Class OcrActor
- All Implemented Interfaces:
com.scivicslab.pojoactor.core.CallableByActionName,AutoCloseable
Actor that reads OCR TSV files and provides page text one page at a time.
OCR TSV format: hiragana TAB kanji TAB page (hiragana column is always empty).
Supported actions:
loadFile- Load a single OCR TSV filenextPage- Advance to next page; returns fail when exhaustedgetPageText- Get current page OCR text (newline-joined fragments)getPageInfo- Get current page number and source filename
-
Field Summary
Fields inherited from class com.scivicslab.pojoactor.core.ActorRef
actorName, actorSystem, object -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptioncom.scivicslab.pojoactor.core.ActionResultgetPageInfo(String args) Get metadata about the current page: "pageNum\tsourceFile".com.scivicslab.pojoactor.core.ActionResultgetPageText(String args) Get the OCR text of the current page (fragments joined by newlines).com.scivicslab.pojoactor.core.ActionResultLoad an OCR TSV file.com.scivicslab.pojoactor.core.ActionResultAdvance to the next page.Methods inherited from class com.scivicslab.turingworkflow.workflow.IIActorRef
callByActionName, hasAnnotatedAction, invokeAnnotatedAction, parseFirstArgumentMethods inherited from class com.scivicslab.pojoactor.core.ActorRef
ask, ask, askNow, clearJsonState, clearPendingMessages, close, createChild, expandVariables, getJsonBoolean, getJsonInt, getJsonString, getJsonString, getLastResult, getName, getNamesOfChildren, getParentName, hasJson, hasJsonState, initLogger, isAlive, json, putJson, setLastResult, setParentName, system, tell, tell, tellNow, toStringOfJson, toStringOfYaml
-
Constructor Details
-
OcrActor
-
-
Method Details
-
loadFile
Load an OCR TSV file. Groups kanji-column text by page number. -
nextPage
Advance to the next page. Returns failure (false) when all pages are exhausted. This causes the workflow to try the next row (e.g., transition to end state). -
getPageText
Get the OCR text of the current page (fragments joined by newlines). -
getPageInfo
Get metadata about the current page: "pageNum\tsourceFile".
-