TextScan
The FilterOpticalCharacterRecognition is a pluggable filter that extracts text from image frames using Optical Character Recognition (OCR). It supports multiple OCR backends and offers flexible configuration for language support, output, and debug logging.
FeaturesDirect link to Features
-
Dual OCR Engine Support
Choose between: -
Multi-language OCR
Use theocr_languageoption to specify one or more language codes (e.g.,en,fr). -
Multi-topic Processing
- Process multiple video regions simultaneously using pub/sub topics
- Filter frames by topic using
topic_patternregex - Exclude specific topics using
exclude_topicslist - Support for exact topic names or regex patterns in exclusions
- Main-first output ordering - ensures consistent topic structure
-
Flexible Output Options
- Write results to JSON file (configurable via
write_output_file) - Forward OCR results in frame metadata (configurable via
forward_ocr_texts) - Results are written to
output_json_pathas newline-delimited JSON - Data forwarding - forwards non-image frames when
forward_upstream_datais enabled
- Write results to JSON file (configurable via
-
Debug Mode
Enablingdebug: truewill increase logging verbosity for troubleshooting and transparency. -
Frame-level Skipping
Add the metadata flagskip_ocr: trueto individual frames to bypass OCR processing. -
Custom Tesseract Path
You can specify a customtesseract_cmdbinary path if using the Tesseract engine (defaults to a bundled AppImage). -
Safe Streaming Output
Results are flushed to disk immediately after processing each frame.NoteThis may lead to heavy I/O operations. A configurable flushing strategy is planned for future releases.
Example OutputDirect link to Example Output
Each processed frame will produce a JSON line similar to:
{
"topic": "camera",
"frame_id": "abc123",
"texts": ["Detected text line 1", "Detected text line 2"]
}
When forwarding results in metadata, they are stored under the ocr_texts key in the frame metadata, with topics as keys:
{
"meta": {
"ocr_texts": {
"camera": ["Detected text line 1", "Detected text line 2"],
"thermal": ["Temperature: 25°C"]
}
}
}
When to UseDirect link to When to Use
This filter is ideal for any pipeline that requires reading printed or handwritten text from images, such as:
- Scanned documents
- Signboards or product packaging in photos
- Scene text in videos
- Multi-camera systems with different text sources
Configuration ReferenceDirect link to Configuration Reference
| Key | Type | Default | Description |
|---|---|---|---|
ocr_engine | string | "easyocr" | OCR engine to use: "tesseract" or "easyocr" |
ocr_language | string[] | ["en"] | List of language codes for OCR |
output_json_path | string | "./output/ocr_results.json" | Path to save output results |
debug | boolean | false | Enable debug logging |
tesseract_cmd | string | Packaged AppImage path | Path to Tesseract binary |
forward_ocr_texts | boolean | true | Whether to forward OCR results in frame metadata |
write_output_file | boolean | false | Whether to write results to output file |
topic_pattern | string | null | Regex pattern to match topic names |
exclude_topics | string[] | [] | List of topics to exclude from OCR processing |
forward_upstream_data | boolean | true | Whether to forward non-image frames through the pipeline |
Environment VariablesDirect link to Environment Variables
All configuration options can be overridden using environment variables with the prefix FILTER_. For example:
FILTER_OCR_ENGINEFILTER_OCR_LANGUAGEFILTER_DEBUGFILTER_TOPIC_PATTERNFILTER_EXCLUDE_TOPICSFILTER_FORWARD_UPSTREAM_DATA
Boolean values should be set to "true" or "false" (case-insensitive). List values should be comma-separated strings.