Skip to main content

TextScan

The FilterOpticalCharacterRecognition is a pluggable filter that extracts text from image frames using Optical Character Recognition (OCR). It supports multiple OCR backends and offers flexible configuration for language support, output, and debug logging.

FeaturesDirect link to Features

  • Dual OCR Engine Support
    Choose between:

  • Multi-language OCR
    Use the ocr_language option to specify one or more language codes (e.g., en,fr).

  • Multi-topic Processing

    • Process multiple video regions simultaneously using pub/sub topics
    • Filter frames by topic using topic_pattern regex
    • Exclude specific topics using exclude_topics list
    • Support for exact topic names or regex patterns in exclusions
    • Main-first output ordering - ensures consistent topic structure
  • Flexible Output Options

    • Write results to JSON file (configurable via write_output_file)
    • Forward OCR results in frame metadata (configurable via forward_ocr_texts)
    • Results are written to output_json_path as newline-delimited JSON
    • Data forwarding - forwards non-image frames when forward_upstream_data is enabled
  • Debug Mode
    Enabling debug: true will increase logging verbosity for troubleshooting and transparency.

  • Frame-level Skipping
    Add the metadata flag skip_ocr: true to individual frames to bypass OCR processing.

  • Custom Tesseract Path
    You can specify a custom tesseract_cmd binary path if using the Tesseract engine (defaults to a bundled AppImage).

  • Safe Streaming Output
    Results are flushed to disk immediately after processing each frame.

    Note

    This may lead to heavy I/O operations. A configurable flushing strategy is planned for future releases.

Example OutputDirect link to Example Output

Each processed frame will produce a JSON line similar to:

{
"topic": "camera",
"frame_id": "abc123",
"texts": ["Detected text line 1", "Detected text line 2"]
}

When forwarding results in metadata, they are stored under the ocr_texts key in the frame metadata, with topics as keys:

{
"meta": {
"ocr_texts": {
"camera": ["Detected text line 1", "Detected text line 2"],
"thermal": ["Temperature: 25°C"]
}
}
}

When to UseDirect link to When to Use

This filter is ideal for any pipeline that requires reading printed or handwritten text from images, such as:

  • Scanned documents
  • Signboards or product packaging in photos
  • Scene text in videos
  • Multi-camera systems with different text sources

Configuration ReferenceDirect link to Configuration Reference

KeyTypeDefaultDescription
ocr_enginestring"easyocr"OCR engine to use: "tesseract" or "easyocr"
ocr_languagestring[]["en"]List of language codes for OCR
output_json_pathstring"./output/ocr_results.json"Path to save output results
debugbooleanfalseEnable debug logging
tesseract_cmdstringPackaged AppImage pathPath to Tesseract binary
forward_ocr_textsbooleantrueWhether to forward OCR results in frame metadata
write_output_filebooleanfalseWhether to write results to output file
topic_patternstringnullRegex pattern to match topic names
exclude_topicsstring[][]List of topics to exclude from OCR processing
forward_upstream_databooleantrueWhether to forward non-image frames through the pipeline

Environment VariablesDirect link to Environment Variables

All configuration options can be overridden using environment variables with the prefix FILTER_. For example:

  • FILTER_OCR_ENGINE
  • FILTER_OCR_LANGUAGE
  • FILTER_DEBUG
  • FILTER_TOPIC_PATTERN
  • FILTER_EXCLUDE_TOPICS
  • FILTER_FORWARD_UPSTREAM_DATA

Boolean values should be set to "true" or "false" (case-insensitive). List values should be comma-separated strings.