Skip to content

OCR text extraction

The "OCR text extraction" service task automatically recognizes all text content in a PDF document or image and returns it as structured text. Recognition is powered by a high-performance OCR engine (PaddleOCR) and supports both printed text and handwriting.

This enables incoming documents such as delivery notes, invoices, or forms to be read automatically and processed in subsequent steps — for example, for AI-powered classification or data extraction.


Input parameters

Provide the following fields as task input:

{
  "pdf": {
    "referenceId": "...",
    "filename": "delivery-note.pdf"
  },
  "enableHandwriting": true,
  "outputFormat": "json"
}

Explanation:

  • pdf.referenceId: Reference to the uploaded file (PDF or image). The file must be available as a fileReference, e.g. from a previous upload step.
  • pdf.filename: The filename including extension. Supported formats: PDF, JPG, PNG, BMP, TIFF.
  • enableHandwriting: Optional (true/false, default: true). Enables handwriting recognition and automatic document de-skewing. For purely printed documents, this can be set to false to speed up processing.
  • outputFormat: Optional (json or markdown, default: json). Determines the output format.

Output

The task returns recognized text page by page:

{
  "metadata": {
    "source_file": "delivery-note.pdf",
    "total_pages": 1,
    "total_text_blocks": 49,
    "total_tables": 0
  },
  "pages": [
    {
      "page_number": 0,
      "text": "Delivery Note No. 7208166\nDate: 2025-03-19\nCustomer: Sample Ltd.",
      "text_blocks": [
        {
          "text": "Delivery Note No. 7208166",
          "confidence": 0.98,
          "bbox": { "x_min": 50, "y_min": 30, "x_max": 400, "y_max": 60 }
        }
      ]
    }
  ],
  "full_text": "Delivery Note No. 7208166\nDate: 2025-03-19\nCustomer: Sample Ltd."
}

Explanation:

  • metadata: Summary including source file, page count, and total number of recognized text blocks.
  • pages: Array with one entry per recognized document page.
    • text: The complete recognized text of the page as plain text.
    • text_blocks: Individual text blocks with position (bbox) and confidence score (confidence, 0–1).
  • full_text: The combined text from all pages — ideal as input for downstream AI services.

JSONata examples

// Reference a file from a previous upload step
{
  "pdf": {
    "referenceId": fileUploadStep.referenceId,
    "filename": fileUploadStep.filename
  },
  "enableHandwriting": true
}
// Pass the extracted text to an AI service
{
  "text": ocrResult.full_text,
  "categories": ["Invoice", "Delivery Note", "Quote", "Other"]
}

Notes

  • Processing time varies depending on document size and complexity — typically 10–90 seconds.
  • With handwriting recognition enabled (enableHandwriting: true), the document is automatically de-skewed, which improves recognition for tilted scans or photographed documents.
  • Multi-page PDFs are fully processed — each page appears as a separate element in the pages array.
  • The confidence value of each text block is useful for quality assessment — values below 0.7 indicate uncertain recognition.

Tip

Combining this with the "AI: Classify document" or "AI: Extract structured data" service task is particularly effective: first, the text is recognized via OCR, and then it is automatically classified or converted into structured data fields.