Skip to content

OCR full analysis

The "OCR full analysis" service task combines text extraction and structure analysis in a single processing step. In addition to the complete running text, tables, headings, and layout regions are recognized simultaneously.

This makes it ideal for documents that contain both text paragraphs and tabular content — such as invoices with line items, delivery notes, or forms with free-text fields.


Input parameters

Provide the following fields as task input:

{
  "pdf": {
    "referenceId": "...",
    "filename": "delivery-note.pdf"
  },
  "enableHandwriting": true,
  "outputFormat": "json"
}

Explanation:

  • pdf.referenceId: Reference to the uploaded file (PDF or image). The file must be available as a fileReference.
  • pdf.filename: The filename including extension. Supported formats: PDF, JPG, PNG, BMP, TIFF.
  • enableHandwriting: Optional (true/false, default: true). Enables handwriting recognition and automatic document de-skewing.
  • outputFormat: Optional (json or markdown, default: json). When set to markdown, the result is returned as structured Markdown text.

Output

JSON format (default)

With outputFormat: "json", the result includes both text and structural data per page:

{
  "metadata": {
    "source_file": "delivery-note.pdf",
    "total_pages": 1,
    "total_text_blocks": 49,
    "total_tables": 1
  },
  "pages": [
    {
      "page_number": 0,
      "text": "Delivery Note No. 7208166\nDate: 2025-03-19",
      "text_blocks": [
        {
          "text": "Delivery Note No. 7208166",
          "confidence": 0.98,
          "bbox": { "x_min": 50, "y_min": 30, "x_max": 400, "y_max": 60 }
        }
      ],
      "tables": [
        {
          "index": 0,
          "html": "<table><tr><td>Item</td><td>Qty</td><td>Product</td></tr><tr><td>1</td><td>50</td><td>Bolt M8</td></tr></table>"
        }
      ]
    }
  ],
  "full_text": "Delivery Note No. 7208166\nDate: 2025-03-19\nCustomer: Sample Ltd."
}

Markdown format

With outputFormat: "markdown", the result is returned as combined Markdown text:

{
  "markdown": "## Delivery Note No. 7208166\n\nDate: 2025-03-19\n\n| Item | Qty | Product |\n|------|-----|---------|\n| 1 | 50 | Bolt M8 |",
  "ocr_text": "Delivery Note No. 7208166\nDate: 2025-03-19\nItem Qty Product\n1 50 Bolt M8"
}

Explanation:

  • text_blocks: Individual text blocks with position and confidence score — for precise text localization.
  • tables: Recognized tables as HTML.
  • full_text: The combined text from all pages as plain text — ideal as input for AI services.
  • markdown: Structured Markdown text with tables and paragraphs (only with outputFormat: "markdown").
  • ocr_text: Additional plain text independent of the structure analysis (only with outputFormat: "markdown").

JSONata examples

// Reference a file from an upload step
{
  "pdf": {
    "referenceId": uploadResult.referenceId,
    "filename": uploadResult.filename
  },
  "enableHandwriting": true,
  "outputFormat": "json"
}
// Pass recognized text and tables to an AI service
{
  "text": ocrResult.full_text,
  "prompt": "Extract the invoice number, date, and total amount."
}
// Check whether tables were recognized (gateway condition)
ocrResult.metadata.total_tables > 0

Notes

  • Processing time varies depending on document size — typically 30–180 seconds, as both text and structure recognition are performed.
  • When only the running text is needed (without tables), the "OCR text extraction" service task is faster and more resource-efficient.
  • When only the table structure is relevant, the "OCR structure analysis" service task can be used instead.
  • All three OCR variants support image files (JPG, PNG, BMP, TIFF) in addition to PDF — photographed documents can be processed directly.

Tip

The Markdown format (outputFormat: "markdown") works excellently as input for AI services. The structured text with table formatting enables the AI to recognize items, quantities, and amounts more reliably than with plain text alone.