Skip to content

OCR structure analysis

The "OCR structure analysis" service task automatically recognizes the layout structure of a document — including tables, headings, text areas, and other visual elements. Unlike plain text extraction, this service analyzes the spatial arrangement of the document.

This enables tabular content such as item lists on delivery notes, invoice line items, or structured forms to be captured automatically and processed in subsequent steps.


Input parameters

Provide the following fields as task input:

{
  "pdf": {
    "referenceId": "...",
    "filename": "invoice.pdf"
  },
  "enableHandwriting": true,
  "outputFormat": "json"
}

Explanation:

  • pdf.referenceId: Reference to the uploaded file (PDF or image). The file must be available as a fileReference.
  • pdf.filename: The filename including extension. Supported formats: PDF, JPG, PNG, BMP, TIFF.
  • enableHandwriting: Optional (true/false, default: true). Enables handwriting recognition and automatic document de-skewing.
  • outputFormat: Optional (json or markdown, default: json). When set to markdown, the recognized structure is returned as Markdown text — particularly useful as input for AI services.

Output

JSON format (default)

With outputFormat: "json", the recognized structure is returned page by page:

{
  "metadata": {
    "source_file": "invoice.pdf",
    "total_pages": 1,
    "total_text_blocks": 0,
    "total_tables": 1
  },
  "pages": [
    {
      "page_number": 0,
      "text": "Item | Qty | Product | Price",
      "tables": [
        {
          "index": 0,
          "html": "<table><tr><td>Item</td><td>Qty</td><td>Product</td><td>Price</td></tr><tr><td>1</td><td>50</td><td>Bolt M8</td><td>€0.12</td></tr></table>"
        }
      ]
    }
  ],
  "full_text": "Item | Qty | Product | Price"
}

Markdown format

With outputFormat: "markdown", the result is returned as Markdown text:

{
  "markdown": "## Invoice\n\n| Item | Qty | Product | Price |\n|------|-----|---------|-------|\n| 1 | 50 | Bolt M8 | €0.12 |",
  "ocr_text": "Invoice\nItem Qty Product Price\n1 50 Bolt M8 €0.12"
}

Explanation:

  • tables: Recognized tables as HTML — ready for parsing in subsequent steps.
  • markdown: Structured Markdown text preserving tables, headings, and paragraphs.
  • ocr_text: Additionally extracted plain text (included in Markdown format) that also captures text areas outside of tables.

JSONata examples

// Reference a file from a previous step
{
  "pdf": {
    "referenceId": uploadResult.referenceId,
    "filename": uploadResult.filename
  },
  "outputFormat": "markdown"
}
// Check whether tables were recognized (gateway condition)
$count(ocrResult.pages[0].tables) > 0

Notes

  • Processing time varies depending on document size — typically 20–120 seconds, somewhat longer than plain text extraction.
  • The Markdown format is particularly well suited as input for AI services such as "AI: Extract structured data" or "AI: Query JSON content".
  • Tables are returned as HTML, making programmatic processing straightforward.

Tip

For documents where both the running text and the table structure are needed, the "OCR full analysis" service task is recommended — it combines text extraction and structure analysis in a single pass.