OCR text extraction
The "OCR text extraction" service task automatically recognizes all text content in a PDF document or image and returns it as structured text. Recognition is powered by a high-performance OCR engine (PaddleOCR) and supports both printed text and handwriting.
This enables incoming documents such as delivery notes, invoices, or forms to be read automatically and processed in subsequent steps — for example, for AI-powered classification or data extraction.
Input parameters
Provide the following fields as task input:
{
"pdf": {
"referenceId": "...",
"filename": "delivery-note.pdf"
},
"enableHandwriting": true,
"outputFormat": "json"
}
Explanation:
pdf.referenceId: Reference to the uploaded file (PDF or image). The file must be available as afileReference, e.g. from a previous upload step.pdf.filename: The filename including extension. Supported formats: PDF, JPG, PNG, BMP, TIFF.enableHandwriting: Optional (true/false, default:true). Enables handwriting recognition and automatic document de-skewing. For purely printed documents, this can be set tofalseto speed up processing.outputFormat: Optional (jsonormarkdown, default:json). Determines the output format.
Output
The task returns recognized text page by page:
{
"metadata": {
"source_file": "delivery-note.pdf",
"total_pages": 1,
"total_text_blocks": 49,
"total_tables": 0
},
"pages": [
{
"page_number": 0,
"text": "Delivery Note No. 7208166\nDate: 2025-03-19\nCustomer: Sample Ltd.",
"text_blocks": [
{
"text": "Delivery Note No. 7208166",
"confidence": 0.98,
"bbox": { "x_min": 50, "y_min": 30, "x_max": 400, "y_max": 60 }
}
]
}
],
"full_text": "Delivery Note No. 7208166\nDate: 2025-03-19\nCustomer: Sample Ltd."
}
Explanation:
metadata: Summary including source file, page count, and total number of recognized text blocks.pages: Array with one entry per recognized document page.text: The complete recognized text of the page as plain text.text_blocks: Individual text blocks with position (bbox) and confidence score (confidence, 0–1).
full_text: The combined text from all pages — ideal as input for downstream AI services.
JSONata examples
// Reference a file from a previous upload step
{
"pdf": {
"referenceId": fileUploadStep.referenceId,
"filename": fileUploadStep.filename
},
"enableHandwriting": true
}
// Pass the extracted text to an AI service
{
"text": ocrResult.full_text,
"categories": ["Invoice", "Delivery Note", "Quote", "Other"]
}
Notes
- Processing time varies depending on document size and complexity — typically 10–90 seconds.
- With handwriting recognition enabled (
enableHandwriting: true), the document is automatically de-skewed, which improves recognition for tilted scans or photographed documents. - Multi-page PDFs are fully processed — each page appears as a separate element in the
pagesarray. - The
confidencevalue of each text block is useful for quality assessment — values below 0.7 indicate uncertain recognition.
Tip
Combining this with the "AI: Classify document" or "AI: Extract structured data" service task is particularly effective: first, the text is recognized via OCR, and then it is automatically classified or converted into structured data fields.