Skip to content

AI: Extract key-value pairs

The "AI: Extract key-value pairs" service task automatically identifies and extracts all relevant key-value pairs from unstructured text — without predefined questions. The AI independently recognizes which information is relevant and returns it in a uniform format.

This is ideal for documents where the exact field structure is not known in advance, or additional data should be captured alongside targeted queries.


Input parameters

Provide the following fields as task input:

{
  "content": "Delivery Note\n\nDelivery note no.: LS-2024-4711\nDate: 15.03.2024\nCustomer no.: K-98765\nOrder reference: B-2024-001\n\nRecipient:\nMüller GmbH\nIndustriestraße 5\n70173 Stuttgart\n\nArticle        Quantity    Unit\nWidget A       100         pcs\nWidget B       50          pcs",
  "language": "en",
  "context": "delivery note",
  "maxContentLength": 50000
}

Explanation:

  • content: The text to be analyzed (e.g. OCR output in Markdown format or plain text).
  • language (optional, default: "de"): Language of the labels in the output ("de" or "en").
  • context (optional): A hint about the document type (e.g. "invoice", "delivery note", "contract"). This helps the AI identify domain-specific fields more accurately.
  • maxContentLength (optional, default: 50000): Maximum character count of the input content.

Output

The task returns all extracted key-value pairs as an array:

{
  "keyValues": [
    {"label": "Delivery note number", "value": "LS-2024-4711"},
    {"label": "Date", "value": "15.03.2024"},
    {"label": "Customer number", "value": "K-98765"},
    {"label": "Order reference", "value": "B-2024-001"},
    {"label": "Recipient", "value": "Müller GmbH"},
    {"label": "Address", "value": "Industriestraße 5, 70173 Stuttgart"}
  ]
}

Each entry contains:

  • label: The recognized field name (e.g. "Delivery note number", "Date").
  • value: The extracted value. If a label is detected but no clear value can be determined, the value is "-".

JSONata examples

Use OCR output from a previous step:

{
  "content": ocr_result.markdown,
  "context": "invoice",
  "language": "en"
}

Access specific values in a subsequent step:

(
  $kv := previous_step.keyValues;
  $kv[label = "Invoice number"].value
)

Notes

  • The AI extracts all recognizable key-value pairs — the exact number and labels depend on the document content.
  • The optional context parameter significantly improves quality for domain-specific documents by guiding the AI to look for industry-typical fields.
  • If a label is found but the value is missing or unclear, "-" is returned instead.
  • The output is always an object with a keyValues array, even if no relevant data is found.

Tip

Combine this task with the "AI: Query Markdown content" task: first use key-value extraction for a general overview, then ask targeted questions with format and validation for specific fields. This way, all relevant data is captured — both broadly and precisely.