Unzip ZIP file

The "Unzip ZIP file" service task extracts a ZIP archive and returns its contents as new file references. Files are additionally grouped by folder structure, so downstream tasks can target specific subfolders. This is ideal for importing ZIP exports from third-party systems or processing multiple uploaded files in one go.

Input parameters

The task expects the following fields:

{
  "zip_file": {
    "referenceId": "...",
    "filename": "archive.zip",
    "contentType": "application/zip"
  },
  "filter": "*.pdf",
  "output_prefix": "",
  "maxUncompressedBytes": 524288000,
  "maxFiles": 1000,
  "maxCompressionRatio": 100
}

Explanation:

zip_file: File reference of the ZIP archive to extract (required).
filter: Glob pattern or array of glob patterns to only extract specific entries (optional). Supports:
- * – any characters except /
- ** – any characters including / (recursive)
- ? – exactly one character
- Matching is case-insensitive.
- Examples: "*.pdf", ["*.pdf", "*.csv"], "invoices/**".
output_prefix: Prefix prepended to every output filename (optional, default: empty).
maxUncompressedBytes: Maximum total uncompressed size in bytes (optional, default: 524288000 = 500 MB). Exceeding this limit aborts the task with an error.
maxFiles: Maximum number of entries in the archive (optional, default: 1000).
maxCompressionRatio: Zip-bomb protection. If the compression ratio (uncompressed/compressed) of any entry exceeds this value, the task is aborted (optional, default: 100).

No filter provided

Without filter, all files in the archive are extracted. Directory entries are always skipped – only files are returned.

Output

The task returns a flat list of all extracted files as well as a structure grouped by folder.

{
  "files": [
    {
      "referenceId": "...",
      "filename": "INV2024-001.pdf",
      "contentType": "application/pdf"
    },
    {
      "referenceId": "...",
      "filename": "B-042.csv",
      "contentType": "text/csv"
    }
  ],
  "groups": {
    "invoices": [
      {
        "referenceId": "...",
        "filename": "INV2024-001.pdf",
        "contentType": "application/pdf"
      }
    ],
    "bookings": [
      {
        "referenceId": "...",
        "filename": "B-042.csv",
        "contentType": "text/csv"
      }
    ],
    "": []
  },
  "count": 2,
  "skipped": 0
}

Explanation:

files: Flat list of all extracted files using the standard file schema (referenceId, filename, contentType). These entries can be passed directly to other service tasks such as "Parse CSV file into structured data" or "Split PDF".
groups: Files grouped by their folder within the ZIP archive. The key is the folder path inside the archive; "" holds files located directly in the root.
count: Number of files extracted.
skipped: Number of entries excluded by the filter.

Name collisions

If the archive contains multiple files with the same name (e.g. invoice.pdf in different folders), a counter is appended automatically: invoice.pdf, invoice_2.pdf, invoice_3.pdf. The mapping in groups remains correct.

JSONata examples

Extract the entire archive:

{
  "zip_file": $.archive
}

Extract PDFs only:

{
  "zip_file": $.archive,
  "filter": "*.pdf"
}

Allow multiple file types:

{
  "zip_file": $.archive,
  "filter": ["*.pdf", "*.csv"]
}

Extract only files from a specific folder:

{
  "zip_file": $.archive,
  "filter": "invoices/**"
}

Typical use cases

Data migration from third-party systems: A ZIP export contains folders such as invoices/, bookings/, receipts/. Using groups.invoices, invoices are processed in one branch while groups.bookings goes to another.
Batch upload: Multiple PDFs are uploaded as a single ZIP. Each file is then handled in a multi-instance subprocess independently.
Data import: A ZIP contains multiple CSV files in the root. Using groups[""], the root CSVs are passed to "Parse CSV file into structured data".

Downstream processing

Groups can be used directly as the collection in a multi-instance subprocess:

{
  "collection": $.groups.invoices
}

Each invoice is then passed individually to the subprocess and can be handled by "Extract text from PDF (invoice)", for example.

For simple cases without folder structure, the flat list is sufficient:

{
  "collection": $.files
}

Notes

Returned file references contain only referenceId, filename and contentType. The folder information is kept exclusively in the keys of groups.
Directory entries in the archive are skipped; only files are extracted.
Entries with path-traversal patterns (.., absolute paths, backslashes) are rejected for security reasons.
Entries with a suspiciously high compression ratio (above maxCompressionRatio) are treated as a zip bomb and cause the task to abort.
Total size and entry count are verified against the Central Directory before extraction, so the archive is not fully unpacked before a limit takes effect.

Tip

Combined with a multi-instance subprocess this service is ideal for parallel processing: use groups.invoices as the collection and invoice as the element variable – every invoice then runs through extraction, validation and posting on its own.