Skip to main content

Document Agent

2025.1.01+

The Document Agent is responsible for handling and processing content items in Flowable.
It enables AI-powered document classification and data extraction, typically in coordination with a case or process.
All operations for the Document Agent take a Content item as input and can produce various output types depending on configuration.

When to use the Document Agent?

Use the Document Agent when your use case involves documents that need to be automatically classified or parsed for structured data, such as extracting metadata from invoices and other documents.

Key Features

  • Automatically classifies documents into content models
  • Supports Content item as input with flexible output types
  • Integrates with associated forms to generate output schemas
  • Works standalone or as part of an Orchestrator Agent
  • Supports a wide range of document formats, including emails and Microsoft Office files

Document Classification

Document classification is a default operation that enables you to classify a document into one of your defined content models.

To configure classification:

  • Use the Add classification button to add candidate content models.
  • For each classification, specify:
    • The content model
    • The default operation used for data extraction

When the Document Agent is part of an Orchestrator Agent, it will automatically classify documents attached to the case.
Once classified, the agent runs the Extract data operation to retrieve structured content and attaches the result as metadata on the content item.

You can configure which extraction logic to use via the Extract data option. A built-in operation called Data extraction is provided, but custom operations are also supported.

Operations

Operations executed by the Document Agent follow the same structure described in the operations concept.
The input type is always a Content item. The content of the content item is parsed to a markdown and can be accessed with ${text} in the user message and system message.

Output Types

In addition to the default output type, the following output types are supported:

  • Data based on associated form:
    This uses the content type of the input item to find a linked form for the Create action.
    That form is parsed (on a best-effort basis) and used as the output schema for the operation.

Multi-document Support

Operations can be invoked with multiple documents in a single request. In this case, all documents are processed together by the LLM, and the output is returned once, as a combined or shared result.

Data Extraction

The default Data extraction operation can be used to retrieve structured data from a classified document, but it is optional.
You may also define custom operations that return structured outputs, these allow you to decouple the extraction schema from the associated form.

Flowable provides a default prompt for the Data extraction operation.
This prompt can be customized using either a Simple prompt or a Prompt template, giving you control over how data is extracted from the document.

Supported File Types

The Document Agent supports the following MIME types for conversion to text, which can then be classification and used for data extraction:

  • Email files (message/rfc822, application/vnd.ms-outlook)
  • Plain text files (text/plain)
  • Markdown files (text/markdown)
  • Microsoft Word documents (application/vnd.openxmlformats-officedocument.wordprocessingml.document)
  • Microsoft Excel spreadsheets (application/vnd.openxmlformats-officedocument.spreadsheetml.sheet)
  • Microsoft PowerPoint presentations (application/vnd.openxmlformats-officedocument.presentationml.presentation)

The used LLM might support some mime types like images as well. Those can be configured as part of the model settings. By default, the following MIME types are configured to be handled by the LLM:

  • Images (image/jpeg, image/png, image/webp, image/gif)

Summary

  • The Document Agent classifies documents and extracts structured data from Content items.
  • Multiple files can be processed in one request.
  • Common file types like emails, text, and Office documents are supported.
  • Often used alongside the Orchestrator Agent in case workflows.