The challenge

IBM Research is hosting a competition challenging the document analysis community to tackle the task of “Form Understanding in Noisy Scanned Documents” (FUNSD).

Form understanding aims to extract and structure the textual content of forms. The structure is obtained by constructing labeled semantic entities that are linked to each other, such as a question–answer pair.

Form understanding is a daunting task, especially if the documents come from different sources or are highly variable in terms of structure and visual appearance.

        The dataset

We manually checked 25,000 images from the form category of the RVL–CDIP dataset and discarded the following:

  • unreadable forms,
  • non-English-language documents
  • documents that are not forms.

Out of the eligible forms, 200 images are manually annotated. The competition will provide 100 documents for training or validation and 100 for testing.

The tasks

1 Text detection
This task aims at spotting the bounding box locations of each word in the form.
Example: (Name) is a single word. Special characters such as checkboxes often encountered in forms should be treated as regular words.

2 Text detection and recognition
The task aims at both localizing and recognizing words in the form. Only legible text is eligible for evaluation.
Example: Signatures are not evaluated.

3 Form understanding
This task will be evaluated in the following three axes:

  • Word grouping. Aggregate words that belong to the same semantic entity.
    Examples: A question, a header or an answer.
  • Semantic entity labeling. Label each semantic entity from a set of four pre-defined labels:
    – question (Q)
    – answer (A)
    – header (H)
    – other (O)
    Note that a single label should be assigned to a semantic entity (i.e., a group of words).
  • Entity linking. Predict the relations between semantic entities. The label of the group determines the type of relation.
    Example: If a semantic entity is tagged as question and linked to another semantic entity tagged as answer, the link becomes a question–answer pair. We can draw links between Q → A, H → Q.


Task 1

Example: Task 1

Task 2

Example: Task 2

Task 3

Example: Task 3

Does my entry have to cover all three tasks?


Participants are free to use their own text detector and OCR engine for Tasks 1 and 2.

We will provide a text detection and text recognition output from a state-of-the-art OCR engine for participants who want to focus on Task 3.
Note: Task 3 will be evaluated end-to-end from the original image to the form understanding output.

Word grouping and labeling on a form from RVL–CDIP

Example: Word grouping and labeling on a form from RVL–CDIP

How to participate

  • Register your interest below.
    Deadline [TBD – postponed].
  • Download the training dataset.
    Available from [TBD – postponed].
  • Download the test dataset.
    Available from [TBD – postponed].
  • Details on how to submit your results and method description will be sent by email to the registered participants.

Once the competition is completed, the winners will be announced at ICDAR 2019.


Data privacy

Your personal data will be used only in connection with this competition and will be deleted within 30 days thereafter.
You can withdraw your consent at any time by sending an email to the .
For more information about our handling of personal data, see IBM’s Privacy Statement.

By sending this form, I confirm that I have read and understood IBM’s personal data practices.

We will alert you as soon as the training and test datasets are available.


Antonio Foncubierta Rodriguez
Antonio Foncubierta Rodríguez
IBM Research scientist

Guillaume Jaume
Guillaume Jaume
PhD candidate, IBM Research