FUNSD: Form Understanding in
Noisy Scanned Documents

A dataset for Text Detection, Optical Character Recognition, Spatial Layout Analysis and Form Understanding.

Dataset Overview

A dataset for the document understanding community.

199 fully annotated forms
31485 words
9707 semantic entities
5304 relations

Citation

If you use this dataset for your research, please cite our paper:

G. Jaume, H. K. Ekenel, J. Thiran "FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents," 2019

Bibtex format:


	      @inproceedings{jaume2019,

	          title = {FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents},

	          author = {Guillaume Jaume, Hazim Kemal Ekenel, Jean-Philippe Thiran},

	          booktitle = {Accepted to ICDAR-OST},

	          year = {2019}

	      }

Examples

Word grouping and semantic entity labeling.