Place for a clever sound-bite here.

—Peter Staar, IBM scientist

Trillions of documents. Millions of concurrent users.

PDF parser: Parses the PDF code and presents the raw data of the PDF (text cells, embedded images and vector graphics in consumable format)

PDF interpreter: Captures ground truth by massive crowd-sourcing Big Data system uses high-performance computing for machine-learning techniques (deep leaning) to train automatic annotation models

Semantic representation: Uses high-performance computing and Big Data systems to obtain a semantic representation in JSON format of the original text

Iterative machine-learning process


Machine-learning iteration process

PDF document parsing

PDF parsing

Manuscript parser

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer id pharetra nibh. Nulla venenatis tristique nunc, ac rutrum erat mollis non. Nam tempor eleifend elementum. Vestibulum at nisl fringilla, malesuada purus at, porttitor odio. Phasellus nec sem et lectus pretium efficitur. In porta iaculis dui, eget venenatis nunc varius eu. Nunc ullamcorper nec mauris dictum rutrum. Aliquam erat volutpat. Maecenas et risus gravida, malesuada elit in, vestibulum sem. Proin convallis odio arcu, vitae commodo nisi faucibus sed.

Integer maximus turpis libero, a congue libero semper eu. Donec ligula odio, rutrum non sagittis sed, vehicula sed diam. Quisque ullamcorper et libero in dictum. Nunc at rutrum risus, eu finibus lectus. Cras ut molestie turpis. Maecenas aliquam diam arcu. Aliquam velit dolor, pulvinar euismod magna mattis, bibendum suscipit tellus. Nullam dapibus urna at pulvinar egestas. Donec non orci velit. In dapibus, nulla vel consequat viverra, ligula metus pharetra mi, a maximus odio lacus vitae nulla. Aliquam et ex enim. Nunc luctus ullamcorper tempor.

PDF parsing

Machine learning

SmartAnnotator

Ask the expert

Peter Staar

Peter Staar

IBM Research scientist