We describe a feature-rich conditional random field model for the extraction of conference and workshop information (e.g. name, date, location, deadline) from calls for papers (CFPs). This has applications in the automatic construction of a conference knowledge base from a collection of CFPs. Relevant information in CFPs is often contained in regions that do not contain complete, grammatical sentences, but can be distinguished visually from other parts of the text by their formatting. We show that in this situation layout features, i.e. features that measure physical layout properties of a text, improve extraction accuracy considerably. On a corpus of CFPs we observe a 30 % gain in F1 through the use of layout features.
Fostering both the creation and the linking of data with the scope of supporting the growth of the L...
International audienceTransformer-based Language Models are widely used in Natural Language Processi...
Fostering both the creation and the linking of data with the scope of supporting the growth of the L...
Abstract. For members of the research community it is vital to stay informed about conferences, work...
Due to the increasing number of conferences, researchers need to spend more and more time browsing t...
Abstract. Due to the increasing number of conferences, researchers need to spend more and more time ...
The ability to find tables and extract information from them is a necessary component of data mining...
Traditional information extraction methods mainly rely on visual feature assisted techniques; but wi...
We address the problem of academic conference homepage understanding for the Semantic Web. This prob...
16 pagesInternational audienceRÉSUMÉ. Cet article décrit une nouvelle approche utilisant des Champs ...
International audienceThe paper describes a new approach using a Conditional Random Fields (CRFs) to...
National audienceCet article décrit une nouvelle approche utilisant des champs aléatoires conditionn...
Abstract—A huge amount of academic papers(including research reports) are being released in web page...
Repetition of layout structure is prevalent in document im-ages. In document design, such repetition...
With the increasing use of research paper search engines, such as CiteSeer, for both literature sear...
Fostering both the creation and the linking of data with the scope of supporting the growth of the L...
International audienceTransformer-based Language Models are widely used in Natural Language Processi...
Fostering both the creation and the linking of data with the scope of supporting the growth of the L...
Abstract. For members of the research community it is vital to stay informed about conferences, work...
Due to the increasing number of conferences, researchers need to spend more and more time browsing t...
Abstract. Due to the increasing number of conferences, researchers need to spend more and more time ...
The ability to find tables and extract information from them is a necessary component of data mining...
Traditional information extraction methods mainly rely on visual feature assisted techniques; but wi...
We address the problem of academic conference homepage understanding for the Semantic Web. This prob...
16 pagesInternational audienceRÉSUMÉ. Cet article décrit une nouvelle approche utilisant des Champs ...
International audienceThe paper describes a new approach using a Conditional Random Fields (CRFs) to...
National audienceCet article décrit une nouvelle approche utilisant des champs aléatoires conditionn...
Abstract—A huge amount of academic papers(including research reports) are being released in web page...
Repetition of layout structure is prevalent in document im-ages. In document design, such repetition...
With the increasing use of research paper search engines, such as CiteSeer, for both literature sear...
Fostering both the creation and the linking of data with the scope of supporting the growth of the L...
International audienceTransformer-based Language Models are widely used in Natural Language Processi...
Fostering both the creation and the linking of data with the scope of supporting the growth of the L...