International audienceWikipedia is a rich source of information across many knowledge domains. Yet, recovering articles relevant to a specific domain is a difficult problem since such articles may be rare and tend to cover multiple topics. Furthermore, Wikipedia's categories provide an ambiguous classification of articles as they relate to all topics and thus are of limited use. In this paper, we develop a new methodology to isolate Wikipedia's articles that describe a specific topic within the scope of relevant categories; the methodology uses supervised machine learning to retrieve a decision tree classifier based on articles' features (URL patterns, summary text, infoboxes, links from list articles). In a case study, we retrieve 3000+ ar...
Wikipedia article names can be utilized as a controlled vocabulary for identifying the main topics i...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
International audienceWikipedia is a rich source of information across many knowledge domains. Yet, ...
The process whereby inferences are made from textual data is broadly referred to as text mining. In ...
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes....
Wikipedia provides an information quality assessment model with criteria for human peer reviewers to...
There are many opportunities to improve the interactivity of information retrieval systems beyond th...
Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to...
This thesis focuses on the design of algorithms for the extraction of knowledge (in terms of entitie...
Wikipedia is a goldmine of information; not just for its many readers, but also for the growing comm...
The number of scientific publications is increasing by 3% per year, making it difficult for scientis...
Abstract Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge bas...
Reflecting the rapid growth of science, technology, and culture, it has become common practice to co...
In 2005 Wikipedia implemented a category system for the purposes of facilitating navigation througho...
Wikipedia article names can be utilized as a controlled vocabulary for identifying the main topics i...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
When humans approach the task of text categorization, they interpret the specific wording of the doc...
International audienceWikipedia is a rich source of information across many knowledge domains. Yet, ...
The process whereby inferences are made from textual data is broadly referred to as text mining. In ...
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes....
Wikipedia provides an information quality assessment model with criteria for human peer reviewers to...
There are many opportunities to improve the interactivity of information retrieval systems beyond th...
Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to...
This thesis focuses on the design of algorithms for the extraction of knowledge (in terms of entitie...
Wikipedia is a goldmine of information; not just for its many readers, but also for the growing comm...
The number of scientific publications is increasing by 3% per year, making it difficult for scientis...
Abstract Although primarily an encyclopedia, Wikipedia’s expansive content provides a knowledge bas...
Reflecting the rapid growth of science, technology, and culture, it has become common practice to co...
In 2005 Wikipedia implemented a category system for the purposes of facilitating navigation througho...
Wikipedia article names can be utilized as a controlled vocabulary for identifying the main topics i...
We propose a language-independent graph-based method to build a-la-carte article collections on user...
When humans approach the task of text categorization, they interpret the specific wording of the doc...