International audienceThis paper describes the application of an information-theoretic approach to document segmentation. Several segmentation criteria are proposed using topic shift detection or just blindly comparing the contents of cache memories where keywords are temporarily stored as a document is analyzed.Experiments with a large corpus of articles from the French newspaper Le Monde show tangible advantages when different models are combined with a suitable strategy. Experimental results show that different strategies for topic shift detection have to be used depending on whether high recall or high precision are sought. Furthermore, methods based on topic independent distributions provide complementary candidates with respect to the...
Most documents are aboutmore than one subject, but the majority of natural language processing algor...
National audienceA new statistical method for Language Modeling and spoken document classification i...
Text segmentation is a traditional task in NLP where a document is broken down into smaller, coheren...
International audienceThe use of cache memories and symmetric Kullback-Leibler distances is proposed...
The use of cache memories and symmetric Kullback-Leibler distances is proposed for topic classificat...
International audienceThe use of cache memories and symmetric Kullback-Leibler distances is proposed...
Most documents are about more than one subject, but the majority of natural language processing algo...
Most documents are about more than one subject, but the majority of natural language processing algo...
International audienceA new statistical method for Language Modeling and spoken document classificat...
This paper presents a new method for topic-based document segmentation, i.e., the identification of ...
International audienceA new statistical method for Language Modeling and spoken document classificat...
Most documents are aboutmore than one subject, but the majority of natural language processing algor...
. We investigate the problem of text segmentation by topic. Applications for this task include topic...
Date du colloque : 09/2008International audienceAn alternative way to tackle Information Retrie...
National audienceA new statistical method for Language Modeling and spoken document classification i...
Most documents are aboutmore than one subject, but the majority of natural language processing algor...
National audienceA new statistical method for Language Modeling and spoken document classification i...
Text segmentation is a traditional task in NLP where a document is broken down into smaller, coheren...
International audienceThe use of cache memories and symmetric Kullback-Leibler distances is proposed...
The use of cache memories and symmetric Kullback-Leibler distances is proposed for topic classificat...
International audienceThe use of cache memories and symmetric Kullback-Leibler distances is proposed...
Most documents are about more than one subject, but the majority of natural language processing algo...
Most documents are about more than one subject, but the majority of natural language processing algo...
International audienceA new statistical method for Language Modeling and spoken document classificat...
This paper presents a new method for topic-based document segmentation, i.e., the identification of ...
International audienceA new statistical method for Language Modeling and spoken document classificat...
Most documents are aboutmore than one subject, but the majority of natural language processing algor...
. We investigate the problem of text segmentation by topic. Applications for this task include topic...
Date du colloque : 09/2008International audienceAn alternative way to tackle Information Retrie...
National audienceA new statistical method for Language Modeling and spoken document classification i...
Most documents are aboutmore than one subject, but the majority of natural language processing algor...
National audienceA new statistical method for Language Modeling and spoken document classification i...
Text segmentation is a traditional task in NLP where a document is broken down into smaller, coheren...