International audienceWe consider an industrial context where we deal with a stream of unlabelled documents that become available progressively over time. Based on an adaptive incremental neural gas algorithm (AING), we propose a new stream-based semisupervised active learning method (A2ING) for document classification, which is able to actively query (from a human annotator) the class-labels of documents that are most informative for learning, according to an uncertainty measure. The method maintains a model as a dynamically evolving graph topology of labelled document-representatives that we call neurons. Experiments on different real datasets show that the proposed method requires on average only 36.3% of the incoming documents to be lab...
International audienceActive learning has been successfully applied to a number of NLP tasks. In thi...
[Departement_IRSTEA]Territoires [TR1_IRSTEA]SYNERGIE [Axe_IRSTEA]TETIS-SISOInternational audienceDat...
In many machine learning problem domains large amounts of data are available but the cost of correct...
International audienceIn this paper, we propose a stream-based semi-supervised active learning metho...
Getting correctly labelled data is an important preliminary stage for many supervisedmachine learnin...
Supervised machine learning methods are increasingly employed in political science. Such models requ...
Data labeling is an expensive and time-consuming task. Choosing which labels to use is increasingly ...
As technology evolves and electronic devices become widespread, the amount of data produced in the f...
With the exponential growth of data amount and sources, access to large collections of data has beco...
International audienceIn this paper, we propose a stream-based semi-supervised active learning metho...
International audienceMislabelling is a critical problem for stream-based active learning methods be...
This thesis focuses on machine learning for data classification. To reduce the labelling cost, activ...
This paper shows how a text classifier’s need for labeled training documents can be reduced by takin...
In recent decades, the availability of a large amount of data has propelled the field of machine lea...
The continuous increase of digital documents on the web creates the need to search for information p...
International audienceActive learning has been successfully applied to a number of NLP tasks. In thi...
[Departement_IRSTEA]Territoires [TR1_IRSTEA]SYNERGIE [Axe_IRSTEA]TETIS-SISOInternational audienceDat...
In many machine learning problem domains large amounts of data are available but the cost of correct...
International audienceIn this paper, we propose a stream-based semi-supervised active learning metho...
Getting correctly labelled data is an important preliminary stage for many supervisedmachine learnin...
Supervised machine learning methods are increasingly employed in political science. Such models requ...
Data labeling is an expensive and time-consuming task. Choosing which labels to use is increasingly ...
As technology evolves and electronic devices become widespread, the amount of data produced in the f...
With the exponential growth of data amount and sources, access to large collections of data has beco...
International audienceIn this paper, we propose a stream-based semi-supervised active learning metho...
International audienceMislabelling is a critical problem for stream-based active learning methods be...
This thesis focuses on machine learning for data classification. To reduce the labelling cost, activ...
This paper shows how a text classifier’s need for labeled training documents can be reduced by takin...
In recent decades, the availability of a large amount of data has propelled the field of machine lea...
The continuous increase of digital documents on the web creates the need to search for information p...
International audienceActive learning has been successfully applied to a number of NLP tasks. In thi...
[Departement_IRSTEA]Territoires [TR1_IRSTEA]SYNERGIE [Axe_IRSTEA]TETIS-SISOInternational audienceDat...
In many machine learning problem domains large amounts of data are available but the cost of correct...