International audienceExtracting top-k keywords and documents using weighting schemes are popular techniques employed in text mining and machine learning for different analysis and retrieval tasks. The weights are usually computed in the data preprocessing step, as they are costly to update and keep track of all the modifications performed on the dataset. Furthermore, computation errors are introduced when analyzing only subsets of the dataset. Therefore, in a Big Data context, it is crucial to lower the runtime of computing weighting schemes, without hindering the analysis process and the accuracy of the machine learning algorithms. To address this requirement for the task of top-k keywords and documents, it is customary to design benchmar...
Data mining, also known as knowledge discovery in database (KDD), is the process to discover interes...
In the automated text classification, a bag-of-words representation followed by the tfidf weighting ...
The spectacular increasing of Data is due to the appearance of networks and smartphones. Amount 42% ...
International audienceTop-k keyword and top-k document extraction are very popular text analysis tec...
International audienceInformation retrieval from textual data focuses on the construction of vocabul...
Summarization: Big data, which is derived from humans or machines, starting with social media and ex...
Today, a largely scalable computing environment provides a possibility of carrying out various data-...
Text search engines return a set of k documents ranked by similarity to a query. Typically, document...
A lot of improvement has gone in the area of information retrieval. But, still improvements can be d...
International audienceAnalyzing textual data is a very challenging task because of the huge volume o...
This paper proposes PASTA (PArameter-free Solutions for Textual Analysis), a large scale engine prov...
Abstract. The paper describes possible representation models and ways of weighting text documents, w...
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases...
Within text categorization and other data mining tasks, the use of suitable methods for term weighti...
In text mining research, the Vector Space Model (VSM) has been commonly used to represent text docum...
Data mining, also known as knowledge discovery in database (KDD), is the process to discover interes...
In the automated text classification, a bag-of-words representation followed by the tfidf weighting ...
The spectacular increasing of Data is due to the appearance of networks and smartphones. Amount 42% ...
International audienceTop-k keyword and top-k document extraction are very popular text analysis tec...
International audienceInformation retrieval from textual data focuses on the construction of vocabul...
Summarization: Big data, which is derived from humans or machines, starting with social media and ex...
Today, a largely scalable computing environment provides a possibility of carrying out various data-...
Text search engines return a set of k documents ranked by similarity to a query. Typically, document...
A lot of improvement has gone in the area of information retrieval. But, still improvements can be d...
International audienceAnalyzing textual data is a very challenging task because of the huge volume o...
This paper proposes PASTA (PArameter-free Solutions for Textual Analysis), a large scale engine prov...
Abstract. The paper describes possible representation models and ways of weighting text documents, w...
Clustering is a powerful technique for large-scale topic discovery from text. It involves two phases...
Within text categorization and other data mining tasks, the use of suitable methods for term weighti...
In text mining research, the Vector Space Model (VSM) has been commonly used to represent text docum...
Data mining, also known as knowledge discovery in database (KDD), is the process to discover interes...
In the automated text classification, a bag-of-words representation followed by the tfidf weighting ...
The spectacular increasing of Data is due to the appearance of networks and smartphones. Amount 42% ...