Computational power needs have grown dramatically in recent years. This is also the case in many language processing tasks, due to very big quantities of documents that must be processed in a reasonable time frame. This scenario has led to a paradigm change in the computing architectures and large-scale text processing strategies used in the NLP field. In this paper we describe a series of experiments carried out in the context of the NewsReader project with the goal of analyzing the scaling capabilities of the language processing pipeline used in it. We explore the use of Storm in a new approach for scalable distributed language processing across multiple machines and evaluate its effectiveness and efficiency for processing documents on a ...
AbstractStreaming computing models allow for on-the-fly processing of large data sets. With the in-c...
The impressive progress in NLP techniques has been driven by the development of multi-task benchmark...
textClustering is a central problem in unsupervised learning for discovering interesting patterns in...
The availability of large and rich quantities of text data is due to the emergence of the World Wide...
In light of widespread digitization endeavors and ever-growing textual data generation, developing e...
Next generation real-time applications demand big-data infrastructures to process huge and continuou...
Summarization: Big data, which is derived from humans or machines, starting with social media and ex...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
More and more use cases require fast, accurate, and reliable processing of large volumes of data. To...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
The amount of information available through the Internet has been showing a significant growth in t...
Cloud computing is increasingly being regarded as a key enabler of the ‘democratization of science’,...
Much of the previous work in Big Data has focussed on numerical sources of information. However, wit...
The demand for Natural Language Processing has been thriving rapidly due to the various emerging Int...
AbstractStreaming computing models allow for on-the-fly processing of large data sets. With the in-c...
The impressive progress in NLP techniques has been driven by the development of multi-task benchmark...
textClustering is a central problem in unsupervised learning for discovering interesting patterns in...
The availability of large and rich quantities of text data is due to the emergence of the World Wide...
In light of widespread digitization endeavors and ever-growing textual data generation, developing e...
Next generation real-time applications demand big-data infrastructures to process huge and continuou...
Summarization: Big data, which is derived from humans or machines, starting with social media and ex...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
More and more use cases require fast, accurate, and reliable processing of large volumes of data. To...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
The amount of information available through the Internet has been showing a significant growth in t...
Cloud computing is increasingly being regarded as a key enabler of the ‘democratization of science’,...
Much of the previous work in Big Data has focussed on numerical sources of information. However, wit...
The demand for Natural Language Processing has been thriving rapidly due to the various emerging Int...
AbstractStreaming computing models allow for on-the-fly processing of large data sets. With the in-c...
The impressive progress in NLP techniques has been driven by the development of multi-task benchmark...
textClustering is a central problem in unsupervised learning for discovering interesting patterns in...