This poster describes experiences processing the two-billion-word Hansard corpus using a fairly standard NLP pipeline on a high performance cluster. Herein we report how we were able to parallelise and apply a “traditional” single-threaded batch-oriented application to a platform that differs greatly from that for which it was originally designed. We start by discussing the tagging toolchain, its specific requirements and properties, and its performance characteristics. This is contrasted with a description of the cluster on which it was to run, and specific limitations are discussed such as the overhead of using SAN-based storage. We then go on to discuss the nature of the Hansard corpus, and describe which properties of this corpus i...
Natural Language Processing (NLP)is an important research direction, since it addresses the needs of...
One of the old and fundamental problems in natural language processing, POS (part-of-speech) tagging...
International audienceThis paper introduces a research about parallelization of an entire applicatio...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
International audienceCommon Crawl is a considerably large, heterogeneous multilingual corpus compri...
Abstract – Semantic analysis often uses a pipeline of Natural Language Processing (NLP) tools such a...
Computational power needs have grown dramatically in recent years. This is also the case in many lan...
The Intelcomp NLP pipeline can be defined as a collection of tools that apply the requested transfor...
Abstract: 2006 saw the start of a project for compiling a multifunctional parallel corpus with Dutch...
Common Crawl is a considerably large, heterogeneous multilingual corpus comprised of crawled documen...
Parallel programming is becoming increasingly popular. Computers have increasingly many cores (proce...
Researches in natural languange processing indicated that more data led to better accuracy. Process...
We report on methods to create the largest publicly available parallel corpora by crawling the web, ...
Natural Language Processing (NLP)is an important research direction, since it addresses the needs of...
One of the old and fundamental problems in natural language processing, POS (part-of-speech) tagging...
International audienceThis paper introduces a research about parallelization of an entire applicatio...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
This poster describes experiences processing the two-billion-word Hansard corpus using a fairly stan...
International audienceCommon Crawl is a considerably large, heterogeneous multilingual corpus compri...
Abstract – Semantic analysis often uses a pipeline of Natural Language Processing (NLP) tools such a...
Computational power needs have grown dramatically in recent years. This is also the case in many lan...
The Intelcomp NLP pipeline can be defined as a collection of tools that apply the requested transfor...
Abstract: 2006 saw the start of a project for compiling a multifunctional parallel corpus with Dutch...
Common Crawl is a considerably large, heterogeneous multilingual corpus comprised of crawled documen...
Parallel programming is becoming increasingly popular. Computers have increasingly many cores (proce...
Researches in natural languange processing indicated that more data led to better accuracy. Process...
We report on methods to create the largest publicly available parallel corpora by crawling the web, ...
Natural Language Processing (NLP)is an important research direction, since it addresses the needs of...
One of the old and fundamental problems in natural language processing, POS (part-of-speech) tagging...
International audienceThis paper introduces a research about parallelization of an entire applicatio...