User-generated discourse from Web 2.0 poses particular challenges to natural language processing (NLP) due to its noise and error proneness. A data cleansing step preceding the analysis steps in an NLP pipeline can reduce the problems. While recent efforts provide general-purpose collections of UIMA-based analysis components, data cleansing seems not yet to be covered. The five-stage data cleansing approach proposed here offers a maximum of flexibility in identifying problematic artifacts, deciding how to deal with them and analysing cleansed data. Simultaneously, it allowed us to create reusable UIMA-based components for the actual data cleansing and for mapping annotations created on the clean data back to the original representation. The...
Current natural language processing systems have a wide coverage of English, but are unforgiving of ...
WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworksh...
Discourse parsing is an important task in natural language processing as it supports a wide range of...
Current Natural Language Processing (NLP) systems feature high-complexity processing pipelines that ...
The Unstructured Information Management Architecture (UIMA) [1] framework is a growing platform for ...
A discourse constitutes a locally and globally coherent text in which words, clauses and sentences a...
Recent advances in natural language processing have produced libraries that extract low level featur...
The field of natural language processing (aka NLP) is an intersection of the study of linguistics, c...
This paper describes a system that learns discourse rules for domain-specific analysis of unrestrict...
[Context & Motivation] Developers need to learn about the requirements of software users, who give ...
Before sharing clinical data, personal and sensitive information must be removed or de-identified. T...
We address the problem of learning discourse-level merging strategies within the context of a natura...
The UIMA Framework aids in discovering knowledge from unstructured information by coordinating Analy...
The paper discusses about data cleaning techniques and machine learning algorithms.Iluustrative exam...
In this paper we will show how clustering techniques provide empirical evidence for a characterisati...
Current natural language processing systems have a wide coverage of English, but are unforgiving of ...
WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworksh...
Discourse parsing is an important task in natural language processing as it supports a wide range of...
Current Natural Language Processing (NLP) systems feature high-complexity processing pipelines that ...
The Unstructured Information Management Architecture (UIMA) [1] framework is a growing platform for ...
A discourse constitutes a locally and globally coherent text in which words, clauses and sentences a...
Recent advances in natural language processing have produced libraries that extract low level featur...
The field of natural language processing (aka NLP) is an intersection of the study of linguistics, c...
This paper describes a system that learns discourse rules for domain-specific analysis of unrestrict...
[Context & Motivation] Developers need to learn about the requirements of software users, who give ...
Before sharing clinical data, personal and sensitive information must be removed or de-identified. T...
We address the problem of learning discourse-level merging strategies within the context of a natura...
The UIMA Framework aids in discovering knowledge from unstructured information by coordinating Analy...
The paper discusses about data cleaning techniques and machine learning algorithms.Iluustrative exam...
In this paper we will show how clustering techniques provide empirical evidence for a characterisati...
Current natural language processing systems have a wide coverage of English, but are unforgiving of ...
WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworksh...
Discourse parsing is an important task in natural language processing as it supports a wide range of...