In this paper, we introduce Profiling–UD, a new text analysis tool inspired to the principles of linguistic profiling that can support language variation research from different perspectives. It allows the extraction of more than 130 features, spanning across different levels of linguistic description. Beyond the large number of features that can be monitored, a main novelty of Profiling–UD is that it has been specifically devised to be multilingual since it is based on the Universal Dependencies framework. In the second part of the paper, we demonstrate the effectiveness of these features in a number of theoretical and applicative studies in which they were successfully used for text and author profilin
Texts are composed for multiple audiences and for numerous purposes. Each form of text follows a set...
International audienceThe existence of universal models to describe the syntax of languages has been...
Author profiling and identification are two areas of data-driven computational linguistics that have...
A new technique is introduced, linguistic profiling, in which large numbers of counts of linguistic ...
Language sample analysis provides rich information about the language abilities in the written or sp...
Since computer technology became widespread available at universities during the last quarter of the...
Contains fulltext : 61135.pdf (author's version ) (Open Access)23 augustus 20047 p
Most natural language models and tools are restricted to one language, typically English. For resear...
This paper describes various experiments done to investigate author profiling of tweets in 4 differe...
A linguist uses various kinds of linguistic data – both text corpora or text collections and dictio...
In this work, we discuss the benefits of using automatically parsed corpora to study language variat...
Language varies not only between countries, but also along regional and socio-demographic lines. Thi...
This is a study exploring the feasibility of a fully automated analysis of linguistic data. It ident...
Moving from the assumption that formal, rather than content features, can be used to detect differen...
PhD ThesisThe objective of this thesis is to test if it is possible to design a program (Automatic ...
Texts are composed for multiple audiences and for numerous purposes. Each form of text follows a set...
International audienceThe existence of universal models to describe the syntax of languages has been...
Author profiling and identification are two areas of data-driven computational linguistics that have...
A new technique is introduced, linguistic profiling, in which large numbers of counts of linguistic ...
Language sample analysis provides rich information about the language abilities in the written or sp...
Since computer technology became widespread available at universities during the last quarter of the...
Contains fulltext : 61135.pdf (author's version ) (Open Access)23 augustus 20047 p
Most natural language models and tools are restricted to one language, typically English. For resear...
This paper describes various experiments done to investigate author profiling of tweets in 4 differe...
A linguist uses various kinds of linguistic data – both text corpora or text collections and dictio...
In this work, we discuss the benefits of using automatically parsed corpora to study language variat...
Language varies not only between countries, but also along regional and socio-demographic lines. Thi...
This is a study exploring the feasibility of a fully automated analysis of linguistic data. It ident...
Moving from the assumption that formal, rather than content features, can be used to detect differen...
PhD ThesisThe objective of this thesis is to test if it is possible to design a program (Automatic ...
Texts are composed for multiple audiences and for numerous purposes. Each form of text follows a set...
International audienceThe existence of universal models to describe the syntax of languages has been...
Author profiling and identification are two areas of data-driven computational linguistics that have...