Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties. Therefore, studying the history of the language has so far been mostly limited to manual analyses on a small scale. In this work, we present a large-scale historical corpus of the written Arabic language, spanning 1400 years. We describe our efforts to clean and process this corpus using Arabic NLP tools, including the identification of reused text. We study the history of the Arabic language using a novel automatic periodization algorithm, as well as other techniques. Our findings confirm the established division of written Arabic into Modern Standard and Classical Arabic, and confir...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
Research into statistical parsing for English has enjoyed over a decade of successful results. Howev...
Arabic is a widely-spoken language with a long and rich history, but existing corpora and language t...
Arabic is a widely-spoken language with a long and rich history, but existing corpora and language t...
The term corpus comes from Latin and means “body”. According to corpus linguists, a corpus can be de...
Classical Arabic forms the basis of Arabic linguistic theory and it is well understood by the educat...
Over the past two decades, since around 2000, Arabic NLP researchers have investigated a variety of ...
Language Engineering, including Information Retrieval, Machine Translation and other Natural Languag...
International audienceWe examine the relationship between Arabic grammar and the corpora to explain ...
Due to the rapid developments in technology and the sudden expansion of social media use, Dialect Ar...
Our Artificial Intelligence research group at the University of Leeds has collected, analysed and an...
To comprehend how Arabic became a pluricentric language, we need to navigate through its rich histor...
The varied textual traditions of the premodern Islamicate World represent an opportunity and a probl...
Treball de fi de màster en Lingüística Teòrica i AplicadaIn the last few years, there has been an in...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
Research into statistical parsing for English has enjoyed over a decade of successful results. Howev...
Arabic is a widely-spoken language with a long and rich history, but existing corpora and language t...
Arabic is a widely-spoken language with a long and rich history, but existing corpora and language t...
The term corpus comes from Latin and means “body”. According to corpus linguists, a corpus can be de...
Classical Arabic forms the basis of Arabic linguistic theory and it is well understood by the educat...
Over the past two decades, since around 2000, Arabic NLP researchers have investigated a variety of ...
Language Engineering, including Information Retrieval, Machine Translation and other Natural Languag...
International audienceWe examine the relationship between Arabic grammar and the corpora to explain ...
Due to the rapid developments in technology and the sudden expansion of social media use, Dialect Ar...
Our Artificial Intelligence research group at the University of Leeds has collected, analysed and an...
To comprehend how Arabic became a pluricentric language, we need to navigate through its rich histor...
The varied textual traditions of the premodern Islamicate World represent an opportunity and a probl...
Treball de fi de màster en Lingüística Teòrica i AplicadaIn the last few years, there has been an in...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy...
Research into statistical parsing for English has enjoyed over a decade of successful results. Howev...