We investigate the use of multiword features to improve Arabic document classification. The Arabic language is both morphologically rich and highly inflected. Accordingly it presents more challenges when enhancing Arabic information retrieval to a level comparable to English. The multiword features are modeled as a combination of words appearing within windows of varying sizes. Our experiments show multiword features combined with dice similarity distance outperform the cosine similarity function and produce results that are comparable to TF-IDF representation. Multiword features are under-explored and we believe they have the potential to improve Arabic information retrieval and, in particular, Arabic document classification
International audienceThere have been great improvements in web technology over the past years which...
Information retrieval aims to provide an easy information access to a user. To achieve this goal, an...
This paper compares and contrasts two feature selection techniques when applied to Arabic corpus; in...
Abstract-Document categorization is an important topic that is central to many applications that dem...
This paper describes the impact of dataset characteristics on the results of Arabic document classif...
Text Categorization (classification) is the process of classifying documents into a predefined set o...
Feature selection problem is one of the main important problems in the text and data mining domain. ...
Cosine similarity is one of the most popular distance measures in text classification problems. In t...
Preprocessing is one of the main components in a conventional document categorization (DC) framework...
This paper describes the impact of dataset characteristics on the results of Arabic document classif...
There is a huge content of Arabic text available over online that requires an organization of these ...
There is a huge content of Arabic text available over online that requires an organization of these ...
Abstract. The Arabic language is a highly flexional and morphologically very rich language. It prese...
International audienceWe study the performance of Arabic text classification combining various techn...
© 2018 Elsevier Ltd Multi-label text categorization refers to the problem of assigning each document...
International audienceThere have been great improvements in web technology over the past years which...
Information retrieval aims to provide an easy information access to a user. To achieve this goal, an...
This paper compares and contrasts two feature selection techniques when applied to Arabic corpus; in...
Abstract-Document categorization is an important topic that is central to many applications that dem...
This paper describes the impact of dataset characteristics on the results of Arabic document classif...
Text Categorization (classification) is the process of classifying documents into a predefined set o...
Feature selection problem is one of the main important problems in the text and data mining domain. ...
Cosine similarity is one of the most popular distance measures in text classification problems. In t...
Preprocessing is one of the main components in a conventional document categorization (DC) framework...
This paper describes the impact of dataset characteristics on the results of Arabic document classif...
There is a huge content of Arabic text available over online that requires an organization of these ...
There is a huge content of Arabic text available over online that requires an organization of these ...
Abstract. The Arabic language is a highly flexional and morphologically very rich language. It prese...
International audienceWe study the performance of Arabic text classification combining various techn...
© 2018 Elsevier Ltd Multi-label text categorization refers to the problem of assigning each document...
International audienceThere have been great improvements in web technology over the past years which...
Information retrieval aims to provide an easy information access to a user. To achieve this goal, an...
This paper compares and contrasts two feature selection techniques when applied to Arabic corpus; in...