This paper focuses on the topic identification for the Arabic language based on topic models. We study the Latent Dirichlet Allocation (LDA) as an unsupervised method for the Arabic topic identification. Thus, a deep study of LDA is carried out at two levels: Stemming process and the choice of LDA hyper-parameters. For the first level, we study the effect of different Arabic stemmers on LDA. For the second level, we focus on LDA hyper-parameters α and β and their impact on the topic identification. This study shows that LDA is an efficient method for Arabic topic identification especially with the right choice of hyper-parameters. Another important result is the high impact of the stemming algorithm on topic identification.Cet article met l...
This paper investigates the use of latent topic modeling for spoken language recognition, where a to...
Currently, exist a large amount of news in a digital format needs to be classified or labeled automa...
During the past few years, the construction of digitalized content is rapidly increasing, raising th...
In this paper, we present a new algorithm based on the LDA (Latent Dirichlet Allocation) and the Sup...
This paper explains for the Arabic language, how to extract named entities and topics from news arti...
This paper is in the field of natural language processing. It applied unsupervised machine learning ...
International audienceIn this paper we present two well-known methods for topic identification. The ...
International audienceTopic Identification is one of the important keysfor the success of many appli...
Arabic topic identification is a part of text classification that aims to assign a given text a set ...
This paper deals with the problem of automatic theme identification of noisy Arabic texts. Actually,...
One of the main factors that characterize a text is its content. Nowadays, the number of documents s...
International audienceThis paper focuses on studying topic identificationfor Arabic language by usin...
Topic Modeling is a statistical process, which derives the latent themes from extensive collections ...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
International audienceWe tackle the task of author identification at PAN 2015 through a Latent Diric...
This paper investigates the use of latent topic modeling for spoken language recognition, where a to...
Currently, exist a large amount of news in a digital format needs to be classified or labeled automa...
During the past few years, the construction of digitalized content is rapidly increasing, raising th...
In this paper, we present a new algorithm based on the LDA (Latent Dirichlet Allocation) and the Sup...
This paper explains for the Arabic language, how to extract named entities and topics from news arti...
This paper is in the field of natural language processing. It applied unsupervised machine learning ...
International audienceIn this paper we present two well-known methods for topic identification. The ...
International audienceTopic Identification is one of the important keysfor the success of many appli...
Arabic topic identification is a part of text classification that aims to assign a given text a set ...
This paper deals with the problem of automatic theme identification of noisy Arabic texts. Actually,...
One of the main factors that characterize a text is its content. Nowadays, the number of documents s...
International audienceThis paper focuses on studying topic identificationfor Arabic language by usin...
Topic Modeling is a statistical process, which derives the latent themes from extensive collections ...
Latent Dirichlet Allocation (LDA) is a popular machine-learning technique that identifies latent str...
International audienceWe tackle the task of author identification at PAN 2015 through a Latent Diric...
This paper investigates the use of latent topic modeling for spoken language recognition, where a to...
Currently, exist a large amount of news in a digital format needs to be classified or labeled automa...
During the past few years, the construction of digitalized content is rapidly increasing, raising th...