C1 - Journal Articles RefereedProbabilistic latent semantic analysis (PLSA) is a method for computing term and document relationships from a document set. The probabilistic latent semantic index (PLSI) has been used to store PLSA information, but unfortunately the PLSI uses excessive storage space relative to a simple term frequency index, which causes lengthy query times. To overcome the storage and speed problems of PLSI, we introduce the probabilistic latent semantic thesaurus (PLST); an efficient and effective method of storing the PLSA information. We show that through methods such as document thresholding and term pruning, we are able to maintain the high precision results found using PLSA while using a very small percent (0.15%) of t...
A dual probability model is constructed for the Latent Semantic Indexing (LSI) using the cosine simi...
Topic models have shown to be one of the most effective tools in Content-Based Multimedia Retrieval...
Abstract: Data collected from web are unclean, amorphous, formless and unstructured. In order to get...
Probabilistic latent semantic analysis (PLSA) is a method for computing term and document relationsh...
Due to the availability of internet-based abstract services and patent databases, bibliometric analy...
Due to the availability of internet-based abstract services and patent databases, bibliometric analy...
Due to the availability of internet-based abstract services and patent databases, bibliometric analy...
Probabilistic Latent Semantic Analysis (PLSA) is an effective technique for information re-trieval, ...
Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis ...
This research project consists of a system, which attempts to combine two methods of indexing docume...
Probabilistic Latent Semantic Analysis (PLSA) is an information retrieval technique proposed to im...
Abstract Experiments show that information retrieval and filtering can be much improved by Latent Se...
Due to the availability of internet-based abstract services and patent databases, bibliometric analy...
This paper proposes a novel statistical approach to intelligent document re-trieval. It seeks to off...
AbstractLatent semantic indexing (LSI) is an information retrieval technique based on the spectral a...
A dual probability model is constructed for the Latent Semantic Indexing (LSI) using the cosine simi...
Topic models have shown to be one of the most effective tools in Content-Based Multimedia Retrieval...
Abstract: Data collected from web are unclean, amorphous, formless and unstructured. In order to get...
Probabilistic latent semantic analysis (PLSA) is a method for computing term and document relationsh...
Due to the availability of internet-based abstract services and patent databases, bibliometric analy...
Due to the availability of internet-based abstract services and patent databases, bibliometric analy...
Due to the availability of internet-based abstract services and patent databases, bibliometric analy...
Probabilistic Latent Semantic Analysis (PLSA) is an effective technique for information re-trieval, ...
Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis ...
This research project consists of a system, which attempts to combine two methods of indexing docume...
Probabilistic Latent Semantic Analysis (PLSA) is an information retrieval technique proposed to im...
Abstract Experiments show that information retrieval and filtering can be much improved by Latent Se...
Due to the availability of internet-based abstract services and patent databases, bibliometric analy...
This paper proposes a novel statistical approach to intelligent document re-trieval. It seeks to off...
AbstractLatent semantic indexing (LSI) is an information retrieval technique based on the spectral a...
A dual probability model is constructed for the Latent Semantic Indexing (LSI) using the cosine simi...
Topic models have shown to be one of the most effective tools in Content-Based Multimedia Retrieval...
Abstract: Data collected from web are unclean, amorphous, formless and unstructured. In order to get...