We explore a matrix-space model, that is a natural extension to the vector space model for Information Retrieval. Each document can be represented by a matrix that is based on document extracts (e.g. sentences, paragraphs, sections). We focus on the performance of this model for the specific case in which documents are originally represented as term-by-sentence matrices. We use the singular value decomposition to approximate the term-by-sentence matrices and assemble these results to form the pseudo-"term-document" matrix that forms the basis of a text mining method alternative to traditional VSM and LSI. We investigate the singular values of this matrix and provide experimental evidence suggesting that the method can be particula...
Statistical analysis prior to processing queries in text mining is important and can help text searc...
Aim of the paper is to propose a Text Mining strategy based on statistical tools, which make more ef...
The authors present a detailed analysis of matrices satisfying the so-called low-rank-plus-shift pro...
<p>(A–C) U and V<sup>T</sup> contain the LSI vectors for terms and documents, respectively while Σ c...
A semi-structured document has more structured information compared to an ordinary document, and the...
Document Clustering is an issue of measuring similarity between documents and grouping similar docum...
In this paper we propose a new method of classifying text documents. Unlike conventional vector spac...
Evidently there is a tremendous proliferation in the amount of information found today on the larges...
Topic modeling is a useful tool in computational social science, digital humanities, biology, and ch...
In this article we show the existence of a formal convergence between the matrix models of biologica...
Article publishing in Mathematics Exchange, 1(1), 2003.Text retrieval is an important area of resear...
A quick growth of internet technology makes it easy to assemble a huge volume of data as text docume...
. Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of...
Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singul...
Our capabilities for collecting and storing data of all kinds are greater then ever. On the other si...
Statistical analysis prior to processing queries in text mining is important and can help text searc...
Aim of the paper is to propose a Text Mining strategy based on statistical tools, which make more ef...
The authors present a detailed analysis of matrices satisfying the so-called low-rank-plus-shift pro...
<p>(A–C) U and V<sup>T</sup> contain the LSI vectors for terms and documents, respectively while Σ c...
A semi-structured document has more structured information compared to an ordinary document, and the...
Document Clustering is an issue of measuring similarity between documents and grouping similar docum...
In this paper we propose a new method of classifying text documents. Unlike conventional vector spac...
Evidently there is a tremendous proliferation in the amount of information found today on the larges...
Topic modeling is a useful tool in computational social science, digital humanities, biology, and ch...
In this article we show the existence of a formal convergence between the matrix models of biologica...
Article publishing in Mathematics Exchange, 1(1), 2003.Text retrieval is an important area of resear...
A quick growth of internet technology makes it easy to assemble a huge volume of data as text docume...
. Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of...
Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singul...
Our capabilities for collecting and storing data of all kinds are greater then ever. On the other si...
Statistical analysis prior to processing queries in text mining is important and can help text searc...
Aim of the paper is to propose a Text Mining strategy based on statistical tools, which make more ef...
The authors present a detailed analysis of matrices satisfying the so-called low-rank-plus-shift pro...