Latent semantic indexing (LSI) is an information retrieval method that represents a dataset as a term-document matrix. LSI uses a matrix factorization method known as the partial singular value decomposition (PSVD) to reduce noise in the data by projecting the term-document matrix into a lower-dimensional vector space. Calculating the PSVD of a large term-document matrix is computationally expensive. In a rapidly expanding environment, a term-document matrix is altered often as new documents and terms are added. Recomputing the PSVD of the term-document matrix each time changes occur can be too expensive. Folding-in is one method of adding new documents or terms to an LSI database; updating the PSVD of the database is another method. Folding-in is computationally inexpensive, but may cause a loss of accuracy in the PSVD. PSVD-updating is more expensive, but maintains the PSVD accuracy. We introduce folding-up, a new method that combines folding-in and PSVD-updating. Folding-up offers a significant improvement in computation time compared with recomputing the PSVD or PSVD-updating, while avoiding the degradation in the PSVD that can occur when the folding-in method alone is used.
Jane E. Mason.
Jane E. Mason is currently a doctoral candidate at Dalhousie University in Halifax, Nova Scotia. She has won numerous research awards, including first place (graduate division) in the grand finals of the 2005 ACM international student research competition. Her research interests include both numerical linear algebra and information retrieval.