|
|
 |
|
Viewing report
|
|
 |
 |
Effective Techniques for Indonesian Text Retrieval. Edition No. 1
VDM Publishing House, June 2009, Pages: 292
In this thesis, we investigate information retrieval techniques for Indonesian.
Stemming is the process of reducing morphological variants of a word to a common stem form. Although several stemming algorithms have been proposed for Indonesian, there is no consensus on which gives better performance. We empirically explore these stemming algorithms, propose novel extensions to the best algorithm, develop a new Indonesian stemmer, and show that these can improve stemming correctness.
We propose a range of techniques to enhance the performance of Indonesian information retrieval. Our experiments show that many of these techniques can increase retrieval performance.
We also address the problem of automatic creation of parallel corpora which are essential for cross-lingual information retrieval and other natural language processing tasks, including machine translation. We describe algorithms that we have developed to automatically identify parallel documents for Indonesian and English. We also investigate the applicability of our identification algorithms for other languages that use the Latin alphabet including German and French.
|
 |
|
|