Full Text Search and Indexing in Languages With Two Alphabets
DOI:
https://doi.org/10.7251/JIT1401041TAbstract
Abstract: The languages spoken in Bosnia and Herzegovina use both Cyrillic and Latin equally. This is an additional problem with indexing and full text searching. In this paper, we are analyzing this problem. Using the tools available on PostgreSQL and ispell dictionaries, we made a solution. As part of the solutions, we created a dictionary of stop words, adjusted the affix file for both alphabets and from the list of words made functional vocabularies for indexing and searching. We made a full search configuration which is useful for indexing texts in both alphabets.Downloads
Published
2014-06-29
Issue
Section
Чланци