Full Text Search and Indexing in Languages With Two Alphabets

Authors

  • Tijana Talić Paneuropean University APEIRON, Banja Luka

DOI:

https://doi.org/10.7251/JIT1401041T

Abstract

Abstract: The languages spoken in Bosnia and Herzegovina use both Cyrillic and Latin equally. This is an additional problem with indexing and full text searching. In this paper, we are analyzing this problem. Using the tools available on PostgreSQL and ispell dictionaries, we made a solution. As part of the solutions, we created a dictionary of stop words, adjusted the affix file for both alphabets and from the list of words made functional vocabularies for indexing and searching. We made a full search configuration which is useful for indexing texts in both alphabets.

Published

2014-06-29

Issue

Section

Чланци