Full Text Search and Indexing in Languages With Two Alphabets

Tijana Talić

doi:10.7251/JIT1401041T

Authors

Tijana Talić Paneuropean University APEIRON, Banja Luka

DOI:

https://doi.org/10.7251/JIT1401041T

Abstract

Abstract: The languages spoken in Bosnia and Herzegovina use both Cyrillic and Latin equally. This is an additional problem with indexing and full text searching. In this paper, we are analyzing this problem. Using the tools available on PostgreSQL and ispell dictionaries, we made a solution. As part of the solutions, we created a dictionary of stop words, adjusted the affix file for both alphabets and from the list of words made functional vocabularies for indexing and searching. We made a full search configuration which is useful for indexing texts in both alphabets.

Full Text Search and Indexing in Languages With Two Alphabets

Authors

DOI:

Abstract

Downloads

Published

Issue

Section