The influence of preprocessing on the accuracy of the classification of posts on social networks about the corona virus

Authors

  • Jelena Lazić Elektrotehnički fakultet Univerzitet u Beogradu

DOI:

https://doi.org/10.7251/ZRSNG2223021L

Abstract

At the beginning of 2020, there was a pandemic caused by the corona virus. At the global level, restrictive measures were introduced in order to prevent the further spread of the virus, after which almost all aspects of life were reduced to working from home. The restriction of live communication has led to increased user activity on social networks. Analysis of the published content on them can provide insight into the feelings and attitudes that prevail among users. In this paper, a classification of announcements about the corona virus on the Twitter social network written in English was made. A publicly available database from the Kaggle platform was used. Tweets are classified based on their sentimental meaning into one of five classes: extremely positive, positive, neutral, negative, and extremely negative tweets. The goal of the paper is to examine how data preprocessing affects classification accuracy. The Naive Bayes classifier, KNN, and artificial neural networks were used. The results indicate that the method of preprocessing links and tagging does not affect the classification accuracy, but the method of processing hashtags can have an effect on accuracy.

Downloads

Published

2023-05-11