Sentiment Analysis on Social Media Content

Boris Borovčanin

doi:10.7251/JIT2601040B

Authors

Boris Borovčanin Department of Information Technologies, Faculty of Engineering, Natural and Medical Sciences, International Burch University, Sarajevo https://orcid.org/0009-0002-7993-0544

DOI:

https://doi.org/10.7251/JIT2601040B

Keywords:

sentiment analysis, twitter data, machine learning, performance metrics

Abstract

Following research evaluated conventional machine learning and deep learning algorithms used for the purpose of binary text classification, in accordance with previous research demonstrating advantages in supervised learning models such as Naive Bayes, Logistic Regression, and LSTM networks. Models that were subject of implementation are: Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Random Forest, and LSTM. Responses from nonprofit organizations have been cleaned, tokenized, and preprocessed implementing either TF-IDF vectorization or sequence trimming determined by the model that was chosen. The majority of the models were performed using 50,000 samples because of computational capacity limitations, whereas the LSTM was executed only with 5,000 samples. LinearSVC is implemented for the purpose of accelerating training of the SVM model, as well as Random Forest parameters optimization for algorithmic efficiency. On the other hand the LSTM model provided an embedding component and a single LSTM unit for maintaining the sequence information. The performance of the models was evaluated according to the accuracy, precision, recall, and F1 score metrics. The findings are indicating that fundamental models perform effectively and consistently, however the LSTM model demands more computational capacity to provide context for classification.

Sentiment Analysis on Social Media Content

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section