Sentiment analysis of coronavirus data with ensemble and machine learning methods

Yıl 2024, Cilt: 8 Sayı: 2, 175 – 185, 30.04.2024

https://doi.org/10.31127/tuje.1352481

Öz

The coronavirus pandemic has distanced people from social life and increased the use of social media. People’s emotions can be determined with text data collected from social media applications. This is used in many fields, especially in commerce. This study aims to predict people’s sentiments about the pandemic by applying sentiment analysis to Twitter tweets about the pandemic using single machine learning classifiers (Decision Tree-DT, K-Nearest Neighbor-KNN, Logistic Regression-LR, Naïve Bayes-NB, Random Forest-RF) and ensemble learning methods (Majority Voting (MV), Probabilistic Voting (PV), and Stacking (STCK)). After vectorizing the tweets using two predictive methods, Word2Vec (W2V) and Doc2Vec, and two traditional word representation methods, Term Frequency-Inverse Document Frequency (TF-IDF) and Bag of Words (BOW), classification models built using single machine learning classifiers were compared to models built using ensemble learning methods (MV, PV and STCK) by heterogeneously combining single machine classifier algorithms. Accuracy (ACC), F-measure (F), precision (P), and recall (R) were used as performance measures, with training/test separation rates of 70%-30% and 80%-20%, respectively. Among these models, the ACC of ensemble learning models ranged from 89% to 73%, while the ACC of single classifier models ranged from 60% to 80%. Among the ensemble learning methods, STCK with Doc2Vec text representation/embedding method gave the best ACC result of 89%. According to the experimental results, ensemble models built with heterogeneous machine learning classifier algorithms gave better results than single machine learning classifier algorithms.

Anahtar Kelimeler

Ensemble learning, Machine learning, Sentiment analysis, Text representation, Word embedding

Etik Beyan

This study was not conducted on any animals or humans.

Destekleyen Kurum

None

Proje Numarası

None

Teşekkür

There is no organization providing any financial support within the scope of the study

Kaynakça

  • Cauberghe, V., Van Wesenbeeck, I., De Jans, S., Hudders, L., & Ponnet, K. (2021). How adolescents use social media to cope with feelings of loneliness and anxiety during COVID-19 lockdown. Cyberpsychology, Behavior, and Social Networking, 24(4), 250-257. https://doi.org/10.1089/cyber.2020.0478
  • Vernikou, S., Lyras, A., & Kanavos, A. (2022). Multiclass sentiment analysis on COVID-19-related tweets using deep learning models. Neural Computing and Applications, 34(22), 19615-19627. https://doi.org/10.1007/s00521-022-07650-2
  • Antonio, V. D., Efendi, S., & Mawengkang, H. (2022). Sentiment analysis for Covid-19 in Indonesia on Twitter with TF-IDF featured extraction and stochastic gradient descent. International Journal of Nonlinear Analysis and Applications, 13(1), 1367-1373. https://doi.org/10.22075/IJNAA.2021.5735
  • Machuca, C. R., Gallardo, C., & Toasa, R. M. (2021). Twitter sentiment analysis on coronavirus: Machine learning approach. In Journal of Physics: Conference Series, 1828(1), 012104. https://doi.org/10.1088/1742-6596/1828/1/012104
  • Barkur, G., & Kamath, G. B. (2020). Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: Evidence from India. Asian Journal of Psychiatry, 51, 102089. https://doi.org/10.1016/j.ajp.2020.102089
  • Isnain, A. R., Marga, N. S., & Alita, D. (2021). Sentiment analysis of government policy on corona case using Naive Bayes Algorithm. Indonesian Journal of Computing and Cybernetics Systems, 15(1), 55-64. https://doi.org/10.22146/ijccs.60718
  • Siddiqua, U. A., Ahsan, T., & Chy, A. N. (2016). Combining a rule-based classifier with ensemble of feature sets and machine learning techniques for sentiment analysis on microblog. In 2016 19th International Conference on Computer and Information Technology, 304-309. https://doi.org/10.1109/ICCITECHN.2016.7860214
  • Mahendrajaya, R., Buntoro, G. A., & Setyawan, M. B. (2019). Analisis Sentimen Pengguna Gopay Menggunakan Metode Lexicon Based Dan Support Vector Machine. Komputek, 3 (2), 52.
  • Rahman, M. M., & Islam, M. N. (2022). Exploring the performance of ensemble machine learning classifiers for sentiment analysis of COVID-19 tweets. In Sentimental Analysis and Deep Learning: Proceedings of ICSADL 2021, 383-396. https://doi.org/10.1007/978-981-16-5157-1_30
  • Bania, R. K. (2020). COVID-19 public tweets sentiment analysis using TF-IDF and inductive learning models. INFOCOMP Journal of Computer Science, 19(2), 23-41.
  • Antonio, V. D. (2021). Performance analysis of TF-IDF feature extraction for stochastic gradient descent classification algorithm on sentiment analysis of Indonesian texts. [Doctoral Dissertation, Universitas Sumatera Utara].
  • Amalia, C., & Sibaroni, Y. (2020). Analisis sentimen data tweet menggunakan model jaringan saraf tiruan dengan pembobotan delta tf-idf. eProceedings of Engineering, 7(2), 7810-7820.
  • Ly, D., & Saad Abdul Malik, T. (2021). How can a module for sentiment analysis be designed to classify tweets about covid19. [Student thesis, University of Borås].
  • Bhardwaj, M., Mishra, P., Badhani, S., & Muttoo, S. K. (2023). Sentiment analysis and topic modeling of COVID-19 tweets of India. International Journal of System Assurance Engineering and Management, 1-21. https://doi.org/10.1007/s13198-023-02082-0
  • AlZoubi, O., Shatnawi, F., Rawashdeh, S., Yassein, M. B., & Hmeidi, I. (2022). Detecting COVID-19 Implication on Education and Economic in Arab World Using Sentiment Analysis Techniques of Twitter Data. In 2022 13th International Conference on Information and Communication Systems, 352-357. https://doi.org/10.1109/ICICS55353.2022.9811166
  • Miglani, A. (2020). Coronavirus tweets nlp-text classification.https://www.kaggle.com/datatattle/covid-19-nlp-textclassification
  • Huanling, T., Hui, Z., Hongmin, W., Han, Z., Xueli, M., Mingyu, L., & Jin, G. (2023). Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model. Chinese Journal of Electronics, 32(3), 647-654. https://doi.org/10.23919/cje.2021.00.113
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667. https://doi.org/10.1016/j.procs.2021.12.187
  • Dündar, A., & Kakışım, A. (2021). Kıyafet Öneri Sistemi için Giyim Metaverilerine dayalı Temsil Öğrenimi. Avrupa Bilim ve Teknoloji Dergisi, (29), 105-110. https://doi.org/10.31590/ejosat.1008736
  • Başarslan, M. S., & Kayaalp, F. (2019). Performance analysis of fuzzy rough set-based and correlation-based attribute selection methods on detection of chronic kidney disease with various classifiers. In 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science, 1-5. https://doi.org/10.1109/EBBT.2019.8741688
  • Turgut, Z., & Kakisim, A. G. (2024). An explainable hybrid deep learning architecture for WiFi-based indoor localization in Internet of Things environment. Future Generation Computer Systems, 151, 196-213. https://doi.org/10.1016/j.future.2023.10.003
  • Basarslan, M. S., Bakir, H., & Yücedağ, İ. (2019). Fuzzy logic and correlation-based hybrid classification on hepatitis disease data set. The International Conference on Artificial Intelligence and Applied Mathematics in Engineering, 787-800. https://doi.org/10.1007/978-3-030-36178-5_68
  • Rahardi, M., Aminuddin, A., Abdulloh, F. F., & Nugroho, R. A. (2022). Sentiment analysis of Covid-19 vaccination using support vector machine in Indonesia. International Journal of Advanced Computer Science and Applications, 13(6), 534-539.
  • Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. https://doi.org/10.1109/TIT.1967.1053964
  • Kakisim, A. G. (2022). Enhancing attributed network embedding via enriched attribute representations. Applied Intelligence, 52(2), 1566-1580. https://doi.org/10.1007/s10489-021-02498-w
  • Mohammed, A., & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University-Computer and Information Sciences, 35(2), 757-774. https://doi.org/10.1016/j.jksuci.2023.01.014
  • Onan, A. (2020). Mining opinions from instructor evaluation reviews: a deep learning approach. Computer Applications in Engineering Education, 28(1), 117-138. https://doi.org/10.1002/cae.22179
  • Kakisim, A. G., Turgut, Z., & Atmaca, T. (2023). XAI empowered dual band Wi-Fi based indoor localization via ensemble learning. In 2023 14th International Conference on Network of the Future (NoF), 150-158. https://doi.org/10.1109/NoF58724.2023.10302788
  • Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. https://doi.org/10.1109/MCAS.2006.1688199
  • Öztürk, T., Turgut, Z., Akgün, G., & Köse, C. (2022). Machine learning-based intrusion detection for SCADA systems in healthcare. Network Modeling Analysis in Health Informatics and Bioinformatics, 11, 47. https://doi.org/10.1007/s13721-022-00390-2
  • Kayaalp, F., Basarslan, M. S., & Polat, K. (2018). A hybrid classification example in describing chronic kidney disease. In 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT), 1-4. https://doi.org/10.1109/EBBT.2018.8391444

Yıl 2024, Cilt: 8 Sayı: 2, 175 – 185, 30.04.2024

https://doi.org/10.31127/tuje.1352481

Öz

Proje Numarası

None

Kaynakça

  • Cauberghe, V., Van Wesenbeeck, I., De Jans, S., Hudders, L., & Ponnet, K. (2021). How adolescents use social media to cope with feelings of loneliness and anxiety during COVID-19 lockdown. Cyberpsychology, Behavior, and Social Networking, 24(4), 250-257. https://doi.org/10.1089/cyber.2020.0478
  • Vernikou, S., Lyras, A., & Kanavos, A. (2022). Multiclass sentiment analysis on COVID-19-related tweets using deep learning models. Neural Computing and Applications, 34(22), 19615-19627. https://doi.org/10.1007/s00521-022-07650-2
  • Antonio, V. D., Efendi, S., & Mawengkang, H. (2022). Sentiment analysis for Covid-19 in Indonesia on Twitter with TF-IDF featured extraction and stochastic gradient descent. International Journal of Nonlinear Analysis and Applications, 13(1), 1367-1373. https://doi.org/10.22075/IJNAA.2021.5735
  • Machuca, C. R., Gallardo, C., & Toasa, R. M. (2021). Twitter sentiment analysis on coronavirus: Machine learning approach. In Journal of Physics: Conference Series, 1828(1), 012104. https://doi.org/10.1088/1742-6596/1828/1/012104
  • Barkur, G., & Kamath, G. B. (2020). Sentiment analysis of nationwide lockdown due to COVID 19 outbreak: Evidence from India. Asian Journal of Psychiatry, 51, 102089. https://doi.org/10.1016/j.ajp.2020.102089
  • Isnain, A. R., Marga, N. S., & Alita, D. (2021). Sentiment analysis of government policy on corona case using Naive Bayes Algorithm. Indonesian Journal of Computing and Cybernetics Systems, 15(1), 55-64. https://doi.org/10.22146/ijccs.60718
  • Siddiqua, U. A., Ahsan, T., & Chy, A. N. (2016). Combining a rule-based classifier with ensemble of feature sets and machine learning techniques for sentiment analysis on microblog. In 2016 19th International Conference on Computer and Information Technology, 304-309. https://doi.org/10.1109/ICCITECHN.2016.7860214
  • Mahendrajaya, R., Buntoro, G. A., & Setyawan, M. B. (2019). Analisis Sentimen Pengguna Gopay Menggunakan Metode Lexicon Based Dan Support Vector Machine. Komputek, 3 (2), 52.
  • Rahman, M. M., & Islam, M. N. (2022). Exploring the performance of ensemble machine learning classifiers for sentiment analysis of COVID-19 tweets. In Sentimental Analysis and Deep Learning: Proceedings of ICSADL 2021, 383-396. https://doi.org/10.1007/978-981-16-5157-1_30
  • Bania, R. K. (2020). COVID-19 public tweets sentiment analysis using TF-IDF and inductive learning models. INFOCOMP Journal of Computer Science, 19(2), 23-41.
  • Antonio, V. D. (2021). Performance analysis of TF-IDF feature extraction for stochastic gradient descent classification algorithm on sentiment analysis of Indonesian texts. [Doctoral Dissertation, Universitas Sumatera Utara].
  • Amalia, C., & Sibaroni, Y. (2020). Analisis sentimen data tweet menggunakan model jaringan saraf tiruan dengan pembobotan delta tf-idf. eProceedings of Engineering, 7(2), 7810-7820.
  • Ly, D., & Saad Abdul Malik, T. (2021). How can a module for sentiment analysis be designed to classify tweets about covid19. [Student thesis, University of Borås].
  • Bhardwaj, M., Mishra, P., Badhani, S., & Muttoo, S. K. (2023). Sentiment analysis and topic modeling of COVID-19 tweets of India. International Journal of System Assurance Engineering and Management, 1-21. https://doi.org/10.1007/s13198-023-02082-0
  • AlZoubi, O., Shatnawi, F., Rawashdeh, S., Yassein, M. B., & Hmeidi, I. (2022). Detecting COVID-19 Implication on Education and Economic in Arab World Using Sentiment Analysis Techniques of Twitter Data. In 2022 13th International Conference on Information and Communication Systems, 352-357. https://doi.org/10.1109/ICICS55353.2022.9811166
  • Miglani, A. (2020). Coronavirus tweets nlp-text classification.https://www.kaggle.com/datatattle/covid-19-nlp-textclassification
  • Huanling, T., Hui, Z., Hongmin, W., Han, Z., Xueli, M., Mingyu, L., & Jin, G. (2023). Representation of Semantic Word Embeddings Based on SLDA and Word2vec Model. Chinese Journal of Electronics, 32(3), 647-654. https://doi.org/10.23919/cje.2021.00.113
  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.
  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • Hidayat, T. H. J., Ruldeviyani, Y., Aditama, A. R., Madya, G. R., Nugraha, A. W., & Adisaputra, M. W. (2022). Sentiment analysis of twitter data related to Rinca Island development using Doc2Vec and SVM and logistic regression as classifier. Procedia Computer Science, 197, 660-667. https://doi.org/10.1016/j.procs.2021.12.187
  • Dündar, A., & Kakışım, A. (2021). Kıyafet Öneri Sistemi için Giyim Metaverilerine dayalı Temsil Öğrenimi. Avrupa Bilim ve Teknoloji Dergisi, (29), 105-110. https://doi.org/10.31590/ejosat.1008736
  • Başarslan, M. S., & Kayaalp, F. (2019). Performance analysis of fuzzy rough set-based and correlation-based attribute selection methods on detection of chronic kidney disease with various classifiers. In 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science, 1-5. https://doi.org/10.1109/EBBT.2019.8741688
  • Turgut, Z., & Kakisim, A. G. (2024). An explainable hybrid deep learning architecture for WiFi-based indoor localization in Internet of Things environment. Future Generation Computer Systems, 151, 196-213. https://doi.org/10.1016/j.future.2023.10.003
  • Basarslan, M. S., Bakir, H., & Yücedağ, İ. (2019). Fuzzy logic and correlation-based hybrid classification on hepatitis disease data set. The International Conference on Artificial Intelligence and Applied Mathematics in Engineering, 787-800. https://doi.org/10.1007/978-3-030-36178-5_68
  • Rahardi, M., Aminuddin, A., Abdulloh, F. F., & Nugroho, R. A. (2022). Sentiment analysis of Covid-19 vaccination using support vector machine in Indonesia. International Journal of Advanced Computer Science and Applications, 13(6), 534-539.
  • Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21-27. https://doi.org/10.1109/TIT.1967.1053964
  • Kakisim, A. G. (2022). Enhancing attributed network embedding via enriched attribute representations. Applied Intelligence, 52(2), 1566-1580. https://doi.org/10.1007/s10489-021-02498-w
  • Mohammed, A., & Kora, R. (2023). A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University-Computer and Information Sciences, 35(2), 757-774. https://doi.org/10.1016/j.jksuci.2023.01.014
  • Onan, A. (2020). Mining opinions from instructor evaluation reviews: a deep learning approach. Computer Applications in Engineering Education, 28(1), 117-138. https://doi.org/10.1002/cae.22179
  • Kakisim, A. G., Turgut, Z., & Atmaca, T. (2023). XAI empowered dual band Wi-Fi based indoor localization via ensemble learning. In 2023 14th International Conference on Network of the Future (NoF), 150-158. https://doi.org/10.1109/NoF58724.2023.10302788
  • Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21-45. https://doi.org/10.1109/MCAS.2006.1688199
  • Öztürk, T., Turgut, Z., Akgün, G., & Köse, C. (2022). Machine learning-based intrusion detection for SCADA systems in healthcare. Network Modeling Analysis in Health Informatics and Bioinformatics, 11, 47. https://doi.org/10.1007/s13721-022-00390-2
  • Kayaalp, F., Basarslan, M. S., & Polat, K. (2018). A hybrid classification example in describing chronic kidney disease. In 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT), 1-4. https://doi.org/10.1109/EBBT.2018.8391444

Toplam 33 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular İletişim Mühendisliği (Diğer)
BölümArticles
Yazarlar

Muhammet Sinan Başarslan İSTANBUL MEDENİYET ÜNİVERSİTESİ 0000-0002-7996-9169 Türkiye

Fatih Kayaalp DUZCE UNIVERSITY 0000-0002-8752-3335 Türkiye

Proje NumarasıNone
Erken Görünüm Tarihi7 Nisan 2024
Yayımlanma Tarihi30 Nisan 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 8 Sayı: 2

Kaynak Göster

APABaşarslan, M. S., & Kayaalp, F. (2024). Sentiment analysis of coronavirus data with ensemble and machine learning methods. Turkish Journal of Engineering, 8(2), 175-185. https://doi.org/10.31127/tuje.1352481
AMABaşarslan MS, Kayaalp F. Sentiment analysis of coronavirus data with ensemble and machine learning methods. TUJE. Nisan 2024;8(2):175-185. doi:10.31127/tuje.1352481
ChicagoBaşarslan, Muhammet Sinan, ve Fatih Kayaalp. “Sentiment Analysis of Coronavirus Data With Ensemble and Machine Learning Methods”. Turkish Journal of Engineering 8, sy. 2 (Nisan 2024): 175-85. https://doi.org/10.31127/tuje.1352481.
EndNoteBaşarslan MS, Kayaalp F (01 Nisan 2024) Sentiment analysis of coronavirus data with ensemble and machine learning methods. Turkish Journal of Engineering 8 2 175–185.
IEEEM. S. Başarslan ve F. Kayaalp, “Sentiment analysis of coronavirus data with ensemble and machine learning methods”, TUJE, c. 8, sy. 2, ss. 175–185, 2024, doi: 10.31127/tuje.1352481.
ISNADBaşarslan, Muhammet Sinan – Kayaalp, Fatih. “Sentiment Analysis of Coronavirus Data With Ensemble and Machine Learning Methods”. Turkish Journal of Engineering 8/2 (Nisan 2024), 175-185. https://doi.org/10.31127/tuje.1352481.
JAMABaşarslan MS, Kayaalp F. Sentiment analysis of coronavirus data with ensemble and machine learning methods. TUJE. 2024;8:175–185.
MLABaşarslan, Muhammet Sinan ve Fatih Kayaalp. “Sentiment Analysis of Coronavirus Data With Ensemble and Machine Learning Methods”. Turkish Journal of Engineering, c. 8, sy. 2, 2024, ss. 175-8, doi:10.31127/tuje.1352481.
VancouverBaşarslan MS, Kayaalp F. Sentiment analysis of coronavirus data with ensemble and machine learning methods. TUJE. 2024;8(2):175-8.

Download or read online: Click here