Chat Topic Classification for Student Counseling using Text Mining

Main Article Content

Akkapon Wongkoblap
Bongse Varavuddhi Muenyuddhi
Phichayasini Kitwatthanathawon
Satidchoke Phosaard
Thara Angskun
Warissadee Duangrasee
Jitimon Angskun

Abstract

Background and Objectives: Alongside their standard duties, providing guidance and support to students is a crucial responsibility for teachers. The effective counseling of students is enhanced by technology, which improves communication between teachers and students. This study employs text-mining techniques to analyze and categorize chat messages exchanged on social media between teachers and students, demonstrating how technology can facilitate this communication effectively.


Methodology: This research implements a text-mining technique involving natural language processing (NLP) to create a classification model for chat topics. Chat messages used in text mining are collected from 20 students and 10 teachers using the Line and Facebook Messenger apps. Messages from both platforms had similar characteristics and were combined for analysis. The 4,500 chat messages were gathered, and after the cleaning process, 2,610 messages remained. The classification model is accomplished with Term Frequency–Inverse Document Frequency (TF-IDF) and three machine learning methods: random forest (RF), support vector machines (SVM), and logistic regression (LR) to build text classifiers. The model will be used to predict the counseling objectives of students.


Main Results: The evaluation of model performance utilized a 10-fold cross-validation technique due to the small size of the dataset, which helps prevent overfitting. The experimental results showed that the model using the RF technique achieved the highest accuracy among all techniques, with an overall F1 score of 89.55 percent. This was followed by the SVM at 88.68 percent and LR at 88.06 percent. When analyzing the models based on chat topics, the highest F1 score was recorded for the topic titled "Leave," followed by "Urgency," "Score," and "Homework."


Discussions: The RF technique consistently yielded the highest values in all chat topics. These results indicate that the RF technique is the most effective at accurately classifying chats compared to other techniques. Moreover, the evaluation of the technique's performance in this study found that the model's errors were caused by the model identifying many duplicate words across all chat topics. These words are not typically used in data analysis to identify relationships. Thus, future analyses may involve using language experts to eliminate these words.


Conclusions: The research findings can be used to categorize chats and predict their topic for student counseling. These findings can also be used to develop automated communication tools, such as integrated chatbots with e-learning. In addition, the model helps to resolve issues and streamline communication, reducing student wait times. However, the designed system has some limitations. It requires an extensive vocabulary corpus for each type of chat topic to improve the model's accuracy using text-mining techniques. Creating a vocabulary corpus for each type of chat topic necessitates linguistic experts and consumes significant time. Furthermore, the data being analyzed is collected from social media, which includes emerging vocabulary, such as chat language, that presents challenges for the model. Several improvements can be made shortly. For instance, the developed model can be improved using deep learning techniques and engaging linguistic experts to understand word characteristics and chat language better.

Article Details

Section
Research Articles

References

Ali, M. S., Azam, F., Safdar, A., & Anwar, M. W. (2022, November). Intelligent agents in educational institutions: NEdBOT-NLP-based chatbot for administrative support using DialogFlow. In 2022 IEEE International Conference on Agents (ICA) (pp. 30-35). IEEE. https://doi.org/10.1109/ICA55837.2022.00012

Bloch, J. (2002). Student/teacher interaction via email: The social context of Internet discourse. Journal of Second Language Writing, 11(2), 117-134. https://doi.org/10.1016/S1060-3743(02)00064-4

Bouhnik, D., & Deshen, M. (2014). WhatsApp goes to school: Mobile instant messaging between teachers and students. Journal of Information Technology Education. Research, 13, 217. https://doi.org/10.28945/2051

Cahyani, D. E., & Patasik, I. (2021). Performance comparison of TF-IDF and Word2Vec models for emotion text classification. Bulletin of Electrical Engineering and Informatics, 10(5), 2780-2788. https://doi.org/10.11591/eei.v10i5.3157

Calabrese, A., Rivoli, A., Sciarrone, F., & Temperini, M. (2022, November). An intelligent chatbot supporting students in massive open online courses. In International Conference on Web-Based Learning (pp. 190-201). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-33023-0_17

Chowdhary, K. R. (2020). Natural language processing. In Fundamentals of Artificial Intelligence. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3972-7_19

Couronné, R., Probst, P. & Boulesteix, AL. (2018). Random Forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics 19, 270. https://doi.org/10.1186/s12859-018-2264-5

Deng, X., & Yu, Z. (2023). A meta-analysis and systematic review of the effect of chatbot technology use in sustainable education. Sustainability, 15(4), 2940. https://doi.org/10.3390/su15042940

Gaikwad, S. V., Chaugule, A., & Patil, P. (2014). Text mining methods and techniques. International Journal of Computer Applications, 85(17). https://doi.org/10.5120/14937-3507

Hendry, D., Darari, F., Nurfadillah, R., Khanna, G., Sun, M., Condylis, P. C., & Taufik, N. (2021). Topic modeling for customer service chats. In 2021 International Conference on Advanced Computer Science and Information Systems (ICACSIS) (pp. 1-6). IEEE. https://doi.org/10.1109/ICACSIS53237.2021.9631322

Hingmire, S., Chougule, S., Palshikar, G. K., & Chakraborti, S. (2013, July). Document classification by topic labeling. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval (pp. 877-880). https://doi.org/10.1145/2484028.2484140

Kim, S. W., & Gil, J. M. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-centric Computing and Information Sciences, 9, 1-21. https://doi.org/10.1186/s13673-019-0192-7

Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., & Brown, D. (2019). Text classification algorithms: A survey. Information, 10(4), 150. https://doi.org/10.3390/info10040150

Luan, J., Zhang, C., Xu, B., Xue, Y., & Ren, Y. (2020). The predictive performances of random forest models with limited sample size and different species traits. Fisheries Research, 227, 105534. https://doi.org/10.1016/j.fishres.2020.105534

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge University Press, pp. 118-120. https://doi.org/10.1017/CBO9780511809071

Meeprasert, W., & Rattagan, E. (2021). Voice of customer analysis on Twitter for Shopee Thailand. Journal of Information Systems in Business JISB, 7(3), 6. https://doi.org/10.14456/jisb.2021.11

More, A. S., & Rana, D. P. (2017). Review of random forest classification techniques to resolve data imbalance. In 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM) (pp. 72-78). IEEE. https://doi.org/10.1109/ICISIM.2017.8122151

Phaewattanakul, K., & Luenam, P. (2013). Opinion mining from online social networks. Modern Management Journal, 11(20), 11-20.

Prachaming, S., Pimkalee, N., & Udomson, N. (2017). The language used to communicate via chat application line. Journal of Roi Et Rajabhat University, 11(2), 80–89.

Sreelakshmi, A. S., Abhinaya, S. B., Nair, A., & Nirmala, S. J. (2019, November). A question answering and quiz generation chatbot for education. In 2019 Grace Hopper Celebration India (GHCI) (pp. 1-6). IEEE. https://doi.org/10.1109/GHCI47972.2019.9071832

Tabassum, A., & Patil, R. R. (2020). A survey on text pre-processing & feature extraction techniques in natural language processing. International Research Journal of Engineering and Technology (IRJET), 7(06), 4864-4867.