Detection the topics of Facebook posts using text mining with Latent Dirichlet Allocation (LDA) algorithm

Shahlaa Mashhadani

doi:10.30526/38.1.4033

Authors

Shahlaa Mashhadani Computer Science Department, College of Education for Pure Sciences (Ibn Al-Haitham), University of Baghdad, Baghdad, Iraq. https://orcid.org/0000-0003-2193-4796

DOI:

https://doi.org/10.30526/38.1.4033

Keywords:

Data Mining, Text Mining, Latent Dirichlet Allocation (LDA) algorithm, Digital Forensics, Machine Learning Techniques

Abstract

The development of artificial intelligence technologies has led to their massive integration in various fields, including daily life. Text data plays a pivotal role in the world of artificial intelligence, especially in machine learning, allowing valuable insights to be extracted from massive data sets to help make informed decisions. Latent Dirichlet Allocation (LDA) and digital forensics intersect through analyzing and classifying textual digital evidence in social media, including Facebook, in which text data is the main focus. This technique is particularly a useful topic modeling technique for uncovering hidden patterns in text data, which can be particularly useful in digital forensics taken from Facebook, including text analysis and evidence discovery, where LDA is used to extract large amounts of unstructured text data from meaningful topics, such as emails, documents, or chat logs. Investigators often deal with huge amounts of text-based evidence, so this technique helps them identify topics, such as fraud, especially in relation to text data, which is the core of our research. It not only improves effort and time but also carries a huge potential for security packages. This work presents a method for processing Facebook posts with the help of a Latent Dirichlet Allocation (LDA) ruleset to classify these texts into coherent themes. The significance of the research lies in its ability to discover themes within each post, which is crucial for analyzing user behavior and addressing security concerns. The use of relevant Facebook data enhances the real-world relevance of the results, facilitating targeted analysis based on the language patterns used by users in these posts and thus contributing to the success of security objectives. In evaluating existing methodologies, this study demonstrates improved performance by optimizing the LDA ruleset to more accurately match the unique features of the target statistics. This improvement leads to improved performance and reduced errors. The results of this study demonstrate the effectiveness of using the LDA approach, as it showed significant improvements over traditional strategies in terms of accuracy and applicability to real-world security situations and digital analytics.

References

Bormida MD. The Big Data World: Benefits, Threats and Ethical Challenges. Ethical Issues in Covert, Security and Surveillance Research. Advances in Research Ethics and Integrity. Leeds: Emerald Publishing Limited; 2021. p. 71–91. Available from: https://doi.org/10.1108/S2398-601820210000008007

Jelodar H, Wang Y, Yuan C, Feng X, Jiang X, Li Y, Zhao L. Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimedia Tools Appl. 2019;78:15169–211. Available from: https://doi.org/10.1007/s11042-018-6894-4

Chauhan U, Shah A. Topic modeling using latent Dirichlet allocation: A survey. ACM Comput Surv. 2021;54(7):1–35.

Amenah HA, Bara’a AA, Rashid AN, Al-Ani M. A new evolutionary algorithm with locally assisted heuristic for complex detection in protein interaction networks. Appl Soft Comput. 2018;73:1004–25. Available from: https://doi.org/10.1016/j.asoc.2018.09.031

Hashim AR. Pupil detection based on color difference and circular Hough transform. Int J Electr Comput Eng. 2018;8:3278–84. Available from: https://doi.org/10.11591/ijece.v8i5.pp3278-3284

Abdulsalam WH, Alhamdani RS, Abdullah MN. Emotion recognition system based on hybrid techniques. Int J Mach Learn Comput. 2019;9(4). Available from: https://doi.org/10.18178/ijmlc.2019.9.4.831

Rajab MA, Hashim KM. An automatic lip reading for short sentences using deep learning nets. Int J Adv Intell Inform. 2023;9(1). Available from: https://doi.org/10.26555/ijain.v9i1.920

Lu HM, Wei CP, Hsiao FY. Modeling healthcare data using multiple-channel latent Dirichlet allocation. J Biomed Inform. 2016;60:210–23. Available from: https://doi.org/10.1016/j.jbi.2016.02.003

Lui M, Lau JH, Baldwin T. Automatic detection and language identification of multilingual documents. Trans Assoc Comput Linguist. 2014;2:27–40. Available from: https://doi.org/10.1162/tacl_a_00163

Zoghbi S, Vulić I, Moens MF. Latent Dirichlet allocation for linking user-generated content and e-commerce data. Inf Sci. 2016;367:573–99. Available from: https://doi.org/10.1016/j.ins.2016.05.047

Kim H, Cho I, Park M. Analyzing genderless fashion trends of consumers’ perceptions on social media: Using unstructured big data analysis through Latent Dirichlet Allocation-based topic modeling. Fashion Text. 2022;9(1):1–21.

Gnanavel S, Mani V, Sreekrishna M, Amshavalli RS, Gashu YR, Duraimurugan N, et al. Rapid Text Retrieval and Analysis Supporting Latent Dirichlet Allocation Based on Probabilistic Models. Mob Inf Syst. 2022. Available from: https://doi.org/10.1155/2022/6028739

Yadav K, Kumar N, Maddikunta PKR, Gadekallu TR. A comprehensive survey on aspect-based sentiment analysis. Int J Eng Syst Model Simul. 2021;12(4):279–90. Available from: https://doi.org/10.1504/IJESMS.2021.119892

Zhang Y, Chen M, Huang D, Wu D, Li Y. iDoctor: Personalized and professionalized medical recommendations based on hybrid matrix factorization. Future Gener Comput Syst. 2017;66:30–5. Available from: https://doi.org/10.1016/j.future.2015.12.001

Sun S, Luo C, Chen J. A review of natural language processing techniques for opinion mining systems. Inf Fusion. 2017;36:10–25. Available from: https://doi.org/10.1016/j.inffus.2016.10.004

Anwar W, Bajwa IS, Choudhary MA, Ramzan S. An empirical study on forensic analysis of Urdu text using LDA-based authorship attribution. IEEE Access. 2019;7:3224-3234. doi: https://doi.org/10.1109/ACCESS.2018.2885011.

Qadir AM, Varol A. The Role of Machine Learning in Digital Forensics. In: 8th Int. Symp. Digit. Forensics Secur. ISDFS 2020; 2020. doi: 10.1109/ISDFS49300.2020.9116298.

Bin Sarhan B, Altwaijry N. Insider Threat Detection Using Machine Learning Approach. Appl Sci. 2023;13(1). doi: https://doi.org/10.3390/app13010259.

Knn RU, Regression L. Credit Card Fraud Detection: An Improved Strategy for High. 2023.

Lu Y, Wang J. Constructing a Digital Capability Evaluation Framework for Manufacturing Enterprises in the Context of Digital Economy: Based on LDA, Entropy Weight and TOPSIS Model. 2024. doi: https://doi.org/10.4108/eai.23-2-2024.2345917.

Xu Z, Liu Y, Xuan J, Chen H, Mei L. Crowdsourcing based social media data analysis of urban emergency events. Multimedia Tools Appl. 2017;76:11567-11584. doi: https://doi.org/10.1007/s11042-015-2731-1.

Chen T-H, Thomas SW, Hassan AE. A survey on the use of topic models when mining software repositories. Empir Softw Eng. 2016;21(5):1843-1919. doi: https://doi.org/10.1007/s10664-015-9402-8.

Debortoli S, Müller O, Junglas I, Vom Brocke J. Text mining for information systems researchers: An annotated topic modeling tutorial. Commun Assoc Inf Syst (CAIS). 2016;39(1):7.

Debortoli S, et al. Text mining for information systems researchers: an annotated topic modeling tutorial. CAIS. 2016;39:7.

Sun X, et al. Exploring topic models in software engineering data analysis: A survey. In: 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD); 2016. IEEE.

Pejić Bach M, Krstić Ž, Seljan S, Turulja L. Text mining for big data analysis in financial sector: A literature review. Sustainability. 2019;11(5):1277. doi: https://doi.org/10.3390/su11051277.

Amado A, Cortez P, Rita P, Moro S. Research trends on big data in marketing: A text mining and topic modeling based literature analysis. Eur Res Manag Bus Econ. 2018;24(1):1-7. doi: https://doi.org/10.1016/j.iedeen.2017.06.002.

Wang Y-C, Burke M, Kraut RE. Gender, topic, and audience response: an analysis of user-generated content on Facebook. In: Proceedings of the SIGCHI conference on human factors in computing systems; 2013. ACM.

Alashri S, Kandala SS, Bajaj V, Ravi R, Smith KL, Desouza KC. An analysis of sentiments on Facebook during the 2016 US presidential election. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM); 2016. IEEE; 2016:795-802. doi: https://doi.org/10.1109/ASONAM.2016.7752329.

Bastani K, Namavari H, Shaffer J. Latent Dirichlet allocation (LDA) for topic modeling of the CFPB consumer complaints. Expert Syst Appl. 2019;127:256-271. doi: https://doi.org/10.1016/j.eswa.2019.03.001.

Li H, Lu W. Learning latent sentiment scopes for entity-level sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2017;31(1). Available from: https://doi.org/10.1609/aaai.v31i1.11016

Mouhssine E, Khalid C. Social big data mining framework for extremist content detection in social networks. In: 2018 International Symposium on Advanced Electrical and Communication Technologies (ISAECT). IEEE; 2018 Nov. p. 1–5. Available from: https://doi.org/10.1109/ISAECT.2018.8618726

Quan X, Kit C, Ge Y, Pan SJ. Short and sparse text topic modeling via self-aggregation. In: Yang Q, Wooldridge M, editors. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI). 2015. p. 2270–6. Available from: https://www.ijcai.org/Abstract/15/321

Laureate CDP, Buntine W, Linger H. A systematic review of the use of topic models for short text social media analysis. Artif Intell Rev. 2023;56(12):14223–55. Available from: https://doi.org/10.1007/s10462-023-10471-x

Mhamdi C, Al-Emran M, Salloum SA. Text mining and analytics: A case study from news channels posts on Facebook. In: Intelligent Natural Language Processing: Trends and Applications. 2018. p. 399–415. Available from: https://doi.org/10.1007/978-3-319-67056-0_19

Li H, Lu W. Learning latent sentiment scopes for entity-level sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2017;31. Available from: https://doi.org/10.1609/aaai.v31i1.11016

Mouhssine E, Khalid C. Social big data mining framework for extremist content detection in social networks. 2018. Available from: https://doi.org/10.1109/ISAECT.2018.8618726

Niu Y, Zhang H, Li J. A Pitman-Yor process self-aggregated topic model for short texts of social media. IEEE Access. 2021;9:129011–21. Available from: https://doi.org/10.1109/ACCESS.2021.3113320

Laureate CDP, Buntine W, Linger H. A systematic review of the use of topic models for short text social media analysis. Artif Intell Rev. 2023. Available from: https://doi.org/10.1007/s10462-023-10471-x

Mhamdi C, Al-Emran M, Salloum SA. Text mining and analytics: A case study from news channels posts on Facebook. In: Intelligent Natural Language Processing: Trends and Applications. 2017. p. 399–415. Available from: https://doi.org/10.1007/978-3-319-67056-0_19