Topic modeling and social network analysis approach to explore diabetes discourse on Twitter in India
Authors
Thilagavathi Ramamoorthy
School of Public Health, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
Vaitheeswaran Kulothungan
ICMR-National Centre for Disease Informatics and Research, Bengaluru, India
Bagavandas Mappillairaju
Centre for Statistics, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
Keywords:
diabetes, social media, Twitter, India, content analysis, network analysis, machine learning, topic modeling
Abstract
Introduction
The utilization of social media presents a promising avenue for the prevention and management of diabetes. To effectively cater to the diabetes-related knowledge, support, and intervention needs of the community, it is imperative to attain a deeper understanding of the extent and content of discussions pertaining to this health issue. This study aims to assess and compare various topic modeling techniques to determine the most effective model for identifying the core themes in diabetes-related tweets, the sources responsible for disseminating this information, the reach of these themes, and the influential individuals within the Twitter community in India.
Methods
Twitter messages from India, dated between 7 November 2022 and 28 February 2023, were collected using the Twitter API. The unsupervised machine learning topic models, namely, Latent Dirichlet Allocation (LDA), non-negative matrix factorization (NMF), BERTopic, and Top2Vec, were compared, and the best-performing model was used to identify common diabetes-related topics. Influential users were identified through social network analysis.
Results
The NMF model outperformed the LDA model, whereas BERTopic performed better than Top2Vec. Diabetes-related conversations revolved around eight topics, namely, promotion, management, drug and personal story, consequences, risk factors and research, raising awareness and providing support, diet, and opinion and lifestyle changes. The influential nodes identified were mainly health professionals and healthcare organizations.
Discussion
The study identified important topics of discussion along with health professionals and healthcare organizations involved in sharing diabetes-related information with the public. Collaborations among influential healthcare organizations, health professionals, and the government can foster awareness and prevent noncommunicable diseases.
Keywords: diabetes, social media, Twitter, India, content analysis, network analysis, machine learning, topic modeling
Author Biography
Vaitheeswaran Kulothungan, ICMR-National Centre for Disease Informatics and Research, Bengaluru, India
SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
Click on "Archives" to access the full archive of scientific preprints. You may use the categories and the search functionality to find select preprints you're interested in.