Preprint / Version 1

Topic modeling and social network analysis approach to explore diabetes discourse on Twitter in India

Authors

  • Thilagavathi Ramamoorthy School of Public Health, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India
  • Vaitheeswaran Kulothungan ICMR-National Centre for Disease Informatics and Research, Bengaluru, India
  • Bagavandas Mappillairaju Centre for Statistics, SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India

Keywords:

diabetes, social media, Twitter, India, content analysis, network analysis, machine learning, topic modeling

Abstract

Introduction The utilization of social media presents a promising avenue for the prevention and management of diabetes. To effectively cater to the diabetes-related knowledge, support, and intervention needs of the community, it is imperative to attain a deeper understanding of the extent and content of discussions pertaining to this health issue. This study aims to assess and compare various topic modeling techniques to determine the most effective model for identifying the core themes in diabetes-related tweets, the sources responsible for disseminating this information, the reach of these themes, and the influential individuals within the Twitter community in India. Methods Twitter messages from India, dated between 7 November 2022 and 28 February 2023, were collected using the Twitter API. The unsupervised machine learning topic models, namely, Latent Dirichlet Allocation (LDA), non-negative matrix factorization (NMF), BERTopic, and Top2Vec, were compared, and the best-performing model was used to identify common diabetes-related topics. Influential users were identified through social network analysis. Results The NMF model outperformed the LDA model, whereas BERTopic performed better than Top2Vec. Diabetes-related conversations revolved around eight topics, namely, promotion, management, drug and personal story, consequences, risk factors and research, raising awareness and providing support, diet, and opinion and lifestyle changes. The influential nodes identified were mainly health professionals and healthcare organizations. Discussion The study identified important topics of discussion along with health professionals and healthcare organizations involved in sharing diabetes-related information with the public. Collaborations among influential healthcare organizations, health professionals, and the government can foster awareness and prevent noncommunicable diseases. Keywords: diabetes, social media, Twitter, India, content analysis, network analysis, machine learning, topic modeling

Author Biography

Vaitheeswaran Kulothungan, ICMR-National Centre for Disease Informatics and Research, Bengaluru, India

SRM Institute of Science and Technology, Kattankulathur, Tamil Nadu, India

Downloads