Managing Online Breastfeeding Consultations Using LDA and BERTopic: A Topic Modeling Approach

Jiahui NIAN; Huiqing LIANG; Caixin YIN; Yue PENG

Previous Article in event

Previous Article in session

Emergency Department Presentations Due to Psychoactive Substances: A Retrospective Analysis

Next Article in event

Empowering Healing Beyond the Clinic: Student Perspectives on Social Prescribing—A Cross-Sectional Study Among Undergraduate Health-Science students

Next Article in session

Pioneering the Future of Healthcare: Bridging Innovation and Transparency in Clinical Data Management

Managing Online Breastfeeding Consultations Using LDA and BERTopic: A Topic Modeling Approach

Jiahui NIAN

¹,

Huiqing LIANG

²,

Caixin YIN

^{*

1},

Yue PENG

¹ Department of Nursing, Women and Children’s Medical Center, Guangzhou Medical University, Guangzhou 510180, China
² School of Nursing, Guangdong Pharmaceutical University, Guangzhou 510310, China

Academic Editor: Rüdiger Pryss

Published: 20 March 2026 by MDPI in The 1st International Online Conference on Healthcare session Clinical Data Management—Balancing Transparency with Innovation for Enhanced Care Quality

Abstract:

Background: Online consultation platforms are central to maternal and infant healthcare in China, providing continuous professional support. Large volumes of unstructured clinician–patient dialogues remain underused for extracting reproducible clinical patterns, necessitating robust computational approaches that balance transparency and NLP innovation.

Objective: The objective was to systematically compare Latent Dirichlet Allocation (LDA) and BERTopic in analyzing large-scale online consultation data and evaluate their performance in identifying breastfeeding-related inquiry patterns.

Methods: We analyzed 527,979 messages from the Internet Outpatient Platform of Guangzhou Women and Children’s Medical Center, Guangzhou, China (2021–2024). After removing nontextual elements, 2,735 consultation-level records were generated through segmentation, customized stopword refinement, synonym merging, and construction of domain-specific medical dictionaries. The optimal number of LDA topics was determined using perplexity (≈12.4) and coherence (≈0.72), with 5 topics selected. BERTopic utilized all-MiniLM-L6-v2 embeddings, UMAP dimensionality reduction, HDBSCAN clustering, and c-TF-IDF weighting. Model performance was compared using topic coherence, topic distinctiveness, and visualization outputs.

Results: Both models identified five key thematic clusters in online breastfeeding consultations: (1) milk supply and infant weight concerns, (2) latch and sucking issues, (3) breast/nipple pain, (4) maternal diet, medication, and galactagogues, and (5) milk expression challenges. BERTopic achieved higher coherence (c_v = 0.78) and produced more compact, well-separated clusters, whereas LDA generated more stable macro-level topic structures. Comparative analysis demonstrates that LDA and BERTopic provide complementary strengths in topic extraction, combining macro-level stability with fine-grained semantic distinction.

Conclusions: Topic modeling of online consultation data enables systematic extraction of patterns in breastfeeding-related inquiries. Integrating LDA and BERTopic supports scalable analysis of unstructured clinical dialogue, facilitates identification of broad and detailed thematic patterns, and advances secondary use of digital health data for telehealth optimization. These findings demonstrate the utility of structured topic modeling in leveraging online consultation platforms for clinical information extraction.

Keywords: Online Consultation; Breastfeeding; Topic Modeling; LDA; BERTopic; Telehealth; Digital Health Data

23 Reads
0 Recommendations

Jiahui NIAN

Huiqing LIANG

Caixin YIN

Yue PENG