In an era marked by a growing reliance on digital platforms for health care consultation, the subreddit r/AskDocs has emerged as a pivotal forum. However, the vast, unstructured nature of forum data presents a formidable challenge; the extraction and meaningful analysis of such data require advanced tools that can navigate the complexities of language and context inherent in user-generated content. The emergence of large language models (LLMs) offers new tools for the extraction of health-related content from unstructured text found in social media platforms such as Reddit. This methodological study aimed to evaluate the use of LLMs to systematically transform the rich, unstructured textual data from the AskDocs subreddit into a structured dataset, an approach that aligns more closely with human cognitive processes than traditional data extraction methods. Human annotators and LLMs were used to extract data from 2800 randomly sampled r/AskDocs subreddit posts. For human annotation, at least 2 medical students extracted demographic information, type of inquiry (diagnosis, symptom, or treatment), proxy relationship, chronic condition, health care consultation status, and primary focus topic. For LLM data extraction, specially engineered prompts were created using JavaScript Object Notation and few-shot prompting. Prompts were used to query several state-of-the-art LLMs (eg, Llama 3, Genna, and GPT). Cohen κ was calculated across all human annotators, with this dataset serving as the gold standard for comparison with LLM data extraction. A high degree of human annotator reliability was observed for the coding of demographic information. Lower reliability was seen in coding the health-related content of the posts. The highest performance scores compared with the gold standard were achieved by Llama 3 70B with 7 few-shot prompt examples (average accuracy=87.4) and GPT-4 with 2 few-shot prompt examples (average accuracy=87.4). Llama 3 70B excelled in coding health-related content while GPT-4 performed better coding demographic content from unstructured posts. LLMs performed comparably with human annotators in extracting demographic and health-related information from the AskDocs subreddit unstructured posts. This study validates the use of LLMs for analyzing digital health care communications and highlights their potential as reliable tools for understanding online behaviors and interactions, shifting toward more sophisticated methodologies in digital research and practice.
Background: In 2024, Reddit, an emerging social media platform, saw a 50% increase in monthly users to nearly 100 million. Reddit has also emerged as a significant space for discussions about health conditions, including epilepsy, which affects about 50 million people globally. Purpose: This study aims to explore trends in the volume, timing, themes, emotional tone, and sentiment of posts on the r/Epilepsy subreddit from 1 December 2023 to 31 December 2024. Methods: We collected 25,222 original English-language posts from r/Epilepsy using Reddit's Application Programming Interface (API). Data extraction was restricted to English-language submissions to ensure compatibility with sentiment and thematic analyses. We analyzed post volume and timing using chi-square tests and Poisson regression. Emotional tone was measured using TextBlob (version 0.19.0), while compound sentiment scores were calculated via VADER (Valence Aware Dictionary and Sentiment Reasoner) (NLTK version 3.9.1). A Pearson correlation assessed agreement between sentiment and emotional tone, with statistical significance set at p < 0.05. Thematic analysis was conducted using a KMeans clustering algorithm (scikit-learn version 1.6.1) to identify recurring discussion topics. Results: Total monthly posts steadily increased, with the highest number (2175) in December 2024. Peak posts in descending order were in December 2024, August 2024, and November 2024. Posts were not evenly distributed across the week, with a significant peak on Mondays (χ2 = 86.75, p < 0.001) and Poisson regression confirming higher activity early in the week (p = 0.001). Emotional tones fluctuated, with positive sentiments in January and October 2024, and negative sentiments in March and August 2024. KMeans clustering identified five main themes: treatment experiences, community engagement, personal experiences, solidarity, and subreddit gratitude. Manual validation of a random subset of posts demonstrated moderate concordance between automated sentiment classification and human ratings. Conclusions: This study highlights temporal patterns, sentiment dynamics, and thematic structure in online discussions on epilepsy. Social media may offer valuable, real-time insights into patient-centered concerns and community engagement, which can inform healthcare professionals and advocacy groups in supporting individuals affected by epilepsy. Future studies may compare trends of epilepsy discussions across various social media platforms, such as X and Instagram, to further understand online patient experiences.
Introduction  While the definitive treatment of Cushing's disease (CD) is transsphenoidal surgery (TSS), little is known about patients' perceptions of their treatment experience. Reddit, an online forum, allows users to interact on "subreddits" specific to interests. We aimed to assess patient sentiments regarding TSS for CD on the Cushing's subreddit. Methods  Cushing's subreddit posts were sorted by "top" of "all time" to evaluate those with the most engagement throughout the site's history. Posts unrelated to surgical management of CD were excluded. Descriptive statistics were performed to compare pre- and postoperative posters. Sentiment analysis was performed using TextBlob, a Python library, and thematic analysis was done using grounded theory qualitative methods. Results  From 68 entries, 53 (77.9%) were written by individuals who underwent TSS. Of posters with a history of TSS, many (68%, n  = 25/38) reported difficult recovery, but an overwhelming majority (91.3%, n  = 42/46) also reported positive long-term outcomes. Posters who had undergone TSS were more likely to post content with negative sentiment ( p  = 0.007), often regarding issues with access to the surgery. Thematic analysis revealed general themes of seeking and sharing advice, healthcare access issues, excitement for TSS, short-term symptoms postoperatively, and long-term outcomes. Conclusion  This study is the first to utilize Reddit to analyze patient perceptions of TSS for CD. This analysis suggests that most posters feel positively regarding their long-term outcome, while negative sentiments are often related to difficulties accessing care. Further studies should assess access to care for those with CD.
Introduction  With improvement in complication and remission rates, recent studies have suggested the viability of transsphenoidal surgery as first-line management for prolactinomas. Reddit, an online forum, allows posters to interact with one another, and discuss symptoms, treatments, and disease courses through specialized forums known as "subreddits." Given the lack of research comparing patient experience on pharmacotherapy versus surgery, we sought to assess the sentiment of treatment within the "r/Prolactinoma subreddit" community. Methods  A search was done by filtering the r/Prolactinoma subreddit. Posts were sorted by "top" of "all time," meaning entries with the most engagement throughout the site's history. Welch's t -test was used to analyze treatment type, while sentiment regarding treatment was analyzed using grounded theory qualitative methods. Results  From 189 total entries, 82 were included; 33% ( n  = 27/82) were posts. Of posters disclosing their treatment, 11% underwent surgery ( n  = 9/79), while 76% received medication ( n  = 60/79). The proportion of positive:negative sentiment and level of engagement on posts regarding pharmacotherapy versus surgical treatments were not significantly different ( p  > 0.05). Qualitative analysis showed themes of changes after medical treatment, immediate changes postsurgery, and online community. Conclusion  Our mixed-methods study found statistically nonsignificant differences in sentiment when comparing Reddit posts from patients who underwent medical and surgical management. Qualitative analysis revealed several themes regarding patients' perceptions of medical and surgical management and the benefits of an online community. The qualitative themes brought to light by this study may serve to inform future studies examining the patient experience with prolactinoma care.
Some individuals interpret persistent depression and anxiety, alongside the exhaustion of standard treatment options, as leading to a desperate need for relief. While desperation may shape how people assess treatment options, its role in psychedelic self-treatment and its relevance for emerging clinical models of psychedelic treatment remain underexplored. We collected discussion threads from two online substance use communities on Reddit.com and investigated how members interpreted desperation in relation to their decisions to self-treat depression and anxiety with psychedelics (including microdosing and macrodosing). Using constructivist grounded theory, we compared posts across 108 threads (173,229 words) from members who expressed desperation (n = 50) and those who did not (n = 68). Members described mounting frustration with standard treatments and worsening mental health that culminated in a tipping point interpreted as desperation for relief. They redefined their need to relieve depression and anxiety as urgent, which contributed to how they made sense of rapid, unplanned self-treatment with available psychedelics, without researching these options or implementing harm reduction and despite acknowledged risks. While some interpreted outcomes as relieving distress, others constructed worsening mental health as linked to multiple desperation-driven decisions to relieve distress. Our findings show how desperation shaped meaning-making around psychedelic self-treatment for depression and anxiety in online communities. These interpretations may be relevant to clinical models, where access is often granted after other treatments fail, and where similar urgency and expectations may shape engagement, decision-making, and outcome interpretations.
Millions of people affected with complex medical conditions with diverse symptoms often turn to online discourse to share their experiences. While some studies have explored natural language processing methods and medical information extraction tools, these typically focus on generic symptoms in clinical notes and struggle to identify patient-reported, disease-specific, subtle symptoms from online health discourse. We aimed to extract patient-reported, disease-specific symptoms shared on social media reflecting the lived experiences of thousands of affected individuals and explore the characteristics, prevalence, and occurrence patterns of the symptoms. We propose a lexicon-based symptom extraction (LSE) method to identify a diverse list of disease-specific, patient-reported symptoms. We initially used a large language model to accelerate the extraction of symptom-related key phrases that formed the lexicon. We evaluated the effectiveness of lexicon extraction against human annotation using a Jaccard index score. We then leveraged BERT-Base, BioBERT, and Phrase-BERT-based embeddings to learn representations of these symptom-related key phrases and cluster similar symptoms using k-means and hierarchical density-based spatial clustering of applications with noise (HDBSCAN). Among the different options explored in our experiments, BioBERT-based k-means clustering was found to be the most effective. Finally, we applied symptom normalization to eliminate duplicate and redundant entries in the comprehensive symptom list. In a real-world polycystic ovary syndrome (PCOS) subreddit dataset, we found that LSE significantly outperformed state-of-the-art baselines, achieving at least 41% and 20% higher F1-scores (mean 86.10) than automatic medical extraction tools and large language models, respectively. Notably, the comprehensive list of 64 PCOS symptoms generated via LSE ensured extensive coverage of symptoms reported in 7 reputable eHealth forums. Analyzing PCOS symptomatology revealed 28 potentially emerging symptoms and 8 self-reported comorbidities co-occurring with PCOS. The comprehensive patient-reported, disease-specific symptom list can help patients and health practitioners resolve uncertainties surrounding the disease, eliminating the variability of PCOS symptoms prevailing in the community. Analyzing PCOS symptomatology across varied dimensions provides valuable insights for public health research.
Cochlear implants (CI) are surgical devices used for rehabilitation of sensorineural hearing loss. More individuals are receiving CIs as technology and surgical techniques improve and candidacy guidelines expand. Despite growing public awareness, CI utilization remains low. Understanding sentiment regarding these devices is important. Reddit is a text-based website wherein posters interact on specialized forums, "subreddits." R/Cochlearimplants allows for unique CI sentiment analysis. A search was done from October 2024 to November 2024 on r/Cochlearimplants. Internet. Posts were sorted by highest engagement. Metadata regarding date, comments, and upvotes were collected. Sentiment was analyzed using TextBlob and VADER, Python library Natural Language Processing tools. Four hundred and twenty unique posters made 1068 total entries. Entries spanned 2019 to 2024, the majority in 2024 (51%, n = 543). VADER found the majority of entries positive (n = 562, 52.9%) while TextBlob found the majority neutral (n = 928, 87%). Sentiment distribution over time was significantly different (P < .001), with more negative sentiment in 2024 than 2019 to 2023. Negative VADER entries had significantly higher word counts (P < .001). Positive VADER entries had higher upvotes (P < .001). Sentiment regarding CI remains more nuanced than can be gleaned from this analysis, including cultural and ethical issues. This study demonstrates that sentiment on r/Cochlearimplants is generally neutral or positive, trending relatively more negative over time. This could suggest negativity towards CIs is growing with increased utilization. Awareness of online sentiment may help providers understand patient perspectives and dispel misinformation.
Vitiligo significantly impairs quality of life. Topical ruxolitinib is a novel Janus kinase inhibitor approved for nonsegmental vitiligo, but real-world patient experiences, particularly regarding efficacy, side effects, and access challenges following approval, are not fully captured by clinical trials. Online patient forums like Reddit offer valuable insights into these aspects. This study analyzed discussions on the r/Vitiligo subreddit regarding topical ruxolitinib to understand real-world patient experiences, perceived treatment success, side effects, access barriers, and overall sentiment. We conducted a retrospective, cross-sectional infodemiology study of posts and comments mentioning ruxolitinib or Opzelura on r/Vitiligo between January 2022 and December 2024. After filtering and preprocessing, 2950 entries were analyzed. Computational linguistics (all-MiniLM-L6-v2), including sentence-transformer embeddings for semisupervised topic classification into therapy success, side effects, insurance and cost, and off-topic, were used. Valence Aware Dictionary and Sentiment Reasoner (-1 to +1) for sentiment analysis was assessed. Temporal trends were analyzed; model performance was validated manually against blinded manual annotation of 500 entries. Representative qualitative data were reviewed. Discussions increased following regulatory approvals. Therapy success was the largest cluster (entries: 1765/2950 , 59.83%; 95% CI 58.1 to 61.6) with positive sentiment (mean score 0.473, 95% CI 0.46 to 0.48), frequently describing facial repigmentation and adjunctive use with phototherapy. Users reported encouraging hair repigmentation within treated areas and success even on vitiligo spots present for over 20 years, while noting that areas like hands and feet were particularly treatment-resistant. The side effects cluster (entries: 558/2950, 18.91%; 95% CI 17.5 to 20.3) had negative sentiment (mean score -0.110, 95% CI -0.14 to 0.07), frequently mentioning application-site acne, fatigue, and panic attacks or anemia. The insurance and cost cluster (entries: 491/2950, 16.64%; 95% CI 15.3 to 18.0) had positive sentiment (mean score 0.349, 95% CI 0.31 to 0.39), dominated by discussions on high costs and access difficulties, alongside strategies like co-pay programs but also noting insurance denials. Manual model validation showed substantial agreement (accuracy 88.4%, 95% CI 86 to 91; F1-score 0.893, 95% CI 0.865 to 0.918; Cohen κ 0.801, 95% CI 0.760 to 0.840). Real-world Reddit narratives broadly corroborate clinical trial efficacy signals, particularly facial repigmentation and utility alongside phototherapy, while highlighting practical barriers including frequent application-site acne and cost or insurance friction. These findings have direct clinical and policy implications: clinicians can proactively counsel about the expected benefit on facial areas, monitor and manage acne or irritation, and discuss combination with phototherapy; practices and payers can mitigate access delays using before authorization templates and co-pay assistance. Social media infodemiology thus complements pharmacovigilance and health services research by quantifying patient-reported outcomes and surfacing access issues at scale, informing patient counseling and coverage decisions in routine care.
Social-media data are increasingly used in medicine for diverse applications. Although several studies have examined social media in rheumatology, most have focused on individual diseases. To date, no work has systematically explored the full spectrum of rheumatology-specific communities on Reddit, and sentiment analysis has not been broadly applied across these communities. This study addresses that gap by identifying rheumatology-related subreddits, describing their characteristics, and analysing the sentiment of their discussions. The Reddit search engine was used to identify candidate subreddits. For each subreddit we collected its name, creation date, subscriber count, public status, and activity since May 2023. We included only active subreddits with >1000 subscribers, a clear rheumatology focus, and data retrievable through Pushshift.io. Descriptive metrics were calculated to characterise the selected communities, and a pre-trained, fine-tuned sentiment-analysis model was applied to classify posts. Twenty subreddits met the inclusion criteria, with subscriber counts ranging from 2000 (r/Behcets) to 70,000 (r/Fibromyalgia). All communities exhibited near-exponential growth from 2016 to 2017 onward. Analysis of the ten most-commented threads in each subreddit yielded 32 thematic categories; the most frequent were "Patients Like Me", "Asking for Emotional Support", "Asking for Demographic Information", and "COVID-19". Negative sentiment predominated in subreddits devoted to musculoskeletal disorders of mechanical origin (e.g., costochondritis, back pain) and fibromyalgia. Themes such as "Expressing Hopelessness" and "Asking for Help: Symptom Management" were also associated with higher levels of negativity. Rheumatology communities use Reddit to discuss health-related issues, suggesting opportunities to enhance patient support and engagement. Study limitations include the demographic skew of Reddit users, reliance on a model trained on Twitter data, and the exclusion of subreddits with fewer than 1000 subscribers, potentially omitting smaller emerging communities.
Social media is a powerful tool also for discussing mental health. The conversations that take place in these spaces provide a unique insight into how users talk about the issue. This study uses fine-tuned pretrained transformer models (BERT and MentalBERT), to classify Reddit posts about anxiety, depression, bipolar disorder and borderline personality disorder (BPD) in specialised subreddits. By assessing how well subreddit conversations align with their intended mental health focus, the analysis ensures that these communities are effectively serving their purpose as support spaces. Our classification models achieve an average accuracy of 82%, with MentalBERT slightly outperforming BERT. To ensure transparency, we use Local Interpretable Model-agnostic Explanations (LIME) to identify key linguistic patterns that influence the model predictions. The outcome reveals distinct language use across conditions: as examples, discussions in bipolar disorder subreddits often refer to mood instability, while BPD communities emphasise challenges in emotional regulation. By integrating classification with explainability, this study offers insights into thematic patterns in online discourse that can support mental health professionals in identifying trends. While our models are not diagnostic tools, they function as subreddit-alignment classifiers, helping to uncover how different topics are discussed across communities. These insights may inform human-in-the-loop community management strategies and contribute to raising awareness and reducing stigma around mental health issues, ultimately fostering more supportive digital environments.
Propofol is widely used in procedural sedation and general anesthesia, but often provokes anxiety among patients and some providers. This study investigates the emotional and thematic landscape of propofol-related discourse on Reddit, a major online health information platform. We analyzed 921 publicly available Reddit posts referencing "propofol" and related sedation terms using a mixed-methods approach. Sentiment analysis was performed with TextBlob and complemented by manual thematic coding. Posts were categorized by subreddit, sentiment, and topic. Descriptive statistics and correlation analyses examined relationships between sentiment, word count, and subreddit type. Two coders achieved strong agreement (Cohen's κ = 0.82). Half of posts were neutral, whereas 30% were negative and 20% were positive. Negative sentiment was most common in patient-focused subreddits such as r/colonoscopy (38%), while provider forums like r/anesthesiology were more neutral or analytical. Among posts, 52% were patient-authored, 28% provider-authored, and 20% unclear. Patients more often expressed anxiety and confusion, while providers discussed clinical dilemmas and ethical issues. Higher word count was weakly correlated with more negative sentiment (r = -0.19). Four thematic clusters emerged: clinical sedation and medication questions; provider professionalism and ethics; veterinary use and animal care; and exam stress or career anxiety. Reddit reveals emotionally rich propofol discourse, spanning patient fears and provider uncertainties. Analysis using digital health frameworks such as affective publics and the Technology Acceptance Model highlights opportunities for improved patient communication, education, and digital tool design. Limitations include platform demographic bias and limited generalizability. These findings offer a methodological foundation and conceptual framework for future digital health research and sentiment-aware clinical tools.
Antidepressant use and withdrawal are often accompanied by side effects such as dizziness, weight gain, and sexual dysfunction. Antidepressants and their associated side effects are stigmatized topics. Social media platforms such as Reddit are considered "safe spaces" by users because they can freely share their experiences and receive support. This pilot study analyzed discussions from the subreddit r/depression to examine how users discuss antidepressant side effects, withdrawal symptoms, and related experiences of depression. We scraped 10 high-engagement threads from the subreddit r/depression using the Python wrapper for the Reddit application programming interface and conducted a 2-step analysis. First, a pilot test was performed using sertraline (Zoloft) threads, followed by an analysis of all antidepressant-related threads. A subset of the data was hand-coded to create and validate regular expressions, which were then used to automatically code the remaining dataset. The resulting coded data were analyzed using epistemic network analysis and complemented with qualitative analysis and elements of semantic networks and hypergraphs. We found that posts were more likely to discuss emotional flattening, sleep, and memory or cognitive issues (Mann-Whitney U=33,235.5; P=.003). Additionally, references to dizziness tended to co-occur with discussions of withdrawal and offers of empathy, while reports of dream-related side effects and requests for personal experiences also co-occurred frequently. By incorporating elements of semantic networks and hypergraphs, we deduced that offers of empathy occurred when users said they experienced dizziness caused by withdrawal, while mentions of "brain zaps" associated with withdrawal often co-occurred with offers of teaching support. Study findings highlight how individuals experiencing antidepressant side effects and withdrawal symptoms use online forums such as Reddit to seek validation, share coping strategies, and provide emotional support to others. The nuanced discussions observed, particularly those related to empathy, symptom management, and shared learning, underscore the role of peer-to-peer networks in normalizing stigmatized experiences and mitigating isolation associated with antidepressant use. Clinicians and digital health practitioners can leverage these insights to better understand patient language, emotional framing, and informational needs outside clinical settings.
As the largest online rosacea forum, r/Rosacea, a subreddit hosted on Reddit, provides a unique opportunity to better understand the concerns of rosacea patients. Using Python software and artificial intelligence models, a total of 1,000 posts from the r/Rosacea subreddit were analyzed for emotional tone, post category, and mentions of signs and symptoms, as well as medications. The majority of posts were classified as seeking advice (n = 631), and posts categorized as patient stories received the highest median upvotes (P<0.001). Rosacea patients were found to be most concerned with external appearance, with redness (45.2%) and pustules (24.2%) being the most discussed signs in advice-seeking posts. Posts referencing burning exhibited strong negative emotional tones of anger and disgust. Ivermectin (12.7%) and azelaic acid (10.9%) were the most discussed medications in advice-seeking posts, and ivermectin received significantly lower median upvotes (P=0.0067). The insights of this cross-sectional analysis aid in achieving a deeper understanding of the rosacea patient perspective. Additional emotional analysis of the posts highlights the need for greater focus on the psychological burden of the disease. The frequency and popularity of rosacea medications reveal potential gaps in patient education and raises concerns regarding treatment adherence to medications, including ivermectin. It is imperative to increase rosacea patient access to quality-assured educational resources and to limit the potential spread of misinformation on r/Rosacea.  .
Suicide bereavement is a uniquely challenging form of loss, yet little is known about how it is expressed in language and how it reflects the meaning-making process. Here, we leveraged naturalistic online language to capture grief expressions beyond traditional help-seeking populations, applying a validated computational text-analysis method (LIWC-22) to 713 posts from the r/SuicideBereavement subreddit and comparing them to 1149 bereavement posts in the r/GriefSupport subreddit. Compared to other bereaved individuals, suicide-loss survivors used more cognitive processing words, reflecting deeper engagement in meaning-making, and displayed distinct attentional focus, frequently revisiting the past and the deceased's life to make sense of the loss. They also expressed greater anger and interpersonal conflict, and used language emphasizing collective and relational aspects of grief. These findings illuminate transdiagnostic processes relevant to bereavement, advancing understanding of suicide loss and suggests new avenues for monitoring and supporting survivors' adjustment in online and community-based postvention contexts.
To assess the extent to which large language models (LLMs) amplify or attenuate inaccurate or contested narratives in radiation contexts and to evaluate their potential influence on public risk perception, patient communication in radiotherapy, and radiation protection policy implementation. We developed a structured framework to extract agreement and sentiment from LLMs. This was applied to OpenAI's GPT family of models to examine susceptibility to strong or misframed radiological opinions, cultural and linguistic bias on controversial radiological topics, and philosophical or moral alignment in radiation-related scenarios. Additionally, GPT-4o mini was used to analyze sentiment trends in the r/Radiation subreddit (February 2021-December 2023). A novel model, AntiRadiophobeGPT, was created to counter radiophobic and myth-driven narratives and evaluated against real user comments. Smaller LLMs (e.g. GPT-4o mini) exhibited significantly higher risk assessment of potentially radiophobic statements than their larger counterparts in general domain radiological risk assessment questions and higher agreement with controversial expert domain questions. Use of Chinese-language prompts or models further increased bias toward culturally sensitive radiological topics. All tested models showed deontological tendencies in moral alignment, with variations across scenarios. Subreddit analysis indicated that health-related myths were most prevalent, but overall community-wide radiophobia and hostility declined over the 3 years. AntiRadiophobeGPT effectively addressed misconceptions with high factual accuracy and demonstrated significantly lower levels of radiophobia and antagonism compared to user-generated responses. These findings underscore the importance of careful LLM deployment in radiological contexts to avoid misinformation propagation and support effective science communication. Overall, this work bridges artificial intelligence and radiation biology by demonstrating how LLM-driven communication can influence radiation risk perception and inform radiological safety practices.
Patients contemplating orthognathic surgery are increasingly turning to online forums, such as Reddit, as a supplementary source of information prior to undertaking a surgical treatment pathway. This study was designed to profile the main topics pertaining to orthognathic surgery which patients were discussing online, using data analytics methodologies and machine learning algorithms. An Application Programming Interface (API) wrapper was used to identify posts relating to orthognathic surgery on Reddit, within the /jawsurgery subreddit. The information was exported for further analysis. Topic modelling was undertaken with the Latent Dirichlet Allocation (LDA) algorithm, Valence Aware Dictionary and sEntiment Reasoner (VADER) tool was used for sentiment analysis, and a machine learning tool was used to pinpoint language cues in the posts. A total of 10265 posts were included in the analysis. These posts were categorised into one of ten topic groups based on qualitative analysis by the authors. Overall, there was positive sentiment toward orthognathic surgery within each of the topic groups within the subreddit. The topic groups which carried the highest sentiment were aesthetic change, aesthetic preferences and disagreements and, postoperative experience. Patients are actively seeking alternative sources of information regarding orthognathic surgery, particularly pertaining to the experiences of other patients. Whilst overall there is a positive sentiment toward surgery, both positive and negative experiences are posted online.
Social media platforms such as Reddit have become important spaces where individuals articulate their distress, seek support, and explore alternative ways of understanding mental health outside traditional institutional frameworks. These environments provide an opportunity to examine mental health discourse at scale, offering perspectives that extend beyond traditional clinical and research settings. This study aims to examine the structure of mental health communities on Reddit by identifying patterns of association between mental disorders reflected in user activity and assessing how these relationships align with established diagnostic categories in the ICD (International Classification of Diseases). We manually curated 114 Reddit communities focused on specific mental health conditions from the 20,000 most active subreddits in 2022. Each community was labeled into 49 disorders and categorized under 9 ICD diagnostic categories within the group of mental and behavioral disorders, collectively known as the F codes. We constructed a disorder association network by identifying statistically significant user overlaps based on coposting across subreddit pairs using a bipartite configuration model, with Bonferroni-corrected significance (P<.001). We analyzed the connectivity of the network within and across diagnostic categories, examining inter- and intracategory links. Finally, we compared the structure of disorder associations inferred from Reddit with the ICD classification derived from diagnostic criteria using hierarchical clustering. The inferred Reddit network of psychopathology revealed an interconnected structure (density=0.135), with all but 6 disorders forming a single giant component that spans across all 9 diagnostic categories. The most prominent disorders by number of users included hyperkinetic disorders (85,000), depressive episodes and recurrent depressive disorders (73,000), habit and impulse disorders (69,000), pervasive developmental disorders (52,000), and generalized anxiety disorder (44,000). In terms of connectivity, posttraumatic stress disorder (17/48 of all possible connections), obsessive-compulsive disorder (16/48), and depersonalization-derealization disorder (15/48) emerged as the most central in the network of positive disorder associations, while schizotypal disorder, avoidant personality disorder, and agoraphobia were the most central when accounting for the association strength. At the level of disorder categories, several disorders, such as bipolar disorder and premenstrual dysphoric disorder, displayed high intercategory associations but weak intracategory ties, indicating blurred diagnostic boundaries. The network of negative coposting associations revealed a divergence from the expectations of past research; for instance, addiction-related communities (eg, alcohol and opioids) were negatively associated with much of the broader mental health discourse. Finally, hierarchical comparisons showed moderate overlap between the Reddit network of disorder associations and the ICD network of diagnostic criteria, both in pairwise edge similarity (13% of edges present in both networks) and overall clustering (Adjusted Rand Index=0.295). Reddit-based mental health communities reveal a complementary structure of disorder associations shaped by lived experience, often diverging from formal diagnostic criteria and exhibiting patterns of association that do not align with established diagnostic boundaries.
This data article describes a dataset capturing public discourse on generative AI (GenAI) in education, collected from Reddit between 1 September 2022 and 31 October 2025. The dataset comprises 984 unique posts and 8346 associated comments drawn from 320 education-related subreddits, grouped into three stakeholder categories: students, educators, and parents. The posts and comments focus on how GenAI tools, such as ChatGPT, are used, experienced, and debated in relation to teaching, learning, assessment, and broader educational practices. Data were retrieved via the Reddit API and processed through a privacy-preserving pipeline that removed personal identifiers in accordance with platform policies. Each entry is accompanied by rich metadata, including a unique identifier, content type (post or comment), stakeholder group, subreddit, creation timestamp, and engagement metrics. Spanning the period before and after the introduction of ChatGPT, the dataset enables temporal analyses of discourse volume and content evolution. This openly available dataset provides a valuable resource for research in education, human-AI interaction, social media analytics, and natural language processing, and supports reproducibility in studies of public engagement with GenAI technologies.
The use of social media platforms, such as Reddit, to seek and share information about disease management and treatment strategies is increasingly common. In the context of asthma-a chronic condition characterized by limiting symptoms and exacerbations that require active patient engagement and adherence to treatment-there is a lack of research describing the content of Reddit posts and the specific topics of interest to patients. This study aimed to describe the topics discussed by users on the Reddit asthma forum and identify the sentiments and polarity of the language used in the posts. A retrospective observational study of public posts on the asthma subreddit forum (r/Asthma) over a 1-year period (October 2023-October 2024). All posts and related threads were included, subdivided into hot, news, and top, and those voted "up" or "down," those that received "awards," categorized as "golds." The messages were reviewed manually and excluded if they were not related to asthma. A mixed methods analysis was conducted, comprising (1) analysis using text lemmatization, (2) structural topic modeling to identify topics based on word frequency, and (3) sentiment and polarity analysis. This approach aimed to identify the most frequently used topics on Reddit, detect positive and negative sentiments based on the words used, and acceptance or rejection (polarity) based on the language used in the asthma subreddit. Statistical analyses were performed using R software (version 4.1.3; R Foundation for Statistical Computing), with a significance threshold set at P<.05. After removing duplicates, 7806 posts were identified. The suitability of the chosen analysis model was confirmed, as it presented the best balance between exclusivity and semantic coherence. Clusters of 25 topics were identified and distributed according to their weight. The topics with the highest weight were Topic 7 (Symptoms and severity of asthma attacks) and Topic 18 (Causes of asthma). No significant differences were found in the evolution of emerging topics throughout the year except in Topic 20 (Seeking advice from people with asthma; P=.04), Topic 21 (Medical tests that should be reviewed periodically; P=.04), and Topic 22 (Times of year when attacks occur; P=.03). The proportion of feelings and emotions showed a stable trend throughout the year. Discrepancies in feelings and emotions were identified depending on the dictionaries used. Thus, a higher probability of positive feelings was confirmed in the AFINN lexicon. Meanwhile, negative feelings were significant in the Stanford Natural Language Processing, Bing, and National Research Council Canada lexicons. These results can serve as a guide to identify hidden patient needs and help professionals develop specific interventions on topics relevant to patients.
After more than a half century of political struggle, queer people in 2025 have unprecedented visibility, strong communities, and are part of powerful political movements. At the same time, we face growing vulnerability and threat. This contrast reflects the distance between understandings of sexuality within and beyond queer communities, a distance that is equally present online as in the physical world. We turn to the social media platform Reddit, because its platform affordances, especially its 'subreddit' communities, up/down voting, and peer-moderation lead to not only queer supportive communities but also spaces filled with virulent hate speech. We identify the subreddit SuddenlyGay with the most interaction between queer and mainstream users by using social network analysis on billions of Reddit posts. We examine how users on SuddenlyGay make meaning around sexual conduct using qualitative content analysis. We find that meaning making is often collaborative, especially when focused on bodies and shared pleasure. In contrast, when posts reference ideological political divides, users fall back on inherited meanings of sexuality and sexual conduct. The findings contribute to understandings of how meanings around sexuality develop and what contexts lead toward collaborative meaning-making.