Wikipedia has become a widely consulted source for health-related information, including transfusion medicine, by both healthcare professionals and the general public. However, the accuracy and completeness of transfusion-related content on this platform remain understudied. So, we aimed to systematically evaluate the current state and quality of transfusion medicine-related content available on Wikipedia. The Wikipedia Subgroup of the Clinical Transfusion Working Party of the International Society of Blood Transfusion (ISBT) conducted a cross-sectional analysis of transfusion medicine-related Wikipedia articles up to 31 December 2024. Articles were identified using the search terms 'Transfusion', 'Blood Components', 'Blood Groups' and related topics. Data extracted and analysed included article metadata, content metrics, visibility indicators and editorial activity. A total of 190 Wikipedia pages related to transfusion medicine were identified, with an additional 14 domain-specific webpages. The most common categories were blood groups (15.3%), blood components (13.7%) and clinical transfusion medicine (11.6%). Nearly 50% of pages were created between 2006 and 2010. Only 21.6% of pages were classified as complete, while 48.4% remained in the development phase. This study uncovers significant gaps in transfusion medicine content on Wikipedia, with many articles found to be incomplete or poorly maintained. These findings present a clear opportunity for healthcare professionals, particularly members of the ISBT Clinical Transfusion Working Party's Wikipedia Subgroup, to enhance the quality, accuracy and accessibility of transfusion-related information through coordinated, collaborative editing efforts.
Wikipedia is one of the most widely accessed sources of scientific information globally, serving as a critical platform for public understanding and engagement with science. Despite its influence, many scientific societies are underrepresented or insufficiently described on the platform, limiting their visibility, outreach potential, and opportunities for public scholarship. This gap is particularly significant given Wikipedia's high placement in search engine results and its role in shaping perceptions of scientific institutions. To address this issue, a structured case study was conducted focusing on the Wikipedia article for the American Association for Anatomy (AAA). A baseline evaluation revealed major deficiencies in article structure, historical coverage, citation quality, and alignment with Wikipedia's editorial standards. Using a methodical editing process informed by science communication principles and content guidelines, the article was substantially revised. High-quality secondary sources were used to develop new sections on AAA's mission, governance, publications, awards, meetings, and outreach efforts. Following these edits, the article's classification improved from Stub to Start-class, its visibility and connectivity within Wikipedia increased, and the number of internal and external references, links, and structural elements grew substantially. To complement these objective improvements, a survey of anatomy stakeholders (n = 29) indicated increased perceptions of credibility, trust in the AAA, and educational usefulness. These outcomes demonstrate that scholarly engagement with Wikipedia can meaningfully enhance institutional presence and public accessibility of scientific knowledge. This case study offers a transparent and replicable model for scientific organizations and educators seeking to improve digital visibility and contribute to open-access science.
Individuals seeking health information often turn to the Internet for answers. Wikipedia is a dynamic, crowdsourced encyclopedia and one of the most accessed online sources for this content. However, the Spanish Wikipedia is not nearly as in-depth as the English version, creating a large disparity. Medical students with English and Spanish proficiency possess a distinct skill set that positions them to contribute timely, trusted, evidence-based content to the platform and reduce this inequity. This case study presents the implementation of a credit-bearing Spanish Wikipedia translation elective by the library for fourth-year medical students at Western Michigan University Homer Stryker M.D. School of Medicine, currently the only Spanish Wikipedia elective in a medical school in the United States. The purpose of the course is to increase the quality and readability of medical articles in the English and Spanish versions of the online encyclopedia using evidence-based medicine (EBM) principles. The output from this elective demonstrates that medical students can use their medical knowledge and skills to create and improve articles in English and Spanish on Wikipedia and disseminate evidence-based information to millions of consumers worldwide seeking reputable health information. Learners can leverage their specialized training to minimize the gap between these versions and become active participants in global health. By using technology to their advantage, they provide enduring health information that impacts and reaches many more people in a virtual setting than in a traditional one-on-one clinical encounter.
The inference of unstructured text semantics is a crucial preprocessing task for NLP and AI applications. Word sense disambiguation and entity linking tasks resolve ambiguous terms within unstructured text corpora to senses from a predefined knowledge source. Wikipedia has been one of the most popular sources due to its completeness, high link density, and multi-language support. In the context of chatbot-mediated consumption of information in recent years through implicit disambiguation and semantic representations in LLMs, Wikipedia remains an invaluable source and reference point. This survey covers methodologies for entity linking with Wikipedia, including early systems based on hyperlink statistics and semantic relatedness, methods using graph inference problem formalizations and graph label propagation algorithms, neural and contextual methods based on sense embeddings and transformers, and multimodal, cross-lingual, and cross-domain settings. Moreover, we cover semantic annotation workflows that facilitate the scaled-up use of Wikipedia-centric entity linking. We also provide an overview of the available datasets and evaluation measures. We discuss challenges such as partial coverage, NIL concepts, the level of sense definition, combining WSD and large-scale language models, as well as the complementary use of Wikidata.
Wikipedia is one of the most widely accessed sources of information worldwide, containing nearly 25,000 molecular entries spanning drugs, natural products, specialty chemicals, and other compounds. Despite its prominence, the chemical content of Wikipedia has not been systematically studied. In this work, we analyzed molecular entries and classified them into use categories, providing a first overview of their roles and applications. Structural diversity was examined using scaffold analysis and UMAP visualization, which revealed well-defined clusters corresponding to major chemical classes. In addition, Wikipedia pageview statistics were analyzed to explore the popularity of molecular entries. These data revealed a strong public focus on CNS-active drugs, recreational substances, and molecules with current medical or cultural relevance, while industrial and specialty chemicals attracted comparatively little attention. Overall, our findings show that Wikipedia offers both a chemically diverse and socially informative perspective on molecules, making it a unique resource at the intersection of chemistry, open data, and public knowledge.
A short survey was distributed to 40,402 authors of papers cited in Wikipedia (n=21,854 surveys sent, n=750 complete responses received). The survey gathered responses from published authors in relation to their views on Wikipedia's trustworthiness in relation to the citations to their published works. The unique findings of the survey were analysed using a mix of quantitative and qualitative methods using Python, Google BigQuery and Looker Studio. Overall, authors expressed positive sentiment towards research citation in Wikipedia and researcher engagement practices (mean scores >7/10). Sub-analyses revealed significant differences in sentiment based on publication type (articles vs. books) and discipline (Humanities and Social Sciences vs. Science, Technology, and Medicine), but not access status (open vs. closed access). This study provides unique insights into author perceptions of Wikipedia's trustworthiness. Further research is needed to deepen the understanding of the benefits for researchers and publishers including academic citations in Wikipedia.
To analyse the trend in use of the main antineoplastic agents (ANP) in Spain, to determine the association of this trend with the number of visits to the related pages in the Spanish edition of Wikipedia and to verify the existence of information aimed at reducing the associated risks of exposure to these drugs. This study had an ecological, descriptive cross-sectional design. The ANP for which more than 100,000 units were used per year in the Spanish Health System were included in the analyses. The trend in the use of these ANP and the number of visits to the pages for these ANP in Wikipedia were analysed using a regression model, and the correlation of these variables was evaluated. Fulfilment of the criteria related to medical-pharmaceutical information (MPI) and safety measure information (SMI) was determined. An increasing trend in the use of ANP was observed for the 9 ANP included in this study: paclitaxel, fluorouracil, azacitidine, oxaliplatin, rituximab, carboplatin, doxorubicin, etoposide, cyclophosphamide, and fluorouracil, which were the most commonly used ANP in the study period. Visits to the Wikipedia pages for the 9 ANP showed a decreasing trend, with an inverse relationship between use and visits to the related Wikipedia pages. Regarding MPI criteria, only the indication/use was included in all ANP pages, and no more than 30% of the remaining criteria were met, with the exception of rituximab, for which 50% of the remaining criteria were met. The SMI criteria were not fulfilled by the ANP pages; effects on fertility were included in 2 (22%) ANP pages and effects on pregnancy were included in 4 (44%) ANP pages. A molecular identifier appeared in 7 of the ANP pages. The consumption of ANP increased, whereas the population interest in visiting related Wikipedia pages decreased. Neither MFI nor SMI were readily available in the ANP articles (pages), including information on the risk of exposure to these dangerous drugs or how to reduce this risk.
Wikipedia is a vital open educational resource in computational biology; however, a significant knowledge gap exists between English and non-English Wikipedias. Reducing this knowledge gap via intensive editing events, or "editathons," would be beneficial in reducing language barriers that disadvantage learners whose native language is not English. Results: We present a framework to guide educators in organizing editathons for learners to improve and create relevant Wikipedia articles. As a case study, we present the results of an editathon held at the 2024 ISCB Latin America conference, in which ten new articles were created for the Spanish-language edition of Wikipedia. We also present a web tool, "compbio-on-wiki," which identifies relevant English Wikipedia articles missing in other languages. We demonstrate the value of editathons to expand the accessibility and visibility of computational biology content in multiple languages. Source code for the compbio-on-wiki Toolforge site is available at: https://github.com/lubianat/compbio-on-wiki.
BioWikiNet is a multilingual dataset describing biodiversity representation across 11 Wikipedia language editions selected for their global reach and relevance to biodiversity-rich regions. The dataset includes 1,266,215 taxonomic articles linked to 751,843 unique taxa from the GBIF Backbone Taxonomy, derived from January 2025 Wikipedia dumps and mapped via Wikidata identifiers. Each record contains article-level metadata (pageviews, edits, editors, creation dates, content metrics) combined with GBIF taxonomic classifications. The dataset provides 6,955,289 taxonomic hyperlinks connecting articles within and across language editions, along with three network-based indices-Species Connectivity Index, Core Index, and Excess Focus Index-that quantify the structural characteristics of taxonomic linkages. BioWikiNet enables transparent, reproducible analyses of biodiversity representation and editorial coverage across linguistic communities, serving as an open resource for biodiversity informatics, conservation culturomics, and multilingual knowledge equity research.
To cope with the large number of publications, more and more researchers are automatically extracting data of interest using natural language processing methods based on supervised learning. Much data, especially in the natural and engineering sciences, is quantitative, but there is a lack of datasets for identifying quantities and their context in text. To address this issue, we present two large datasets based on Wikipedia and Wikidata: Wiki-Quantities is a dataset consisting of over 1.2 million annotated quantities in the English-language Wikipedia. Wiki-Measurements is a dataset of 38 738 annotated quantities in the English-language Wikipedia along with their respective measured entity, property, and optional qualifiers. Manual validation of 100 samples each of Wiki-Quantities and Wiki-Measurements found 100% and 84-94% correct, respectively. The datasets can be used in pipeline approaches to measurement extraction, where quantities are first identified and then their measurement context. To allow reproduction of this work using newer or different versions of Wikipedia and Wikidata, we publish the code used to create the datasets along with the data.
暂无摘要(点击查看详情)
暂无摘要(点击查看详情)
Well-being attracts scholars' and policymakers' interests for decades. This study examines how respondents evaluate a "Good Life," "Happy Life," and "Meaningful Life" through the analysis of fictional celebrity articles using the supervised Indian Buffet Process (sIBP). We identify key patterns in well-being perception, highlighting the importance of artistic engagement, public influence, and career success across all three dimensions. While happiness is closely linked to career achievements and personal stability, meaning is driven by cultural and artistic contributions, and a good life balances both elements. Personal hardships negatively impact all three dimensions but are particularly detrimental to happiness. Conversely, creative contributions and public engagement enhance perceptions of a meaningful life. These findings suggest that external success and intrinsic fulfillment are both essential for well-being. Our approach demonstrates the value of computational text analysis in uncovering nuanced insights into societal conceptions of a fulfilling life, paving the way for further interdisciplinary research on well-being and perception.
The high cost of manual data labeling and privacy concerns result in a considerable dearth of medical annotations in non-English texts. Recent work by Frank and Kramer [1] introduces an unsupervised approach for constructing an ontology-annotated corpora from Wikipedia (https://www.wikidata.org) for German medical NER. We evaluate the proposed approach across English, German, Spanish, and French for medication and diagnosis entity recognition. Our multilabel corpora yield notable improvements in German medication detection under sparse annotations compared to the baseline, with consistent performance across other languages.
暂无摘要(点击查看详情)
暂无摘要(点击查看详情)
The scope of knowledge is constantly evolving, due to such factors as environmental changes, cultural evolution, and scientific discovery. Consequently, we are frequently confronted with gaps in our knowledge, compelling us to seek information from available sources. Sometimes the information we seek is easy to find; other times it has yet to be established by others, requiring us to creatively come up with an original perspective. Yet, little is known about how our foraging strategies change depending on the ease with which the information we seek is readily available. We investigated how the need to generate new ideas influences the rate at which individuals explore or exploit existing information. Participants (N=138) answered questions either fully answerable (low-creativity condition) or not fully answerable (high-creativity condition) with information they foraged for on Wikipedia. We created knowledge networks from the foraged information, wherein Wikipedia pages were nodes. The edges linked pairs of Wikipedia pages when they were visited by the participant either sequentially or within the same condition, and were weighted based on the semantic similarity between the pair of pages. This approach allowed us to measure exploration (jumping between disparate pages) and exploitation (viewing closely related pages). In the high-creativity condition, participants were more likely to trade-off between exploration (lower average edge weights) and exploitation (higher average clustering coefficients). This trade-off was associated with responses that were more novel, diverging further from the Wikipedia text, compared to less novel responses. These findings reveal how foraging strategies differ in creative versus non-creative contexts, and provide insight into the processes that underlie learning and scientific discovery.
With increasing division and conflict amongst groups with different opinions on social and political issues, there is a growing need to effectively manage intergroup conflict. The current paper examined the role of superordinate identities in facilitating-versus hindering-competing opinion-based groups to work through value-based intergroup conflict and reach value consensus. We examined interactions on Wikipedia as a novel, 'real-world' context where people with different opinions and perspectives work through disagreement guided by the rules and norms of a Wikipedian superordinate identity. We thematically analysed 22 discussion topics (comprising 9837 words) involving 21 editors on the Wikipedia talk page corresponding to the Indigenous Voice to Parliament article. Analyses revealed that supporters and opponents of the Voice often shared the same values but disagreed about how those values should be expressed (i.e., the implications of those values). Moreover, we found evidence that working through intergroup conflict involved perceiving value consensus-a process which was facilitated by a Wikipedian superordinate identity. The results highlight the conditions under which superordinate groups can productively structure disagreement and attenuate conflict between opinion-based groups.