In this paper I propose a micro-based innovation driven general equilibrium growth-model allowing for endogenous entry and exit as well as three different types of research. I make the novel distinction between three types of firms, namely basic and applied research where applied research is again differentiated into high and low type research to account for a heterogeneous firm environment. I further propose a new and flexible way to model basic research spillovers in this model class. While previous literature addresses the effect of R&D policies in general, my findings suggest, there is no optimal market solution in which basic research takes a noteworthy share. In order to counteract this undersupply and to take advantage of basic research spillovers, a sizable basic research subsidy is required to sufficiently crowd-in basic research which then allows for a substantial increase in welfare. My findings also suggest the necessity for purely state-financed basic research to approach the social planner solution in a feasible manner.
In the context of unprecedented U.S. Department of Defense (DoD) budgets, this paper examines the recent history of DoD funding for academic research in algorithmically based warfighting. We draw from a corpus of DoD grant solicitations from 2007 to 2023, focusing on those addressed to researchers in the field of artificial intelligence (AI). Considering the implications of DoD funding for academic research, the paper proceeds through three analytic sections. In the first, we offer a critical examination of the distinction between basic and applied research, showing how funding calls framed as basic research nonetheless enlist researchers in a war fighting agenda. In the second, we offer a diachronic analysis of the corpus, showing how a 'one small problem' caveat, in which affirmation of progress in military technologies is qualified by acknowledgement of outstanding problems, becomes justification for additional investments in research. We close with an analysis of DoD aspirations based on a subset of Defense Advanced Research Projects Agency (DARPA) grant solicitations for the use of AI in battlefield applications. Taken together, we argue that grant solicitations work as a vehi
Electronic health record (EHR) notes are dense medical documents containing large amounts of information, often filled with complex medical jargon. Highlighting all details in EHRs helps reduce the likelihood of missing crucial information by drawing attention to key content. This study proposes the design of a Cardiology Interface Terminology (CIT) to accurately highlight all details in EHR notes of cardiology patients. We introduce an innovative Machine Learning (ML) technique for the design of CIT. The ML technique requires training data. Manual preparation of such training data is time-consuming and expensive. The process of the CIT design includes three phases. In the first two phases, we innovatively derive a training data CIT to be used by the third phase, ML technique. We start by designing an initial CIT, composed of several components: the cardiology-related sub-hierarchies of SNOMED, other SNOMED concepts mined from EHRs of build set, and necessary components of terms e.g., medical abbreviations and medications. Utilizing an iterative process, fine-grained phrases containing initial CIT concepts are extracted from build set as CIT concept candidates. The candidate concep
Biomedical text embeddings have primarily been developed using research literature from PubMed, yet clinical cardiology practice relies heavily on procedural knowledge and specialized terminology found in comprehensive textbooks rather than research abstracts. This research practice gap limits the effectiveness of existing embedding models for clinical applications incardiology. This study trained CardioEmbed, a domain-specialized embedding model based on Qwen3-Embedding-8B, using contrastive learning on a curated corpus of seven comprehensive cardiology textbooks totaling approximately 150,000 sentences after deduplication. The model employs InfoNCE loss with in-batch negatives and achieves 99.60% retrieval accuracy on cardiac-specific semantic retrieval tasks, a +15.94 percentage point improvement over MedTE, the current state-of-the-art medical embedding model. On MTEB medical benchmarks, the model obtained BIOSSES 0.77 Spearman and SciFact 0.61 NDCG@10, indicating competitive performance on related biomedical domains. Domain-specialized training on comprehensive clinical textbooks yields near-perfect cardiology retrieval (99.60% Acc@1), improving over MedTE by +15.94 percentage
Software is at the core of most scientific discoveries today. Therefore, the quality of research results highly depends on the quality of the research software. Rigorous testing, as we know it from software engineering in the industry, could ensure the quality of the research software but it also requires a substantial effort that is often not rewarded in academia. Therefore, this research explores the effects of research software testing integrated into teaching on research software. In an in-vivo experiment, we integrated the engineering of a test suite for a large-scale network simulation as group projects into a course on software testing at the Blekinge Institute of Technology, Sweden, and qualitatively measured the effects of this integration on the research software. We found that the research software benefited from the integration through substantially improved documentation and fewer hardware and software dependencies. However, this integration was effortful and although the student teams developed elegant and thoughtful test suites, no code by students went directly into the research software since we were not able to make the integration back into the research software
This scientometric study analyzes Avian Influenza research from 2014 to 2023 using bibliographic data from the Web of Science database. We examined publication trends, sources, authorship, collaborative networks, document types, and geographical distribution to gain insights into the global research landscape. Results reveal a steady increase in publications, with high contributions from Chinese and American institutions. Journals such as PLoS One and the Journal of Virology published the highest number of studies, indicating their influence in this field. The most prolific institutions include the Chinese Academy of Sciences and the University of Hong Kong, while the College of Veterinary Medicine at South China Agricultural University emerged as the most productive department. China and the USA lead in publication volume, though developed nations like the United Kingdom and Germany exhibit a higher rate of international collaboration. "Articles" are the most common document type, constituting 84.6% of the total, while "Reviews" account for 7.6%. This study provides a comprehensive view of global trends in Avian Influenza research, emphasizing the need for collaborative efforts ac
An increasing number of countries and institutions are investing in large-scale research infrastructures (RIs) and in basic research. Scientific discoveries, which are expected thanks to RIs, may have a non-use value, in analogy with environmental and cultural public goods. This paper provides, for the first time, an empirical estimation of the willingness to pay (WTP) for discoveries in basic research by the general public. We focus on the Large Hadron Collider (LHC), the largest particle accelerator worldwide, where in 2012 the Higgs boson was discovered. Nobody knows the practical value of such discovery, beyond knowledge per se. The findings of our study are based on a dichotomous choice contingent valuation (CV) survey carried out in line with the NOAA guidelines. The survey involved 1,022 undergraduate students enrolled in more than 30 different degrees (including the humanities) at five universities located in four countries (Italy, France, Spain, UK). We ask two main research questions: Which are the determinants of the WTP for the LHC discoveries? What is the average contribution that the respondents would be willing to pay? Our results confirm that income, interest and at
Demographic data collection is essential in education research, as demographic data allows researchers to better describe the participant population they study and to contextualize findings. However, current research practices for neurodiversity demographics often rely on prescriptive methods (e.g., requiring participants to report official diagnoses) rather than allowing participants to self-identify. This approach can: a) not allow participants to express their intersecting identities in ways that are authentic; and b) limit trustworthiness and reliability of the data and interpretation. In addition, inconsistent dissemination and representation of demographic data across studies hinder the accessibility and usability of this work. Through a literature review of neurodivergent student experiences with learning and performing STEM, we identified widespread discrepancies in how demographic information is collected and reported. This paper explores how neurodivergent identities can be more accurately and inclusively represented in education research. We present findings of a thematic analysis on the ways neurodivergent demographic data collection is done in the literature using data
This paper presents a scientometric analysis of research output from the University of Lagos, focusing on the two decades spanning 2004 to 2023. Using bibliometric data retrieved from the Web of Science, we examine trends in publication volume, collaboration patterns, citation impact, and the most prolific authors, departments, and research domains at the university. The study reveals a consistent increase in research productivity, with the highest publication output recorded in 2023. Health Sciences, Engineering, and Social Sciences are identified as dominant fields, reflecting the university's interdisciplinary research strengths. Collaborative efforts, both locally and internationally, show a positive correlation with higher citation impact, with the United States and the United Kingdom being the leading international collaborators. Notably, open-access publications account for a significant portion of the university's research output, enhancing visibility and citation rates. The findings offer valuable insights into the university's research performance over the past two decades, providing a foundation for strategic planning and policy formulation to foster research excellence
The rapidly increasing volume of electronic health record (EHR) data underscores a pressing need to unlock biomedical knowledge from unstructured clinical texts to support advancements in data-driven clinical systems, including patient diagnosis, disease progression monitoring, treatment effects assessment, prediction of future clinical events, etc. While contextualized language models have demonstrated impressive performance improvements for named entity recognition (NER) systems in English corpora, there remains a scarcity of research focused on clinical texts in low-resource languages. To bridge this gap, our study aims to develop multiple deep contextual embedding models to enhance clinical NER in the cardiology domain, as part of the BioASQ MultiCardioNER shared task. We explore the effectiveness of different monolingual and multilingual BERT-based models, trained on general domain text, for extracting disease and medication mentions from clinical case reports written in English, Spanish, and Italian. We achieved an F1-score of 77.88% on Spanish Diseases Recognition (SDR), 92.09% on Spanish Medications Recognition (SMR), 91.74% on English Medications Recognition (EMR), and 88.
Cardiovascular diseases are becoming increasingly prevalent in modern society, with a profound impact on global health and well-being. These Cardiovascular disorders are complex and multifactorial, influenced by genetic predispositions, lifestyle choices, and diverse socioeconomic and clinical factors. Information about these interrelated factors is dispersed across multiple types of textual data, including patient narratives, medical records, and scientific literature. Natural language processing (NLP) has emerged as a powerful approach for analysing such unstructured data, enabling healthcare professionals and researchers to gain deeper insights that may transform the diagnosis, treatment, and prevention of cardiac disorders. This review provides a comprehensive overview of NLP research in cardiology from 2014 to 2025. We systematically searched six literature databases for studies describing NLP applications across a range of cardiovascular diseases. After a rigorous screening process, we identified 265 relevant articles. Each study was analysed across multiple dimensions, including NLP paradigms, cardiology-related tasks, disease types, and data sources. Our findings reveal sub
Heart diseases remain a leading cause of morbidity and mortality worldwide, necessitating accurate and trustworthy differential diagnosis. However, existing artificial intelligence-based diagnostic methods are often limited by insufficient cardiology knowledge, inadequate support for complex reasoning, and poor interpretability. Here we present HeartAgent, a cardiology-specific agent system designed to support a reliable and explainable differential diagnosis. HeartAgent integrates customized tools and curated data resources and orchestrates multiple specialized sub-agents to perform complex reasoning while generating transparent reasoning trajectories and verifiable supporting references. Evaluated on the MIMIC dataset and a private electronic health records cohort, HeartAgent achieved over 36% and 20% improvements over established comparative methods, in top-3 diagnostic accuracy, respectively. Additionally, clinicians assisted by HeartAgent demonstrated gains of 26.9% in diagnostic accuracy and 22.7% in explanatory quality compared with unaided experts. These results demonstrate that HeartAgent provides reliable, explainable, and clinically actionable decision support for cardio
Research and development (R&D) of countries play a major role in a long-term development of the economy. We measure the R&D efficiency of all 28 member countries of the European Union in the years 2008--2014. Super-efficient data envelopment analysis (DEA) based on robustness of classification into efficient and inefficient units is adopted. We use the number of citations as output of basic research, the number of patents as output of applied research and R&D expenditures with manpower as inputs. To meet DEA assumptions and to capture R&D characteristics, we analyze a homogeneous sample of countries, adjust prices using purchasing power parity and consider time lag between inputs and outputs. We find that the efficiency of general R&D is higher for countries with higher GDP per capita. This relation also holds for specialized efficiencies of basic and applied research. However, it is much stronger for applied research suggesting its outputs are more easily distinguished and captured. Our findings are important in the evaluation of research and policy making.
As Engineering Education Research (EER) develops as a discipline it is necessary for EER scholars to contribute to the development of learning theory rather than simply being informed by it. It has been suggested that to do this effectively will require partnerships between Engineering scholars and psychologists, education researchers, including other social scientists. The formation of such partnerships is particularly important when considering the introduction of business-related skills into engineering curriculum designed to prepare 21st Century Engineering Students for workplace challenges. In order to encourage scholars beyond Engineering to engage with EER, it is necessary to provide an introduction to the complexities of EER. With this aim in mind, this paper provides an outline review of what is considered rigorous research from an EER perspective as well as highlighting some of the core methodological traditions of EER. The paper aims to facilitate further discussion between EER scholars and researchers from other disciplines, ultimately leading to future collaboration on innovative and rigorous EER.
Recently, significant efforts have been made to create Health Digital Twins (HDTs), digital twins for clinical applications. Heart modeling is one of the fastest-growing fields, which favors the effective application of HDTs. The clinical application of HDTs will be increasingly widespread in the future of healthcare services and has a huge potential to form part of the mainstream in medicine. However, it requires the development of both models and algorithms for the analysis of medical data, and advances in Artificial Intelligence (AI) based algorithms have already revolutionized image segmentation processes. Precise segmentation of lesions may contribute to an efficient diagnostics process and a more effective selection of targeted therapy. In this paper, a brief overview of recent achievements in HDT technologies in the field of cardiology, including interventional cardiology was conducted. HDTs were studied taking into account the application of Extended Reality (XR) and AI, as well as data security, technical risks, and ethics-related issues. Special emphasis was put on automatic segmentation issues. It appears that improvements in data processing will focus on automatic segme
Physics education researchers (PER) often analyze student data with single-level regression models (e.g., linear and logistic regression). However, education datasets can have hierarchical structures, such as students nested within courses, that single-level models fail to account for. The improper use of single-level models to analyze hierarchical datasets can lead to biased findings. Hierarchical models (a.k.a., multi-level models) account for this hierarchical nested structure in the data. In this publication, we outline the theoretical differences between how single-level and multi-level models handle hierarchical datasets. We then present analysis of a dataset from 112 introductory physics courses using both multiple linear regression and hierarchical linear modeling to illustrate the potential impact of using an inappropriate analytical method on PER findings and implications. Research can leverage multi-institutional datasets to improve the field's understanding of how to support student success in physics. There is no post hoc fix, however, if researchers use inappropriate single-level models to analyze multi-level datasets. To continue developing reliable and generalizable
Domain-specific text embeddings are critical for clinical natural language processing, yet systematic comparisons across model architectures remain limited. This study evaluates ten transformer-based embedding models adapted for cardiology through Low-Rank Adaptation (LoRA) fine-tuning on 106,535 cardiology text pairs derived from authoritative medical textbooks. Results demonstrate that encoder-only architectures, particularly BioLinkBERT, achieve superior domain-specific performance (separation score: 0.510) compared to larger decoder-based models, while requiring significantly fewer computational resources. The findings challenge the assumption that larger language models necessarily produce better domain-specific embeddings and provide practical guidance for clinical NLP system development. All models, training code, and evaluation datasets are publicly available to support reproducible research in medical informatics.
This article investigates how translation memories (TM) can be created by translators or other language professionals in order to compile domain-specific parallel corpora , which can then be used in different scenarios, such as machine translation training and fine-tuning, TM leveraging, and/or large language model fine-tuning. The article introduces a semi-automatic TM preparation methodology leveraging primarily translation tools used by translators in favor of data quality and control by the translators. This semi-automatic methodology is then used to build a cardiology-based Turkish -> English corpus from bilingual abstracts of Turkish cardiology journals. The resulting corpus called TRENCARD Corpus has approximately 800,000 source words and 50,000 sentences. Using this methodology, translators can build their custom TMs in a reasonable time and use them in their bilingual data requiring tasks.
Drawing on 1,178 safety and reliability papers from 9,439 generative AI papers (January 2020 - March 2025), we compare research outputs of leading AI companies (Anthropic, Google DeepMind, Meta, Microsoft, and OpenAI) and AI universities (CMU, MIT, NYU, Stanford, UC Berkeley, and University of Washington). We find that corporate AI research increasingly concentrates on pre-deployment areas -- model alignment and testing & evaluation -- while attention to deployment-stage issues such as model bias has waned. Significant research gaps exist in high-risk deployment domains, including healthcare, finance, misinformation, persuasive and addictive features, hallucinations, and copyright. Without improved observability into deployed AI, growing corporate concentration could deepen knowledge deficits. We recommend expanding external researcher access to deployment data and systematic observability of in-market AI behaviors.
Background: Citation analysis has become an important tool for research performance assessment in the medical sciences. However, different areas of medical research may have considerably different citation practices, even within the same medical field. Because of this, it is unclear to what extent citation-based bibliometric indicators allow for valid comparisons between research units active in different areas of medical research. Methodology: A visualization methodology is introduced that reveals differences in citation practices between medical research areas. The methodology extracts terms from the titles and abstracts of a large collection of publications and uses these terms to visualize the structure of a medical field and to indicate how research areas within this field differ from each other in their average citation impact. Results: Visualizations are provided for 32 medical fields, defined based on journal subject categories in the Web of Science database. The analysis focuses on three fields. In each of these fields, there turn out to be large differences in citation practices between research areas. Low-impact research areas tend to focus on clinical intervention resea