The contributions of model complexity, data volume, and feature modalities to knowledge graph-based drug repurposing remain poorly quantified under rigorous temporal validation. We constructed a pharmacology knowledge graph from ChEMBL 36 comprising 5,348 entities including 3,127 drugs, 1,156 proteins, and 1,065 indications. A strict temporal split was enforced with training data up to 2022 and testing data from 2023 to 2025, together with biologically verified hard negatives mined from failed assays and clinical trials. We benchmarked five knowledge graph embedding models and a standard graph neural network with 3.44 million parameters that incorporates drug chemical structure using a graph attention encoder and ESM-2 protein embeddings. Scaling experiments ranging from 0.78 to 9.75 million parameters and from 25 to 100 percent of the data, together with feature ablation studies, were used to isolate the contributions of model capacity, graph density, and node feature modalities. Removing the graph attention based drug structure encoder and retaining only topological embeddings combined with ESM-2 protein features improved drug protein PR-AUC from 0.5631 to 0.5785 while reducing V
Physics-Informed Kolmogorov-Arnold Networks (PIKANs) are gaining attention as an effective counterpart to the original multilayer perceptron-based Physics-Informed Neural Networks (PINNs). Both representation models can address inverse problems and facilitate gray-box system identification. However, a comprehensive understanding of their performance in terms of accuracy and speed remains underexplored. In particular, we introduce a modified PIKAN architecture, tanh-cPIKAN, which is based on Chebyshev polynomials for parametrization of the univariate functions with an extra nonlinearity for enhanced performance. We then present a systematic investigation of how choices of the optimizer, representation, and training configuration influence the performance of PINNs and PIKANs in the context of systems pharmacology modeling. We benchmark a wide range of first-order, second-order, and hybrid optimizers, including various learning rate schedulers. We use the new Optax library to identify the most effective combinations for learning gray-boxes under ill-posed, non-unique, and data-sparse conditions. We examine the influence of model architecture (MLP vs. KAN), numerical precision (single
The morphological form of a word can often give cues to its meaning, but purely relying on these mappings can lead to overgeneralization in high-stakes domains. In the medical domain, for instance, LLMs can confidently reason about fictitious drugs from their affixes alone (e.g., wugcillin) and generate plausible-looking clinical content. We present a behavioral and mechanistic study of LLM "affix heuristics" in pharmacology. Using fictitious drug names built from real affixes, we show that affix signals alone elicit class-level pharmacological responses. We introduce a framework for identifying whether a model's drug semantics are driven mainly by the affix, the stem, or the drug name as a whole. Applied across 653 drugs, our framework reveals that models often induce drug meaning primarily through affix cues, yet rarely explicitly indicate this reliance, and sometimes incorrectly conflate properties among affix-sharing drugs. Activation patching across models further localizes this behavior to early-mid layers. These findings show that morphological shortcuts pose a subtle but measurable risk to safety.
Chronic superficial gastritis (CSG) severely affects quality of life and can progress to worse gastric pathologies. Traditional Chinese Medicine (TCM) effectively treats CSG, as exemplified by Jinhong Tablets (JHT) with known anti-inflammatory properties, though their mechanism remains unclear. This study integrated network pharmacology, untargeted metabolomics, and gut microbiota analyses to investigate how JHT alleviates CSG. A rat CSG model was established and evaluated via H&E staining. We identified JHT's target profiles and constructed a multi-layer biomolecular network. Differential metabolites in plasma were determined by untargeted metabolomics, and gut microbiota diversity/composition in fecal and cecal samples was assessed via 16S rRNA sequencing. JHT markedly reduced gastric inflammation. Network pharmacology highlighted metabolic pathways, particularly lipid and nitric oxide metabolism, as essential to JHT's therapeutic effect. Metabolomics identified key differential metabolites including betaine (enhancing gut microbiota), phospholipids, and citrulline (indicating severity of CSG). Pathway enrichment supported the gut microbiota's involvement. Further microbiota
Large language models (LLMs) have shown strong empirical performance across pharmacology and drug discovery tasks, yet the internal mechanisms by which they encode pharmacological knowledge remain poorly understood. In this work, we investigate how drug-group semantics are represented and retrieved within Llama-based biomedical language models using causal and probing-based interpretability methods. We apply activation patching to localize where drug-group information is stored across model layers and token positions, and complement this analysis with linear probes trained on token-level and sum-pooled activations. Our results demonstrate that early layers play a key role in encoding drug-group knowledge, with the strongest causal effects arising from intermediate tokens within the drug-group span rather than the final drug-group token. Linear probing further reveals that pharmacological semantics are distributed across tokens and are already present in the embedding space, with token-level probes performing near chance while sum-pooled representations achieve maximal accuracy. Together, these findings suggest that drug-group semantics in LLMs are not localized to single tokens but
A fundamental mistake in receptor theory has led to an enduring misunderstanding of how to estimate the affinity and efficacy of an agonist. These properties are inextricably linked and cannot be easily separated in any case where the binding of a ligand induces a conformation change in its receptor. Consequently, binding curves and concentration-response relationships for receptor agonists have no straightforward interpretation. This problem, the affinity-efficacy problem, remains overlooked and misunderstood despite it being recognised in 1987. To avoid the further propagation of this misunderstanding, we propose that the affinity-efficacy problem should be included in the core curricula for pharmacology undergraduates proposed by the British Pharmacological Society and IUPHAR.
Small Angle Neutron Scattering (SANS) is a non-destructive technique utilized to probe the nano- to mesoscale structure of materials by analyzing the scattering pattern of neutrons. Accelerating SANS acquisition for in-situ analysis is essential, but it often reduces the signal-to-noise ratio (SNR), highlighting the need for methods to enhance SNR even with short acquisition times. While deep learning (DL) can be used for enhancing SNR of low quality SANS, the amount of experimental data available for training is usually severely limited. We address this issue by proposing a Plug-and-play Restoration for SANS (PR-SANS) that uses domain-adapted priors. The prior in PR-SANS is initially trained on a set of generic images and subsequently fine-tuned using a limited amount of experimental SANS data. We present a theoretical convergence analysis of PR-SANS by focusing on the error resulting from using inexact domain-adapted priors instead of the ideal ones. We demonstrate with experimentally collected SANS data that PR-SANS can recover high-SNR 2D SANS detector images from low-SNR detector images, effectively increasing the SNR. This advancement enables a reduction in acquisition times
With many advancements in in silico biology in recent years, the paramount challenge is to translate the accumulated knowledge into exciting industry partnerships and clinical applications. Achieving models that characterize the link of molecular interactions to the activity and structure of a whole organ are termed multiscale biophysics. Historically, the pharmaceutical industry has worked well with in silico models by leveraging their prediction capabilities for drug testing. However, the needed higher fidelity and higher resolution of models for efficient prediction of pharmacological phenomenon dictates that in silico approaches must account for the verifiable multiscale biophysical phenomena, as a spatial and temporal dimension variation for different processes and models. The collection of different multiscale models for different tissues and organs can compose digital twin solutions towards becoming a service for researchers, clinicians, and drug developers. Our paper has two main goals: 1) To clarify to what extent detailed single- and multiscale modeling has been accomplished thus far, we provide a review on this topic focusing on the biophysics of epithelial, cardiac, and
Natural language processing (NLP) is an area of artificial intelligence that applies information technologies to process the human language, understand it to a certain degree, and use it in various applications. This area has rapidly developed in the last few years and now employs modern variants of deep neural networks to extract relevant patterns from large text corpora. The main objective of this work is to survey the recent use of NLP in the field of pharmacology. As our work shows, NLP is a highly relevant information extraction and processing approach for pharmacology. It has been used extensively, from intelligent searches through thousands of medical documents to finding traces of adversarial drug interactions in social media. We split our coverage into five categories to survey modern NLP methodology, commonly addressed tasks, relevant textual data, knowledge bases, and useful programming libraries. We split each of the five categories into appropriate subcategories, describe their main properties and ideas, and summarize them in a tabular form. The resulting survey presents a comprehensive overview of the area, useful to practitioners and interested observers.
3D Referring Expression Segmentation (3D-RES) aims to segment 3D objects by correlating referring expressions with point clouds. However, traditional approaches frequently encounter issues like over-segmentation or mis-segmentation, due to insufficient emphasis on spatial information of instances. In this paper, we introduce a Rule-Guided Spatial Awareness Network (RG-SAN) by utilizing solely the spatial information of the target instance for supervision. This approach enables the network to accurately depict the spatial relationships among all entities described in the text, thus enhancing the reasoning capabilities. The RG-SAN consists of the Text-driven Localization Module (TLM) and the Rule-guided Weak Supervision (RWS) strategy. The TLM initially locates all mentioned instances and iteratively refines their positional information. The RWS strategy, acknowledging that only target objects have supervised positional information, employs dependency tree rules to precisely guide the core instance's positioning. Extensive testing on the ScanRefer benchmark has shown that RG-SAN not only establishes new performance benchmarks, with an mIoU increase of 5.1 points, but also exhibits si
Dietary flavonoids associate with disease prevention in epidemiological studies, yet their polypharmacological mechanisms remain unclear. We establish network pharmacology as a systematic framework to characterize flavonoid therapeutic properties through integrated computational, experimental, and epidemiological validation. We constructed a master network of 17,869 human proteins, 14 dietary flavonoids, and 1,496 FDA-approved drugs with 278,768 interactions. Flavonoids averaged 45.3 target proteins per compound compared to 16.8 for FDA-approved drugs (2.7-fold higher; p=7.5x10^-4), reflecting multi-target architecture. Statistical analysis revealed that 71.4% of flavonoids targeted proteins associated with cardiovascular drugs and 78.6% aligned with antineoplastic drug targets. MTT-based Jurkat cell assays confirmed network predictions: high-association flavonoids (luteolin LC50=31.4 microM, myricetin=29.5 microM) produced strong cytotoxicity, while low-association flavonoids showed minimal activity (LC50>200 microM). Network-predicted association strengths correlated with experimental bioactivity (Pearson r=0.918; R^2=0.843). We translated network associations into food-level
The effects of social influence and homophily suggest that both network structure and node attribute information should inform the tasks of link prediction and node attribute inference. Recently, Yin et al. proposed Social-Attribute Network (SAN), an attribute-augmented social network, to integrate network structure and node attributes to perform both link prediction and attribute inference. They focused on generalizing the random walk with restart algorithm to the SAN framework and showed improved performance. In this paper, we extend the SAN framework with several leading supervised and unsupervised link prediction algorithms and demonstrate performance improvement for each algorithm on both link prediction and attribute inference. Moreover, we make the novel observation that attribute inference can help inform link prediction, i.e., link prediction accuracy is further improved by first inferring missing attributes. We comprehensively evaluate these algorithms and compare them with other existing algorithms using a novel, large-scale Google+ dataset, which we make publicly available.
Peptide strings have been developed as a concept for the past fourteen years. They are proposed to basically entail various quantum states engendered by the physical interactions of proteins containing peptide portions that are similar to one another and, moreover, amino acid sequences that are sterically complementary to each other. In this survey, additional insights are presented that support the notion that peptide strings are likely a biophysical phenomenon that warrants further investigations. Specifically, these putative peptide strings traits are the capacity for wave-like interferences and electric current-like properties. Therefore, future experimental validation and quantification of these predicted features as well as application of this potential energy that is stored in the distinct shapes of proteins and peptides may prove beneficial in addressing challenges in physics and medicine. Within the latter field, the key discipline of pharmacology should be most fundamentally advanced by the peptide strings approach.
Modeling a system's temporal behaviour in reaction to external stimuli is a fundamental problem in many areas. Pure Machine Learning (ML) approaches often fail in the small sample regime and cannot provide actionable insights beyond predictions. A promising modification has been to incorporate expert domain knowledge into ML models. The application we consider is predicting the progression of disease under medications, where a plethora of domain knowledge is available from pharmacology. Pharmacological models describe the dynamics of carefully-chosen medically meaningful variables in terms of systems of Ordinary Differential Equations (ODEs). However, these models only describe a limited collection of variables, and these variables are often not observable in clinical environments. To close this gap, we propose the latent hybridisation model (LHM) that integrates a system of expert-designed ODEs with machine-learned Neural ODEs to fully describe the dynamics of the system and to link the expert and latent variables to observable quantities. We evaluated LHM on synthetic data as well as real-world intensive care data of COVID-19 patients. LHM consistently outperforms previous works,
This study presents a use-case of a network of low-cost acoustic smart sensors deployed in the city of Pamplona to analyse changes in the urban soundscape during the San Fermin Festival. The sensors were installed in different areas of the city before, during, and after the event, capturing continuous acoustic data. Our analysis reveals a significant transformation in the city's sonic environment during the festive period: overall sound pressure levels increase significantly, soundscape patterns change, and the acoustic landscape becomes dominated by sounds associated with human activity. These findings highlight the potential of distributed smart acoustic monitoring systems to characterize the temporal dynamics of urban soundscapes and underscore how the large-scale event of San Fermin drastically reshapes the overall acoustic dynamics of the city of Pamplona. Additionally, to complement the objective measurements, a curated collection of real San Fermin sound recordings has been created and made publicly available, preserving the festival's unique sonic heritage.
The integration of large language models into public transit systems represents a significant advancement in urban transportation management and passenger experience. This study examines the impact of LLMs within San Antonio's public transit system, leveraging their capabilities in natural language processing, data analysis, and real time communication. By utilizing GTFS and other public transportation information, the research highlights the transformative potential of LLMs in enhancing route planning, reducing wait times, and providing personalized travel assistance. Our case study is the city of San Antonio as part of a project aiming to demonstrate how LLMs can optimize resource allocation, improve passenger satisfaction, and support decision making processes in transit management. We evaluated LLM responses to questions related to both information retrieval and also understanding. Ultimately, we believe that the adoption of LLMs in public transit systems can lead to more efficient, responsive, and user-friendly transportation networks, providing a model for other cities to follow.
Analyzing the structure and function of urban transportation networks is critical for enhancing mobility, equity, and resilience. This paper leverages network science to conduct a multi-modal analysis of San Diego's transportation system. We construct a multi-layer graph using data from OpenStreetMap (OSM) and the San Diego Metropolitan Transit System (MTS), representing driving, walking, and public transit layers. By integrating thousands of Points of Interest (POIs), we analyze network accessibility, structure, and resilience through centrality measures, community detection, and a proposed metric for walkability. Our analysis reveals a system defined by a stark core-periphery divide. We find that while the urban core is well-integrated, 30.3% of POIs are isolated from public transit within a walkable distance, indicating significant equity gaps in suburban and rural access. Centrality analysis highlights the driving network's over-reliance on critical freeways as bottlenecks, suggesting low network resilience, while confirming that San Diego is not a broadly walkable city. Furthermore, community detection demonstrates that transportation mode dictates the scale of mobility, produ
Many multi-genic systemic diseases such as neurological disorders, inflammatory diseases, and the majority of cancers do not have effective treatments yet. Reinforcement learning powered systems pharmacology is a potentially effective approach to design personalized therapies for untreatable complex diseases. In this survey, state-of-the-art reinforcement learning methods and their latest applications to drug design are reviewed. The challenges on harnessing reinforcement learning for systems pharmacology and personalized medicine are discussed. Potential solutions to overcome the challenges are proposed. In spite of successful application of advanced reinforcement learning techniques to target-based drug discovery, new reinforcement learning strategies are needed to address systems pharmacology-oriented personalized de novo drug design.
In recent years, considerable research has been dedicated to the application of neural models in the field of natural language generation (NLG). The primary objective is to generate text that is both linguistically natural and human-like, while also exerting control over the generation process. This paper offers a comprehensive and task-agnostic survey of the recent advancements in neural text generation. These advancements have been facilitated through a multitude of developments, which we categorize into four key areas: data construction, neural frameworks, training and inference strategies, and evaluation metrics. By examining these different aspects, we aim to provide a holistic overview of the progress made in the field. Furthermore, we explore the future directions for the advancement of neural text generation, which encompass the utilization of neural pipelines and the incorporation of background knowledge. These avenues present promising opportunities to further enhance the capabilities of NLG systems. Overall, this survey serves to consolidate the current state of the art in neural text generation and highlights potential avenues for future research and development in this
Rédei and san Pedro discuss my "Comparing Causality Principles," their main aim being to distinguish reasonable weakened versions of two causality principles presented there, "SO1" and "SO2". They also argue that the proof that SO1 implies SO2 contains a flaw. Here, a reply is made to a number of points raised in their paper. It is argued that the "intuition" that SO1 should be stronger than SO2 is implicitly based on a false premise. It is pointed out that a similar weakening of SO2 was already considered in the original paper. The technical definition of the new conditions is shown to be defective. The argument against the stronger versions of SO1 and SO2 given by Rédei and san Pedro is criticised. The flaw in the original proof is shown to be an easily corrected mistake in the wording. Finally, it is argued that some cited results on causal conditions in AQFT have little relevance to these issues, and are, in any case, highly problematic in themselves.