The role of Artificial Intelligence (AI) is growing in every stage of drug development. Nevertheless, a major challenge in drug discovery AI remains: Drug pharmacokinetic (PK) and Drug-Target Interaction (DTI) datasets collected in different studies often exhibit limited overlap, creating data overlap sparsity. Thus, data curation becomes difficult, negatively impacting downstream research investigations in high-throughput screening, polypharmacy, and drug combination. We propose xImagand-DKI, a novel SMILES/Protein-to-Pharmacokinetic/DTI (SP2PKDTI) diffusion model capable of generating an array of PK and DTI target properties conditioned on SMILES and protein inputs that exhibit data overlap sparsity. We infuse additional molecular and genomic domain knowledge from the Gene Ontology (GO) and molecular fingerprints to further improve our model performance. We show that xImagand-DKI-generated synthetic PK data closely resemble real data univariate and bivariate distributions, and can adequately fill in gaps among PK and DTI datasets. As such, xImagand-DKI is a promising solution for data overlap sparsity and may improve performance for downstream drug discovery research tasks. Code
Environmental and mental conditions are known risk factors for dermatitis and symptoms of skin inflammation, but their interplay is difficult to quantify; epidemiological studies rarely include both, along with possible confounding factors. Infodemiology leverages large online data sets to address this issue, but fusing them produces strong patterns of spatial and temporal correlation, missingness, and heterogeneity. In this paper, we design a causal network that correctly models these complex structures in large-scale infodemiological data from Google, EPA, NOAA and US Census (434 US counties, 134 weeks). Our model successfully captures known causal relationships between weather patterns, pollutants, mental conditions, and dermatitis. Key findings reveal that anxiety accounts for 57.4% of explained variance in dermatitis, followed by NO2 (33.9%), while environmental factors show significant mediation effects through mental conditions. The model predicts that reducing PM2.5 emissions by 25% could decrease dermatitis prevalence by 18%. Through statistical validation and causal inference, we provide unprecedented insights into the complex interplay between environmental and mental he
Accurate drug target affinity prediction can improve drug candidate selection, accelerate the drug discovery process, and reduce drug production costs. Previous work focused on traditional fingerprints or used features extracted based on the amino acid sequence in the protein, ignoring its 3D structure which affects its binding affinity. In this work, we propose GraphPrint: a framework for incorporating 3D protein structure features for drug target affinity prediction. We generate graph representations for protein 3D structures using amino acid residue location coordinates and combine them with drug graph representation and traditional features to jointly learn drug target affinity. Our model achieves a mean square error of 0.1378 and a concordance index of 0.8929 on the KIBA dataset and improves over using traditional protein features alone. Our ablation study shows that the 3D protein structure-based features provide information complementary to traditional features.
In this work we develop and investigate mathematical and computational models that describe drug delivery from a contact lens during wear. Our models are designed to predict the dynamics of drug release from the contact lens and subsequent transport into the adjacent pre-lens tear film and post-lens tear film as well as into the ocular tissue (e.g. cornea), into the eyelid, and out of these regions. These processes are modeled by one dimensional diffusion out of the lens coupled to compartment-type models for drug concentrations in the various accompanying regions. In addition to numerical solutions that are compared with experimental data on drug release in an in vitro eye model, we also identify a large diffusion limit model for which analytical solutions can be written down for all quantities of interest, such as cumulative release of the drug from the contact lens. We use our models to make assessments about possible mechanisms and drug transport pathways through the pre-lens and post-lens tear films and provide interpretation of experimental observations. We discuss successes and limitations of our models as well as their potential to guide further research to help understand
Recent advances in large language models (LLMs) have shown great potential to accelerate drug discovery. However, the specialized nature of biochemical data often necessitates costly domain-specific fine-tuning, posing major challenges. First, it hinders the application of more flexible general-purpose LLMs for cutting-edge drug discovery tasks. More importantly, it limits the rapid integration of the vast amounts of scientific data continuously generated through experiments and research. Compounding these challenges is the fact that real-world scientific questions are typically complex and open-ended, requiring reasoning beyond pattern matching or static knowledge retrieval.To address these challenges, we propose CLADD, a retrieval-augmented generation (RAG)-empowered agentic system tailored to drug discovery tasks. Through the collaboration of multiple LLM agents, CLADD dynamically retrieves information from biomedical knowledge bases, contextualizes query molecules, and integrates relevant evidence to generate responses - all without the need for domain-specific fine-tuning. Crucially, we tackle key obstacles in applying RAG workflows to biochemical data, including data heteroge
Longer timelines and lower success rates of drug candidates limit the productivity of clinical trials in the pharmaceutical industry. Promising de novo drug design techniques help solve this by exploring a broader chemical space, efficiently generating new molecules, and providing improved therapies. However, optimizing for molecular characteristics found in approved oral drugs remains a challenge, limiting de novo usage. In this work, we propose NovoMol, a novel de novo method using recurrent neural networks to mass-generate drug molecules with high oral bioavailability, increasing clinical trial time efficiency. Molecules were optimized for desirable traits and ranked using the quantitative estimate of drug-likeness (QED). Generated molecules meeting QED's oral bioavailability threshold were used to retrain the neural network, and, after five training cycles, 76% of generated molecules passed this strict threshold and 96% passed the traditionally used Lipinski's Rule of Five. The trained model was then used to generate specific drug candidates for the cancer-related PDGFRα receptor and 44% of generated candidates had better binding affinity than the current state-of-the-art drug,
Bayesian dynamic borrowing (BDB) and synthetic control methods (SCM) are both used in clinical trial design when recruitment, retention, or allocation is a challenge. The performance of these approaches has not previously been directly compared due to differences in application, product, and measurement metrics. This study aims to conduct a comparison of power and type 1 error rates of BDB (using meta-analytic predictive prior (MAP)) and SCM using a case study of Pediatric Atopic Dermatitis. Six historical randomised control trials were selected for use in both the creation of the MAP prior and synthetic control arm. The R library RBesT was used to create a MAP prior and the R library Synthpop was used to create a synthetic control arm for the SCM. Power and type 1 error rate were used as comparison metrics. BDB produced a power of 0.580 and a type 1 error rate of 0.026. SCM produced a power of 0.641 and a type 1 error rate of 0.027. In this case study, the SCM model produced a higher power than the BDB method with a similar type 1 error rate. However, the decision to use SCM or BDB should come from the specific needs of the potential trial, since their power and type 1 error rate
Dexterous grasping depends on contact regulation, not motion alone. Stable manipulation requires fingers to maintain appropriate object loading as contacts slip, deform, or become visually occluded. Existing cross-embodiment dexterous policies unify motion through retargeted hand poses or latent actions, but force feedback remains tied to each hand's sensing and actuation, limiting transfer. This work introduces a cross-embodiment force-position interface for contact-aware manipulation across heterogeneous dexterous hands. Motion intent is represented in a shared hand-pose latent, while each hand's effort signal is calibrated through system identification into physical joint torque in N.m. These torques are mapped to fingertip forces and compact per-finger load descriptors, giving the policy comparable observations of where the hand should move and how the object is loaded. Using this interface, a flow-matching visuomotor policy is trained on vision, proprioception, and calibrated contact, with structured visual masking that encourages reliance on force under grasp-relevant occlusion. The same calibrated signal drives a hybrid force-position controller for demonstration collection
Objective: This study aims to improve the reliability and robustness of medical named entity recognition (NER) in Chinese atopic dermatitis (AD) clinical texts through explanation-guided learning. Methods: We propose a stability and boundary-aware explanation-guided NER framework. Perturbation-based analysis is used to evaluate explanation stability and entity boundary sensitivity. An adaptive fusion strategy dynamically combines local and global explanation to generate more reliable token-level explanations. The fused explanation signals are further incorporated into model training through stability, boundary-aware, and consistency constraints. Results: Experiments on Chinese AD NER datasets show that the proposed framework improves explanation robustness and achieves consistent performance gains across multiple NER models. The adaptive fusion strategy also provides more stable explanations and stronger boundary perception than individual explanation methods. Conclusion: The proposed method effectively integrates reliable explanation signals into medical NER training, improving both recognition performance and explanation reliability. The framework provides a practical and general
Despite all the advances achieved in the field of tumor-biology research, in most cases conventional therapies including chemotherapy are still the leading choices. The main disadvantage of these treatments, in addition to the low solubility of many antitumor drugs, is their lack of specificity, which explains the frequent occurrence of serious side effects due to nonspecific drug uptake by healthy cells. Progress in nanotechnology and its application in medicine have provided new opportunities and different smart systems. Such systems can improve the intracellular delivery of the drugs due to their multifunctionality and targeting potential. The purpose of this manuscript is to review and analyze the recent progress made in nanotherapy applied to cancer treatment. First, we provide a global overview of cancer and different smart nanoparticles currently used in oncology. Then, we analyze in detail the development of drug-delivery strategies in cancer therapy, focusing mainly on the intravenously administered smart nanoparticles with protein corona to avoid immune-system clearance. Finally, we discuss the challenges, clinical trials, and future directions of the nanoparticle-based t
The amendments made to the Data 3 Act and impact of COVID-19 have fostered the growth of digital healthcare market and promoted the use of medical data in artificial intelligence in South Korea. Atopic dermatitis, a chronic inflammatory skin disease, is diagnosed via subjective evaluations without using objective diagnostic methods, thereby increasing the risk of misdiagnosis. It is also similar to psoriasis in appearance, further complicating its accurate diagnosis. Existing studies on skin diseases have used high-quality dermoscopic image datasets, but such high-quality images cannot be obtained in actual clinical settings. Moreover, existing systems must ensure accuracy and fast response times. To this end, an ensemble learning-based skin lesion detection system (ENSEL) was proposed herein. ENSEL enhanced diagnostic accuracy by integrating various deep learning models via an ensemble approach. Its performance was verified by conducting skin lesion detection experiments using images of skin lesions taken by actual users. Its accuracy and response time were measured using randomly sampled skin disease images. Results revealed that ENSEL achieved high recall in most images and less
The task of grading atopic dermatitis (or AD, a form of eczema) from patient images is difficult even for trained dermatologists. Research on automating this task has progressed in recent years with the development of deep learning solutions; however, the rapid evolution of multimodal models and more specifically vision-language models (VLMs) opens the door to new possibilities in terms of explainable assessment of medical images, including dermatology. This report describes experiments carried out to evaluate the ability of seven VLMs to assess the severity of AD on a set of test images.
Despite improved rational drug design and a remarkable progress in genomic, proteomic and high-throughput screening methods, the number of novel, single-target drugs fell much behind expectations during the past decade. Multi-target drugs multiply the number of pharmacologically relevant target molecules by introducing a set of indirect, network-dependent effects. Parallel with this the low-affinity binding of multi-target drugs eases the constraints of druggability, and significantly increases the size of the druggable proteome. These effects tremendously expand the number of potential drug targets, and will introduce novel classes of multi-target drugs with smaller side effects and toxicity. Here we review the recent progress in this field, compare possible network attack strategies, and propose several methods to find target-sets for multi-target drugs.
Efficiently steering generative models toward pharmacologically relevant regions of chemical space remains a major obstacle in molecular drug discovery under low-data regimes. We present VECTOR+: Valid-property-Enhanced Contrastive Learning for Targeted Optimization and Resampling, a framework that couples property-guided representation learning with controllable molecule generation. VECTOR+ applies to both regression and classification tasks and enables interpretable, data-efficient exploration of functional chemical space. We evaluate on two datasets: a curated PD-L1 inhibitor set (296 compounds with experimental $IC_{50}$ values) and a receptor kinase inhibitor set (2,056 molecules by binding mode). Despite limited training data, VECTOR+ generates novel, synthetically tractable candidates. Against PD-L1 (PDB 5J89), 100 of 8,374 generated molecules surpass a docking threshold of $-15.0$ kcal/mol, with the best scoring $-17.6$ kcal/mol compared to the top reference inhibitor ($-15.4$ kcal/mol). The best-performing molecules retain the conserved biphenyl pharmacophore while introducing novel motifs. Molecular dynamics (250 ns) confirm binding stability (ligand RMSD < $2.5$ angst
Accurate identification of the epileptogenic zone (EZ) is essential for seizure freedom after resective surgery in drug-resistant epilepsy, yet seizure freedom rates remain below 50%. We developed EpiiSLM, a dual foundation model system for EZ identification with stereo-electroencephalography (sEEG), by training a signal foundation model on 104,990 minutes of sEEG recordings from the Montreal Neurological Institute & Hospital, while leveraging all recordings regardless of surgical outcome and anchoring EZ biomarker extraction on non-epileptic signals. A language foundation model then integrates sEEG-derived outputs with multimodal clinical information to produce interpretable predictions. Under leave-one-patient-out evaluation, EpiiSLM achieved 0.978 contact-level positive predictive value (PPV), outperforming the seizure onset zone(SOZ)-as-EZ baseline by 15.1% (p < 0.05), and 100% region-level accuracy; on an external dataset, EpiiSLM achieved 0.857 contact-level PPV. EpiiSLM requires only one night of interictal sleep data, suggesting potential to reduce invasive sEEG monitoring duration and improve surgical outcomes.
Developing and discovering new drugs is a complex and resource-intensive endeavor that often involves substantial costs, time investment, and safety concerns. A key aspect of drug discovery involves identifying novel drug-target (DT) interactions. Existing computational methods for predicting DT interactions have primarily focused on binary classification tasks, aiming to determine whether a DT pair interacts or not. However, protein-ligand interactions exhibit a continuum of binding strengths, known as binding affinity, presenting a persistent challenge for accurate prediction. In this study, we investigate various techniques employed in Drug Target Interaction (DTI) prediction and propose novel enhancements to enhance their performance. Our approaches include the integration of Protein Language Models (PLMs) and the incorporation of Contact Map information as an inductive bias within current models. Through extensive experimentation, we demonstrate that our proposed approaches outperform the baseline models considered in this study, presenting a compelling case for further development in this direction. We anticipate that the insights gained from this work will significantly narr
Drug discovery is lengthy and expensive, with traditional computer-aided design facing limits. This paper examines integrating quantum computing across the drug development cycle to accelerate and enhance workflows and rigorous decision-making. It highlights quantum approaches for molecular simulation, drug-target interaction prediction, and optimizing clinical trials. Leveraging quantum capabilities could accelerate timelines and costs for bringing therapies to market, improving efficiency and ultimately benefiting public health.
Glaucoma treatment relies on effective delivery of therapeutics to the anterior chamber; however, conventional approaches such as topical administration and intracameral injection are limited by rapid clearance and low intraocular bioavailability. In this study, a Computational Fluid Dynamics (CFD) framework was developed to comparatively evaluate drug transport, retention, and spatial distribution across three delivery strategies: intracameral injection, drug-eluting implants, and topical delivery via contact lens.
Despite the great popularity of virtual screening of existing compound libraries, the search for new potential drug candidates also takes advantage of generative protocols, where new compound suggestions are enumerated using various algorithms. To increase the activity potency of generative approaches, they have recently been coupled with molecular docking, a leading methodology of structure-based drug design. In this review, we summarize progress since docking-based generative models emerged. We propose a new taxonomy for these methods and discuss their importance for the field of computer-aided drug design. In addition, we discuss the most promising directions for further development of generative protocols coupled with docking.
Designing reward functions that guide generative molecular design (GMD) algorithms to desirable areas of chemical space is of critical importance in AI-driven drug discovery. Traditionally, this has been a manual and error-prone task; the selection of appropriate computational methods to approximate biological assays is challenging and the aggregation of computed values into a single score even more so, leading to potential reliance on trial-and-error approaches. We propose a novel approach for automated reward configuration that relies solely on experimental data, mitigating the challenges of manual reward adjustment on drug discovery projects. Our method achieves this by constructing a ranking over experimental data based on Pareto dominance over the multi-objective space, then training a neural network to approximate the reward function such that rankings determined by the predicted reward correlate with those determined by the Pareto dominance relation. We validate our method using two case studies. In the first study we simulate Design-Make-Test-Analyse (DMTA) cycles by alternating reward function updates and generative runs guided by that function. We show that the learned fu