Transition metal ions play crucial roles in the structure and function of numerous proteins, contributing to essential biological processes such as catalysis, electron transfer, and oxygen binding. However, accurately modeling the electronic structure and properties of metalloproteins poses significant challenges due to the complex nature of their electronic configurations and strong correlation effects. Multiconfigurational quantum chemistry methods are, in principle, the most appropriate tools for addressing these challenges, offering the capability to capture the inherent multi-reference character and strong electron correlation present in bio-inorganic systems. Yet their computational cost has long hindered wider adoption, making methods such as Density Functional Theory (DFT) the method of choice. However, advancements over the past decade have substantially alleviated this limitation, rendering multiconfigurational quantum chemistry methods more accessible and applicable to a wider range of bio-inorganic systems. In this perspective, we discuss some of these developments and how they have already been used to answer some of the most important questions in bio-inorganic chemis
We introduce ChemPro, a progressive benchmark with 4100 natural language question-answer pairs in Chemistry, across 4 coherent sections of difficulty designed to assess the proficiency of Large Language Models (LLMs) in a broad spectrum of general chemistry topics. We include Multiple Choice Questions and Numerical Questions spread across fine-grained information recall, long-horizon reasoning, multi-concept questions, problem-solving with nuanced articulation, and straightforward questions in a balanced ratio, effectively covering Bio-Chemistry, Inorganic-Chemistry, Organic-Chemistry and Physical-Chemistry. ChemPro is carefully designed analogous to a student's academic evaluation for basic to high-school chemistry. A gradual increase in the question difficulty rigorously tests the ability of LLMs to progress from solving basic problems to solving more sophisticated challenges. We evaluate 45+7 state-of-the-art LLMs, spanning both open-source and proprietary variants, and our analysis reveals that while LLMs perform well on basic chemistry questions, their accuracy declines with different types and levels of complexity. These findings highlight the critical limitations of LLMs in
Designing novel inorganic materials through generative models remains an important challenge for material science, driven by the complexity and diversity of inorganic structures across expansive chemical compositions and structural landscape. The vast combinatorial space of inorganic compounds demands innovative, AI-driven approaches to overcome limitations in generative accuracy and efficiency. To address this, we introduce a novel method that redefines the encoding and generation of inorganic materials by utilizing domain-specific symmetry-aware representation. Our approach not only refines the representation of intricate inorganic structures but also contributes to the field of material discovery by enhancing the precision and stability of generated candidates. Central to our methodology is a novel padding technique that exploits crystal symmetry information to enhance the encoding process. By integrating Wyckoff position length-aware padding into an encoder architecture, we achieve a more robust informed representation of inorganic materials. This symmetry-driven enhancement improves deep learning models to generate stable, previously unexplored inorganic structures with superi
Inorganic/inorganic composites are found in multiple applications crucial for the energy transition, from nuclear reactor to energy storage devices. Their microstructures dictate a number of properties, such as mass transport or fracture resistance. There has been a multitude of process developed to control the microstructure of inorganic/inorganic composites, from powder mixing and the use of short or long fibre, to tape casting for laminates up to recently 3D printing. Here, we combined emulsions and slip casting into a simpler, broadly available, inexpensive processing platform that allow for in situ control of a composite's microstructure that also enables complex shaping. Emulsions are used to form droplets of controllable size of one inorganic phase into another, while slip casting enable 3D shaping of the final part. Our study shows that slip casting emulsions trigger a two-steps solvent removal that opens the possibility for conformal coating of porosity. By making magnetically responsive droplets, we form inorganic fibre inside an inorganic matrix during slip casting, demonstrating a simple fabrication for long-fibre reinforced composites. We exemplify the potential of thi
Materials discovery is fundamental to advance next-generation technologies as well as for sustainable and circular economy. Beyond computational screening, generative models are efficient at finding materials with desired properties, via multi-modal learning using multiscale data. This perspective examines the landscape of generative design for inorganic materials and discusses the integration of multi-modal learning with high-throughput experimental validation. We contextualize these challenges through the lens of a generative design framework as a unified approach to address the data-driven inverse design of functional materials. The central idea of the framework is constructed around a foundation AI model for inorganic materials interlinked deeply with various property databases and high-throughput experiments via a machine learning driven closed loop, which enables the framework to solve key challenges in functional materials. We argue that domain-specific implementations of such integrated workflows represent a promising pathway toward the unresolved challenge of data-driven inverse design for atom-engineered inorganic functional materials.
To enhance large language models (LLMs) for chemistry problem solving, several LLM-based agents augmented with tools have been proposed, such as ChemCrow and Coscientist. However, their evaluations are narrow in scope, leaving a large gap in understanding the benefits of tools across diverse chemistry tasks. To bridge this gap, we develop ChemToolAgent, an enhanced chemistry agent over ChemCrow, and conduct a comprehensive evaluation of its performance on both specialized chemistry tasks and general chemistry questions. Surprisingly, ChemToolAgent does not consistently outperform its base LLMs without tools. Our error analysis with a chemistry expert suggests that: For specialized chemistry tasks, such as synthesis prediction, we should augment agents with specialized tools; however, for general chemistry questions like those in exams, agents' ability to reason correctly with chemistry knowledge matters more, and tool augmentation does not always help.
Retrosynthesis strategically plans the synthesis of a chemical target compound from simpler, readily available precursor compounds. This process is critical for synthesizing novel inorganic materials, yet traditional methods in inorganic chemistry continue to rely on trial-and-error experimentation. Emerging machine-learning approaches struggle to generalize to entirely new reactions due to their reliance on known precursors, as they frame retrosynthesis as a multi-label classification task. To address these limitations, we propose Retro-Rank-In, a novel framework that reformulates the retrosynthesis problem by embedding target and precursor materials into a shared latent space and learning a pairwise ranker on a bipartite graph of inorganic compounds. We evaluate Retro-Rank-In's generalizability on challenging retrosynthesis dataset splits designed to mitigate data duplicates and overlaps. For instance, for Cr2AlB2, it correctly predicts the verified precursor pair CrB + Al despite never seeing them in training, a capability absent in prior work. Extensive experiments show that Retro-Rank-In sets a new state-of-the-art, particularly in out-of-distribution generalization and candid
Modelling the inorganic exciton contribution to 2D hybrid organic-inorganic perovskites is essential to understand their properties and screen for new materials. Here, we combine hybrid density functional theory calculations including spin orbit coupling (SOC) with the experimental relationship between the inorganic band gap and exciton binding energy to predict the inorganic exciton energy. For this purpose, we determine a universal exchange parameter for the HSE06 hybrid functional with SOC for lead-based 2D hybrid organic-inorganic perovskites. We further identify a relationship that connects PBE calculations to experiment-quality optical gaps and allows us to generalize the exchange mixing parameter other SOC approximations. Our approach opens the path to screen for 2D hybrid organic-inorganic perovskites with optimized spectra, e.g., for new solar cell or light emitting materials.
Machine learning is revolutionizing chemistry. Beyond the value of predictive models accelerating virtual screening, generative AI aims at enabling inverse design, reversing the compound-to-property prediction paradigm into property-to-compound generation. Chemists now have access to a rich AI toolbox for organic chemistry, including drug discovery. However, the application of these methods to inorganic compounds remains limited by the challenges posed by their intrinsic nature. This Review analyzes how these challenges have been addressed, considering widely diverse systems ranging from molecules to crystals, including transition metal complexes and microporous materials. The analysis focuses on how generative AI methods have evolved towards data-representation-model pipelines that address the full complexity of inorganic compounds, including their chemical composition, geometry, symmetry, and electronic structure. Future directions, like benchmark standardization and the development of synthesizability metrics, are also discussed.
The spatial distribution of the chemical reservoirs in protoplanetary disks is key to elucidate the composition of planets, especially habitable ones. However, the partitioning of the main elements among the refractory and volatile phases is still elusive. Key parameters such as the carbon-to-oxygen C/O elemental ratio and the ionization fraction remain poorly constrained, with the latter potentially orders of magnitude lower than in the interstellar medium. Moreover, the thermal structure of the gas is also poorly known, despite its deep influence on gas-phase chemistry. In this context, ortho-to-para ratios could provide selective and sensitive probes. Recent ALMA observations have measured the spatially resolved column density of ortho-and para-H2CO in the transition disk orbiting TW Hya and derived the radial profile of the ortho-to-para ratio. Yet, current disk models do not include the nuclear-spin-resolved chemistry required to interpret these observations. The present work aims to fill this gap, by combining a parametric disk physical model of TW Hya with the UGAN network, updated to include a comprehensive description of the nuclear-spin-resolved chemistry of formaldehyde.
The NSF Workshop in Quantum Information and Computation for Chemistry assembled experts from directly quantum-oriented fields such as algorithms, chemistry, machine learning, optics, simulation, and metrology, as well as experts in related fields such as condensed matter physics, biochemistry, physical chemistry, inorganic and organic chemistry, and spectroscopy. The goal of the workshop was to summarize recent progress in research at the interface of quantum information science and chemistry as well as to discuss the promising research challenges and opportunities in the field. Furthermore, the workshop hoped to identify target areas where cross fertilization among these fields would result in the largest payoff for developments in theory, algorithms, and experimental techniques. The ideas can be broadly categorized in two distinct areas of research that obviously have interactions and are not separated cleanly. The first area is quantum information for chemistry, or how quantum information tools, both experimental and theoretical can aid in our understanding of a wide range of problems pertaining to chemistry. The second area is chemistry for quantum information, which aims to di
Accelerated materials discovery is critical for addressing global challenges. However, developing new laboratory workflows relies heavily on real-world experimental trials, and this can hinder scalability because of the need for numerous physical make-and-test iterations. Here we present MATTERIX, a multiscale, graphics processing unit-accelerated robotic simulation framework designed to create high-fidelity digital twins of chemistry laboratories, thus accelerating workflow development. This multiscale digital twin simulates robotic physical manipulation, powder and liquid dynamics, device functionalities, heat transfer and basic chemical reaction kinetics. This is enabled by integrating realistic physics simulation and photorealistic rendering with a modular graphics processing unit-accelerated semantics engine, which models logical states and continuous behaviors to simulate chemistry workflows across different levels of abstraction. MATTERIX streamlines the creation of digital twin environments through open-source asset libraries and interfaces, while enabling flexible workflow design via hierarchical plan definition and a modular skill library that incorporates learning-based
Zeolites are inorganic materials known for their diversity of applications, synthesis conditions, and resulting polymorphs. Although their synthesis is controlled both by inorganic and organic synthesis conditions, computational studies of zeolite synthesis have focused mostly on organic template design. In this work, we use a strong distance metric between crystal structures and machine learning (ML) to create inorganic synthesis maps in zeolites. Starting with 253 known zeolites, we show how the continuous distances between frameworks reproduce inorganic synthesis conditions from the literature without using labels such as building units. An unsupervised learning analysis shows that neighboring zeolites according to our metric often share similar inorganic synthesis conditions, even in template-based routes. In combination with ML classifiers, we find synthesis-structure relationships for 14 common inorganic conditions in zeolites, namely Al, B, Be, Ca, Co, F, Ga, Ge, K, Mg, Na, P, Si, and Zn. By explaining the model predictions, we demonstrate how (dis)similarities towards known structures can be used as features for the synthesis space. Finally, we show how these methods can be
Multimodal scientific reasoning remains a significant challenge for large language models (LLMs), particularly in chemistry, where problem-solving relies on symbolic diagrams, molecular structures, and structured visual data. Here, we systematically evaluate 40 proprietary and open-source multimodal LLMs, including GPT-5, o3, Gemini-2.5-Pro, and Qwen2.5-VL, on a curated benchmark of Olympiad-style chemistry questions drawn from over two decades of U.S. National Chemistry Olympiad (USNCO) exams. These questions require integrated visual and textual reasoning across diverse modalities. We find that many models struggle with modality fusion, where in some cases, removing the image even improves accuracy, indicating misalignment in vision-language integration. Chain-of-Thought prompting consistently enhances both accuracy and visual grounding, as demonstrated through ablation studies and occlusion-based interpretability. Our results reveal critical limitations in the scientific reasoning abilities of current MLLMs, providing actionable strategies for developing more robust and interpretable multimodal systems in chemistry. This work provides a timely benchmark for measuring progress in
The Gibbs energy, G, determines the equilibrium conditions of chemical reactions and materials stability. Despite this fundamental and ubiquitous role, G has been tabulated for only a small fraction of known inorganic compounds, impeding a comprehensive perspective on the effects of temperature and composition on materials stability and synthesizability. Here, we use the SISSO (sure independence screening and sparsifying operator) approach to identify a simple and accurate descriptor to predict G for stoichiometric inorganic compounds with ~50 meV/atom (~1 kcal/mol) resolution, and with minimal computational cost, for temperatures ranging from 300-1800 K. We then apply this descriptor to ~30,000 known materials curated from the Inorganic Crystal Structure Database (ICSD). Using the resulting predicted thermochemical data, we generate thousands of temperature-dependent phase diagrams to provide insights into the effects of temperature and composition on materials synthesizability and stability and to establish the temperature-dependent scale of metastability for inorganic compounds.
The Caltech HEP Crystal Lab has been actively investigating novel inorganic scintillators along the following three directions. Fast and radiation hard inorganic scintillators to face the challenge of severe radiation environment expected by future HEP experiments at hadron colliders, such as the high luminosity LHC and FCC hh. Ultrafast inorganic scintillators to face the challenge of unprecedented event rate expected by future HEP experiments searching for rare decays, such as Mu2e II, and ultrafast time of flight system at hadron colliders. Cost effective inorganic scintillators for the homogeneous hadron calorimeter concept to face the challenge of both electromagnetic and jet mass resolutions required by the proposed Higgs factory. We report novel materials along all directions: LuAG:Ce ceramic fibers for the HL LHC, Lu2O3:Yb ceramic scintillators for ultrafast applications, and ABS:Ce and DSB:Ce glass scintillators for the proposed Higgs factory. The result of this investigation may also benefit nuclear physics experiments, GHz hard X ray imaging, medical imaging, and homeland security applications.
Cyanopolyynes, a family of nitrogen containing carbon chains, are common in the interstellar medium and possibly form the backbone of species relevant to prebiotic chemistry. Following their gas phase formation, they are expected to freeze out on ice grains in cold interstellar regions. In this work we present the hydrogenation reaction network of isolated HC_{3}N, the smallest cyanopolyyne, that consists over-a-barrier radical-neutral reactions and barrierless radical-radical reactions. We employ density functional theory, coupled cluster and multiconfigurational methods to obtain activation and reaction energies for the hydrogenation network of HC_{3}N. This work explores the reaction network of the isolated molecule and constitutes a preview on the reactions occurring on the ice grain surface. We find that the reactions where the hydrogen atom adds to the carbon chain at carbon atom opposite of the cyano-group give the lowest and most narrow barriers. Subsequent hydrogenation leads to the astrochemically relevant vinyl cyanide and ethyl cyanide. Alternatively, the cyano-group can hydrogenate via radical-radical reactions, leading to the fully saturated propylamine. These results
Two-dimensional (2D) transition-metal carbides and nitrides (MXenes) show impressive performance in applications, such as supercapacitors, batteries, electromagnetic interference shielding, or electrocatalysis. These materials combine the electronic and mechanical properties of 2D inorganic crystals with chemically modifiable surfaces, and surface-engineered MXenes represent an ideal platform for fundamental and applied studies of interfaces in 2D functional materials. A natural step in structural engineering of MXene compounds is the development and understanding of MXenes with various organic functional groups covalently bound to inorganic 2D sheets. Such hybrid structures have the potential to unite the tailorability of organic molecules with the unique electronic properties of inorganic 2D solids. Here, we introduce a new family of hybrid MXenes (h-MXenes) with amido- and imido-bonding between organic and inorganic parts. The description of h-MXene structure requires an intricate mix of concepts from the fields of coordination chemistry, self-assembled monolayers (SAMs) and surface science. The optical properties of h-MXenes reveal coherent coupling between the organic and inor
Three-body recombination, or ternary association, is a termolecular reaction in which three particles collide, forming a bound state between two, whereas the third escapes freely. Three-body recombination reactions play a significant role in many systems relevant to physics and chemistry. In particular, they are relevant in cold and ultracold chemistry, quantum gases, astrochemistry, atmospheric physics, physical chemistry, and plasma physics. As a result, three-body recombination has been the subject of extensive work during the last 50 years, although primarily from an experimental perspective. Indeed, a general theory for three-body recombination remains elusive despite the available experimental information. Our group recently developed a direct approach based on classical trajectory calculations in hyperspherical coordinates for three-body recombination to amend this situation, leading to a first principle explanation of ion-atom-atom and atom-atom-atom three-body recombination processes. This review aims to summarize our findings on three-body recombination reactions and identify the remaining challenges in the field.
Efficient chemical kinetic model inference and application in combustion are challenging due to large ODE systems and widely separated time scales. Machine learning techniques have been proposed to streamline these models, though strong nonlinearity and numerical stiffness combined with noisy data sources make their application challenging. Here, we introduce ChemKANs, a novel neural network framework with applications both in model inference and simulation acceleration for combustion chemistry. ChemKAN's novel structure augments the generic Kolmogorov Arnold Network Ordinary Differential Equations (KAN-ODEs) with knowledge of the information flow through the relevant kinetic and thermodynamic laws. This chemistry-specific structure combined with the expressivity and rapid neural scaling of the underlying KAN-ODE algorithm instills in ChemKANs a strong inductive bias, streamlined training, and higher accuracy predictions compared to standard benchmarks, while facilitating parameter sparsity through shared information across all inputs and outputs. In a model inference investigation, we benchmark the robustness of ChemKANs to sparse data containing up to 15% added noise, and superfl