Chemical synthesis, as a foundational methodology in the creation of transformative molecules, exerts substantial influence across diverse sectors from life sciences to materials and energy. Current chemical synthesis practices emphasize laborious and costly trial-and-error workflows, underscoring the urgent need for advanced AI assistants. Nowadays, large language models (LLMs), typified by GPT-4, have been introduced as an efficient tool to facilitate scientific research. Here, we present Chemma, a fully fine-tuned LLM with 1.28 million pairs of Q&A about reactions, as an assistant to accelerate organic chemistry synthesis. Chemma surpasses the best-known results in multiple chemical tasks, e.g., single-step retrosynthesis and yield prediction, which highlights the potential of general AI for organic chemistry. Via predicting yields across the experimental reaction space, Chemma significantly improves the reaction exploration capability of Bayesian optimization. More importantly, integrated in an active learning framework, Chemma exhibits advanced potential for autonomous experimental exploration and optimization in open reaction spaces. For an unreported Suzuki-Miyaura cross
The emergence of life on the Earth has required a prior organic chemistry leading to the formation of prebiotic molecules. The origin and the evolution of the organic matter on the early Earth is not yet firmly understood. Several hypothesis, possibly complementary, are considered. They can be divided in two categories: endogenous and exogenous sources. In this work we investigate the contribution of a specific endogenous source: the organic chemistry occurring in the ionosphere of the early Earth where the significant VUV contribution of the young Sun involved an efficient formation of reactive species. We address the issue whether this chemistry can lead to the formation of complex organic compounds with CO2 as only source of carbon in an early atmosphere made of N2, CO2 and H2, by mimicking experimentally this type of chemistry using a low pressure plasma reactor. By analyzing the gaseous phase composition, we strictly identified the formation of H2O, NH3, N2O and C2N2. The formation of a solid organic phase is also observed, confirming the possibility to trigger organic chemistry in the upper atmosphere of the early Earth. The identification of Nitrogen-bearing chemical functio
Organic chemistry is undergoing a major paradigm shift, moving from a labor-intensive approach to a new era dominated by automation and artificial intelligence (AI). This transformative shift is being driven by technological advances, the ever-increasing demand for greater research efficiency and accuracy, and the burgeoning growth of interdisciplinary research. AI models, supported by computational power and algorithms, are drastically reshaping synthetic planning and introducing groundbreaking ways to tackle complex molecular synthesis. In addition, autonomous robotic systems are rapidly accelerating the pace of discovery by performing tedious tasks with unprecedented speed and precision. This article examines the multiple opportunities and challenges presented by this paradigm shift and explores its far-reaching implications. It provides valuable insights into the future trajectory of organic chemistry research, which is increasingly defined by the synergistic interaction of automation and AI.
Reaction prediction remains one of the major challenges for organic chemistry, and is a pre-requisite for efficient synthetic planning. It is desirable to develop algorithms that, like humans, "learn" from being exposed to examples of the application of the rules of organic chemistry. We explore the use of neural networks for predicting reaction types, using a new reaction fingerprinting method. We combine this predictor with SMARTS transformations to build a system which, given a set of reagents and re- actants, predicts the likely products. We test this method on problems from a popular organic chemistry textbook.
Many interstellar complex organic molecules (COMs) are believed to be produced on the surfaces of icy grains at low temperatures. Atomic carbon is considered responsible for the skeletal evolution processes, such as C-C bond formation, via insertion or addition reactions. Before reactions, C atoms must diffuse on the surface to encounter reaction partners; therefore, information on their diffusion process is critically important for evaluating the role of C atoms in the formation of COMs. In situ detection of C atoms on ice was achieved by a combination of photostimulated desorption and resonance enhanced multiphoton ionization methods. We found that C atoms weakly bound to the ice surface diffused approximately above 30 K and produced C2 molecules. The activation energy for C-atom surface diffusion was experimentally determined to be 88 meV (1,020 K), indicating that the diffusive reaction of C atoms is activated at approximately 22 K on interstellar ice. The facile diffusion of C at T > 22 K atoms on interstellar ice opens a previously overlooked chemical regime where the increase in complexity of COMs as driven by C atoms. Carbon addition chemistry can be an alternative sourc
Planetary systems such as our own are formed after a long process where matter condenses from diffuse clouds to stars, planets, asteroids, comets and residual dust, undergoing dramatic changes in physical and chemical state in less than a few million years. Several studies have shown that the chemical composition during the early formation of a Solar-type planetary system is a powerful diagnostic to track the history of the system itself. Among the approximately 270 molecules so far detected in the ISM, the so-called interstellar complex organic molecules (iCOMs) are of particular interest both because of their evolutionary diagnostic power and because they might be potential precursors of biomolecules, which are at the basis of terrestrial life. This Chapter focuses on the evolution of organic molecules during the early stages of a Solar-type planetary system, represented by the prestellar, Class 0/I and protoplanetary disk phases, and compares them with what is observed presently in Solar System comets. Our twofold goal is to review the processes at the base of organic chemistry during Solar-type star formation and, in addition, to possibly provide constraints on the early histor
We present an oriented numerical summarizer algorithm, applied to producing automatic summaries of scientific documents in Organic Chemistry. We present its implementation named Yachs (Yet Another Chemistry Summarizer) that combines a specific document pre-processing with a sentence scoring method relying on the statistical properties of documents. We show that Yachs achieves the best results among several other summarizers on a corpus of Organic Chemistry articles.
Astrochemistry lies at the nexus of astronomy, chemistry, and molecular physics. On the basis of precise laboratory data, a rich collection of more than 200 familiar and exotic molecules have been identified in the interstellar medium, the vast majority by their unique rotational fingerprint. Despite this large body of work, there is scant evidence in the radio band for the basic building blocks of chemistry on earth -- five and six-membered rings -- despite long standing and sustained efforts during the past 50 years. In contrast, a peculiar structural motif, highly unsaturated carbon in a chain-like arrangement, is instead quite common in space. The recent astronomical detection of cyanobenzene, the simplest aromatic nitrile, in the dark molecular cloud TMC-1, and soon afterwards in additional pre-stellar, and possibly protostellar sources, establishes that aromatic chemistry is likely widespread in the earliest stages of star formation. The subsequent discovery of cyanocyclopentadienes and even cyanonapthlenes in TMC-1 provides further evidence that organic molecules of considerable complexity are readily synthesized in regions with high visual extinction but where the low tempe
Finding the main product of a chemical reaction is one of the important problems of organic chemistry. This paper describes a method of applying a neural machine translation model to the prediction of organic chemical reactions. In order to translate 'reactants and reagents' to 'products', a gated recurrent unit based sequence-to-sequence model and a parser to generate input tokens for model from reaction SMILES strings were built. Training sets are composed of reactions from the patent databases, and reactions manually generated applying the elementary reactions in an organic chemistry textbook of Wade. The trained models were tested by examples and problems in the textbook. The prediction process does not need manual encoding of rules (e.g., SMARTS transformations) to predict products, hence it only needs sufficient training reaction sets to learn new types of reactions.
The second part of this paper is devoted to the following important question in organic chemistry: given two isomers of a molecule, how to identify them with their structural formulae using only type properties of that molecule? A classical answer of this question is given for benzene by the identification of its di-substituted (para, ortho, and meta), and tri-substituted (asymmetric, vicinal, and symmetric) derivatives via the Korner substitution reactions among them. Here we develop a machinery within the framework of the Lunn-Senior's mathematical model of isomerism in organic chemistry, which, in principle, answers this question. In particular, it is shown that the members of a chiral pair cannot be distinguished via substitution reactions. The examples of ethene, benzene, and cyclopropane are discussed.
We introduce ChemPro, a progressive benchmark with 4100 natural language question-answer pairs in Chemistry, across 4 coherent sections of difficulty designed to assess the proficiency of Large Language Models (LLMs) in a broad spectrum of general chemistry topics. We include Multiple Choice Questions and Numerical Questions spread across fine-grained information recall, long-horizon reasoning, multi-concept questions, problem-solving with nuanced articulation, and straightforward questions in a balanced ratio, effectively covering Bio-Chemistry, Inorganic-Chemistry, Organic-Chemistry and Physical-Chemistry. ChemPro is carefully designed analogous to a student's academic evaluation for basic to high-school chemistry. A gradual increase in the question difficulty rigorously tests the ability of LLMs to progress from solving basic problems to solving more sophisticated challenges. We evaluate 45+7 state-of-the-art LLMs, spanning both open-source and proprietary variants, and our analysis reveals that while LLMs perform well on basic chemistry questions, their accuracy declines with different types and levels of complexity. These findings highlight the critical limitations of LLMs in
The analysis of biorelevant molecules in returned mission samples such as from the carbonaceous asteroid (162173) Ryugu is key to unravelling the role of extraterrestrial organics in the evolution of life. Coordinated analyses using chemically non-destructive techniques at the finest length-scales on pristine samples are particularly important. Here, we identify the chemical signature of uncommon globular and nitrogen-containing diffuse composite organic matter in asteroid Ryugu and map the distribution of biorelevant molecules therein with unprecedented detail. Using a novel electron-microscopy-based combination of vibrational and core-level spectroscopy, we disentangle the chemistry and nanoscale petrography of these organics. We show that some of these organics contain soluble and highly aliphatic components as well as NHx functional groups, that have formed in outer solar nebula environments before parent body incorporation or were synthesized by subtle fluid reactions on the final Ryugu asteroid. These novel coordinated analyses will open up new avenues of research on these types of precious and rare asteroidal dust samples.
Context. Dust grains in circumstellar envelopes are likely to have a spread-out temperature distribution. Aims. To investigate how trends in temperature distribution between small and large grains affect the hot corino chemistry of complex organic molecules (COMs) and warm carbon-chain chemistry (WCCC). Methods. A multi-grain multi-layer astrochemical code with an up-to-date treatment of surface chemistry was used with three grain temperature trends: grain temperature proportional to grain radius to the power -1/6 (Model M-1/6), to 0 (M0), and to 1/6 (M1/6). The cases of hot corino and WCCC chemistry were investigated, for a total of six models. The essence of these changes is for the main ice reservoir - small grains - having higher (M-1/6) or lower (M1/6) temperature than the surrounding gas. Results. The chemistry of COMs shows better agreement with observations in models M-1/6 and M1/6 than in Model M0. Model M-1/6 shows best agreement for WCCC because earlier mass-evaporation of methane ice from small grains induces the WCCC phenomenon at lower temperatures. Conclusions. Models considering several grain populations with different temperatures can more precisely reproduce circu
Impacts are critical to producing the aqueous environments necessary to stimulate prebiotic chemistry on Titan's surface. Furthermore, organic hazes resting on the surface are a likely feedstock of biomolecules. In this work, we conduct impact experiments on laboratory-produced organic haze particles and haze/sand mixtures and analyze these samples for life's building blocks. Samples of unshocked haze and sand particles are also analyzed to determine the change in biomolecule concentrations and distributions from shocking. Across all samples, we detect seven nucleobases, nine proteinogenic amino acids, and five other biomolecules (e.g., urea) using a blank subtraction procedure to eliminate signals due to contamination. We find that shock pressures of 13 GPa variably degrade nucleobases, amino acids, and a few other organics in haze particles and haze/sand mixtures; however, certain individual biomolecules become enriched or are even produced from these events. Xanthine, threonine, and aspartic acid are enriched or produced in impact experiments containing sand, suggesting these minerals may catalyze the production of these biomolecules. On the other hand, thymine and isoleucine/no
To enhance large language models (LLMs) for chemistry problem solving, several LLM-based agents augmented with tools have been proposed, such as ChemCrow and Coscientist. However, their evaluations are narrow in scope, leaving a large gap in understanding the benefits of tools across diverse chemistry tasks. To bridge this gap, we develop ChemToolAgent, an enhanced chemistry agent over ChemCrow, and conduct a comprehensive evaluation of its performance on both specialized chemistry tasks and general chemistry questions. Surprisingly, ChemToolAgent does not consistently outperform its base LLMs without tools. Our error analysis with a chemistry expert suggests that: For specialized chemistry tasks, such as synthesis prediction, we should augment agents with specialized tools; however, for general chemistry questions like those in exams, agents' ability to reason correctly with chemistry knowledge matters more, and tool augmentation does not always help.
The spatial distribution of the chemical reservoirs in protoplanetary disks is key to elucidate the composition of planets, especially habitable ones. However, the partitioning of the main elements among the refractory and volatile phases is still elusive. Key parameters such as the carbon-to-oxygen C/O elemental ratio and the ionization fraction remain poorly constrained, with the latter potentially orders of magnitude lower than in the interstellar medium. Moreover, the thermal structure of the gas is also poorly known, despite its deep influence on gas-phase chemistry. In this context, ortho-to-para ratios could provide selective and sensitive probes. Recent ALMA observations have measured the spatially resolved column density of ortho-and para-H2CO in the transition disk orbiting TW Hya and derived the radial profile of the ortho-to-para ratio. Yet, current disk models do not include the nuclear-spin-resolved chemistry required to interpret these observations. The present work aims to fill this gap, by combining a parametric disk physical model of TW Hya with the UGAN network, updated to include a comprehensive description of the nuclear-spin-resolved chemistry of formaldehyde.
Multimodal scientific reasoning remains a significant challenge for large language models (LLMs), particularly in chemistry, where problem-solving relies on symbolic diagrams, molecular structures, and structured visual data. Here, we systematically evaluate 40 proprietary and open-source multimodal LLMs, including GPT-5, o3, Gemini-2.5-Pro, and Qwen2.5-VL, on a curated benchmark of Olympiad-style chemistry questions drawn from over two decades of U.S. National Chemistry Olympiad (USNCO) exams. These questions require integrated visual and textual reasoning across diverse modalities. We find that many models struggle with modality fusion, where in some cases, removing the image even improves accuracy, indicating misalignment in vision-language integration. Chain-of-Thought prompting consistently enhances both accuracy and visual grounding, as demonstrated through ablation studies and occlusion-based interpretability. Our results reveal critical limitations in the scientific reasoning abilities of current MLLMs, providing actionable strategies for developing more robust and interpretable multimodal systems in chemistry. This work provides a timely benchmark for measuring progress in
A huge interesting progress in the field of organic electronic materials and devices has been observed in the last decade. However, the understanding of these materials is still a challenge to overcome. Most studies in literature focus on active devices such as OTFTs, OLEDs and OPVs. Nevertheless, a complete technology has to have also passive devices in order to allow the design of interesting applications and complex circuits. This paper deals with the development of a complete set of passive devices allowing the fabrication of simple applications such as filters or sensors. The process flow is a fully screen printed technology that uses exclusively organic materials on gold laser ablated flexible substrate. Discrete passive (R, L, C) devices have been processed and characterized. This has permitted the fabrication of RLC low-band pass filters that are dedicated to RF applications, typically around 1GHz. Furthermore, based on these discrete passive components, we have developed a sensitive sensor on flexible substrate for RFID applications. We present the state of the art of our process development for RF applications using organic materials.
Several organic cations have been used to passivate perovskite films; however, selecting the optimal cation remains challenging. In this work, we carried out density functional theory calculations to understand the effects induced by 17 different organic cations on the passivation (P-cations) of thin two-dimensional P2MAn-1PbnI3n+1 perovskites films, where n = 1 and 2. We found that the interactions between different types of P-cations and the inorganic slab affect the length and angles of the bonds within the inorganic framework (PbI6-octahedra). In general, the binding mechanism includes the interactions of organic cations with the inorganic framework, which leads to the accumulation of electron density within the halides, indicative of Bronsted--Lowry acid-base interactions. Oxygenated groups facilitate additional H-bond formation through -OH and -COOH groups, promoting the localization of the electron density between layers and improving the energetic stability of the system. Based on the results and analysis, we found that three P-cations might have higher potential for real-life applications, namely 4-fluorophenylethylammonium (FPEA), phenylethylammonium (PEA) and butylammoni
Accelerated materials discovery is critical for addressing global challenges. However, developing new laboratory workflows relies heavily on real-world experimental trials, and this can hinder scalability because of the need for numerous physical make-and-test iterations. Here we present MATTERIX, a multiscale, graphics processing unit-accelerated robotic simulation framework designed to create high-fidelity digital twins of chemistry laboratories, thus accelerating workflow development. This multiscale digital twin simulates robotic physical manipulation, powder and liquid dynamics, device functionalities, heat transfer and basic chemical reaction kinetics. This is enabled by integrating realistic physics simulation and photorealistic rendering with a modular graphics processing unit-accelerated semantics engine, which models logical states and continuous behaviors to simulate chemistry workflows across different levels of abstraction. MATTERIX streamlines the creation of digital twin environments through open-source asset libraries and interfaces, while enabling flexible workflow design via hierarchical plan definition and a modular skill library that incorporates learning-based