Physics-informed machine learning (PIML) provides a promising solution for building energy modeling and can serve as a virtual environment to enable reinforcement learning (RL) agents to interact and learn. However, challenges remain in efficiently integrating physics priors, evaluating the effectiveness of physics constraints, balancing model accuracy and physics consistency, and enabling real-world implementation. To address these gaps, this study introduces a Physics-Informed Modularized Neural Network (PI-ModNN), which incorporates physics priors through a physics-informed model structure, loss functions, and hard constraints. A new evaluation metric called "temperature response violation" is developed to quantify the physical consistency of data-driven building dynamic models under varying control inputs and training data sizes. Additionally, a physics prior evaluation framework based on rule importance is proposed to assess the contribution of each individual physics prior, offering guidance on selecting appropriate PIML techniques. Results indicate that incorporating physical priors does not always improve model performance; inappropriate priors may decrease model accuracy a
Despite the growing attention to time series forecasting in recent years, many studies have proposed various solutions to address the challenges encountered in time series prediction, aiming to improve forecasting performance. However, effectively applying these time series forecasting models to the field of financial asset pricing remains a challenging issue. There is still a need for a bridge to connect cutting-edge time series forecasting models with financial asset pricing. To bridge this gap, we have undertaken the following efforts: 1) We constructed three datasets from the financial domain; 2) We selected over ten time series forecasting models from recent studies and validated their performance in financial time series; 3) We developed new metrics, msIC and msIR, in addition to MSE and MAE, to showcase the time series correlation captured by the models; 4) We designed financial-specific tasks for these three datasets and assessed the practical performance and application potential of these forecasting models in important financial problems. We hope the developed new evaluation suite, FinTSBridge, can provide valuable insights into the effectiveness and robustness of advance
Machine learning has driven an exponential increase in computational demand, leading to massive data centers that consume significant amounts of energy and contribute to climate change. This makes sustainable data center control a priority. In this paper, we introduce SustainDC, a set of Python environments for benchmarking multi-agent reinforcement learning (MARL) algorithms for data centers (DC). SustainDC supports custom DC configurations and tasks such as workload scheduling, cooling optimization, and auxiliary battery management, with multiple agents managing these operations while accounting for the effects of each other. We evaluate various MARL algorithms on SustainDC, showing their performance across diverse DC designs, locations, weather conditions, grid carbon intensity, and workload requirements. Our results highlight significant opportunities for improvement of data center operations using MARL algorithms. Given the increasing use of DC due to AI, SustainDC provides a crucial platform for the development and benchmarking of advanced algorithms essential for achieving sustainable computing and addressing other heterogeneous real-world challenges.
Atomic precision advanced manufacturing (APAM) dopes silicon with enough carriers to change its electronic structure and can be used to create novel devices by defining metallic regions whose boundaries have single-atom abruptness. Incompatibility with the thermal and lithography process requirements for gated silicon transistor manufacturing have inhibited exploration of both how APAM can enhance CMOS performance and how transistor manufacturing steps can accelerate the discovery of new APAM device concepts. In this work, we introduce an APAM process that enables direct integration into the middle of a transistor manufacturing workflow. We show that a process that combines sputtering and annealing with a hardmask preserves a defining characteristic of APAM, a doping density far in excess of the solid solubility limit, while trading another, the atomic precision, for compatibility with manufacturing. The electrical characteristics of a chip combining a transistor with an APAM resistor show that the APAM module has only affected the transistor through the addition of a resistance and not by altering the transistor. This proof-of-concept demonstration also outlines the requirements a
Our paper introduces a novel approach to social network information retrieval and user engagement through a personalized chatbot system empowered by Federated Learning GPT. The system is designed to seamlessly aggregate and curate diverse social media data sources, including user posts, multimedia content, and trending news. Leveraging Federated Learning techniques, the GPT model is trained on decentralized data sources to ensure privacy and security while providing personalized insights and recommendations. Users interact with the chatbot through an intuitive interface, accessing tailored information and real-time updates on social media trends and user-generated content. The system's innovative architecture enables efficient processing of input files, parsing and enriching text data with metadata, and generating relevant questions and answers using advanced language models. By facilitating interactive access to a wealth of social network information, this personalized chatbot system represents a significant advancement in social media communication and knowledge dissemination.
We report the detection of Nipah virus in an infectious clone format, a BSL4-level pathogen and CDC-designated Bioterrorism Agent, in raw RNA-Seq sequencing reads deposited by the Wuhan Institute of Virology (WIV) produced from five December 2019 patients infected with SARS-CoV-2. Research involving Nipah infectious clones has never been reported to have occured at the WIV. These patient samples have been previously reported to contain reads from several other viruses: Influenza A, Spodoptera frugiperda rhabdovirus and Nipah. Previous authors have interpreted the presence of these virus sequences as indicative of co-infections of the patients in question by these pathogens or laboratory contamination. However, our analysis shows that NiV genes are encapsulated in synthetic vectors, which we infer was for assembly of a NiV infectious clone. In particular, we document the finding of internal N, P-V-W-C and L protein coding sequences as well as coverage of the G and F genes. Furthermore, the format of Hepatitis D virus ribozyme and T7 terminator downstream of the 5-prime end of the NiV sequence is consistent with truncation required at the end of the genome for a full length infectiou
This chapter explores advancements in decoding strategies for large language models (LLMs), focusing on enhancing the Locally Typical Sampling (LTS) algorithm. Traditional decoding methods, such as top-k and nucleus sampling, often struggle to balance fluency, diversity, and coherence in text generation. To address these challenges, Adaptive Semantic-Aware Typicality Sampling (ASTS) is proposed as an improved version of LTS, incorporating dynamic entropy thresholding, multi-objective scoring, and reward-penalty adjustments. ASTS ensures contextually coherent and diverse text generation while maintaining computational efficiency. Its performance is evaluated across multiple benchmarks, including story generation and abstractive summarization, using metrics such as perplexity, MAUVE, and diversity scores. Experimental results demonstrate that ASTS outperforms existing sampling techniques by reducing repetition, enhancing semantic alignment, and improving fluency.
The Third-Generation in DNA sequencing has emerged in the last few years using new technologies that allow the production of long-read sequences. Applications of the Third-Generation sequencing enable real-time and on-site data production, changing the research paradigms in environmental and medical sampling in virology. To take full advantage of large-scale data generated from long-read sequencing, an innovation in the downstream data analysis is necessary. Here, we discuss futuristic methods using machine learning approaches to analyze big genetic data. Machine learning combines pattern recognition and computational learning to perform predictive and exploratory data analysis. In particular, deep learning is a field of machine learning that is used to solve complex problems through artificial neural networks. Unlike other methods, features can be learned using neural networks entirely from data without manual specifications. We discuss the future of 21st-century virology by presenting futuristic approaches for virus studies using real-time data production and on-site data analysis with the Third-Generation Sequencing and machine learning methods. We first introduce the basic conc
We present the Virology Capabilities Test (VCT), a large language model (LLM) benchmark that measures the capability to troubleshoot complex virology laboratory protocols. Constructed from the inputs of dozens of PhD-level expert virologists, VCT consists of $322$ multimodal questions covering fundamental, tacit, and visual knowledge that is essential for practical work in virology laboratories. VCT is difficult: expert virologists with access to the internet score an average of $22.1\%$ on questions specifically in their sub-areas of expertise. However, the most performant LLM, OpenAI's o3, reaches $43.8\%$ accuracy, outperforming $94\%$ of expert virologists even within their sub-areas of specialization. The ability to provide expert-level virology troubleshooting is inherently dual-use: it is useful for beneficial research, but it can also be misused. Therefore, the fact that publicly available models outperform virologists on VCT raises pressing governance considerations. We propose that the capability of LLMs to provide expert-level troubleshooting of dual-use virology work should be integrated into existing frameworks for handling dual-use technologies in the life sciences.
Deep learning has recently gained high interest in ophthalmology, due to its ability to detect clinically significant features for diagnosis and prognosis. Despite these significant advances, little is known about the ability of various deep learning systems to be embedded within ophthalmic imaging devices, allowing automated image acquisition. In this work, we will review the existing and future directions for "active acquisition" embedded deep learning, leading to as high quality images with little intervention by the human operator. In clinical practice, the improved image quality should translate into more robust deep learning-based clinical diagnostics. Embedded deep learning will be enabled by the constantly improving hardware performance with low cost. We will briefly review possible computation methods in larger clinical systems. Briefly, they can be included in a three-layer framework composed of edge, fog and cloud layers, the former being performed at a device-level. Improved edge layer performance via "active acquisition" serves as an automatic data curation operator translating to better quality data in electronic health records (EHRs), as well as on the cloud layer, f
Before the current pandemic, influenza and respiratory syncytial virus (RSV) were the leading etiological agents of seasonal acute respiratory infections (ARI) around the world. In this setting, medical doctors typically based the diagnosis of ARI on patients' symptoms alone and did not routinely conduct virological tests necessary to identify individual viruses, limiting the ability to study the interaction between multiple pathogens and to make public health recommendations. We consider a stochastic kinetic model (SKM) for two interacting ARI pathogens circulating in a large population and an empirically-motivated background process for infections with other pathogens causing similar symptoms. An extended marginal sampling approach, based on the linear noise approximation to the SKM, integrates multiple data sources and additional model components. We infer the parameters defining the pathogens' dynamics and interaction within a Bayesian model and explore the posterior trajectories of infections for each illness based on aggregate infection reports from six epidemic seasons collected by the state health department and a subset of virological tests from a sentinel program at a gen
We present a framework for generating universal semantic embeddings of chemical elements to advance materials inference and discovery. This framework leverages ElementBERT, a domain-specific BERT-based natural language processing model trained on 1.29 million abstracts of alloy-related scientific papers, to capture latent knowledge and contextual relationships specific to alloys. These semantic embeddings serve as robust elemental descriptors, consistently outperforming traditional empirical descriptors with significant improvements across multiple downstream tasks. These include predicting mechanical and transformation properties, classifying phase structures, and optimizing materials properties via Bayesian optimization. Applications to titanium alloys, high-entropy alloys, and shape memory alloys demonstrate up to 23% gains in prediction accuracy. Our results show that ElementBERT surpasses general-purpose BERT variants by encoding specialized alloy knowledge. By bridging contextual insights from scientific literature with quantitative inference, our framework accelerates the discovery and optimization of advanced materials, with potential applications extending beyond alloys to
The networking ability of journals reflects their academic influence among peer journals. This paper analyzes the cited and citing environments of the journal--Advances in Atmospheric Sciences--using methods from social network analysis. The journal has been actively participating in the international journal environment, but one has a tendency to cite papers published in international journals. Advances in Atmospheric Sciences is intensely interrelated with international peer journals in terms of similar citing pattern. However, there is still room for an increase in its academic visibility given the comparatively smaller reception in terms of cited references.
The rapid neutron-capture process (r-process) is responsible for the creation of roughly half of the elements heavier than iron, including precious metals like silver, gold, and platinum, as well as radioactive elements such as thorium and uranium. Despite its importance, the nature of the astrophysical sites where the r-process occurs, and the detailed mechanisms of its formation, remain elusive. The key to resolving these mysteries lies in the study of chemical signatures preserved in ancient, metal-poor stars. In this review, we explore r-process nucleosynthesis, focusing on the sites, progenitors, and formation mechanisms. We discuss the role of potential astrophysical sites such as neutron star mergers, core-collapse supernovae, magneto-rotational supernovae, and collapsars, that can play a key role in producing the heavy elements. We also highlight the importance of studying these signatures through high-resolution spectroscopic surveys, stellar archaeology, and multi-messenger astronomy. Recent advancements, such as the gravitational wave event GW170817 and detection of the r-process in the ejecta of its associated kilonovae, have established neutron star mergers as one of t
This paper presents an in-depth analysis of the performance of seven different Large Language Models (LLMs) in solving a diverse set of math advanced calculus problems. The study aims to evaluate these models' accuracy, reliability, and problem-solving capabilities, including ChatGPT 4o, Gemini Advanced with 1.5 Pro, Copilot Pro, Claude 3.5 Sonnet, Meta AI, Mistral AI, and Perplexity. The assessment was conducted through a series of thirty-two test problems, encompassing a total of 320 points. The problems covered various topics, from vector calculations and geometric interpretations to integral evaluations and optimization tasks. The results highlight significant trends and patterns in the models' performance, revealing both their strengths and weaknesses - for instance, models like ChatGPT 4o and Mistral AI demonstrated consistent accuracy across various problem types, indicating their robustness and reliability in mathematical problem-solving, while models such as Gemini Advanced with 1.5 Pro and Meta AI exhibited specific weaknesses, particularly in complex problems involving integrals and optimization, suggesting areas for targeted improvements. The study also underscores the
The review is devoted to the theory of collective and it local pinning effects in various disordered non-linear driven systems. Although the emphasis is put on charge and spin density waves and magnetic domain walls, the theory has also applications to flux lines and lattices thereof, dislocation lines, adsorbed mono-layers and related systems. In the first part we focus on the theory of the collective pinning which includes the equilibrium properties of elastic systems with frozen-in disorder as well as the features close to the dynamic depinning transition enforced by an external driving force and at finite temperatures. Thermal fluctuations smear out this transition and allow for a creep motion of the elastic objects even at small forces. An ac-driving force also destroys the sharp transition which is replaced by a velocity hysteresis. The second part is devoted to the local pinning picture and its applications. Inclusion of plastic deformations results in a rich cross-over behavior of the force-velocity relation as well as of the frequency dependence of the dynamic response. The local pinning recovers and exploits new elements of the energy landscape such as termination points
Chromatin is a complex of DNA, RNA and proteins whose primary function is to package genomic DNA into the tight confines of a cell nucleus. A fundamental repeating unit of chromatin is the nucleosome, an octamer of histone proteins around which 147 base pairs of DNA are wound in almost two turns of a left-handed superhelix. Chromatin is a dynamic structure which exerts profound influence on regulation of gene expression and other cellular functions. These chromatin-directed processes are facilitated by optimizing nucleosome positions throughout the genome and by remodeling nucleosomes in response to various external and internal signals such as environmental perturbations. Here we discuss large-scale maps of nucleosome positions made available through recent advances in parallel high-throughput sequencing and microarray technologies. We show that these maps reveal common features of nucleosome organization in eukaryotic genomes. We also survey computational models designed to predict nucleosome formation scores or energies, and demonstrate how these predictions can be used to position multiple nucleosome on the genome without steric overlap.
In recent years, considerable research has been dedicated to the application of neural models in the field of natural language generation (NLG). The primary objective is to generate text that is both linguistically natural and human-like, while also exerting control over the generation process. This paper offers a comprehensive and task-agnostic survey of the recent advancements in neural text generation. These advancements have been facilitated through a multitude of developments, which we categorize into four key areas: data construction, neural frameworks, training and inference strategies, and evaluation metrics. By examining these different aspects, we aim to provide a holistic overview of the progress made in the field. Furthermore, we explore the future directions for the advancement of neural text generation, which encompass the utilization of neural pipelines and the incorporation of background knowledge. These avenues present promising opportunities to further enhance the capabilities of NLG systems. Overall, this survey serves to consolidate the current state of the art in neural text generation and highlights potential avenues for future research and development in this
This study investigated the use of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis for rapid soil fertility assessment, with a focus on key indicators such as available boron (B), organic carbon (OC), available manganese (Mn), available sulfur (S), and the sulfur availability index (SAI). A total of 1,133 soil samples from diverse agro-climatic zones in Eastern India were analyzed. The research integrated color and texture features from microscopic soil images, PXRF data, and auxiliary soil variables (AVs) using a Random Forest model. Results showed that combining image features (IFs) with AVs significantly improved prediction accuracy for available B (R2 = 0.80) and OC (R2 = 0.88). A data fusion approach, incorporating IFs, AVs, and PXRF data, further enhanced predictions for available Mn and SAI, with R2 values of 0.72 and 0.70, respectively. The study highlights the potential of integrating these technologies to offer rapid, cost-effective soil testing methods, paving the way for more advanced predictive models and a deeper understanding of soil fertility. Future work should explore the application of deep learning models on a larger dataset, incorporatin
Despite recent advances in text-conditioned 3D indoor scene generation, there remain gaps in the evaluation of these methods. Existing metrics often measure realism by comparing generated scenes to a set of ground-truth scenes, but they overlook how well scenes follow the input text and capture implicit expectations of plausibility. We present SceneEval, an evaluation framework designed to address these limitations. SceneEval introduces fine-grained metrics for explicit user requirements-including object counts, attributes, and spatial relationships-and complementary metrics for implicit expectations such as support, collisions, and navigability. Together, these provide interpretable and comprehensive assessments of scene quality. To ground evaluation, we curate SceneEval-500, a benchmark of 500 text descriptions with detailed annotations of expected scene properties. This dataset establishes a common reference for reproducible and systematic comparison across scene generation methods. We evaluate six recent scene generation approaches using SceneEval and demonstrate its ability to provide detailed assessments of the generated scenes, highlighting strengths and areas for improvemen