The rapid rise of open-weight and open-source foundation models is intensifying the obligation and reshaping the opportunity to make AI systems safe. This paper reports outcomes from the Columbia Convening on AI Openness and Safety (San Francisco, 19 Nov 2024) and its six-week preparatory programme involving more than forty-five researchers, engineers, and policy leaders from academia, industry, civil society, and government. Using a participatory, solutions-oriented process, the working groups produced (i) a research agenda at the intersection of safety and open source AI; (ii) a mapping of existing and needed technical interventions and open source tools to safely and responsibly deploy open foundation models across the AI development workflow; and (iii) a mapping of the content safety filter ecosystem with a proposed roadmap for future research and development. We find that openness -- understood as transparent weights, interoperable tooling, and public governance -- can enhance safety by enabling independent scrutiny, decentralized mitigation, and culturally plural oversight. However, significant gaps persist: scarce multimodal and multilingual benchmarks, limited defenses agai
In Rust, unsafe code is the sole source of potential undefined behaviors. To avoid misuse, Rust developers should clarify the safety properties for each unsafe API. However, the community currently lacks a key standard for safety documentation: existing safety comments in the source code and safety documentation can be ad hoc and incomplete. This paper presents a tag-centric methodology for auditing the consistency and completeness of safety documentation. We first derive a taxonomy of Safety Tags to formalize natural-language requirements. Second, because API soundness frequently relies on struct invariants, we propose a set of empirical rules to systematically audit the structural consistency of safety documentation. We implemented this methodology in safety-tool, a static linter that automatically enforces structural consistency between local safety annotations and callee requirements. Our approach was applied to the Rust standard library, fixing documentation issues on 27 APIs with 61 safety tags and identifying safety tags that are applicable to 96.1% of the public unsafe APIs in libstd. Furthermore, we have formalized the tagging idea through a Rust RFC to the wider community
Safety has become the central value around which dominant AI governance efforts are being shaped. Recently, this culminated in the publication of the International AI Safety Report, written by 96 experts of which 30 nominated by the Organisation for Economic Co-operation and Development (OECD), the European Union (EU), and the United Nations (UN). The report focuses on the safety risks of general-purpose AI and available technical mitigation approaches. In this response, informed by a system safety perspective, I refl ect on the key conclusions of the report, identifying fundamental issues in the currently dominant technical framing of AI safety and how this frustrates meaningful discourse and policy efforts to address safety comprehensively. The system safety discipline has dealt with the safety risks of software-based systems for many decades, and understands safety risks in AI systems as sociotechnical and requiring consideration of technical and non-technical factors and their interactions. The International AI Safety report does identify the need for system safety approaches. Lessons, concepts and methods from system safety indeed provide an important blueprint for overcoming
Drug safety research is crucial for maintaining public health, often requiring comprehensive data support. However, the resources currently available to the public are limited and fail to provide a comprehensive understanding of the relationship between drugs and their side effects. This paper introduces DrugWatch, an easy-to-use and interactive multi-source information visualisation platform for drug safety study. It allows users to understand common side effects of drugs and their statistical information, flexibly retrieve relevant medical reports, or annotate their own medical texts with our automated annotation tool. Supported by NLP technology and enriched with interactive visual components, we are committed to providing researchers and practitioners with a one-stop information analysis, retrieval, and annotation service. The demonstration video is available at https://www.youtube.com/watch?v=RTqDgxzETjw. We also deployed an online demonstration system at https://drugwatch.net/.
Kidney abnormality detection is required for all preclinical drug development. It involves a time-consuming and costly examination of hundreds to thousands of whole-slide images per drug safety study, most of which are normal, to detect any subtle changes indicating toxic effects. In this study, we present the first large-scale self-supervised abnormality detection model for kidney toxicologic pathology, spanning drug safety assessment studies from 158 compounds. We explore the complexity of kidney abnormality detection on this scale using features extracted from the UNI foundation model (FM) and show that a simple k-nearest neighbor classifier on these features performs at chance, demonstrating that the FM-generated features alone are insufficient for detecting abnormalities. We then demonstrate that a self-supervised method applied to the same features can achieve better-than-chance performance, with an area under the receiver operating characteristic curve of 0.62 and a negative predictive value of 89%. With further development, such a model can be used to rule out normal slides in drug safety assessment studies, reducing the costs and time associated with drug development.
Adverse drug interactions are a critical concern in pharmacovigilance, as both clinical trials and spontaneous reporting systems often lack the breadth to detect complex drug interactions. This study introduces a computational framework for adverse drug interaction detection, leveraging disproportionality analysis on individual case safety reports. By integrating the Anatomical Therapeutic Chemical classification, the framework extends beyond drug interactions to capture hierarchical pharmacological relationships. To address biases inherent in existing disproportionality measures, we employ a hypergeometric risk metric, while a Markov Chain Monte Carlo algorithm provides robust empirical p-value estimation for the risk associated to cocktails. A genetic algorithm further facilitates efficient identification of high-risk drug cocktails. Validation on synthetic and FDA Adverse Event Reporting System data demonstrates the method's efficacy in detecting established drugs and drug interactions associated with myopathy-related adverse events. Implemented as an R package, this framework offers a reproducible, scalable tool for post-market drug safety surveillance.
The introduction of Smart Electric Vehicles (SEVs) represents an increasingly disruption on automotive area, once integrates advanced computer and communication technologies to highly electrical cars, which come with high performances, environment friendly and user friendly characteristics . But the increasing complexity of SEVs prompted by greater dependence on interconnected systems, autonomous capabilities and electrification, presents new challenges in cybersecurity as well as functional safety. The safety and reliability of such vehicles is paramount, as unsafe or unreliable operation in either case represents an unacceptable risk in terms of the performance of the vehicle and safety of the passenger. This paper investigates the integrated development of cybersecurity and functional safety for SEVs, emphasizing the requirement for the parallel development of these domains as components that are not treated separately. In SEVs, cybersecurity is quite crucial in order to prevent the threats of hacking, data breaches and unauthorized access to vehicle systems. Functional safety ensures that important vehicle functions (braking, steering, battery control, etc.) keep working even i
The role of Artificial Intelligence (AI) is growing in every stage of drug development. Nevertheless, a major challenge in drug discovery AI remains: Drug pharmacokinetic (PK) and Drug-Target Interaction (DTI) datasets collected in different studies often exhibit limited overlap, creating data overlap sparsity. Thus, data curation becomes difficult, negatively impacting downstream research investigations in high-throughput screening, polypharmacy, and drug combination. We propose xImagand-DKI, a novel SMILES/Protein-to-Pharmacokinetic/DTI (SP2PKDTI) diffusion model capable of generating an array of PK and DTI target properties conditioned on SMILES and protein inputs that exhibit data overlap sparsity. We infuse additional molecular and genomic domain knowledge from the Gene Ontology (GO) and molecular fingerprints to further improve our model performance. We show that xImagand-DKI-generated synthetic PK data closely resemble real data univariate and bivariate distributions, and can adequately fill in gaps among PK and DTI datasets. As such, xImagand-DKI is a promising solution for data overlap sparsity and may improve performance for downstream drug discovery research tasks. Code
Multimodal large language models (MLLMs) are increasingly deployed as assistants that interact through text and images, making it crucial to evaluate contextual safety when risk depends on both the visual scene and the evolving dialogue. Existing contextual safety benchmarks are mostly single-turn and often miss how malicious intent can emerge gradually or how the same scene can support both benign and exploitative goals. We introduce the Multi-Turn Multimodal Contextual Safety Benchmark (MTMCS-Bench), a benchmark of realistic images and multi-turn conversations that evaluates contextual safety in MLLMs under two complementary settings, escalation-based risk and context-switch risk. MTMCS-Bench offers paired safe and unsafe dialogues with structured evaluation. It contains over 30 thousand multimodal (image+text) and unimodal (text-only) samples, with metrics that separately measure contextual intent recognition, safety-awareness on unsafe cases, and helpfulness on benign ones. Across eight open-source and seven proprietary MLLMs, we observe persistent trade-offs between contextual safety and utility, with models tending to either miss gradual risks or over-refuse benign dialogues.
This paper revisits three backup-based safety filters -- Backup Control Barrier Functions (Backup CBF), Model Predictive Shielding (MPS), and gatekeeper -- through a unified comparative framework. Using a common safety-filter abstraction and shared notation, we make explicit both their common backup-policy structure and their key algorithmic differences. We compare the three methods through their filter-inactive sets, i.e., the states where the nominal policy is left unchanged. In particular, we show that MPS is a special case of gatekeeper, and we further relate gatekeeper to the interior of the Backup CBF inactive set within the implicit safe set. This unified view also highlights a key source of conservatism in backup-based safety filters: safety is often evaluated through the feasibility of a backup maneuver, rather than through the nominal policy's continued safe execution. The paper is intended as a compact tutorial and review that clarifies the theoretical connections and differences among these methods.
Longer timelines and lower success rates of drug candidates limit the productivity of clinical trials in the pharmaceutical industry. Promising de novo drug design techniques help solve this by exploring a broader chemical space, efficiently generating new molecules, and providing improved therapies. However, optimizing for molecular characteristics found in approved oral drugs remains a challenge, limiting de novo usage. In this work, we propose NovoMol, a novel de novo method using recurrent neural networks to mass-generate drug molecules with high oral bioavailability, increasing clinical trial time efficiency. Molecules were optimized for desirable traits and ranked using the quantitative estimate of drug-likeness (QED). Generated molecules meeting QED's oral bioavailability threshold were used to retrain the neural network, and, after five training cycles, 76% of generated molecules passed this strict threshold and 96% passed the traditionally used Lipinski's Rule of Five. The trained model was then used to generate specific drug candidates for the cancer-related PDGFRα receptor and 44% of generated candidates had better binding affinity than the current state-of-the-art drug,
The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-powered Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense str
Ensuring the safe alignment of large language models (LLMs) with human values is critical as they become integral to applications like translation and question answering. Current alignment methods struggle with dynamic user intentions and complex objectives, making models vulnerable to generating harmful content. We propose Safety Arithmetic, a training-free framework enhancing LLM safety across different scenarios: Base models, Supervised fine-tuned models (SFT), and Edited models. Safety Arithmetic involves Harm Direction Removal to avoid harmful content and Safety Alignment to promote safe responses. Additionally, we present NoIntentEdit, a dataset highlighting edit instances that could compromise model safety if used unintentionally. Our experiments show that Safety Arithmetic significantly improves safety measures, reduces over-safety, and maintains model utility, outperforming existing methods in ensuring safe content generation.
A system safety case is a compelling, comprehensible, and valid argument about the satisfaction of the safety goals of a given system operating in a given environment supported by convincing evidence. Since the publication of UL 4600 in 2020, safety cases have become a best practice for measuring, managing, and communicating the safety of autonomous vehicles (AVs). Although UL 4600 provides guidance on how to build the safety case for an AV, the complexity of AVs and their operating environments, the novelty of the used technology, the need for complying with various regulations and technical standards, and for addressing cybersecurity concerns and ethical considerations make the development of safety cases for AVs challenging. To this end, safety case frameworks have been proposed that bring strategies, argument templates, and other guidance together to support the development of a safety case. This paper introduces the Open Autonomy Safety Case Framework, developed over years of work with the autonomous vehicle industry, as a roadmap for how AVs can be deployed safely and responsibly.
We propose a new framework to facilitate dynamic assurance within a safety case approach by associating safety performance measurement with the core assurance artifacts of a safety case. The focus is mainly on the safety architecture, whose underlying risk assessment model gives the concrete link from safety measurement to operational risk. Using an aviation domain example of autonomous taxiing, we describe our approach to derive safety indicators and revise the risk assessment based on safety measurement. We then outline a notion of consistency between a collection of safety indicators and the safety case, as a formal basis for implementing the proposed framework in our tool, AdvoCATE.
Aviation safety is paramount, demanding precise analysis of safety occurrences during different flight phases. This study employs Natural Language Processing (NLP) and Deep Learning models, including LSTM, CNN, Bidirectional LSTM (BLSTM), and simple Recurrent Neural Networks (sRNN), to classify flight phases in safety reports from the Australian Transport Safety Bureau (ATSB). The models exhibited high accuracy, precision, recall, and F1 scores, with LSTM achieving the highest performance of 87%, 88%, 87%, and 88%, respectively. This performance highlights their effectiveness in automating safety occurrence analysis. The integration of NLP and Deep Learning technologies promises transformative enhancements in aviation safety analysis, enabling targeted safety measures and streamlined report handling.
With the rapid popularity of large language models such as ChatGPT and GPT-4, a growing amount of attention is paid to their safety concerns. These models may generate insulting and discriminatory content, reflect incorrect social values, and may be used for malicious purposes such as fraud and dissemination of misleading information. Evaluating and enhancing their safety is particularly essential for the wide application of large language models (LLMs). To further promote the safe deployment of LLMs, we develop a Chinese LLM safety assessment benchmark. Our benchmark explores the comprehensive safety performance of LLMs from two perspectives: 8 kinds of typical safety scenarios and 6 types of more challenging instruction attacks. Our benchmark is based on a straightforward process in which it provides the test prompts and evaluates the safety of the generated responses from the evaluated model. In evaluation, we utilize the LLM's strong evaluation ability and develop it as a safety evaluator by prompting. On top of this benchmark, we conduct safety assessments and analyze 15 LLMs including the OpenAI GPT series and other well-known Chinese LLMs, where we observe some interesting f
Real-time safety assessment (RTSA) of dynamic systems is a critical task that has significant implications for various fields such as industrial and transportation applications, especially in non-stationary environments. However, the absence of a comprehensive review of real-time safety assessment methods in non-stationary environments impedes the progress and refinement of related methods. In this paper, a review of methods and techniques for RTSA tasks in non-stationary environments is provided. Specifically, the background and significance of RTSA approaches in non-stationary environments are firstly highlighted. We then present a problem description that covers the definition, classification, and main challenges. We review recent developments in related technologies such as online active learning, online semi-supervised learning, online transfer learning, and online anomaly detection. Finally, we discuss future outlooks and potential directions for further research. Our review aims to provide a comprehensive and up-to-date overview of real-time safety assessment methods in non-stationary environments, which can serve as a valuable resource for researchers and practitioners in t
Modern cyber-physical systems are operated by complex software that increasingly takes over safety-critical functions. Software enables rapid iterations and continuous delivery of new functionality that meets the ever-changing expectations of users. As high-speed development requires discipline, rigor, and automation, software factories are used. These entail methods and tools used for software development, such as build systems and pipelines. To keep up with the rapid evolution of software, we need to bridge the disconnect in methods and tools between software development and safety engineering today. We need to invest more in formality upfront - capturing safety work products in semantically rich models that are machine-processable, defining automatic consistency checks, and automating the generation of documentation - to benefit later. Transferring best practices from software to safety engineering is worth exploring. We advocate for safety factories, which integrate safety tooling and methods into software development pipelines.
Safety lies at the core of developing and deploying large language models (LLMs). However, previous safety benchmarks only concern the safety in one language, e.g. the majority language in the pretraining data such as English. In this work, we build the first multilingual safety benchmark for LLMs, XSafety, in response to the global deployment of LLMs in practice. XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We utilize XSafety to empirically study the multilingual safety for 4 widely-used LLMs, including both close-API and open-source models. Experimental results show that all LLMs produce significantly more unsafe responses for non-English queries than English ones, indicating the necessity of developing safety alignment for non-English languages. In addition, we propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT by evoking safety knowledge and improving cross-lingual generalization of safety alignment. Our prompting method can significantly reduce the ratio of unsafe responses from 19.1% to 9.7% for non-English queries. We release our data at https://github.com/Jar