共找到 20 条结果
We define and investigate source-modality monitoring -- the ability of multimodal models to track and communicate the input source from which pieces of information originate. We consider source-modality monitoring as an instance of the more general binding problem, and evaluate the extent to which models exploit syntactic vs. semantic signals in order to bind words like image in a user-provided prompt to specific components of their input and context (i.e., actual images). Across experiments spanning 11 vision-language models (VLMs) performing target-modality information retrieval tasks, we find that both syntactic and semantic signals play an important role, but that the latter tend to outweigh the former in cases when modalities are highly distinct distributionally. We discuss the implications of these findings for model robustness, and in the context of increasingly multimodal agentic systems.
Observability into the decision making of modern AI systems may be required to safely deploy increasingly capable agents. Monitoring the chain-of-thought (CoT) of today's reasoning models has proven effective for detecting misbehavior. However, this "monitorability" may be fragile under different training procedures, data sources, or even continued system scaling. To measure and track monitorability, we propose three evaluation archetypes (intervention, process, and outcome-property) and a new monitorability metric, and introduce a broad evaluation suite. We demonstrate that these evaluations can catch simple model organisms trained to have obfuscated CoTs, and that CoT monitoring is more effective than action-only monitoring in practical settings. We compare the monitorability of various frontier models and find that most models are fairly, but not perfectly, monitorable. We also evaluate how monitorability scales with inference-time compute, reinforcement learning optimization, and pre-training model size. We find that longer CoTs are generally more monitorable and that RL optimization does not materially decrease monitorability even at the current frontier scale. Notably, we fin
Power transformers are critical assets in power networks, whose reliability directly impacts grid resilience and stability. Traditional condition monitoring approaches, often rule-based or purely physics-based, struggle with uncertainty, limited data availability, and the complexity of modern operating conditions. Recent advances in machine learning (ML) provide powerful tools to complement and extend these methods, enabling more accurate diagnostics, prognostics, and control. In this two-part series, we examine the role of Neural Networks (NNs) and their extensions in transformer condition monitoring and health management tasks. This first paper introduces the basic concepts of NNs, explores Convolutional Neural Networks (CNNs) for condition monitoring using diverse data modalities, and discusses the integration of NN concepts within the Reinforcement Learning (RL) paradigm for decision-making and control. Finally, perspectives on emerging research directions are also provided.
We introduce a cross-domain behavioural assay of monitoring-control coupling in LLMs, grounded in the Nelson and Narens (1990) metacognitive framework and applying human psychometric methodology to LLM evaluation. The battery comprises 524 items across six cognitive domains (learning, metacognitive calibration, social cognition, attention, executive function, prospective regulation), each grounded in an established experimental paradigm. Tasks T1-T5 were pre-registered on OSF prior to data collection; T6 was added as an exploratory extension. After every forced-choice response, dual probes adapted from Koriat and Goldsmith (1996) ask the model to KEEP or WITHDRAW its answer and to BET or decline. The critical metric is the withdraw delta: the difference in withdrawal rate between incorrect and correct items. Applied to 20 frontier LLMs (10,480 evaluations), the battery discriminates three profiles consistent with the Nelson-Narens architecture: blanket confidence, blanket withdrawal, and selective sensitivity. Accuracy rank and metacognitive sensitivity rank are largely inverted. Retrospective monitoring and prospective regulation appear dissociable (r = .17, 95% CI wide given n=20
Decentralized autonomous organizations (DAOs) are designed to disperse control, yet recent evidence shows that effective governance is often concentrated in a small number of participants. This note studies one simple mechanism behind that pattern. Because decentralized governance is monitor-intensive, rising proposal flow may eventually outpace the capacity of broad-based participation. Using a DAO--quarter panel, I estimate a fixed-effects kink model with DAO and quarter fixed effects and find a statistically significant decline in the marginal responsiveness of active voters once proposal activity crosses an interior threshold. I then study realized voting concentration using kink specifications with data-driven cutoffs. Across specifications, decentralization gains do not persist indefinitely once governance workload becomes sufficiently high, and load-based measures show especially clear evidence of a transition toward more concentrated realized control. The results provide reduced-form evidence consistent with a ``too big to monitor'' mechanism in DAO governance: when proposal flow grows faster than broad participation can keep up, effective control may drift toward a smaller
With our growing reliability on distributed networks, the security aspect of such networks becomes of prime importance. In large scale distributed networks it becomes cardinal to have an efficient and effective monitoring scheme. The monitoring schemes supervise the node behaviour in the network and look out for any discrepancy. Monitoring schemes comprise of monitoring components that work together to help schemes in meeting various security requirement parameters for the networks. These security parameters are breached via various attacks by manipulation of monitoring components of particular monitoring schemes to produce faulty results and thereby reducing efficiency of networks, reliability and security. In this paper we have discussed these components of monitoring, multiple monitoring schemes, their security parameters and various types of attacks possible on these monitoring components by manipulating assumptions of monitoring schemes.
With the increasing significance of Research, Technology, and Innovation (RTI) policies in recent years, the demand for detailed information about the performance of these sectors has surged. Many of the current tools are limited in their application purpose. To address these issues, we introduce a requirements engineering process to identify stakeholders and elicitate requirements to derive a system architecture, for a web-based interactive and open-access RTI system monitoring tool. Based on several core modules, we introduce a multi-tier software architecture of how such a tool is generally implemented from the perspective of software engineers. A cornerstone of this architecture is the user-facing dashboard module. We describe in detail the requirements for this module and additionally illustrate these requirements with the real example of the Austrian RTI Monitor.
System monitoring is an established tool to measure the utilization and health of HPC systems. Usually system monitoring infrastructures make no connection to job information and do not utilize hardware performance monitoring (HPM) data. To increase the efficient use of HPC systems automatic and continuous performance monitoring of jobs is an essential component. It can help to identify pathological cases, provides instant performance feedback to the users, offers initial data to judge on the optimization potential of applications and helps to build a statistical foundation about application specific system usage. The LIKWID monitoring stack is a modular framework build on top of the LIKWID tools library. It aims on enabling job specific performance monitoring using HPM data, system metrics and application-level data for small to medium sized commodity clusters. Moreover, it is designed to integrate in existing monitoring infrastructures to speed up the change from pure system monitoring to job-aware monitoring.
Indirect structural health monitoring (iSHM) for broken rail detection using onboard sensors presents a cost-effective paradigm for railway track assessment, yet reliably detecting small, transient anomalies (2-10 cm) remains a significant challenge due to complex vehicle dynamics, signal noise, and the scarcity of labeled data limiting supervised approaches. This study addresses these issues through unsupervised deep learning. We introduce an incremental synthetic data benchmark designed to systematically evaluate model robustness against progressively complex challenges like speed variations, multi-channel inputs, and realistic noise patterns encountered in iSHM. Using this benchmark, we evaluate several established unsupervised models alongside our proposed Attention-Focused Transformer. Our model employs a self-attention mechanism, trained via reconstruction but innovatively deriving anomaly scores primarily from deviations in learned attention weights, aiming for both effectiveness and computational efficiency. Benchmarking results reveal that while transformer-based models generally outperform others, all tested models exhibit significant vulnerability to high-frequency local
While detailed resource usage monitoring is possible on the low-level using proper tools, associating such usage with higher-level abstractions in the application layer that actually cause the resource usage in the first place presents a number of challenges. Suppose a large-scale scientific data analysis workflow is run using a distributed execution environment such as a compute cluster or cloud environment and we want to analyze the I/O behaviour of it to find and alleviate potential bottlenecks. Different tasks of the workflow can be assigned to arbitrary compute nodes and may even share the same compute nodes. Thus, locally observed resource usage is not directly associated with the individual workflow tasks. By acquiring resource usage profiles of the involved nodes, we seek to correlate the trace data to the workflow and its individual tasks. To accomplish that, we select the proper set of metadata associated with low-level traces that let us associate them with higher-level task information obtained from log files of the workflow execution as well as the job management using a task orchestrator such as Kubernetes with its container management. Ensuring a proper information c
Existing technologies for distributed light-field mapping and light pollution monitoring (LPM) rely on either remote satellite imagery or manual light surveying with single-point sensors such as SQMs (sky quality meters). These modalities offer low-resolution data that are not informative for dense light-field mapping, pollutant factor identification, or sustainable policy implementation. In this work, we propose LightViz -- an interactive software interface to survey, simulate, and visualize light pollution maps in real-time. As opposed to manual error-prone methods, LightViz (i) automates the light-field data collection and mapping processes; (ii) provides a platform to simulate various light sources and intensity attenuation models; and (iii) facilitates effective policy identification for conservation. To validate the end-to-end computational pipeline, we design a distributed light-field sensor suit, collect data on Florida coasts, and visualize the distributed light-field maps. In particular, we perform a case study at St. Johns County in Florida, which has a two-decade conservation program for lighting ordinances. The experimental results demonstrate that LightViz can offer h
The proliferation of sensors brings an immense volume of spatio-temporal (ST) data in many domains, including monitoring, diagnostics, and prognostics applications. Data curation is a time-consuming process for a large volume of data, making it challenging and expensive to deploy data analytics platforms in new environments. Transfer learning (TL) mechanisms promise to mitigate data sparsity and model complexity by utilizing pre-trained models for a new task. Despite the triumph of TL in fields like computer vision and natural language processing, efforts on complex ST models for anomaly detection (AD) applications are limited. In this study, we present the potential of TL within the context of high-dimensional ST AD with a hybrid autoencoder architecture, incorporating convolutional, graph, and recurrent neural networks. Motivated by the need for improved model accuracy and robustness, particularly in scenarios with limited training data on systems with thousands of sensors, this research investigates the transferability of models trained on different sections of the Hadron Calorimeter of the Compact Muon Solenoid experiment at CERN. The key contributions of the study include expl
The Astro2020 Decadal Survey recognizes time-domain astronomy as a key science area over the next decade and beyond. With over 30 years of HST data and the potential for 20 years of JWST operations, these flagship observatories offer an unparalleled prospect for a half-century of space-based observations in the time domain. To take best advantage of this opportunity, STScI charged a working group to solicit community input and formulate strategies to maximize the science return in time-domain astronomy from these two platforms. Here, the HST/JWST Long-Term Monitoring Working Group reports on the input we received and presents our recommendations to enhance the scientific return for time-domain astronomy from HST and JWST. We suggest changes in policies to enable and prioritize long-term science programs of high scientific value. As charged, we also develop recommendations based on community input for a JWST Director's Discretionary Time program to observe high-redshift transients.
We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge Task 2: First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring. Continuing from last year's DCASE 2023 Challenge Task 2, we organize the task as a first-shot problem under domain generalization required settings. The main goal of the first-shot problem is to enable rapid deployment of ASD systems for new kinds of machines without the need for machine-specific hyperparameter tunings. This problem setting was realized by (1) giving only one section for each machine type and (2) having completely different machine types for the development and evaluation datasets. For the DCASE 2024 Challenge Task 2, data of completely new machine types were newly collected and provided as the evaluation dataset. In addition, attribute information such as the machine operation conditions were concealed for several machine types to mimic situations where such information are unavailable. We will add challenge results and analysis of the submissions after the challenge submission deadline.
One of the commonly used seismic force-resisting systems in structures is Reinforced Concrete (RC) Intermediate Moment Frames (IMF). Although using the IMF is not allowed in high seismic hazard zones according to ASCE 7-10, it is permitted in both Iran's 2800 Seismic Standard and New Zealand's Seismic Code. This study investigates the seismic behavior of a reinforced concrete IMF subjected to earthquake excitations using shaking table tests on a 2D RC structural model which is designed under the regulations of ACI318-19. The scale factor of 1/2.78 is selected for the frame fabrication due to the size limit of the shaking table. The constructed model has three stories with a height as 115 cm for each story, the clear length of beams as 151 cm, and cross-sectional dimensions of columns and beams as 11 $\times$ 11 cm and 12 $\times$ 11 cm, respectively. The whole structure is supported by a foundation that is 173 cm long, 52 cm wide, and 22 cm deep. Columns and beams are reinforced with 8 mm diameter longitudinal ribbed bars and stirrups with 6 mm diameter. The tests are conducted in stages with increasing peak ground acceleration (PGA) till the failure of the frame. Sarpol-E-Zahab ea
As road accident cases are increasing due to the inattention of the driver, automated driver monitoring systems (DMS) have gained an increase in acceptance. In this report, we present a real-time DMS system that runs on a hardware-accelerator-based edge device. The system consists of an InfraRed camera to record the driver footage and an edge device to process the data. To successfully port the deep learning models to run on the edge device taking full advantage of the hardware accelerators, model surgery was performed. The final DMS system achieves 63 frames per second (FPS) on the TI-TDA4VM edge device.
Agricultural production is highly dependent on naturally occurring environmental conditions like change of seasons and the weather. Especially in fruit and wine growing, late frosts occurring shortly after the crops have sprouted have the potential to cause massive damage to plants [L1,L2] [1]. In this article we present a cost-efficient temperature monitoring system for detecting and reacting to late frosts to prevent crop failures. The proposed solution includes a data space where Internet of Things (IoT) devices can form a cyber-physical system (CPS) to interact with their nearby environment and securely exchange data. Based on this data, more accurate predictions can be made in the future using machine learning (ML), which will further contribute to minimising economic damage caused by crop failures.
We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task 2: ``First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring''. The main goal is to enable rapid deployment of ASD systems for new kinds of machines without the need for hyperparameter tuning. In the past ASD tasks, developed methods tuned hyperparameters for each machine type, as the development and evaluation datasets had the same machine types. However, collecting normal and anomalous data as the development dataset can be infeasible in practice. In 2023 Task 2, we focus on solving the first-shot problem, which is the challenge of training a model on a completely novel machine type. Specifically, (i) each machine type has only one section (a subset of machine type) and (ii) machine types in the development and evaluation datasets are completely different. Analysis of 86 submissions from 23 teams revealed that the keys to outperform baselines were: 1) sampling techniques for dealing with class imbalances across different domains and attributes, 2) generation of synthetic samples for robust detection, and 3) use of multiple
Continuous monitoring of trained ML models to determine when their predictions should and should not be trusted is essential for their safe deployment. Such a framework ought to be high-performing, explainable, post-hoc and actionable. We propose TRUST-LAPSE, a "mistrust" scoring framework for continuous model monitoring. We assess the trustworthiness of each input sample's model prediction using a sequence of latent-space embeddings. Specifically, (a) our latent-space mistrust score estimates mistrust using distance metrics (Mahalanobis distance) and similarity metrics (cosine similarity) in the latent-space and (b) our sequential mistrust score determines deviations in correlations over the sequence of past input representations in a non-parametric, sliding-window based algorithm for actionable continuous monitoring. We evaluate TRUST-LAPSE via two downstream tasks: (1) distributionally shifted input detection, and (2) data drift detection. We evaluate across diverse domains - audio and vision using public datasets and further benchmark our approach on challenging, real-world electroencephalograms (EEG) datasets for seizure detection. Our latent-space mistrust scores achieve stat
Grid computing consists of the coordinated use of large sets of diverse, geographically distributed resources for high performance computation. Effective monitoring of these computing resources is extremely important to allow efficient use on the Grid. The large number of heterogeneous computing entities available in Grids makes the task challenging. In this work, we describe a Grid monitoring system, called GridMonitor, that captures and makes available the most important information from a large computing facility. The Grid monitoring system consists of four tiers: local monitoring, archiving, publishing and harnessing. This architecture was applied on a large scale linux farm and network infrastructure. It can be used by many higher-level Grid services including scheduling services and resource brokering.