AI systems are entering critical domains like healthcare, finance, and defense, yet remain vulnerable to adversarial attacks. While AI red teaming is a primary defense, current approaches force operators into manual, library-specific workflows. Operators spend weeks hand-crafting workflows - assembling attacks, transforms, and scorers. When results fall short, workflows must be rebuilt. As a result, operators spend more time constructing workflows than probing targets for security and safety vulnerabilities. We introduce an AI red teaming agent built on the open-source Dreadnode SDK. The agent creates workflows grounded in 45+ adversarial attacks, 450+ transforms, and 130+ scorers. Operators can probe multi-agent systems, multilingual, and multimodal targets, focusing on what to probe rather than how to implement it. We make three contributions: 1. Agentic interface. Operators describe goals in natural language via the Dreadnode TUI (Terminal User Interface). The agent handles attack selection, transform composition, execution, and reporting, letting operators focus on red teaming. Weeks compress to hours. 2. Unified framework. A single framework for probing traditional ML models (
While distributed workers rely on scheduled meetings for coordination and collaboration, these meetings can also challenge their ability to focus. Protecting worker focus has been addressed from a technical perspective, but companies are now attempting organizational interventions, such as meeting-free weeks. Recognizing distributed collaboration as a sociotechnical challenge, we first present an interview study with distributed workers participating in meeting-free weeks at an enterprise software company. We identify three orientations workers exhibit during these weeks: Focus, Collaborative, and Time-Bound, each with varying levels and use of unstructured time. These different orientations result in challenges in attention negotiation, which may be suited for technical interventions. This motivated a follow-up study investigating attention negotiation and the compensating mechanisms workers developed during meeting-free weeks. Our framework identified tensions between the attention-getting and attention-delegation strategies. We extend past work to show how workers adapt their virtual collaboration mechanisms in response to organizational interventions
Annotating medical images, particularly for organ segmentation, is laborious and time-consuming. For example, annotating an abdominal organ requires an estimated rate of 30-60 minutes per CT volume based on the expertise of an annotator and the size, visibility, and complexity of the organ. Therefore, publicly available datasets for multi-organ segmentation are often limited in data size and organ diversity. This paper proposes an active learning method to expedite the annotation process for organ segmentation and creates the largest multi-organ dataset (by far) with the spleen, liver, kidneys, stomach, gallbladder, pancreas, aorta, and IVC annotated in 8,448 CT volumes, equating to 3.2 million slices. The conventional annotation methods would take an experienced annotator up to 1,600 weeks (or roughly 30.8 years) to complete this task. In contrast, our annotation method has accomplished this task in three weeks (based on an 8-hour workday, five days a week) while maintaining a similar or even better annotation quality. This achievement is attributed to three unique properties of our method: (1) label bias reduction using multiple pre-trained segmentation models, (2) effective erro
In the last decade, research in the field of autonomous vehicles has grown immensely, and there is a wealth of information available for researchers to rapidly establish an autonomous vehicle platform for basic maneuvers. In this paper, we design, implement, and test, in ten weeks, a PD approach to longitudinal control for pedestrian emergency braking. We also propose a lateral controller with a similar design for future testing in lane following. Using widely available tools, we demonstrate the safety of the vehicle in pedestrian emergency braking scenarios.
Psychologists have long known that an expert in a field not only knows significantly more individual facts/skills than a novice but also has these facts/skills organized into a mental hierarchy that links the individual facts (at the bottom of the hierarchy) together with larger more-encompassing ideas (at the top of the hierarchy). In the Spring quarter of 2012, UC Davis offered 4 sections (about 180 students each) of the first quarter of introductory physics, Physics 9A, covering Newtonian mechanics. One of these sections is a "treatment" group and had the entire 10-week quarter's set of ideas introduced, largely qualitatively, in the first 6 weeks followed by the 4 weeks where students learn to use those ideas to solve the algebraically complicated problems that physicists prize. The other three sections were organized as usual. The treatment group and one of the other sections were taught by the author and were identical (same homework, discussion, lecture, and lab) except for the organization of the content. After controlling for GPA as well as Force Concept Inventory pretest scores, the treatment group was found, with better than 99% confidence, to score higher on the final e
From a scale analysis of hydrodynamic phenomena having a significant action on the drift of an object in coastal ocean waters, we deduce equations modeling the associated hydrodynamic fields over a time period of several weeks. These models are essentially non linear hyperbolic systems of PDE involving a small parameter. Then from the models we extract a simplified and nevertheless typical one for which we prove that its classical solution exists on a time interval which is independent of the small parameter. We then show that the solution weak-* converges as the small parameter goes to zero and we characterize the equation satisfied by the weak-* limit
Across almost all scientific disciplines, the instruments that record our experimental data and the methods required for storage and data analysis are rapidly increasing in complexity. This gives rise to the need for scientific communities to adapt on shorter time scales than traditional university curricula allow for, and therefore requires new modes of knowledge transfer. The universal applicability of data science tools to a broad range of problems has generated new opportunities to foster exchange of ideas and computational workflows across disciplines. In recent years, hack weeks have emerged as an effective tool for fostering these exchanges by providing training in modern data analysis workflows. While there are variations in hack week implementation, all events consist of a common core of three components: tutorials in state-of-the-art methodology, peer-learning and project work in a collaborative environment. In this paper, we present the concept of a hack week in the larger context of scientific meetings and point out similarities and differences to traditional conferences. We motivate the need for such an event and present in detail its strengths and challenges. We find
This paper presents a capture of the queries managed by an eDonkey server during almost 10 weeks, leading to the observation of almost 9 billion messages involving almost 90 million users and more than 275 million distinct files. Acquisition and management of such data raises several challenges, which we discuss as well as the solutions we developed. We obtain a very rich dataset, orders of magnitude larger than previously avalaible ones, which we provide for public use. We finally present basic analysis of the obtained data, which already gives evidence of non-trivial features.
Massive Open Online Courses (MOOCs) have become popular platforms for online learning. While MOOCs enable students to study at their own pace, this flexibility makes it easy for students to drop out of class. In this paper, our goal is to predict if a learner is going to drop out within the next week, given clickstream data for the current week. To this end, we present a multi-layer representation learning solution based on branch and bound (BB) algorithm, which learns from low-level clickstreams in an unsupervised manner, produces interpretable results, and avoids manual feature engineering. In experiments on Coursera data, we show that our model learns a representation that allows a simple model to perform similarly well to more complex, task-specific models, and how the BB algorithm enables interpretable results. In our analysis of the observed limitations, we discuss promising future directions.
Large language models (LLMs) are regularly evaluated using benchmark datasets. But what justifies making inferences about an LLM's capabilities based on its answers to a curated set of questions? This paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs -- such as AP exams -- are also those used to test people. However, this raises an implication: these benchmarks are only valid tests if LLMs misunderstand concepts in ways that mirror human misunderstandings. Otherwise, success on benchmarks only demonstrates potemkin understanding: the illusion of understanding driven by answers irreconcilable with how any human would interpret a concept. We present two procedures for quantifying the existence of potemkins: one using a specially designed benchmark in three domains, the other using a general procedure that provides a lower-bound on their prevalence. We find that potemkins are ubiquitous across models, tasks, and domains. We also find that these failures reflect not just incorrect understanding, but deeper internal incoherence in concept representations.
We present an all-fiber design for a Tm-based fiber amplifier that can tune over 1992-2065 nm with 300-350 W single-frequency (<100 kHz) output. Over 180 W is achieved out to 2085 nm with <10% ASE content without utilizing ASE spectral filters. The amplifier employs both Tm- and Tm/Ho-doped gain fibers in two preamplifier stages in addition to longer sections of Tm fiber to extend the bandwidth of the Tm-based high-power amplifier to longer wavelengths (>2050 nm). Efficiencies of 55% are realized across the full bandwidth. Roll-off occurs beyond 2085 nm where ASE becomes intractable. The amplifier has an average M^2 value of 1.39 at high-power due to the presence of light guided within the fiber pedestal. Estimates of the pedestal light and higher-order mode contents are provided.
The recharge oscillator (RO) model has been successfully used to understand different aspects of the El Niño-Southern Oscillation (ENSO). Fitting the RO to observations and climate model simulations consistently suggests that ENSO is a damped oscillator whose variability is sustained and made irregular by external weather noise. We investigate the methods that have been used to estimate the growth rate of ENSO by applying them to simulations of both damped and self-sustained RO regimes. We find that fitting a linear RO leads to parameters that imply a damped oscillator even when the fitted data were produced by a model that is self-sustained. Fitting a nonlinear RO also leads to a significant bias toward a damped regime. As such, it seems challenging to conclude whether ENSO is a damped or a self-sustained oscillation by fitting such models to observations, and the possibility that ENSO is self-sustained cannot be ruled out.
The Gregorian calendar -- first established for daily use on Friday, October 15th, 1582 by Pope Gregory XIII in Catholic countries -- is presently the most pervasive calendar in the world. As such, algorithms for performing various calendrical computations in accurate, performant, and easily implementable ways are extremely useful in fields like software engineering. In this paper, we present a novel algorithm for determining the day of the week for any date in the Gregorian calendar. Of note, our algorithm does not rely on remembering tables of values. Instead, we encode tables needed for computation using simple linear regression with truncation to adjust for any errors present in our linear models in such a way that no tables have to be recalled. In addition, our algorithm does not require a relabeling of days, weeks, months, or years to values other than their intuitive representations. The algorithm works by taking a date in the Gregorian calendar, calculating the number of days (accounting for leap years using simple linear regression with truncation) that have elapsed since the epoch of the Gregorian calendar in 1582 from the specified date and adding this number modulo 7 to
We study quantum advantage in the 1-step graph domination game on cycle graphs numerically, analytically and through the use of Noisy intermediate scale quantum (NISQ) processors. We find explicit strategies that realise the recently found upper bounds for small graphs and generalise them to larger cycles. We demonstrate that NISQ computers realise the predicted quantum advantages with high accuracy.
Background and Objective: Several studies confirm that the age of hens has a tremendous impact on external and internal egg quality characteristics. Egg production could be at serious risk if egg quality characteristics and age of hens are not seriously considered. This study was conducted to analyze the phenotypic correlations between some internal and external egg quality characteristics in old laying hens. Materials and Methods: A total of 288 eggs of 85-week-old Hy-Line Brown laying hens were collected during 3 weeks and their internal and external egg characteristics were evaluated. Results: Phenotypic correlations between egg quality characteristics in old laying hens indicate a negative impact on shell and albumen quality but not affected yolk quality characteristics. Conclusion: This study helps to understand that raising laying hens above 80 weeks would have a negative impact on egg quality characteristics.
Opportunistic pharmacokinetic (PK) studies have sparse and imbalanced clinical measurement data, and the impact of sample time errors is an important concern when seeking accurate estimates of treatment response. We evaluated an approximate Bayesian model for individualized pharmacokinetics in the presence of time recording errors (TREs), considering both a short and long infusion dosing pattern. We found that the long infusion schedule generally had lower bias in estimates of the pharmacodynamic (PD) endpoint relative to the short infusion schedule. We investigated three different design strategies for their ability to mitigate the impact of TREs: (i) shifting blood draws taken during an active infusion to the post-infusion period, (ii) identifying the best next sample time by minimizing bias in the presence of TREs, and (iii) collecting additional information on a subset of patients based on estimate uncertainty or quadrature-estimated variance in the presence of TREs. Generally, the proposed strategies led to a decrease in bias of the PD estimate for the short infusion schedule, but had a negligible impact for the long infusion schedule. Dosing regimens with periods of high non-
Coastal upwelling, driven by alongshore winds and characterized by cold sea surface temperatures and high upper-ocean nutrient content, is an important physical process sustaining some of the oceans' most productive ecosystems. To fully understand the ocean properties in eastern boundary upwelling systems, it is important to consider the depth of the source waters being upwelled, as it affects both the SST and the transport of nutrients toward the surface. Here, we construct an upwelling source depth distribution for parcels at the surface in the upwelling zone. We do so using passive tracers forced at the domain boundary for every model depth level to quantify their contributions to the upwelled waters. We test the dependence of this distribution on the strength of the wind stress and stratification using high-resolution regional ocean simulations of an idealized coastal upwelling system. We also present an efficient method for estimating the mean upwelling source depth. Furthermore, we show that the standard deviation of the upwelling source depth distribution increases with increasing wind stress and decreases with increasing stratification. These results can be applied to bette
As amorphous materials get jammed, both geometric and dynamic heterogeneity are observed. We investigate the correlation between the local geometric heterogeneity and local rearrangements in a slowly compressed bidisperse quasi-two-dimensional emulsion system. The compression is driven by evaporation of the continuous phase, and causes the area packing fraction to increase from 0.88 to 0.99. We quantify the structural heterogeneity of the system using the radical Voronoi tessellation following the method of [Rieser et al., Phys. Rev. Lett. 116, 088001 (2016)]. We define two structural quantities characterizing local structure, the first which considers nearest neighbors and the second of which includes information from second nearest neighbors. We find that droplets in heterogeneous local regions are more likely to have local rearrangements. These rearrangements are generally T1 events where two droplets converge toward a void, and two droplets move away from the void to make room for the converging droplets. Thus the presence of the voids tends to orient the T1 events. The presence of a correlation between the structural quantities and the rearrangement dynamics remains qualitativ
Digital competence (DC) is a broad set of skills, attitudes, and knowledge for confident, critical and responsible use of digital technologies in every aspect of life. DC is fundamental to all people in conducting a productive and fulfilling life in an increasingly digital world. However, prejudices, misconceptions, and lack of awareness reduce the diffusion of DC, hindering digital transformation and preventing countries and people from realising their full potential. Teaching Informatics in the curriculum is increasingly supported by the institutions but faces serious challenges, such as teacher upskilling and support, and will require several years to observe sizeable outcomes. In response, grassroots movements promoting computing literacy in an informal setting have grown, including EU Code Week, whose vision is to develop computing skills while promoting diversity and raising awareness of the importance of digital skills. Code Week participation is a form of public engagement that could be affected by socio-economic and demographic factors, as any other form of participation. The aim of the manuscript is twofold: first, to offer a detailed and comprehensive statistical descrip