Literature reviews show stated-preference studies, used to understand the values individuals place on health and health care, are increasingly administered online, potentially maximising respondent access and allowing for enhanced response quality. Online respondents may often choose whether to use a desktop or laptop personal computer (PC), tablet or smartphone, all with different screen sizes and modes of data entry, to complete the survey. To avoid differences in measurement errors, frequently respondents are asked to complete the surveys on a PC despite evidence that handheld devices are increasingly used for internet browsing. As yet, it is unknown if or how the device used to access the survey affects responses and/or the subsequent valuations derived. This study uses data from a discrete choice experiment (DCE) administered online to elicit preferences of a general population sample of females for a national breast screening programme. The analysis explores differences in key outcomes such as completion rates, engagement with the survey materials, respondent characteristics, response time, failure of an internal validity test and health care preferences for (1) handheld devices and (2) PC users. Preferences were analysed using a fully correlated random parameter logit (RPL) model to allow for unexplained scale and preference heterogeneity. One thousand respondents completed the survey in its entirety. The most popular access devices were PCs (n = 785), including Windows (n = 705) and Macbooks (n = 69). Two-hundred and fifteen respondents accessed the survey on a handheld device. Most outcomes related to survey behaviour, including failure of a dominance check, 'flat lining', self-reported attribute non-attendance (ANA) or respondent-rated task difficulty, did not differ by device type (p > 0.100). Respondents accessing the survey using a PC were generally quicker (median time to completion 14.5 min compared with 16 min for those using handheld devices) and were significantly less likely to speed through a webpage. Although there was evidence of preference intensity (taste) or variability (scale) heterogeneity across respondents in the sample, it was not driven by the access device. Overall, we find that neither preferences nor choice behaviour is associated with the type of access device, as long as respondents are presented with question formats that are easy to use on small touchscreens. Health preference researchers should optimise preference instruments for a range of devices and encourage respondents to complete the surveys using their preferred device. However, we suggest that access device characteristics should be gathered and included when reporting results.
We present swarm-attack, an open-source adversarial testing framework in which multiple lightweight LLM agents coordinate through shared memory, parallel exploration, and evolutionary optimization. Together, our results demonstrate that both safety bypass of frontier models and software vulnerability discovery, i.e., the capability class that motivated restricted release of Anthropic's Mythos Preview, are achievable at effectively zero cost using commodity hardware and openly available models. We report two experiments. In the first, five instances of a 1.2 billion parameter model conducted 225 jailbreak attacks each against GPT-4o and Claude Sonnet~4. Against GPT-4o, the swarm achieved an Effective Harm Rate of 45.8%, producing 49 critical-severity breaches; against Claude Sonnet-4, the Effective Harm Rate was 0% despite a 40% technical success rate. In the second experiment, the same models performed combined source code analysis and binary fuzzing against a vulnerable C application with 9 planted CWEs. With a hand-crafted exploit seed corpus, regex pattern detection, and AddressSanitizer-based crash classification, the pipeline recovers 9 of 9 vulnerabilities (100% recall) in ap
How few parameters do we really need to forecast a periodic time series? An hourly electricity series, reshaped as a 24-row matrix with one column per day, is approximately rank-1: a daily shape modulated by a daily level (median centered rank-1 energy 0.82 on GIFT-Eval). Should we learn the shape? Smoothing, shrinkage, and low-rank fits all seem like obvious upgrades over the simple average of the last K=2 cycles. On all 97 GIFT-Eval configurations, we tested 8 such alternatives (e.g., Fourier, EWMA, James-Stein, rank-r SVD): none significantly beats the frozen baseline under Holm correction; two are significantly worse. The resulting method, FLAIR, is (a) Effective: matches PatchTST on aggregate GIFT-Eval (relMASE 0.838 vs 0.849); (b) Compact: 28 scalars for hourly, 57 for weekly; (c) Fast: 22 minutes on one CPU core of a MacBook Pro; (d) Closed-form & Hands-Off: one SVD per period candidate, GCV-averaged Ridge, no GPU, no pre-training, no per-task tuning. In the high-rank-1, many-cycle regime, extra flexibility is estimation noise.
Project Yanasse presents a method for discovering new proofs of theorems in one area of mathematics by transferring proof strategy patterns (e.g., Lean 4 tactic invocation patterns) from a structurally distant area. The system extracts tactic usage distributions across 27 top-level areas of Mathlib (217,133 proof states), computes z-scores to identify tactics that are heavily used in a source area but rare or absent in a target area, matches source and target proof states via GPU-accelerated NP-hard analogy (running on a MacBook Air via Apple's MPS backend), and then asks an AI reasoning agent to semantically adapt--not symbol-substitute--the source tactics invocation pattern to the target theorem. In this first part of the study, the method is applied to the pair Probability -> Representation Theory, producing 4 Lean-verified new proofs out of 10 attempts (40%). The proofs compile with zero sorry declarations. The key finding is that tactic schemas decompose into a head (domain-gated, rarely transfers) and a modifier (domain-general, often transfers): filter upwards's head fails in representation theory (no Filter structure), but its [LIST] with ω modifier transfers cleanly as
Blockchain-based IoT data sharing systems increasingly adopt a hybrid architecture in which a permissioned ledger stores tamper-evident metadata while encrypted payloads are placed in content-addressed storage. In such systems, a central security bottleneck is key access control: enforcing dynamic, multi-user authorization for releasing or using bulk-data decryption keys. Existing designs often rely on always-online RBAC or smart-contract gates that return keys to authorized users, reintroducing a trusted online policy enforcement point and weakening auditability. This paper presents a revocation-ready key management layer that replaces online key release with ciphertext key publication: the ledger records metadata of the form (CID, CK, PolicyID, epoch), where CK is a CP-ABE ciphertext encapsulating an AES-GCM key. Users retrieve CK from the ledger and decrypt locally if their attributes satisfy the policy. To support forward revocation and policy evolution without re-encrypting large files, the design introduces an epoch/time-bound attribute and a lightweight CK-rotation protocol that updates only small ciphertext keys and ledger entries. We implement a minimal end-to-end prototyp
Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent. Finally, we run adversarial post-training to both accelerate inference and improve generation quality, reducing the number of inference steps while improving fidelity and prompt adherence. Stable Audio 3 models are trained on licensed and Creative Commons data to generate music and sounds in less than a 2s on an H200 GPU and less than a few seconds on a MacBook Pro M4. We release the weights of small and medium, that can run on consumer-grade hardware, together with their training and inference pipeline.
Lossless compression is essential for efficient data storage and transmission. Although learning-based lossless compressors achieve strong results, most of them are designed for a single modality, leading to redundant compressor deployments in multi-modal settings. Designing a unified multi-modal compressor is critical yet challenging, as different data types vary largely in format, dimension, and statistics. Multi-modal large language models offer a promising resolution but remain too complex for practical use. Thus, we propose \textbf{OmniZip}, \textbf{a unified and lightweight lossless compressor for multi-modal data (like image, text, speech, tactile, database, and gene sequence)}. Built on a lightweight backbone, OmniZip incorporates three key components to enable efficient multi-modal lossless compression: a modality-unified tokenizer that reversibly transforms diverse data into tokens, a modality-routing context learning mechanism that enables flexible multi-modal context modeling, and a modality-routing feedforward design that further enhances the model's nonlinear representation flexibility. A reparameterization training strategy is used to enhance model capacity. OmniZip
Robots deployed in dynamic environments must contend with environment-driven changes that reshape computation at runtime: new tasks may appear, precedence relations can shift, and overall workload structure evolves, all of which degrade performance, especially when multi-task inference is required under tight resource and real-time budgets. We present RED, a real-time scheduling framework for multi-task deep neural network workloads on resource-constrained robotic platforms that adapts to Robotic Environmental Dynamics (RED) while preserving end-to-end timing guarantees under modeling assumptions. The core of RED is a deadline-aware scheduler that assigns intermediate sub-deadlines, allowing it to accommodate evolving computation graphs and asynchronous inference induced by unpredictable conditions. The framework also supports flexible deployment of MIMONet (multi-input multi-output neural networks), commonly used in multi-tasking robots to alleviate memory pressure through weight sharing. RED explicitly leverages this shared-parameter property via a workload refinement and graph-reconstruction procedure that aligns MIMONet structure with schedulability requirements, improving comp
Accurate network-traffic forecasting enables proactive capacity planning and anomaly detection in Internet Service Provider (ISP) networks. Recent advances in time-series foundation models (TSFMs) have demonstrated strong zero-shot and few-shot generalization across diverse domains, yet their effectiveness for computer networking remains unexplored. This paper presents a systematic evaluation of a TSFM, IBM's Tiny Time Mixer (TTM), on the CESNET-TimeSeries24 dataset, a 40-week real-world ISP telemetry corpus. We assess TTM under zero-shot and few-shot settings across multiple forecasting horizons (hours to days), aggregation hierarchies (institutions, subnets, IPs), and temporal resolutions (10-minute and hourly). Results show that TTM achieves consistent accuracy (RMSE 0.026-0.057) and stable $R^2$ scores across horizons and context lengths, outperforming or matching fully trained deep learning baselines such as GRU and LSTM. Inference latency remains under 0.05s per 100 points on a single MacBook Pro using CPU-only computation, confirming deployability without dedicated GPU or MPS acceleration. These findings highlight the potential of pretrained TSFMs to enable scalable, efficie
Deploying local large language models and vision-language models on edge devices requires balancing accuracy with constrained computational and energy budgets. Although graphics processors dominate modern artificial-intelligence deployment, most consumer hardware--including laptops, desktops, industrial controllers, and embedded systems--relies on central processing units. Despite this, the computational laws governing central-processing-unit-only inference for local language and vision-language workloads remain largely unexplored. We systematically benchmark large language and vision-language models on two representative central-processing-unit tiers widely used for local inference: a MacBook Pro M2, reflecting mainstream laptop-class deployment, and a Raspberry Pi 5, representing constrained, low-power embedded settings. Using a unified methodology based on continuous sampling of processor and memory usage together with area-under-curve integration, we characterize how computational load scales with input text length for language models and with image resolution for vision-language models. We uncover two empirical scaling laws: (1) computational cost for language-model inference
Pruning large pre-trained transformers in a data-scarce scenario is challenging, as it often requires massive retraining data to recover performance. For instance, Distill-Whisper prunes Whisper by 40 and retrains on 21,000 hours of speech, far beyond what is available for most languages. Can Whisper be made lighter and faster for edge devices in data-scarce settings? Focusing on Bambara with only 32h of speech-to-text data, we propose a new pruning recipe. Instead of vocabulary pruning, which is unsuitable due to frequent code-switching by Bambara speakers, we compress the embeddings with low-rank decomposition and feature distillation. Rather than removing layers, we merge them to limit performance loss. The final model preserves 90 of the original performance while being 48 smaller and 2.15x faster on a MacBook Air M1.
The misuse of Large Language Models (LLMs) to infer emotions from text for malicious purposes, known as emotion inference attacks, poses a significant threat to user privacy. In this paper, we investigate the potential of Apple Intelligence's writing tools, integrated across iPhone, iPad, and MacBook, to mitigate these risks through text modifications such as rewriting and tone adjustment. By developing early novel datasets specifically for this purpose, we empirically assess how different text modifications influence LLM-based detection. This capability suggests strong potential for Apple Intelligence's writing tools as privacy-preserving mechanisms. Our findings lay the groundwork for future adaptive rewriting systems capable of dynamically neutralizing sensitive emotional content to enhance user privacy. To the best of our knowledge, this research provides the first empirical analysis of Apple Intelligence's text-modification tools within a privacy-preservation context with the broader goal of developing on-device, user-centric privacy-preserving mechanisms to protect against LLMs-based advanced inference attacks on deployed systems.
We introduce Nessie, a galaxy group finder implemented in Rust and distributed as both a Python and R package. Nessie employs the friends-of-friends (FoF) algorithm and requires only on-sky position and redshift as input, making it immediately applicable to surveys that lack a well-defined luminosity function. We implement several algorithmic optimizations including binary search and k-d tree pre-selection that significantly improve performance by reducing unnecessary galaxy pair checks. To validate the accuracy of Nessie, we tune its parameters using a suite of GALFORM mock lightcones and achieve a strong Figure of Merit. We further demonstrate its reliability by applying it to both the GAMA and SDSS surveys, where it produces group catalogues consistent with those in the literature. Additional functionality is included for comparison with simulations and mock catalogues. Benchmarking on a standard MacBook Pro (M3 chip with 11 cores) shows that version 1 of Nessie can process about 1 million galaxies in around 10 seconds, highlighting its speed and suitability for next-generation redshift surveys.
We derive an algorithm for computing a classic nonlinearity correction -- applicable to constant and uniform illumination -- in the presence of read noise and photon noise. The algorithm operates simultaneously on many nondestructive ramps at a range of count rates and directly computes the function transforming measured counts into linearized counts. We also compute chi squared for the corrected ramps, enabling the user to identify the polynomial degree beyond which chi squared ceases to improve significantly. The computational cost of our algorithm is linear in the number of reads and ramps, reaching ~100 hours to derive a correction for all 4096 x 4096 pixels of a Hawaii-4 RG detector from 186 illuminated 55-read ramps on a 2023 Macbook Pro laptop (~10,000 reads per pixel). We identify a potential source of bias in the nonlinearity correction when combining ramps of very different illuminations, together with effective mitigations. We apply our algorithm to a random set of pixels from the Roman Space Telescope's Wide Field Instrument. We find that a >=9th order nonlinearity correction is needed, at which point chi squared is close to its theoretically expected value and beyon
The recent widespread adoption of Large Language Models (LLMs) and machine learning in general has sparked research interest in exploring the possibilities of deploying these models on smaller devices such as laptops and mobile phones. This creates a need for frameworks and approaches that are capable of taking advantage of on-device hardware. The MLX framework was created to address this need. It is a framework optimized for machine learning (ML) computations on Apple silicon devices, facilitating easier research, experimentation, and prototyping. This paper presents a performance evaluation of MLX, focusing on inference latency of transformer models. We compare the performance of different transformer architecture implementations in MLX with their Pytorch counterparts. For this research we create a framework called MLX-transformers which includes different transformer implementations in MLX and downloads the model checkpoints in pytorch and converts it to the MLX format. By leveraging the advanced architecture and capabilities of Apple Silicon, MLX-Transformers enables seamless execution of transformer models directly sourced from Hugging Face, eliminating the need for checkpoint
The computational complexity of large language model (LLM) inference significantly constrains their deployment efficiency on edge devices. In contrast, small language models offer faster decoding and lower resource consumption but often suffer from degraded response quality and heightened susceptibility to hallucinations. To address this trade-off, collaborative decoding, in which a large model assists in generating critical tokens, has emerged as a promising solution. This paradigm leverages the strengths of both model types by enabling high-quality inference through selective intervention of the large model, while maintaining the speed and efficiency of the smaller model. In this work, we present a novel collaborative decoding inference system that allows small models to perform on-device inference while selectively consulting a cloud-based large model for critical token generation. Remarkably, the system achieves a 60% performance gain on CommonsenseQA using only a 0.5B model on an M1 MacBook, with under 7% of tokens generation uploaded to the large model in the cloud.
A growing issue within conservation bioacoustics is the task of analysing the vast amount of data generated from the use of passive acoustic monitoring devices. In this paper, we present an alternative AI model which has the potential to help alleviate this problem. Our model formulation addresses the key issues encountered when using current AI models for bioacoustic analysis, namely the: limited training data available; environmental impact, particularly in energy consumption and carbon footprint of training and implementing these models; and associated hardware requirements. The model developed in this work uses associative memory via a transparent, explainable Hopfield neural network to store signals and detect similar signals which can then be used to classify species. Training is rapid ($3$\,ms), as only one representative signal is required for each target sound within a dataset. The model is fast, taking only $5.4$\,s to pre-process and classify all $10384$ publicly available bat recordings, on a standard Apple MacBook Air. The model is also lightweight with a small memory footprint of $144.09$\,MB of RAM usage. Hence, the low computational demands make the model ideal for
Large Language Models (LLMs) excel at evaluating machine translation (MT), but their scale and cost hinder deployment on edge devices and in privacy-sensitive workflows. We ask: how small can you get while still detecting meaning-altering translation errors? Focusing on English->German Critical Error Detection (CED), we benchmark sub-2B models (LFM2-350M, Qwen-3-0.6B/1.7B, Llama-3.2-1B-Instruct, Gemma-3-1B) across WMT21, WMT22, and SynCED-EnDe-2025. Our framework standardizes prompts, applies lightweight logit-bias calibration and majority voting, and reports both semantic quality (MCC, F1-ERR/F1-NOT) and compute metrics (VRAM, latency, throughput). Results reveal a clear sweet spot around one billion parameters: Gemma-3-1B provides the best quality-efficiency trade-off, reaching MCC=0.77 with F1-ERR=0.98 on SynCED-EnDe-2025 after merged-weights fine-tuning, while maintaining 400 ms single-sample latency on a MacBook Pro M4 Pro (24 GB). At larger scale, Qwen-3-1.7B attains the highest absolute MCC (+0.11 over Gemma) but with higher compute cost. In contrast, ultra-small models (0.6B) remain usable with few-shot calibration yet under-detect entity and number errors. Overall, comp
The technological transition of MacBook charging solutions from MagSafe to USB-C, followed by a return to MagSafe 3, encapsulates the dynamic interplay between technological advancement, environmental considerations, and economic factors. This study delves into the broad implications of these charging technology shifts, particularly focusing on the environmental repercussions associated with electronic waste and the economic impacts felt by both manufacturers and consumers. By investigating the lifecycle of these technologies - from development and market introduction through to their eventual obsolescence - this paper underscores the importance of devising strategies that not only foster technological innovation but also prioritize environmental sustainability and economic feasibility. This comprehensive analysis illuminates the crucial factors influencing the evolution of charging technologies and their wider societal and environmental implications, advocating for a balanced approach that ensures technological progress does not compromise ecological health or economic stability.
We present a deep neural network (DNN) accelerated Hamiltonian Monte Carlo (HMC) algorithm called DeepHMC for the inference of binary neutron star systems. The HMC is a non-random walk sampler that uses background gradient information to accelerate the convergence of the sampler. While faster converging than a random-walk sampler, in theory by a factor of the dimensionality of the problem, a known computational bottleneck for HMC algorithms is the calculation of gradients of the log-likelihood. We demonstrate that Hamiltonian trajectories based on a DNN gradients are 30 times faster than those based on the relative binning gradients, and 7000 times faster than trajectories based on a naive likelihood gradient calculation. Using the publicly available 128 second LVK data set for the binary neutron star mergers GW170817 and GW190425, we show that not only does DeepHMC produce produces highly accurate and consistent results with the LVK public data, but acquires 5000 statistically independent samples (SIS) in the $12D$ parameter space in approximately two hours on a Macbook pro for GW170817, with a cost of $<1$ second/SIS, and 2.5 days for GW190425, with a cost of $\sim25$ seconds/