Modeling single-cell gene expression across diverse biological and technical conditions is crucial for characterizing cellular states and simulating unseen scenarios. Existing methods often treat genes as independent tokens, overlooking their high-level biological relationships and leading to poor performance. We introduce SAVE, a unified generative framework based on conditional Transformers for multi-condition single-cell modeling. SAVE leverages a coarse-grained representation by grouping semantically related genes into blocks, capturing higher-order dependencies among gene modules. A Flow Matching mechanism and condition-masking strategy further enhance flexible simulation and enable generalization to unseen condition combinations. We evaluate SAVE on a range of benchmarks, including conditional generation, batch effect correction, and perturbation prediction. SAVE consistently outperforms state-of-the-art methods in generation fidelity and extrapolative generalization, especially in low-resource or combinatorially held-out settings. Overall, SAVE offers a scalable and generalizable solution for modeling complex single-cell data, with broad utility in virtual cell synthesis and
Multimodal deepfakes can exhibit subtle visual artifacts and cross-modal inconsistencies, which remain challenging to detect, especially when detectors are trained primarily on curated synthetic forgeries. Such synthetic dependence can introduce dataset and generator bias, limiting scalability and robustness to unseen manipulations. We propose SAVe, a self-supervised audio-visual deepfake detection framework that learns entirely on authentic videos. SAVe generates on-the-fly, identity-preserving, region-aware self-blended pseudo-manipulations to emulate tampering artifacts, enabling the model to learn complementary visual cues across multiple facial granularities. To capture cross-modal evidence, SAVe also models lip-speech synchronization via an audio-visual alignment component that detects temporal misalignment patterns characteristic of audio-visual forgeries. Experiments on FakeAVCeleb and AV-LipSync-TIMIT demonstrate competitive in-domain performance and strong cross-dataset generalization, highlighting self-supervised learning as a scalable paradigm for multimodal deepfake detection.
For video-text retrieval, the use of CLIP has been a de facto choice. Since CLIP provides only image and text encoders, this consensus has led to a biased paradigm that entirely ignores the sound track of videos. While several attempts have been made to reintroduce audio -- typically by incorporating an audio encoder and fusing its output with visual features -- these methods face two challenges: ineffective representation of speech content and suboptimal vision-audio fusion. To address these issues jointly, we propose SAVE, a Speech Aware Video rEpresentation learning method. SAVE improves upon AVIGATE, a SOTA audiovisual method, with a dedicated speech branch for more effective speech embedding. Furthermore, we introduce soft-ALBEF for early vision-audio alignment that facilitates fusion. Extensive experiments on five benchmarks show that SAVE compares favorably against the SOTA, outperforming AVIGATE by +4.1% on MSRVTT-9k, +1.9% on MSRVTT-7k, +2.5% on VATEX, +9.8% on Charades, and +2.1% on LSMDC, in light of the SumR metric.
Although Multimodal Large Language Models (MLLMs) have advanced substantially, they remain vulnerable to object hallucination caused by language priors and visual information loss. To address this, we propose SAVE (Sparse Autoencoder-Driven Visual Information Enhancement), a framework that mitigates hallucination by steering the model along Sparse Autoencoder (SAE) latent features. A binary object-presence question-answering probe identifies the SAE features most indicative of the model's visual information processing, referred to as visual understanding features. Steering the model along these identified features reinforces grounded visual understanding and effectively reduces hallucination. With its simple design, SAVE outperforms state-of-the-art training-free methods on standard benchmarks, achieving a 10\%p improvement in CHAIR\_S and consistent gains on POPE and MMHal-Bench. Extensive evaluations across multiple models and layers confirm the robustness and generalizability of our approach. Further analysis reveals that steering along visual understanding features suppresses the generation of uncertain object tokens and increases attention to image tokens, mitigating hallucina
Wi-Fi facilitates the Internet connectivity of billions of devices worldwide, making it an indispensable technology for modern life. Wi-Fi networks are becoming significantly denser, making energy consumption and its effects on operational costs and environmental sustainability crucial considerations. Wi-Fi has already introduced several mechanisms to enhance the energy efficiency of non-Access Point (non-AP) stations (STAs). However, the reduction of energy consumption of APs has never been a priority. Always-on APs operating at their highest capabilities consume significant power, which affects the energy costs of the infrastructure owner, aggravates the environmental impact, and decreases the lifetime of battery-powered APs. IEEE 802.11bn, which will be the basis of Wi-Fi 8, makes a big leap forward by introducing the AP Power Save (PS) framework. In this article, we describe and analyze the main proposals discussed in the IEEE 802.11bn Task Group (TGbn), such as Scheduled Power Save, (Semi-)Dynamic Power Save, and Cross-Link Power Save. We also consider other proposals that are being discussed in TGbn, namely the integration of Wake-up Radios (WuRs) and STA offloading. We then
Global warming is often framed in broad planetary numbers such as the 1.5C and 2C warming thresholds, creating the false impression that individual corporations efforts to reduce emissions are meaningless in the absence of collective action. This perspective causes companies to reduce ambition towards voluntarily cutting emissions, as they believe their pollution has negligible impacts on its own. Reframing the issue to focus on the life-saving potential of independent corporate actions empowers companies to act and holds them accountable for inaction. Here, we show the results from an innovative climate-health modeling technique which calculates the avoided deaths from sustainability efforts for 3,084 companies spanning a range of sizes and sectors. From the reported emissions and planned emissions reductions, we create scenarios for 2020-2049 with and without companies pledged emissions cuts and calculate the resulting warming from 2020-2100 using a climate emulator. We then use temperatures from these scenarios to calculate the deaths resulting from warming by using mortality damage functions. We find that more than 92% of these companies stand to save at least one life by follo
Imagine activating new robots meant to aid staff in an elder care facility, only to discover the robots are counterproductive. They undermine the most meaningful moments of the jobs and increase staff workloads, because robots demand care too. Eventually, they're returned. This vignette captures key elements of James Adrian Wright's ethnography, "Robots Won't Save Japan", an essential resource for understanding the state of elder care robotics. Wright's rich ethnographic interviews and observations challenge the prevailing funding, research, and development paradigms for robotics. Elder care residents tend to be Disabled, so this review article augments Wrights' insights with overlooked perspectives from Disability and Robotics research. This article highlights how care recipients' portrayal suggests that Paro, a plush robot seal, might perform better than the care team and author indicated -- leading to insights that support urgent paradigm shifts in elder care, ethnographic studies, and robotics. It presents some of the stronger technical status quo counter-arguments to the book's core narratives, then confronts their own assumptions. Furthermore, it explores exceptional cases wh
The essential requirement for fault-tolerant quantum computation (FTQC) is the total protocol design to achieve a fair balance of all the critical factors relevant to its practical realization, such as the space overhead, the threshold, and the modularity. A major obstacle in realizing FTQC with conventional protocols, such as those based on the surface code and the concatenated Steane code, has been the space overhead, i.e., the required number of physical qubits per logical qubit. Protocols based on high-rate quantum low-density parity-check (LDPC) codes gather considerable attention as a way to reduce the space overhead, but problematically, the existing fault-tolerant protocols for such quantum LDPC codes sacrifice the other factors. Here we construct a new fault-tolerant protocol to meet these requirements simultaneously based on more recent progress on the techniques for concatenated codes rather than quantum LDPC codes, achieving a constant space overhead, a high threshold, and flexibility in modular architecture designs. In particular, under a physical error rate of $0.1\%$, our protocol reduces the space overhead to achieve the logical CNOT error rates $10^{-10}$ and $10^{
For vertical Bell Laboratories layered space-time architecture (V-BLAST), the original fast recursive algorithm was proposed, and then several improvements were proposed successively to further reduce the computational complexity. The improvements include the inverse of a partitioned matrix and the interference cancellation scheme adopted by the know recursive algorithm with the least computations, while the former is applied to improve the latter into an interference cancellation scheme with memory saving in this paper. The corresponding recursive algorithm proposed by us saves memories without sacrificing speed compared to the know recursive algorithm with the least computations, while it achieves the speedup of 1.86 and saves about half memories compared to the know recursive algorithm with the least memories.
Text-to-Image (T2I) diffusion models have achieved remarkable success in synthesizing high-quality images conditioned on text prompts. Recent methods have tried to replicate the success by either training text-to-video (T2V) models on a very large number of text-video pairs or adapting T2I models on text-video pairs independently. Although the latter is computationally less expensive, it still takes a significant amount of time for per-video adaption. To address this issue, we propose SAVE, a novel spectral-shift-aware adaptation framework, in which we fine-tune the spectral shift of the parameter space instead of the parameters themselves. Specifically, we take the spectral decomposition of the pre-trained T2I weights and only update the singular values while freezing the corresponding singular vectors. In addition, we introduce a spectral shift regularizer aimed at placing tighter constraints on larger singular values compared to smaller ones. This form of regularization enables the model to grasp finer details within the video that align with the provided textual descriptions. We also offer theoretical justification for our proposed regularization technique. Since we are only de
On-device training is an emerging approach in machine learning where models are trained on edge devices, aiming to enhance privacy protection and real-time performance. However, edge devices typically possess restricted computational power and resources, making it challenging to perform computationally intensive model training tasks. Consequently, reducing resource consumption during training has become a pressing concern in this field. To this end, we propose SCoTTi (Save Computation at Training Time), an adaptive framework that addresses the aforementioned challenge. It leverages an optimizable threshold parameter to effectively reduce the number of neuron updates during training which corresponds to a decrease in memory and computation footprint. Our proposed approach demonstrates superior performance compared to the state-of-the-art methods regarding computational resource savings on various commonly employed benchmarks and popular architectures, including ResNets, MobileNet, and Swin-T.
The increasing demand for edge computing is leading to a rise in energy consumption from edge devices, which can have significant environmental and financial implications. To address this, in this paper we present a novel method to enhance the energy efficiency while speeding up computations by distributing the workload among multiple containers in an edge device. Experiments are conducted on two Nvidia Jetson edge boards, the TX2 and the AGX Orin, exploring how using a different number of containers can affect the energy consumption and the computational time for an inference task. To demonstrate the effectiveness of our splitting approach, a video object detection task is conducted using an embedded version of the state-of-the-art YOLO algorithm, quantifying the energy and the time savings achieved compared to doing the computations on a single container. The proposed method can help mitigate the environmental and economic consequences of high energy consumption in edge computing, by providing a more sustainable approach to managing the workload of edge devices.
Driven by the upsurge progress in text-to-image (T2I) generation models, text-to-video (T2V) generation has experienced a significant advance as well. Accordingly, tasks such as modifying the object or changing the style in a video have been possible. However, previous works usually work well on trivial and consistent shapes, and easily collapse on a difficult target that has a largely different body shape from the original one. In this paper, we spot the bias problem in the existing video editing method that restricts the range of choices for the new protagonist and attempt to address this issue using the conventional image-level personalization method. We adopt motion personalization that isolates the motion from a single source video and then modifies the protagonist accordingly. To deal with the natural discrepancy between image and video, we propose a motion word with an inflated textual embedding to properly represent the motion in a source video. We also regulate the motion word to attend to proper motion-related areas by introducing a novel pseudo optical flow, efficiently computed from the pre-calculated attention maps. Finally, we decouple the motion from the appearance o
According to the widely accepted opinion, classical (statistical) physics does not support objective indeterminism, since the statistical laws of classical physics allow a deterministic hidden background, while --- as Arthur Fine writes polemizing with Grünbaum --- "{\sl the antilibertarian position finds little room to breathe in a statistical world if we take laws of the quantum theory as exemplars of the statistical laws in such a world. So, it appears that, contrary to what Grünbaum claims, the libertarians' 'could have done otherwise' does indeed find support from indeterminism if we take the indeterministic laws to be of the sort found in the quantum theory.}" In this paper I will show that, quite the contrary, quantum mechanics does not save free will. For instance, the EPR experiments are compatible with a deterministic world. They admit a deterministic local hidden parameter description if the deterministic model is 'allowed' to describe not only the measurement outcomes, but also the outcomes of the 'decisions' whether this or that measurement will be performed. So, the derivation of the freedom of the will from quantum mechanics is a tautology: from the assumption that t
Correlated stability conjecture (CSC) proposed by Gubser and Mitra [1,2] linked the thermodynamic and classical (in)stabilities of black branes. In [3] it was shown that the thermodynamic instabilities, specifically the negative specific heat, indeed result in the instabilities in the hydrodynamic spectrum of holographically dual plasma excitations. Counter-examples of CSC were presented in the context of black branes with scalar hair undergoing a second-order phase transition [4,5]. The latter translationary invariant horizons have scalar hair, raising the question whether the asymptotic parameters of the scalar hair can be appropriately interpreted as additional charges leading to a generalization of the thermodynamic stability criterion. In this paper we show that the generalization of the thermodynamic stability criterion of this type can not save CSC. We further present a simple statistical model which makes it clear that thermodynamic and dynamical (in)stabilities generically are not correlated.
How much energy, money, and emissions can advanced control of heating and cooling equipment save in real buildings? To address this question, researchers sometimes control a small number of thermal zones within a larger multi-zone building, then report savings for the controlled zones only. That approach can overestimate savings by neglecting heat transfer between controlled zones and adjacent zones. This paper mathematically characterizes the overestimation error when the dynamics are linear and the objectives are linear in the thermal load, as usually holds when optimizing energy efficiency, energy costs, or emissions. Overestimation errors can be large even in seemingly innocuous situations. For example, when controlling only interior zones that have no direct thermal contact with the outdoors, all perceived savings are fictitious. This paper provides an alternative estimation method based on the controlled and adjacent zones' temperature measurements. The new method does not require estimating how much energy the building would have used under baseline operations, so it removes the additional measurement and verification challenge of accurate baseline estimation.
Greenhouse gas emissions from the steel, fertiliser and plastic industries can be mitigated by producing their precursors with green hydrogen. In Germany, green production may be economically unviable due to high energy costs. This study quantifies the 'renewables pull' of cheaper production abroad and high-lights trade-offs between cost savings and import dependence. Using a detailed European energy system model coupled to global supply curves for hydrogen and industry precursors (hot briquetted iron, ammonia and methanol), we assess five scenarios with increasing degrees of freedom with respect to imports. We find that precursor import is preferred over hydrogen import because there are significant savings in hydrogen infrastructure. Cost savings in the German industry sector from shifting precursor production to European partners compared to domestic production are at 4.1 bnEUR/a or 11.2 %. This strategy captures 47.7 % of the cost savings achievable by precursor import from non-European countries, which lowers industry costs by 8.6 bnEUR/a (23.3 %). Moving energy-intensive precursor production abroad allows Germany to save costs while still retaining a substantial share of subs
The 6G mobile network is the next evolutionary step after 5G, with a prediction of an explosive surge in mobile traffic. It provides ultra-low latency, higher data rates, high device density, and ubiquitous coverage, positively impacting services in various areas. Energy saving is a major concern for new systems in the telecommunications sector because all players are expected to reduce their carbon footprints to contribute to mitigating climate change. Network slicing is a fundamental enabler for 6G/5G mobile networks and various other new systems, such as the Internet of Things (IoT), Internet of Vehicles (IoV), and Industrial IoT (IIoT). However, energy-saving methods embedded in network slicing architectures are still a research gap. This paper discusses how to embed energy-saving methods in network-slicing architectures that are a fundamental enabler for nearly all new innovative systems being deployed worldwide. This paper's main contribution is a proposal to save energy in network slicing. That is achieved by deploying ML-native agents in NS architectures to dynamically orchestrate and optimize resources based on user demands. The SFI2 network slicing reference architecture
The consumption function maps current wealth and the exogenous state to current consumption. We prove the existence and uniqueness of a consumption function when the agent has a preference for wealth. When the period utility functions are restricted to power functions, we prove that the consumption function is asymptotically linear as wealth tends to infinity and provide a complete characterization of the asymptotic slopes. When the risk aversion with respect to wealth is less than that for consumption, the asymptotic slope is zero regardless of other model parameters, implying wealthy households save a large fraction of their income, consistent with empirical evidence.
Empirical evidence shows that wealthy households have substantially higher saving rates and markedly lower marginal propensity to consume (MPC) than other groups. Existing theory cannot account for this pattern unless under restrictive assumptions on returns, discounting, and preferences. This paper develops a general theory of optimal savings with preference shocks, allowing risk aversion to vary across states and over time, and shows that incorporating such heterogeneity in risk attitudes fundamentally reshapes the asymptotic dynamics of consumption and saving. In particular, zero asymptotic MPCs (100% asymptotic saving rates) arise under markedly weaker conditions than in existing theory. Strikingly, such outcomes occur whenever there is a positive probability that agents become less risk averse in the future. Therefore, the vanishing MPC emerges as a generic feature rather than a knife-edge result of the optimal savings model, offering a more theoretically robust and empirically consistent account of the saving behavior of wealthy households.