共找到 20 条结果
Let $G$ be a connected graph with vertex set $V(G)$. The distance, $d_G(u,v)$, between vertices $u$ and $v$ in $G$ is defined as the length of a shortest path between $u$ and $v$ in $G$. The distance matrix of $G$ is the matrix $D(G)=(d_G(u,v))_{u,v\in V(G)}$. The second largest distance eigenvalue of $G$ is the second largest one in the spectrum of $D(G)$. We show that any connected graph with the second largest distance eigenvalue less than $\frac{-3+\sqrt{5}}{2}$ is chordal, and characterize those bicyclic graphs and split graphs with the second largest distance eigenvalue less than $-\frac{1}{2}$.
The concept of continuous-aperture multiple-input multiple-output (CAP-MIMO) technology has been proposed recently, which aims at achieving high spectrum efficiency by deploying extremely dense antennas or even continuous antennas in a given aperture. The fundamental question of CAP-MIMO is whether it can achieve much better performance than the traditional discrete MIMO system. In this paper, to model the CAP-MIMO, we use self-adjoint operators to depict the structural characteristics of the continuous random electromagnetic fields from physical laws. Then, we propose a non-asymptotic performance comparison scheme between continuous and discrete MIMO systems based on the analysis of mutual information. We show the consistency of the proposed scheme by proving that the mutual information between discretized transceivers converges to that between continuous transceivers. Numerical analysis verifies the theoretical results, and suggests that the mutual information obtained from the discrete MIMO with widely adopted half-wavelength spaced antennas almost achieves the mutual information obtained from CAP-MIMO.
We introduce a novel computational unit for neural networks that features multiple biases, challenging the traditional perceptron structure. This unit emphasizes the importance of preserving uncorrupted information as it is passed from one unit to the next, applying activation functions later in the process with specialized biases for each unit. Through both empirical and theoretical analyses, we show that by focusing on increasing biases rather than weights, there is potential for significant enhancement in a neural network model's performance. This approach offers an alternative perspective on optimizing information flow within neural networks. See source code at https://github.com/CuriosAI/dac-dev.
What if consciousness isn’t limited to brains like ours。 Philosophers Eric Schwitzgebel and Jeremy Pober argue that consciousness could arise in many different forms of life, even in beings built from radically different materials than those found on Earth。 Drawing on the vastness of the universe and the likely existence of countless alien civiliza
Two newly confirmed "super-puff" planets are so diffuse that they are less dense than cotton candy, despite being about the size of Jupiter。 Their rare orbital relationship and enormous, lightweight atmospheres could provide valuable clues about how some of the strangest planets in the galaxy come to exist
We show that a Turing machine with two single-head one-dimensional tapes cannot recognize the set {x2x'| x \in {0,1}^* and x' is a prefix of x} in real time, although it can do so with three tapes, two two-dimensional tapes, or one two-head tape, or in linear time with just one tape. In particular, this settles the longstanding conjecture that a two-head Turing machine can recognize more languages in real time if its heads are on the same one-dimensional tape than if they are on separate one-dimensional tapes.
It is argued that special relativity remains a viable physical theory even when there is permitted signals traveling faster than light.
Large language models (LLMs) are increasingly deployed as autonomous agents that negotiate, coordinate, and act on behalf of users. Whether they cooperate in such settings is no longer just an academic question, but a central issue for AI governance. We approach it from a strategic-behaviour angle, asking how two everyday levers - the size of what is at stake, and the language in which the interaction is described - shape the strategies LLMs adopt in a repeated Prisoner's Dilemma. Rather than reading cooperation off raw action counts, we train supervised classifiers to recognise the canonical strategies of repeated games (always cooperate, always defect, Tit-for-Tat, Win-Stay-Lose-Shift) and use them as a lens onto LLM behaviour. To know what the strategy distribution should look like under the same payoffs, we derive an evolutionary game theory (EGT) baseline and compare it with the LLM data. The two outcomes disagree in a revealing way: as stakes grow, evolutionary theory predicts that defection should take over the population, yet LLMs move in the opposite direction, becoming more cooperative - a signature, we argue, of alignment training and the human-like reasoning patterns LL
Activation steering has emerged as a key methodology for controlling the behavior of large language models (LLMs). Existing difference-in-means based methods, however, are fundamentally limited: they capture only mean differences between class activations and fail to recover discriminative signals that naturally exist in the nonlinear feature subspace under the superposition hypothesis. Motivated by that, we propose High-Dimensional Random-projection for Activation Steering (HiDRA), a training-free approach that integrates seamlessly with existing activation steering methods. By performing activation addition in the projected high-dimensional space, HiDRA can provably capture a better discriminative structure beyond the reach of linear methods. Experiments across diverse LLM families and benchmarks demonstrate that HiDRA consistently outperforms baseline counterparts, achieving stronger behavioral control without significant computational overhead.
In this paper, we establish a set of theoretical impossibility results, termed the No-Free-Fairness theorems, that identify three fundamental sources of disparity in learning systems. First, we show that when a task exhibits irreducible cost on a subgroup, any decision rule must trade off overall performance with disparity, yielding an inherent fairness--cost frontier. Second, we prove that even in ideal, noise-free settings where a perfectly fair and accurate solution exists, finite-sample learning alone induces nontrivial subgroup disparity, ruling out distribution-free fairness guarantees. More seriously, enforcing strict relative fairness creates a statistical bottleneck: achieving low cost may require exponentially many samples. Third, we show that limitations of the model class can independently induce disparity: if the model cannot represent accurate solutions for a subgroup, fairness remains unattainable regardless of data or training procedure. Overall, these results demonstrate that unfairness is not solely a consequence of biased data or suboptimal optimization, but arises from the intrinsic structure of decision problems, the constraints of finite data, and the expressi
A fundamental requirement for intelligent systems is the ability to learn continuously under changing environments. However, models trained in this regime often suffer from catastrophic forgetting. Leveraging pre-trained models has recently emerged as a promising solution, since their generalized feature extractors enable faster and more robust adaptation. While some earlier works mitigate forgetting by fine-tuning only on the first task, this approach quickly deteriorates as the number of tasks grows and the data distributions diverge. More recent research instead seeks to consolidate task knowledge into a unified backbone, or adapting the backbone as new tasks arrive. However, such approaches may create a (potential) \textit{mismatch} between task-specific classifiers and the adapted backbone. To address this issue, we propose a novel \textit{Local Classifier Alignment} (LCA) loss to better align the classifier with backbone. Theoretically, we show that this LCA loss can enable the classifier to not only generalize well for all observed tasks, but also improve robustness. Furthermore, we develop a complete solution for continual learning, following the model merging approach and
The special relativistic generalization of isotropic regularized kappa distributions is derived and compared to that of the original Olbertian (or standard) kappa distributions. It is demonstrated that for the latter the kappa parameter is even stronger limited than in the non-relativistic case, while for the former all positive kappa values remain possible. After a derivation of the non-relativistic limits, the pressures of the distributions are studied as a specific case of the moments of both the relativistic standard and regularized kappa distributions.
Understanding and certifying the behavior of modern deep neural networks remains a fundamental challenge in reliable machine learning. We introduce a new class of data-dependent generalization bounds that apply directly to trained models, without any modification. In particular, we present an exactly computable bound that is non-vacuous across all evaluated networks, including ImageNet-scale models with 600M parameters. This this is the first work showing that meaningful generalization guarantees are achievable even for large, unaltered deep networks. Our approach reveals that generalization is governed by the interaction between the trained model and the geometry of the data distribution. We decompose the generalization error into two interpretable components: a distributional complexity term, capturing how the data mass is distributed across the input space, and local model-behavior terms, capturing the network's behavior within individual regions. This joint dependence identifies where and why generalization gaps arise. Empirically, some components of our bound are highly predictive of the true test error, and the bound tightens when the partition aligns with the intrinsic data
Accurately evaluating model performance is crucial for deploying machine learning systems in real-world applications. Traditional methods often require a sufficiently large labeled test set to ensure a reliable evaluation. However, in many contexts, a large labeled dataset is costly and labor-intensive. Therefore, we sometimes have to do evaluation by a few labeled samples, which is theoretically challenging. Recent advances in generative models offer a promising alternative by enabling the synthesis of high-quality data. In this work, we make a systematic investigation about the use of synthetic data to estimate the test error of a trained model under limited labeled data conditions. To this end, we develop novel generalization bounds that take synthetic data into account. Those bounds suggest novel ways to optimize synthetic samples for evaluation and theoretically reveal the significant role of the generator's quality. Inspired by those bounds, we propose a theoretically grounded method to generate optimized synthetic data for model evaluation. Experimental results on simulation and tabular datasets demonstrate that, compared to existing baselines, our method achieves accurate a
For a weighted graph $G = (V, E, w)$ and a designated source vertex $s \in V$, a spanning tree that simultaneously approximates a shortest-path tree w.r.t. source $s$ and a minimum spanning tree is called a shallow-light tree (SLT). Specifically, an $(α, β)$-SLT of $G$ w.r.t. $s \in V$ is a spanning tree of $G$ with root-stretch $α$ (preserving all distances between $s$ and the other vertices up to a factor of $α$) and lightness $β$ (its weight is at most $β$ times the weight of a minimum spanning tree of $G$). Despite the large body of work on SLTs, the basic question of whether a better approximation algorithm exists was left untouched to date, and this holds in any graph family. This paper makes a first nontrivial step towards this question by presenting two bicriteria approximation algorithms. For any $ε>0$, a set $P$ of $n$ points in constant-dimensional Euclidean space and a source $s\in P$, our first (respectively, second) algorithm returns, in $O(n \log n \cdot {\rm polylog}(1/ε))$ time, a non-Steiner (resp., Steiner) tree with root-stretch $1+O(ε\log ε^{-1})$ and weight at most $O(\mathrm{opt}_ε\cdot \log^2 ε^{-1})$ (resp., $O(\mathrm{opt}_ε\cdot \log ε^{-1})$), where $
Simulating out-of-equilibrium dynamics of quantum field theories in nature is challenging with classical methods, but is a promising application for quantum computers. Unfortunately, simulating interacting bosonic fields involves a high boson-to-qubit encoding overhead. Furthermore, when mapping to qubits, the infinite-dimensional Hilbert space of bosons is necessarily truncated, with truncation errors that grow with energy and time. A qubit-based quantum computer, augmented with an active bosonic register, and with qubit, bosonic, and mixed qubit-boson quantum gates, offers a more powerful platform for simulating bosonic theories. We demonstrate this capability experimentally in a hybrid analog-digital trapped-ion quantum computer, where qubits are encoded in the internal states of the ions, and the bosons in the ions' motional states. Specifically, we simulate nonequilibrium dynamics of a (1+1)-dimensional Yukawa model, a simplified model of interacting nucleons and pions, and measure fermion- and boson-occupation-state probabilities. These dynamics populate high bosonic-field excitations starting from an empty state, and the experimental results capture well such high-occupation
Semi-Supervised Semantic Segmentation reduces reliance on extensive annotations by using unlabeled data and state-of-the-art models to improve overall performance. Despite the success of deep co-training methods, their underlying mechanisms remain underexplored. This work revisits Cross Pseudo Supervision with dual heterogeneous backbones and introduces Knowledge Consultation (SegKC) to further enhance segmentation performance. The proposed SegKC achieves significant improvements on Pascal and Cityscapes benchmarks, with mIoU scores of 87.1%, 89.2%, and 89.8% on Pascal VOC with the 1/4, 1/2, and full split partition, respectively, while maintaining a compact model architecture.
Normalization layers are critical components of modern AI systems, such as ChatGPT, Gemini, DeepSeek, etc. Empirically, they are known to stabilize training dynamics and improve generalization ability. However, the underlying theoretical mechanism by which normalization layers contribute to both optimization and generalization remains largely unexplained, especially when using many normalization layers in a deep neural network (DNN). In this work, we develop a theoretical framework that elucidates the role of normalization through the lens of capacity control. We prove that an unnormalized DNN can exhibit exponentially large Lipschitz constants with respect to either its parameters or inputs, implying excessive functional capacity and potential overfitting. Such bad DNNs are uncountably many. In contrast, the insertion of normalization layers provably can reduce the Lipschitz constant at an exponential rate in the number of normalization layers. This exponential reduction yields two fundamental consequences: (1) it smooths the loss landscape at an exponential rate, facilitating faster and more stable optimization; and (2) it constrains the effective capacity of the network, thereby
Traditional threat modeling remains reactive-focused on known TTPs and past incident data, while threat prediction and forecasting frameworks are often disconnected from operational or architectural artifacts. This creates a fundamental weakness: the most serious cyber threats often do not arise from what is known, but from what is assumed, overlooked, or not yet conceived, and frequently originate from the future, such as artificial intelligence, information warfare, and supply chain attacks, where adversaries continuously develop new exploits that can bypass defenses built on current knowledge. To address this mental gap, this paper introduces the theory and methodology of Future-Back Threat Modeling (FBTM). This predictive approach begins with envisioned future threat states and works backward to identify assumptions, gaps, blind spots, and vulnerabilities in the current defense architecture, providing a clearer and more accurate view of impending threats so that we can anticipate their emergence and shape the future we want through actions taken now. The proposed methodology further aims to reveal known unknowns and unknown unknowns, including tactics, techniques, and procedure
Federated learning (FL) is a machine learning methodology that involves the collaborative training of a global model across multiple decentralized clients in a privacy-preserving way. Several FL methods are introduced to tackle communication inefficiencies but do not address how to sample participating clients in each round effectively and in a privacy-preserving manner. In this paper, we propose \textit{FedSTaS}, a client and data-level sampling method inspired by \textit{FedSTS} and \textit{FedSampling}. In each federated learning round, \textit{FedSTaS} stratifies clients based on their compressed gradients, re-allocate the number of clients to sample using an optimal Neyman allocation, and sample local data from each participating clients using a data uniform sampling strategy. Experiments on three datasets show that \textit{FedSTaS} can achieve higher accuracy scores than those of \textit{FedSTS} within a fixed number of training rounds.