共找到 20 条结果
The study on the expressive power of transformers shows that transformers are permutation equivariant, and they can approximate all permutation-equivariant continuous functions on a compact domain. However, these results are derived under real parameters and exact operations, while real implementations on computers can only use a finite set of numbers and inexact machine operations with round-off errors. In this work, we investigate the representability of floating-point transformers that use floating-point parameters and floating-point operations. Unlike existing results under exact operations, we first show that floating-point transformers can represent a class of non-permutation-equivariant functions even without positional encoding. Furthermore, we prove that floating-point transformers can represent all permutation-equivariant functions when the sequence length is bounded, but they cannot when the sequence length is large. We also found the minimal equivariance structure in floating-point transformers, and show that all non-trivial additive positional encoding can harm the representability of floating-point transformers.
The Milky way Galaxy is brimming with free-floating objects, including stars, planets and planetesimals. For the purpose of this chapter, we define a free-floating object as a solid body that is not orbited by a considerably more massive body. A planet then is considered free floating if it is not orbiting a star but it may be orbiting another planet. A binary planet, or planet-moon pair that is not orbiting a star, is then considered free floating. Most free-floating objects are not born as such because most objects form in some sort of coordinated environmental effort, such as a star forming region or a circum-stellar disk. Free-floating stars then originate from dissolved clusters. Free floating planets are ejected from their parent star in an internal dynamical encounter with another planet or stripped from the star by other means such as a supernova or a nearby passing star. Free floating (interstellar) planetesimals probably form in a similar fashion as free-floating planets. The number of free-floating objects in the Galaxy can be large. With billions of stars and planets, and trillions of interstellar planetesimals. Although free-floating planets appear to be quite common (
Floating-point programs form the foundation of modern science and engineering, providing the essential computational framework for a wide range of applications, such as safety-critical systems, aerospace engineering, and financial analysis. Floating-point errors can lead to severe consequences. Although floating-point errors widely exist, only a subset of inputs may trigger significant errors in floating-point programs. Therefore, it is crucial to determine whether a given input could produce such errors. Researchers tend to take the results of high-precision floating-point programs as oracles for detecting floating-point errors, which introduces two main limitations: (1) difficulty of implementation and (2) prolonged execution time. The two recent tools, ATOMU and FPCC, can partially address these issues. However, ATOMU suffers from false positives; while FPCC, though eliminating false positives, operates at a considerably slower speed. To address these two challenges, we propose a novel approach named PI-detector to computing floating-point errors effectively and efficiently. Our approach is based on the observation that floating-point errors stem from large condition numbers in
In numeric-intensive computations, it is well known that the execution of floating-point programs is imprecise as floating-point arithmetic incurs round-off errors. Although round-off errors are small for a single floating-point operation, the aggregation of such errors may be dramatic and cause catastrophic program failures. Therefore, to ensure the correctness of floating-point programs, round-off error needs to be carefully taken into account. In this work, we consider polynomial invariant generation for floating-point programs, aiming at generating tight invariants under the perturbation of round-off errors. Our contribution is a novel framework for applying polynomial constraint solving to address the invariant generation problem, which is also the first polynomial constraint solving based approach that handles floating-point errors to our best knowledge. In our framework, we propose a novel combination of round-off error analysis and polynomial constraint solving, aiming to circumvent the cost of handling a large number of error variables in the floating-point model. Experimental results over a variety of challenging benchmarks show that our framework outperforms SOTA approac
We examine the interaction between floating cylindrical objects and surface waves in the gravity regime. Since the impact of resonance phenomena associated with floating bodies, particularly at laboratory scales, remains underexplored, we focus on the influence of the floats' resonance frequency on wave emission. First, we study the response of floating rigid cylinders to external mechanical perturbations. Using an optical reconstruction technique to measure surface wave fields in both space and time, we study the natural resonance frequency of floats with different sizes. The results indicate that the resonance frequency is influenced by the interplay between the cylinder geometry and the solid-to-fluid density ratio. Second, these floating objects are placed in an incoming wave field. These experiments demonstrate that floats diffract incoming waves, while radiating secondary waves that interfere with the incident wavefield. Minimal wave generation is observed at resonance frequencies. These findings can provide insights for elucidating the behavior of larger structures, such as sea ice floes, in natural wave fields.
Here we present an analytic approximation for the entropy of floating-point numbers, along with bounds on the error of this approximation. It is well-known that the differential entropy is tightly linked to the discrete entropy of a uniformly quantized random variable. Our approximation uncovers a different quantity that provides this link for floating-point quantization. Additionally, we prove that the entropy of a floating-point quantized random variable is approximately unchanged under scaling. Closed-form expressions for the floating-point entropy of common distributions are provided and compared to exact results.
The b-posit, or bounded posit, is a variation of the posit format designed for high performance computing (HPC) and AI applications. Unlike traditional floating-point formats (floats), posits use variable-length fields for exponent scaling and significand, providing better efficiency for the same bit width. However, this flexibility introduces high worst-case overhead in decode-encode logic, exceeding the cost of handling subnormals for floats. To address this, the b-posit restricts the regime field to a 6-bit limit, reducing variability in regime and fraction sizes. With an exponent size eS of 5 bits, the dynamic range is $2^{-192}$ to $2^{192}$ (about $10^{-58}$ to $10^{58}$) and the quire size is 800 bits, for any precision $n>12$. This constraint improves numerical properties and simplifies hardware implementation by allowing decode-encode operations with basic multiplexers. Our 32-bit b-posit decoder circuits achieve significant improvements: 79 percent less power consumption, 71 percent smaller area, and 60 percent reduced latency compared to standard posit decoders. The 32-bit b-posit encoder shows 68 percent lower power usage, 46 percent less area, and 44 percent shorter
Upscaling is central to offshore wind's cost-reduction strategy, with increasingly large rotors and nacelles requiring taller and stronger towers. In Floating Offshore Wind Turbines (FOWTs), this trend amplifies fatigue loads due to coupled wind-wave dynamics and platform motion. Conventional fatigue evaluation requires millions of high-fidelity simulations, creating prohibitive computational costs and slowing design innovation. This paper presents FLOAT (Fatigue-aware Lightweight Optimization and Analysis for Towers), a framework that accelerates fatigue-aware tower design. It integrates three key contributions: a lightweight fatigue estimation method that enables efficient optimization, a Monte Carlo-based probabilistic wind-wave sampling approach that reduces required simulations, and enhanced high-fidelity modeling through pitch/heave-platform calibration and High-Performance Computing execution. The framework is applied to the IEA 22 MW FOWT tower, delivering, to the authors' knowledge, the first fatigue-oriented redesign of this benchmark model: FLOAT 22 MW FOWT tower. Validation against 6,468 simulations shows that the optimized tower extends the estimated fatigue life from
Dynamic and polymorphic languages attach information, such as types, to run time objects, and therefore adapt the memory layout of values to include space for this information. This makes it difficult to efficiently implement IEEE754 floating-point numbers as this format does not leave an easily accessible space to store type information. The three main floating-point number encodings in use today, tagged pointers, NaN-boxing, and NuN-boxing, have drawbacks. Tagged pointers entail a heap allocation of all float objects, and NaN/NuN-boxing puts additional run time costs on type checks and the handling of other objects. This paper introduces self-tagging, a new approach to object tagging that uses an invertible bitwise transformation to map floating-point numbers to tagged values that contain the correct type information at the correct position in their bit pattern, superimposing both their value and type information in a single machine word. Such a transformation can only map a subset of all floats to correctly typed tagged values, hence self-tagging takes advantage of the non-uniform distribution of floating point numbers used in practice to avoid heap allocation of the most freque
Length generalization is the ability of language models to maintain performance on inputs longer than those seen during pretraining. In this work, we introduce a simple yet powerful position encoding (PE) strategy, Random Float Sampling (RFS), that generalizes well to lengths unseen during pretraining or fine-tuning. In particular, instead of selecting position indices from a predefined discrete set, RFS uses randomly sampled continuous values, thereby avoiding out-of-distribution (OOD) issues on unseen lengths by exposing the model to diverse indices during training. Since assigning indices to tokens is a common and fundamental procedure in widely used PEs, the advantage of RFS can easily be incorporated into, for instance, the absolute sinusoidal encoding, RoPE, and ALiBi. Experiments corroborate its effectiveness by showing that RFS results in superior performance in length generalization tasks as well as zero-shot commonsense reasoning benchmarks.
Efficient number representation is essential for federated learning, natural language processing, and network measurement solutions. Due to timing, area, and power constraints, such applications use narrow bit-width (e.g., 8-bit) number systems. The widely used floating-point systems exhibit a trade-off between the counting range and accuracy. This paper introduces Floating-Floating-Point (F2P) - a floating point number that varies the partition between mantissa and exponent. Such flexibility leads to a large counting range combined with improved accuracy over a selected sub-range. Our evaluation demonstrates that moving to F2P from the state-of-the-art improves network measurement accuracy and federated learning.
We introduce AbsInf, a lightweight abstract object designed as a high-performance alternative to Python's native float('inf') within pathfinding algorithms. Implemented as a C-based Python extension, AbsInf bypasses IEEE-754 float coercion and dynamic type dispatch, offering constant-time dominance comparisons and arithmetic neutrality. When integrated into Dijkstra's algorithm without altering its logic, AbsInf reduces runtime by up to 17.2%, averaging 9.74% across diverse synthetic and real-world graph datasets. This optimization highlights the performance trade-offs in high-frequency algorithmic constructs, where a symbolic use of infinity permits efficient abstraction. Our findings contribute to the broader discourse on lightweight architectural enhancements for interpreted languages, particularly in performance-critical control flows.
The study of the expressive power of neural networks has investigated the fundamental limits of neural networks. Most existing results assume real-valued inputs and parameters as well as exact operations during the evaluation of neural networks. However, neural networks are typically executed on computers that can only represent a tiny subset of the reals and apply inexact operations, i.e., most existing results do not apply to neural networks used in practice. In this work, we analyze the expressive power of neural networks under a more realistic setup: when we use floating-point numbers and operations as in practice. Our first set of results assumes floating-point operations where the significand of a float is represented by finite bits but its exponent can take any integer value. Under this setup, we show that neural networks using a binary threshold unit or ReLU can memorize any finite input/output pairs and can approximate any continuous function within an arbitrary error. In particular, the number of parameters in our constructions for universal approximation and memorization coincides with that in classical results assuming exact mathematical operations. We also show similar
Building upon the Integer Lattice Gas Automata framework of Blommel \textit{et al.} \cite{PhysRevE.97.023310}, we introduce a simplified, fluctuation-free variant. This approach relies on floating-point numbers and closely mirrors the Lattice Boltzmann Method (LBM), with the key distinction being a novel collision operator. This operator, derived from the ensemble average of transition probabilities, generates nonlinear terms. We propose this new Float Lattice Gas Automata (FLGA) collision as a computationally efficient alternative to traditional and quantum LBM implementations.
Motif scaffolding seeks to design scaffold structures for constructing proteins with functions derived from the desired motif, which is crucial for the design of vaccines and enzymes. Previous works approach the problem by inpainting or conditional generation. Both of them can only scaffold motifs with fixed positions, and the conditional generation cannot guarantee the presence of motifs. However, prior knowledge of the relative motif positions in a protein is not readily available, and constructing a protein with multiple functions in one protein is more general and significant because of the synergies between functions. We propose a Floating Anchor Diffusion (FADiff) model. FADiff allows motifs to float rigidly and independently in the process of diffusion, which guarantees the presence of motifs and automates the motif position design. Our experiments demonstrate the efficacy of FADiff with high success rates and designable novel scaffolds. To the best of our knowledge, FADiff is the first work to tackle the challenge of scaffolding multiple motifs without relying on the expertise of relative motif positions in the protein. Code is available at https://github.com/aim-uofa/FADif
Conventional hardware-friendly quantization methods, such as fixed-point or integer, tend to perform poorly at very low word sizes as their shrinking dynamic ranges cannot adequately capture the wide data distributions commonly seen in sequence transduction models. We present AdaptivFloat, a floating-point inspired number representation format for deep learning that dynamically maximizes and optimally clips its available dynamic range, at a layer granularity, in order to create faithful encoding of neural network parameters. AdaptivFloat consistently produces higher inference accuracies compared to block floating-point, uniform, IEEE-like float or posit encodings at very low precision ($\leq$ 8-bit) across a diverse set of state-of-the-art neural network topologies. And notably, AdaptivFloat is seen surpassing baseline FP32 performance by up to +0.3 in BLEU score and -0.75 in word error rate at weight bit widths that are $\leq$ 8-bit. Experimental results on a deep neural network (DNN) hardware accelerator, exploiting AdaptivFloat logic in its computational datapath, demonstrate per-operation energy and area that is 0.9$\times$ and 1.14$\times$, respectively, that of equivalent bit
Many small countries are in need of additional territory. They build landfills and expensive artificial islands. The ocean covers 71 per cent of the Earth surface. Those countries (or persons of wealth) starting the early colonization of the ocean may obtain advantages through additional territory or creating their own independent state. An old idea is building a big ship. The best solution to this problem, however, is the provision of floating cities, islands, and states. The author idea is to use for floating cities, islands, and states a cheap floating platform created from a natural ice field taken from the Arctic or Antarctic oceans. These cheap platforms protected by air-film (bottom and sides) and a conventional insulating cover (top) and having a cooling system can exist for an unlimited time. They can be increased in number or size at any time, float in warm oceans, travel to different continents and countries, serve as artificial airports, harbors and other marine improvements, as well as floating cities and industrial bases for virtually any use. Author researches and computes parameters of these ice floating platforms, other methods of building such floating territory,
We explore the stability of floating objects through mathematical modeling and experimentation. Our models are based on standard ideas of center of gravity, center of buoyancy, and Archimedes' Principle. We investigate a variety of floating shapes with two-dimensional cross sections and identify analytically and/or computationally a potential energy landscape that helps identify stable and unstable floating orientations. We compare our analyses and computations to experiments on floating objects designed and created through 3D printing. In addition to our results, we provide code for testing the floating configurations for new shapes, as well as giving details of the methods for 3D printing the objects. The paper includes conjectures and open problems for further study.
Reducing hardware overhead of neural networks for faster or lower power inference and training is an active area of research. Uniform quantization using integer multiply-add has been thoroughly investigated, which requires learning many quantization parameters, fine-tuning training or other prerequisites. Little effort is made to improve floating point relative to this baseline; it remains energy inefficient, and word size reduction yields drastic loss in needed dynamic range. We improve floating point to be more energy efficient than equivalent bit width integer hardware on a 28 nm ASIC process while retaining accuracy in 8 bits with a novel hybrid log multiply/linear add, Kulisch accumulation and tapered encodings from Gustafson's posit format. With no network retraining, and drop-in replacement of all math and float32 parameters via round-to-nearest-even only, this open-sourced 8-bit log float is within 0.9% top-1 and 0.2% top-5 accuracy of the original float32 ResNet-50 CNN model on ImageNet. Unlike int8 quantization, it is still a general purpose floating point arithmetic, interpretable out-of-the-box. Our 8/38-bit log float multiply-add is synthesized and power profiled at 28
In this paper we investigate the limit performance of Floating Gossip, a new, fully distributed Gossip Learning scheme which relies on Floating Content to implement location-based probabilistic evolution of machine learning models in an infrastructure-less manner. We consider dynamic scenarios where continuous learning is necessary, and we adopt a mean field approach to investigate the limit performance of Floating Gossip in terms of amount of data that users can incorporate into their models, as a function of the main system parameters. Different from existing approaches in which either communication or computing aspects of Gossip Learning are analyzed and optimized, our approach accounts for the compound impact of both aspects. We validate our results through detailed simulations, proving good accuracy. Our model shows that Floating Gossip can be very effective in implementing continuous training and update of machine learning models in a cooperative manner, based on opportunistic exchanges among moving users.