共找到 20 条结果
In this paper, we investigate the properties of disjoint Ces$\grave{a}$ro-hypercyclic operators. First, the definition of disjoint Ces$\grave{a}$ro-hypercyclic operators is provided, and disjoint Ces$\grave{a}$ro-Hypercyclicity Criterion is proposed. Later, two methods are used to prove that operators satisfying this criterion possess disjoint Ces$\grave{a}$ro-hypercyclicity. Finally, this paper further investigates weighted shift operators and provides detailed characterizations of the weight sequences for disjoint Ces$\grave{a}$ro-hypercyclic unilateral and bilateral weighted shift operators on sequence spaces.
Ionospheric effects degrade the quality of radar data, which are critical for the precision of the satellite ephemeris produced by space surveillance systems; this degradation is especially noticeable for radars such as GRAVES that operate in the very high frequency range. This article presents a simple and effective method to correct for ionospheric effects, with an evaluation on data obtained with GRAVES, the French space surveillance radar. This method relies on GPS data, and our evaluation relies on GRAVES and DORIS data. We found that the gain in terms of evaluated radial velocity can be as high as 1.76$κ$, where $κ$ is the typical root mean square of the noise on radial velocity measurements for GRAVES (excluding ionospheric effects): the error decreases from 2.60$κ$ to 0.83$κ$ for daytime satellite overhead passes. Our conclusion is that, while this method is very simple to implement, it has proven to be a good correction for ionospheric effects in practice.
The paper extends the well-known Lyusternik-Graves theorem for set-valued mappings to the Holder framework, offers an affirmative answer to an open problem proposed by Dontchev and improves recent results of He and Ng. Primal and dual necessary and sufficient conditions for Holder metric regularity are established. The results are applied to convergence analysis of a Newton-type method. Some open problems for future research are also discussed.
Repeatedly solving the parameterized optimal mass transport (pOMT) problem is a frequent task in applications such as image registration and adaptive grid generation. It is thus critical to develop a highly efficient reduced solver that is equally accurate as the full order model. In this paper, we propose such a machine learning-like method for pOMT by adapting a new reduced basis (RB) technique specifically designed for nonlinear equations, the reduced residual reduced over-collocation (R2-ROC) approach, to the parameterized Monge-Amp$\grave{\rm e}$re equation. It builds on top of a narrow-stencil finite different method (FDM), a so-called truth solver, which we propose in this paper for the Monge-Amp$\grave{\rm e}$re equation with a transport boundary. Together with the R2-ROC approach, it allows us to handle the strong and unique nonlinearity pertaining to the Monge-Amp$\grave{\rm e}$re equation achieving online efficiency without resorting to any direct approximation of the nonlinearity. Several challenging numerical tests demonstrate the accuracy and high efficiency of our method for solving the Monge-Amp$\grave{\rm e}$re equation with various parametric boundary conditions.
In this paper, we establish interior $C^{1,α}$ estimates for solutions of the linearized Monge-Amp$\grave{e}$re equation $$\mathcal{L}_φu:=\mathrm{tr}[ΦD^2 u]=f,$$ where the density of the Monge-Amp$\grave{e}$re measure $g:=\mathrm{det}D^2φ$ satisfies a $\mathrm{VMO}$-type condition and $Φ:=(\mathrm{det}D^2φ)(D^2φ)^{-1}$ is the cofactor matrix of $D^2φ$.
In this paper, we establish global $W^{2,p}$ estimates for solutions of the linearized Monge-Amp$\grave{e}$re equation $$\mathcal{L}_φu:=\mathrm{tr}[ΦD^2 u]=f,$$ where the density of the Monge-Amp$\grave{e}$re measure $g:=\mathrm{det}D^2φ$ satisfies a $\mathrm{VMO}$-type condition, and $Φ:=(\mathrm{det}D^2φ)(D^2φ)^{-1}$ is the cofactor matrix of $D^2φ$.
In this paper, we establish global $C^{1+α,\frac{1+α}{2}}$ estimates for solutions of the linearized parabolic Monge-Amp$\grave{e}$re equation $$\mathcal{L}_φu(x,t):=-u_t\,\mathrm{det}D^2φ(x)+\mathrm{tr}[Φ(x) D^2 u]=f(x,t)$$ under appropriate conditions on the domain, Monge-Amp$\grave{e}$re measures, boundary data and $f$, where $Φ:=\mathrm{det}(D^2φ)(D^2φ)^{-1}$ is the cofactor of the Hessian of $D^2φ$.
In certain fields where compositional data are studied, the compositional components, called parts, can be combined into certain subsets, called amalgamations, that are based on domain knowledge. Furthermore, these subsets can form a natural hierarchy of amalgamations subdividing into sub-amalgamations. The authors, a statistician and a biochemist, demonstrate how to create a hierarchy of amalgamations in the context of fatty acid compositions in a sample of marine organisms. Following a tradition in compositional data analysis, these amalgamations are transformed to logratios, and their usefulness as new variables is quantified by the percentage of total logratio variance that they explain. This method is proposed as an alternative method of variable selection in compositional data analysis.
We introduce Delayed Streams Modeling (DSM), a flexible formulation for streaming, multimodal sequence-to-sequence learning. Sequence-to-sequence generation is often cast in an offline manner, where the model consumes the complete input sequence before generating the first output timestep. Alternatively, streaming sequence-to-sequence rely on learning a policy for choosing when to advance on the input stream, or write to the output stream. DSM instead models already time-aligned streams with a decoder-only language model. By moving the alignment to a pre-processing step,and introducing appropriate delays between streams, DSM provides streaming inference of arbitrary output sequences, from any input combination, making it applicable to many sequence-to-sequence problems. In particular, given text and audio streams, automatic speech recognition (ASR) corresponds to the text stream being delayed, while the opposite gives a text-to-speech (TTS) model. We perform extensive experiments for these two major sequence-to-sequence tasks, showing that DSM provides state-of-the-art performance and latency while supporting arbitrary long sequences, being even competitive with offline baselines.
Artificial intelligence and machine learning models deployed on edge devices, e.g., for quality control in Additive Manufacturing (AM), are frequently small in size. Such models usually have to deliver highly accurate results within a short time frame. Methods that are commonly employed in literature start out with larger trained models and try to reduce their memory and latency footprint by structural pruning, knowledge distillation, or quantization. It is, however, also possible to leverage hardware-aware Neural Architecture Search (NAS), an approach that seeks to systematically explore the architecture space to find optimized configurations. In this study, a hardware-aware NAS workflow is introduced that couples an edge device located in Belgium with a powerful High-Performance Computing system in Germany, to train possible architecture candidates as fast as possible while performing real-time latency measurements on the target hardware. The approach is verified on a use case in the AM domain, based on the open RAISE-LPBF dataset, achieving ~8.8 times faster inference speed while simultaneously enhancing model quality by a factor of ~1.35, compared to a human-designed baseline.
In this work we present the Consistency-Rebalanced Accuracy (CoRA) metric, improving the reliability of Large Language Model (LLM) scores computed on multiple choice (MC) benchmarks. Our metric explores the response consistency of the LLMs, taking advantage of synthetically-generated questions with altered answer choices. With two intermediate scores, i.e. Bare-Minimum-Consistency Accuracy (BMCA) and Consistency Index (CI), CoRA is computed by adjusting the multiple-choice question answering (MCQA) scores to better reflect the level of consistency of the LLM. We present evaluations in different benchmarks using diverse LLMs, and not only demonstrate that LLMs can present low response consistency even when they present high MCQA scores, but also that CoRA can successfully scale down the scores of inconsistent models.
We address the problem of extending a pretrained large language model to a new domain that was not seen during training. Standard techniques, such as finetuning or low-rank adaptation (LoRA) are successful at domain adaptation, but do not formally add capacity to the model. This often leads to a trade-off, between performing well on the new domain vs. degrading performance on the original domain. Here, we revisit and improve adapters to extend LLMs from three angles: data, architecture and training procedure, which are advantageously considered jointly. The resulting method, called neutral residues, modifies adapters in a way that leads each new residual block to output near-zeros on the original domain. This solution leads to strong results when adapting a state-of-the-art model originally trained on English to a new language. Neutral residues significantly outperform competing approaches such as finetuning, LoRA or vanilla adapters in terms of the trade-off between learning the new language and not forgetting English.
We present RAISE-LPBF, a large dataset on the effect of laser power and laser dot speed in powder bed fusion (LPBF) of 316L stainless steel bulk material, monitored by on-axis 20k FPS video. Both process parameters are independently sampled for each scan line from a continuous distribution, so interactions of different parameter choices can be investigated. The data can be used to derive statistical properties of LPBF, as well as to build anomaly detectors. We provide example source code for loading the data, baseline machine learning models and results, and a public benchmark to evaluate predictive models.
Recent techniques such as retrieval-augmented generation or chain-of-thought reasoning have led to longer contexts and increased inference costs. Context compression techniques can reduce these costs, but the most effective approaches require fine-tuning the target model or even modifying its architecture. This can degrade its general abilities when not used for this specific purpose. Here we explore an alternative approach: an encoder that compresses the context into continuous representations which replace token embeddings in decoder LLMs. First, we perform a systematic study of training strategies and architecture choices for the encoder. Our findings led to the design of an Adaptable text Representations Compressor, named ARC-Encoder, which outputs $x$-times fewer continuous representations (typically $x\!\in\!\{4,8\}$) than text tokens. We evaluate ARC-Encoder across a variety of LLM usage scenarios, ranging from in-context learning to context window extension, on both instruct and base decoders. Results show that ARC-Encoder achieves state-of-the-art performance on several benchmarks while improving computational efficiency at inference. Finally, we demonstrate that our model
We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue framework. Current systems for spoken dialogue rely on pipelines of independent components, namely voice activity detection, speech recognition, textual dialogue and text-to-speech. Such frameworks cannot emulate the experience of real conversations. First, their complexity induces a latency of several seconds between interactions. Second, text being the intermediate modality for dialogue, non-linguistic information that modifies meaning -- such as emotion or non-speech sounds -- is lost in the interaction. Finally, they rely on a segmentation into speaker turns, which does not take into account overlapping speech, interruptions and interjections. Moshi solves these independent issues altogether by casting spoken dialogue as speech-to-speech generation. Starting from a text language model backbone, Moshi generates speech as tokens from the residual quantizer of a neural audio codec, while modeling separately its own speech and that of the user into parallel streams. This allows for the removal of explicit speaker turns, and the modeling of arbitrary conversational dynamics. We moreover extend the hie
The review presents rigorous results of the theory of fundamental equations of evolution of many-particle systems with collisions and also considers their connection with nonlinear kinetic equations describing the collective behavior of particles in scaling approximations. This work is dedicated to the 160th anniversary of the birth of Dmytro Oleksandrovych Grave, the first academician of the Ukraine Academy of Sciences in mathematics and the founder of the Institute of Mathematics in 1920.
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Scaling the size of language models to tens of billions of parameters has led to impressive performance on a wide range of tasks. At generation, these models are used auto-regressively, requiring a forward pass for each generated token, and thus reading the full set of parameters from memory. This memory access forms the primary bottleneck for generation and it worsens as the model size increases. Moreover, executing a forward pass for multiple tokens in parallel often takes nearly the same time as it does for just one token. These two observations lead to the development of speculative sampling, where a second smaller model is used to draft a few tokens, that are then validated or rejected using a single forward pass of the large model. Unfortunately, this method requires two models that share the same tokenizer and thus limits its adoption. As an alternative, we propose to use parallel decoding as a way to draft multiple tokens from a single model with no computational cost, nor the need for a second model. Our approach only requires an additional input token that marks the words that will be generated simultaneously. We show promising performance (up to $30\%$ speed-up) while re
The Di{ó}si-Penrose model is explored in a relativistic context. Relativistic effects were considered within a recently proposed Grave de Peralta approach [L. Grave de Peralta, {\em Results Phys.} {\bf 18} (2020) 103318], which parametrize the Schr{ö}dinger-like hamiltonian so as to impose that the average kinetic energy of the system coincide with its relativistic kinetic energy. As a case of study, the method is applied to a particle in a box with good results. In the Di{ó}si-Penrose model we observed that the width of a quantum matter field confined by its own gravitational field [L. Di{ó}si, {\em Phys. Lett}. {\bf 105A} (1984) 199], sharply drop to zero for a mass of the order of the Planck mass, indicating a breakdown of the model at the Planck scale.
The present note concerns the "graph of graphs" that has cubic graphs as vertices connected by edges represented by the so-called Whitehead moves. Here, we prove that the outer-conductance of the graph of graphs tends to zero as the number of vertices tends to infinity. This answers a question of K. Rafi in the negative.