共找到 20 条结果
In this paper, we explicitly compute two kinds of algebraic Belyi functions on Bring's curve. One is related to a congruence subgroup of ${\rm SL}_2(\mathbb{Z})$ and the other is related to a congruence subgroup of the triangle group $Δ(2,4,5)\subset \SL_2(\R)$. To carry out the computation, we use elliptic cusp forms of weight 2 for the former case and the automorphism group of Bring's curve for the latter case. We also discuss a suitable base field (a number field) for describing isomorphisms between Hulek-Craig's curve, Bring's curve, and another algebraic model obtained as a modular curve.
Large Language Models (LLMs) exhibit strong multilingual capabilities, yet remain fundamentally constrained by the severe imbalance in global language resources. While over 7,000 languages are spoken worldwide, only a small subset (fewer than 100) has sufficient digital presence to meaningfully influence modern LLM training. This disparity leads to systematic underperformance, cultural misalignment, and limited accessibility for speakers of low-resource and extreme-low-resource languages. To address this gap, we introduce Bring Your Own Language (BYOL), a unified framework for scalable, language-aware LLM development tailored to each language's digital footprint. BYOL begins with a language resource classification that maps languages into four tiers (Extreme-Low, Low, Mid, High) using curated web-scale corpora, and uses this classification to select the appropriate integration pathway. For low-resource languages, we propose a full-stack data refinement and expansion pipeline that combines corpus cleaning, synthetic text generation, continual pretraining, and supervised finetuning. Applied to Chichewa and Maori, this pipeline yields language-specific LLMs that achieve approximately
Generative recommendation (GR) has shown strong potential for sequential recommendation in an end-to-end generation paradigm. However, existing GR models suffer from severe cold-start collapse: their recommendation accuracy on cold-start items can drop to near zero. Current solutions typically rely on retraining with cold-start interactions, which is hindered by sparse feedback, high computational cost, and delayed updates, limiting practical utility in rapidly evolving recommendation catalogs. Inspired by model editing in NLP, which enables training-free knowledge injection into large language models, we explore how to bring this paradigm to generative recommendation. This, however, faces two key challenges: GR lacks the explicit subject-object binding common in natural language, making targeted edits difficult; and GR does not exhibit stable token co-occurrence patterns, making the injection of multi-token item representations unreliable. To address these challenges, we propose GenRecEdit, a model editing framework tailored for generative recommendation. GenRecEdit explicitly models the relationship between the full sequence context and next-token generation, adopts iterative tok
The construction of the High Intensity heavy ion Accelerator Facility (HIAF) has been completed, with current efforts focused on subsystem commissioning. Beam commissioning is scheduled for autumn 2025, marking a critical milestone in the HIAF project. This paper presents high-precision optics calculations for the Booster Ring (BRing) of HIAF, a key component for achieving stable heavy-ion beam acceleration. Leveraging high-precision magnetic field data, each magnet is divided into hundreds of slices, thus establishing a high-precision sliced optics model for BRing. Detailed calculations of BRing's optics are presented in this work. Critical parameters including tunes and betatron functions of the lattice based on the measured magnetic fields and those of the ideal lattice have been compared. The results highlight the impact of realistic magnetic field on beam dynamics and provide essential insights for accelerator tuning and optimization. These findings serve as a fundamental reference for beam commissioning and long-term operation, ensuring beam stability and performance reproducibility in HIAF.
Animated characters often move in non-physical ways and have proportions that are far from a typical walking robot. This provides an ideal platform for innovation in both mechanical design and stylized motion control. In this paper, we bring Olaf to life in the physical world, relying on reinforcement learning guided by animation references for control. To create the illusion of Olaf's feet moving along his body, we hide two asymmetric legs under a soft foam skirt. To fit actuators inside the character, we use spherical and planar linkages in the arms, mouth, and eyes. Because the walk cycle results in harsh contact sounds, we introduce additional rewards that noticeably reduce impact noise. The large head, driven by small actuators in the character's slim neck, creates a risk of overheating, amplified by the costume. To keep actuators from overheating, we feed temperature values as additional inputs to policies, introducing new rewards to keep them within bounds. We validate the efficacy of our modeling in simulation and on hardware, demonstrating an unmatched level of believability for a costumed robotic character.
The Tor network offers network anonymity to its users by routing their traffic through a sequence of relays. A group of nine directory authorities maintains information about all available relay nodes using a distributed directory protocol. We observe that the current protocol makes a strong synchrony assumption, which makes it vulnerable to natural as well as adversarial non-synchronous communication scenarios over the Internet. In this paper, we show that it is possible to cause a failure in the Tor directory protocol by targeting a majority of the authorities for only five minutes using a well-executed distributed denial-of-service (DDoS) attack. We demonstrate this attack in a controlled environment and show that it is cost-effective for as little as \$53.28 per month to disrupt the protocol and to effectively bring down the entire Tor network. To mitigate this problem, we consider the popular partial synchrony assumption that ensures protocol security even when the network delays are large and unknown initially. We design a new Tor directory protocol that leverages a standard partial-synchronous consensus protocol to solve this problem, while also proving its security. We have
While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as "bring my cup," where the robot must act on one specific instance among visually similar objects. We study this setting of manipulating personal objects, in which a VLA must identify and control a user-specific object unseen during training using only a few reference images. To address this challenge, we propose Visual Attentive Prompting (VAP), a simple-yet-effective training-free perceptual adapter that equips frozen VLAs with top-down selective attention. VAP treats the reference images as a non-parametric visual memory, grounds the personal object in the scene through open-vocabulary detection and embedding-based matching, and then injects this grounding as a visual prompt by highlighting the object and rewriting the instruction. We construct two simulation benchmarks, Personalized-SIMPLER and Personalized-VLABench, and a real-world tabletop benchmark to evaluate personalized manipulation across multiple robots and tasks. Experiments show that VAP consistently outperforms generic policies and token-learning baselines in both success rate and correc
In this paper, we present FaceShot, a novel training-free portrait animation framework designed to bring any character into life from any driven video without fine-tuning or retraining. We achieve this by offering precise and robust reposed landmark sequences from an appearance-guided landmark matching module and a coordinate-based landmark retargeting module. Together, these components harness the robust semantic correspondences of latent diffusion models to produce facial motion sequence across a wide range of character types. After that, we input the landmark sequences into a pre-trained landmark-driven animation model to generate animated video. With this powerful generalization capability, FaceShot can significantly extend the application of portrait animation by breaking the limitation of realistic portrait landmark detection for any stylized character and driven video. Also, FaceShot is compatible with any landmark-driven animation model, significantly improving overall performance. Extensive experiments on our newly constructed character benchmark CharacBench confirm that FaceShot consistently surpasses state-of-the-art (SOTA) approaches across any character domain. More re
Bringing AI to vehicles and enabling them as sensing platforms is key to transforming maintenance from reactive to proactive. Now is the time to integrate AI copilots that speak both languages: machine and driver. This article offers a conceptual and technical perspective intended to spark interdisciplinary dialogue and guide future research and development in intelligent vehicle systems, predictive maintenance, and AI-powered user interaction.
The High Intensity heavy ion Accelerator Facility (HIAF) successfully accelerated the 18O6+ beam on October 27, 2025. This paper presents a further simulation study on the high-precision optics, namely sliced optics, of the Booster Ring (BRing) at HIAF based on measured magnetic fields, focusing on three aspects: (1) closed-orbit distortion (COD) and variations in optical parameters induced by errors; (2) closed-orbit correction; (3) dynamic aperture. Specifically, detailed investigations are conducted on COD and optical parameter variations caused by magnet alignment errors and dipole magnet field errors, alongside simulations of closed-orbit correction and detailed calculations of BRing's dynamic aperture. Results show the sliced optics outperforms the original optics in COD control. Without chromaticity correction, its dynamic aperture is superior to the original; after chromaticity correction, it remains comparable. This study provides valuable insights for accelerator tuning and optimization.
The modular group $Γ$ (which is the Hecke group $H_3$) can be used to study triangular maps. Here we use the Hecke group $H_4$ to study the regular map that underlies Bring's surface of genus 4. Our main result is the determination of the 20-sided hyperbolic polygon that gives Bring's surface.
Despite recent achievements, existing 3D interacting hands recovery methods have shown results mainly on motion capture (MoCap) environments, not on in-the-wild (ITW) ones. This is because collecting 3D interacting hands data in the wild is extremely challenging, even for the 2D data. We present InterWild, which brings MoCap and ITW samples to shared domains for robust 3D interacting hands recovery in the wild with a limited amount of ITW 2D/3D interacting hands data. 3D interacting hands recovery consists of two sub-problems: 1) 3D recovery of each hand and 2) 3D relative translation recovery between two hands. For the first sub-problem, we bring MoCap and ITW samples to a shared 2D scale space. Although ITW datasets provide a limited amount of 2D/3D interacting hands, they contain large-scale 2D single hand data. Motivated by this, we use a single hand image as an input for the first sub-problem regardless of whether two hands are interacting. Hence, interacting hands of MoCap datasets are brought to the 2D scale space of single hands of ITW datasets. For the second sub-problem, we bring MoCap and ITW samples to a shared appearance-invariant space. Unlike the first sub-problem, 2
Single RGB or LiDAR is the mainstream sensor for the challenging scene flow, which relies heavily on visual features to match motion features. Compared with single modality, existing methods adopt a fusion strategy to directly fuse the cross-modal complementary knowledge in motion space. However, these direct fusion methods may suffer the modality gap due to the visual intrinsic heterogeneous nature between RGB and LiDAR, thus deteriorating motion features. We discover that event has the homogeneous nature with RGB and LiDAR in both visual and motion spaces. In this work, we bring the event as a bridge between RGB and LiDAR, and propose a novel hierarchical visual-motion fusion framework for scene flow, which explores a homogeneous space to fuse the cross-modal complementary knowledge for physical interpretation. In visual fusion, we discover that event has a complementarity (relative v.s. absolute) in luminance space with RGB for high dynamic imaging, and has a complementarity (local boundary v.s. global shape) in scene structure space with LiDAR for structure integrity. In motion fusion, we figure out that RGB, event and LiDAR are complementary (spatial-dense, temporal-dense v.s.
In the area of multi-objective evolutionary algorithms (MOEAs), there is a trend of using an archive to store non-dominated solutions generated during the search. This is because 1) MOEAs may easily end up with the final population containing inferior solutions that are dominated by other solutions discarded during the search process and 2) the population that has a commensurable size of the problem's Pareto front is often not practical. In this paper, we theoretically show, for the first time, that using an archive can guarantee speed-ups for MOEAs. Specifically, we prove that for two well-established MOEAs (NSGA-II and SMS-EMOA) on two commonly studied problems (OneMinMax and LeadingOnesTrailingZeroes), using an archive brings a polynomial acceleration on the expected running time. The reason is that with an archive, the size of the population can reduce to a small constant; there is no need for the population to keep all the Pareto optimal solutions found. This contrasts existing theoretical studies for MOEAs where a population with a commensurable size of the problem's Pareto front is needed. The findings in this paper not only provide a theoretical confirmation for an increasi
While pre-trained image autoencoders are increasingly utilized in computer vision, the application of inverse graphics in 2D latent spaces has been under-explored. Yet, besides reducing the training and rendering complexity, applying inverse graphics in the latent space enables a valuable interoperability with other latent-based 2D methods. The major challenge is that inverse graphics cannot be directly applied to such image latent spaces because they lack an underlying 3D geometry. In this paper, we propose an Inverse Graphics Autoencoder (IG-AE) that specifically addresses this issue. To this end, we regularize an image autoencoder with 3D-geometry by aligning its latent space with jointly trained latent 3D scenes. We utilize the trained IG-AE to bring NeRFs to the latent space with a latent NeRF training pipeline, which we implement in an open-source extension of the Nerfstudio framework, thereby unlocking latent scene learning for its supported methods. We experimentally confirm that Latent NeRFs trained with IG-AE present an improved quality compared to a standard autoencoder, all while exhibiting training and rendering accelerations with respect to NeRFs trained in the image
5G brings many improvements to cellular networks in terms of performance, such as lower latency, improved network efficiency, and higher throughput, making it an attractive candidate for many applications. One such domain is industrial applications that may require real-time guarantees to transmit time-critical control messages. Assuming the immense number of devices exchanging data in support of Massive Machine-Type Communications (mMTC) applications, the capability of the cellular infrastructure to handle a large number of real-time transmissions may be inadequate. For such cases, there exists an acute desire to reduce any overheads as much as possible in order to guarantee certain deadlines. One such target is the Domain Name System (DNS) service, for which queries precede almost every new network request. This incorporates additional communication delays based on the response time, which in turn is affected by the proximity of the DNS server. While bringing DNS service to the edge has been touted as a logical solution, its integration with 5G systems is still challenging. This is due to the inability to access the DNS query information at the application layer since the User Eq
In this paper, we work to bring telepresence to every desktop. Unlike commercial systems, personal 3D video conferencing systems must render high-quality videos while remaining financially and computationally viable for the average consumer. To this end, we introduce a capturing and rendering system that only requires 4 consumer-grade RGBD cameras and synthesizes high-quality free-viewpoint videos of users as well as their environments. Experimental results show that our system renders high-quality free-viewpoint videos without using object templates or heavy pre-processing. While not real-time, our system is fast and does not require per-video optimizations. Moreover, our system is robust to complex hand gestures and clothing, and it can generalize to new users. This work provides a strong basis for further optimization, and it will help bring telepresence to every desk in the near future. The code and dataset will be made available on our website https://mcmvmc.github.io/PersonalTelepresence/.
Stars strongly impact their environment, and shape structures on all scales throughout the universe, in a process known as ``feedback''. Due to the complexity of both stellar evolution and the physics of larger astrophysical structures, there remain many unanswered questions about how feedback operates, and what we can learn about stars by studying their imprint on the wider universe. In this white paper, we summarize discussions from the Lorentz Center meeting `Bringing Stellar Evolution and Feedback Together' in April 2022, and identify key areas where further dialogue can bring about radical changes in how we view the relationship between stars and the universe they live in.
Text-rich VQA, namely Visual Question Answering based on text recognition in the images, is a cross-modal task that requires both image comprehension and text recognition. In this work, we focus on investigating the advantages and bottlenecks of LLM-based approaches in addressing this problem. To address the above concern, we separate the vision and language modules, where we leverage external OCR models to recognize texts in the image and Large Language Models (LLMs) to answer the question given texts. The whole framework is training-free benefiting from the in-context ability of LLMs. This pipeline achieved superior performance compared to the majority of existing Multimodal Large Language Models (MLLM) on four text-rich VQA datasets. Besides, based on the ablation study, we find that LLM brings stronger comprehension ability and may introduce helpful knowledge for the VQA problem. The bottleneck for LLM to address text-rich VQA problems may primarily lie in visual part. We also combine the OCR module with MLLMs and pleasantly find that the combination of OCR module with MLLM also works. It's worth noting that not all MLLMs can comprehend the OCR information, which provides insig
By combining the security features of TLS with the reliability of TCP, QUIC opens new possibilities for many applications. We demonstrate the benefits that QUIC brings for routing protocols. Current Internet routing protocols use insecure transport protocols. BGP uses TCP possibly with authentication. OSPF uses its own transport protocol above plain IP. We design and implement a library that allows to replace the transport protocols used by BGP and OSPF with QUIC. We apply this library to the BIRD routing daemon and report preliminary results.