搜索 — ResearchTracker

In hybrid automatic speech recognition (ASR) systems, the vocabulary size is unambiguous, typically determined by the number of phones, bi-phones, or tri-phones present in the language. In contrast, end-to-end ASR systems derive their vocabulary, often referred to as tokens from the text corpus used for training. The choice and, more importantly, the size of this vocabulary is a critical hyper-parameter in training end-to-end ASR systems. Tokenization algorithms such as Byte Pair Encoding (BPE), WordPiece, and Unigram Language Model (ULM) use the vocabulary size as an input hyper-parameter to generate the sub-words employed during ASR training. Popular toolkits like ESPNet provide a fixed vocabulary size in their training recipes, but there is little documentation or discussion in the literature regarding how these values are determined. Recent work [1] has formalized an approach to identify the vocabulary size best suited for end-to-end ASR, introducing a cost function framework that treats the tokenization process as a black box. In this paper, we build upon that foundation by curve fitting the training data and using the principle of first and second derivative tests in calculus

2nd Place Solution for CVPR2024 E2E Challenge: End-to-End Autonomous Driving Using Vision Language Model

arXiv2025-09-02作者：Zilong Guo, Yi Luo, Long Sha

End-to-end autonomous driving has drawn tremendous attention recently. Many works focus on using modular deep neural networks to construct the end-to-end archi-tecture. However, whether using powerful large language models (LLM), especially multi-modality Vision Language Models (VLM) could benefit the end-to-end driving tasks remain a question. In our work, we demonstrate that combining end-to-end architectural design and knowledgeable VLMs yield impressive performance on the driving tasks. It is worth noting that our method only uses a single camera and is the best camera-only solution across the leaderboard, demonstrating the effectiveness of vision-based driving approach and the potential for end-to-end driving tasks.

搜索结果：End-to-end

A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR

2nd Place Solution for CVPR2024 E2E Challenge: End-to-End Autonomous Driving Using Vision Language Model

End-to-End Simulation of 5G NR Integrated Access and Backhaul Networks for Remote Maritime Connectivity

Make It Long, Keep It Fast: End-to-End 10K Long User Behavior Sequence Modeling for Billion-Scale Douyin Recommendation

FocalAD: Local Motion Planning for End-to-End Autonomous Driving

DriveE2E: Closed-Loop Benchmark for End-to-End Autonomous Driving through Real-to-Simulation

Generalized Trajectory Scoring for End-to-end Multimodal Planning

Agents in the Sandbox: End-to-End Crash Bug Reproduction for Minecraft

Exploring the Causality of End-to-End Autonomous Driving

SKGE-SWIN: End-To-End Autonomous Vehicle Waypoint Prediction and Navigation Using Skip Stage Swin Transformer

DSG: An End-to-End Document Structure Generator

Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation

Benchmarking End-To-End Performance of AI-Based Chip Placement Algorithms

End-to-end Autonomous Driving: Challenges and Frontiers

An Empirical Study of End-to-End Temporal Action Detection

End-to-End Models for Chemical-Protein Interaction Extraction: Better Tokenization and Span-Based Pipeline Strategies

End-to-End $n$-ary Relation Extraction for Combination Drug Therapies

End-to-End United Video Dehazing and Detection

End-to-end analysis using image classification

Multichannel End-to-end Speech Recognition