搜索 — ResearchTracker

Modern datacenters increasingly rely on low-power, single-slot inference accelerators to balance performance, energy efficiency, and rack density constraints. The NVIDIA T4 GPU has become widely deployed due to strong performance per watt and mature software support. Its successor, the NVIDIA L4 GPU, introduces improvements in Tensor Core throughput, cache capacity, memory bandwidth, and parallel execution capability. However, limited empirical evidence quantifies the practical inference performance gap between these two generations under controlled and reproducible conditions. This work introduces DEEP-GAP, a systematic evaluation extending the GDEV-AI methodology to GPU inference. Using identical configurations and workloads, we evaluate ResNet18, ResNet50, and ResNet101 across FP32, FP16, and INT8 precision modes using PyTorch and TensorRT. Results show that reduced precision significantly improves performance, with INT8 achieving up to 58x throughput improvement over CPU baselines. L4 achieves up to 4.4x higher throughput than T4 while reaching peak efficiency at smaller batch sizes between 16 and 32, improving latency-throughput tradeoffs for latency-sensitive workloads. T4 re

Detection of Performance Changes in MooBench Results Using Nyrkiö on GitHub Actions

arXiv2025-10-13作者：Shinhyung Yang, David Georg Reichelt, Henrik Ingo

In GitHub with its 518 million hosted projects, performance changes within these projects are highly relevant to the project's users. Although performance measurement is supported by GitHub CI/CD, performance change detection is a challenging topic. In this paper, we demonstrate how we incorporated Nyrkiö to MooBench. Prior to this work, Moobench continuously ran on GitHub virtual machines, measuring overhead of tracing agents, but without change detection. By adding the upload of the measurements to the Nyrkiö change detection service, we made it possible to detect performance changes. We identified one major performance regression and examined the performance change in depth. We report that (1) it is reproducible with GitHub actions, and (2) the performance regression is caused by a Linux Kernel version change.

搜索结果：Performance

DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance

Detection of Performance Changes in MooBench Results Using Nyrkiö on GitHub Actions

Mixed-Precision Performance Portability of FFT-Based GPU-Accelerated Algorithms for Block-Triangular Toeplitz Matrices

Performance Enhancement of the Ozaki Scheme on Integer Matrix Multiplication Unit

A Study of Performance Portability in Plasma Physics Simulations

Adaptation and synchronization -- basic mechanisms in music performance

Decomposing Docker Container Startup Performance: A Three-Tier Measurement Study on Heterogeneous Infrastructure

Performance Optimization in Stream Processing Systems: Experiment-Driven Configuration Tuning for Kafka Streams

Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards Exascale

Performance Tuning for GPU-Embedded Systems: Machine-Learning-based and Analytical Model-driven Tuning Methodologies

Enhancing Trace Visualizations for Microservices Performance Analysis

PICO: Performance Insights for Collective Operations

Prediction of Performance and Power Consumption of GPGPU Applications

Automatic Microprocessor Performance Bug Detection

A Performance-Portable SYCL Implementation of CRK-HACC for Exascale

Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML

Enhancing Performance Monitoring in C/C++ Programs with EDPM: A Domain-Specific Language for Performance Monitoring

A Metric for Performance Portability

Validation of hardware events for successful performance pattern identification in High Performance Computing

Performance of Genetic Algorithms in the Context of Software Model Refactoring