搜索结果：commits

共找到 20 条结果

高级筛选 ▾

Detecting Multiple Semantic Concerns in Tangled Code Commits

arXiv

Code commits in a version control system (e.g., Git) should be atomic, i.e., focused on a single goal, such as adding a feature or fixing a bug. In practice, however, developers often bundle multiple concerns into tangled commits, obscuring intent and complicating maintenance. Recent studies have used Conventional Commits Specification (CCS) and Language Models (LMs) to capture commit intent, demonstrating that Small Language Models (SLMs) can approach the performance of Large Language Models (LLMs) while maintaining efficiency and privacy. However, they do not address tangled commits involving multiple concerns, leaving the feasibility of using LMs for multi-concern detection unresolved. In this paper, we frame multi-concern detection in tangled commits as a multi-label classification problem and construct a controlled dataset of artificially tangled commits based on real-world data. We then present an empirical study using SLMs to detect multiple semantic concerns in tangled commits, examining the effects of fine-tuning, concern count, commit-message inclusion, and header-preserving truncation under practical token-budget limits. Our results show that a fine-tuned 14B-parameter S

How and Why Agents Can Identify Bug-Introducing Commits

arXiv2026-03-31作者：Niklas Risse, Marcel Böhme

Śliwerski, Zimmermann, and Zeller (SZZ) just won the 2026 ACM SIGSOFT Impact Award for asking: When do changes induce fixes? Their paper from 2005 served as the foundation for a wide array of approaches aimed at identifying bug-introducing changes (or commits) from fix commits in software repositories. But even after two decades of progress, the best-performing approach from 2025 yields a modest increase of 10 percentage points in F1-score on the most popular Linux kernel dataset. In this paper, we uncover how and why LLM-based agents can substantially advance the state-of-the-art in identifying bug-introducing commits from fix commits. We propose a simple agentic workflow based on searching a set of candidate commits and find that it raises the F1-score from 0.64 to 0.81 on the most popular Linux kernel dataset, a bigger jump than between the original 2005 method (0.54) and the previous SOTA (0.64). We also uncover why agents are so successful: They derive short greppable patterns from the fix commit diff and message and use them to effectively search and find bug-introducing commits in large candidate sets. Finally, we also discuss how these insights might enable further progress

搜索结果：commits

Detecting Multiple Semantic Concerns in Tangled Code Commits

How and Why Agents Can Identify Bug-Introducing Commits

A Purpose-oriented Study on Open-source Software Commits and Their Impacts on Software Quality

CLNX: Bridging Code and Natural Language for C/C++ Vulnerability-Contributing Commits Identification

From Commits to Confidence: Towards Stability-Informed Risk Assessment in Open Source Software

Exploring Security Commits in Python

PatchSeeker: Mapping NVD Records to their Vulnerability-fixing Commits with LLM Generated Commits and Embeddings

An Empirical Study of Token-based Micro Commits

CommitBART: A Large Pre-trained Model for GitHub Commits

An Exploratory Study of Bot Commits

A Fine-grained Data Set and Analysis of Tangling in Bug Fixing Commits

Utilizing Source Code Syntax Patterns to Detect Bug Inducing Commits using Machine Learning Models

Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted Retrieval

CrossCommitVuln-Bench: A Dataset of Multi-Commit Python Vulnerabilities Invisible to Per-Commit Static Analysis

Analysis of Commit Signing on Github

KADEL: Knowledge-Aware Denoising Learning for Commit Message Generation

Evolution of repositories and privacy laws: commit activities in the GDPR and CCPA era

On the Prevalence and Usage of Commit Signing on GitHub: A Longitudinal and Cross-Domain Study

On the Informativeness of Security Commit Messages: A Large-scale Replication Study

LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery