搜索 — ResearchTracker

We introduce HackerSignal, a benchmark for temporal out-of-distribution cyber threat intelligence (CTI) and cross-source CVE linkage. HackerSignal aggregates 7.45 million exact-deduplicated documents from 64 public forum/source identifiers spanning eight source layers and a 36-year window (1990-2026). In contrast to other publicly accessible cybersecurity datasets, HackerSignal is among the first public benchmark datasets that maps the full potential exploit to vulnerability trajectory from hacker community discourse, exploit databases with working and proof of concept exploits, vulnerability advisories, and software fix commits. HackerSignal creates these linkages through a shared CVE identifier space while preserving source-specific release modes to support a range of unique Artificial Intelligence (AI)-enabled cybersecurity analytics tasks. In this paper, we summarize HackerSignal and illustrate three selected benchmark tasks it uniquely supports: (1) CVE linkage retrieval (cross-source temporal out-of-distribution entity grounding); (2) exploit type classification (8-class vulnerability type prediction with temporal OOD evaluation); and (3) temporal generalization (prospective

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

arXiv2026-06-08作者：Ziqian Zhong, Ivgeni Segal, Ivan Bercovich

Agent benchmarks score submissions with outcome verifiers that are typically hand-written and brittle, leaving them open to reward hacking. We audit 1,968 tasks across five terminal-agent benchmarks and find 323 (16%) hackable by frontier models given only the task description. This corrupts both leaderboard rankings and RL training signal, yet the standard response is manual and reactive. We introduce the hacker-fixer loop, a method for building exploit-resistant verifiers without per-task manual patching. The loop alternates three LLM agents: a hacker tries to pass the verifier without solving the task, a fixer patches the verifier to reject each discovered exploit, and a solver confirms the patched verifier still admits legitimate solutions. The loop iterates: each patch reshapes what the verifier rewards, surfacing the next exploit. We further add verifier access, and let patches transfer across tasks, to broaden the exploits the loop discovers. On KernelBench, the loop drives the attack success rate from 62% to 0% on a held-out corpus of publicly reported exploits. We also find that weaker agents in the loop can defend against much stronger hackers: Gemini 3 Flash's loop drive

搜索结果：hacker

HackerSignal: A Large-Scale Multi-Source Dataset Linking Hacker Community Discourse to the CVE Vulnerability Lifecycle

Hardening Agent Benchmarks with Adversarial Hacker-Fixer Loops

Social Media Reactions to Open Source Promotions: AI-Powered GitHub Projects on Hacker News

Framing the Hacker: Media Representations and Public Discourse in Germany

EventHunter: Dynamic Clustering and Ranking of Security Events from Hacker Forum Discussions

Stochastic behavior of an n-node blockchain under cyber attacks from multiple hackers with random re-setting times

EUREKHA: Enhancing User Representation for Key Hackers Identification in Underground Forums

Launch-Day Diffusion: Tracking Hacker News Impact on GitHub Stars for AI Tools

Understanding Hackers' Work: An Empirical Study of Offensive Security Practitioners

Instantaneous and limiting behavior of an n-node blockchain under cyber attacks from a single hacker

A Small World of Bad Guys: Investigating the Behavior of Hacker Groups in Cyber-Attacks

The Professionalization of the Hacker Industry

Which programming languages do hackers use? A survey at the German Chaos Computer Club

How Professional Hackers Understand Protected Code while Performing Attack Tasks

Deep Learning Algorithm for Threat Detection in Hackers Forum (Deep Web)

A generative machine learning model for designing metal hydrides applied to hydrogen storage

The Global Landscape of Environmental AI Regulation: From the Cost of Reasoning to a Right to Green AI

Derivation and analysis of a Stokes-transport system in evolving vessels modeling thermoregulation in human skin

A pragmatic approach to regulating AI agents

Weighing the mass of LHS 3844 b