Semantic communication has emerged as a promising paradigm for improving transmission efficiency and task-level reliability, yet most existing reliability-enhancement approaches rely on retransmission strategies driven by semantic fidelity checking that require additional check codewords solely for retransmission triggering, thereby incurring substantial communication overhead. In this paper, we propose S3CHARQ, a Joint Source-Channel-Check Coding framework with hybrid automatic repeat request that fundamentally rethinks the role of check codewords in semantic communications. By integrating the check codeword into the JSCC process, S3CHARQ enables JS3C, allowing the check codeword to simultaneously support semantic fidelity verification and reconstruction enhancement. At the transmitter, a semantic fidelity-aware check encoder embeds auxiliary reconstruction information into the check codeword. At the receiver, the JSCC and check codewords are jointly decoded by a JS3C decoder, while the check codeword is additionally exploited for perceptual quality estimation. Moreover, because retransmission decisions are necessarily based on imperfect semantic quality estimation in the absence
Metaprogramming and effect handlers interact in unexpected, and sometimes undesirable, ways. One example is scope extrusion: the generation of ill-scoped code. Scope extrusion can either be preemptively prevented, via static type systems, or retroactively detected, via dynamic checks. Static type systems exist in theory, but struggle with a range of implementation and usability problems in practice. In contrast, dynamic checks exist in practice (e.g. in MetaOCaml), but are understudied in theory. Designers of metalanguages are thus given little guidance regarding the design and implementation of checks. We present the first formal study of dynamic scope extrusion checks, introducing a calculus ($λ_{\langle\langle\text{op}\rangle\rangle}$) for describing and evaluating checks. Further, we introduce a novel dynamic check $\unicode{x2014}$ the "Cause-for-Concern" check $\unicode{x2014}$ which we prove correct, characterise without reference to its implementation, and argue combines the advantages of existing dynamic checks. Finally, we extend our framework with refined environment classifiers, which statically prevent scope extrusion, and compare their expressivity with the dynamic ch
We study the decoding problem for quantum Tanner codes and propose to exploit the underlying local code structure by grouping check nodes into more powerful generalized check nodes for enhanced iterative belief propagation (BP) decoding by decoding the generalized checks using a maximum a posteriori (MAP) decoder as part of the check node processing of each decoding iteration. We mainly study the finite-length setting and show that the proposed enhanced generalized BP decoder for quantum Tanner codes significantly outperforms the standard quaternary BP decoder with memory effects, as well as the recently proposed Relay-BP decoder, even outperforming generalized bicycle (GB) codes with comparable parameters in some cases. For other classes of quantum low-density parity-check (qLDPC) codes, we propose a greedy algorithm to combine checks for generalized BP decoding. However, for GB codes, bivariate bicycle codes, hypergraph product codes, and lifted-product codes, there seems to be limited gain by combining simple checks into more powerful ones. To back up our findings, we also provide a theoretical cycle analysis for the considered qLDPC codes.
Empirical researchers often use diagnostic checks to assess the plausibility of their modeling assumptions, such as testing for covariate balance in RCTs, pre-trends in event studies, or instrument validity in IV designs. While these checks are traditionally treated as external hurdles to estimation, we argue they should be integrated into the estimation process itself. In particular, we propose residualizing one's baseline estimator against the vector of diagnostic check statistics to remove the component of baseline sampling variation explained by the diagnostic checks. This residualized estimator offers researchers a "free lunch," delivering three properties simultaneously: (i) eliminating inference distortions from check-based selective reporting; (ii) reducing variance without changing the estimand when the baseline model is correctly specified; and (iii) minimizing worst-case bias under bounded local misspecification within the class of linear adjustments. We apply our method to the RCT in Kaur et al. (2024) and find that, even in a setting where all balance checks pass comfortably, residualization increases the magnitude of the baseline point estimate and reduces its standar
Fact-checking-specific search tools such as Google Fact Check are a promising way to combat misinformation on social media, especially during events bringing significant social influence, such as the COVID-19 pandemic and the U.S. presidential elections. However, the usability of such an approach has not been thoroughly studied. We evaluated the performance of Google Fact Check by analyzing the retrieved fact-checking results regarding 1,000 COVID-19-related false claims and found it able to retrieve the fact-checking results for 15.8% of the input claims, and the rendered results are relatively reliable. We also found that the false claims receiving different fact-checking verdicts (i.e., "False," "Partly False," "True," and "Unratable") tend to reflect diverse emotional tones, and fact-checking sources tend to check the claims in different lengths and using dictionary words to various extents. Claim variations addressing the same issue yet described differently are likely to retrieve distinct fact-checking results. We suggest that the quantities of the retrieved fact-checking results could be optimized and that slightly adjusting input wording may be the best practice for users t
Address Sanitizer (ASan) is a sharp weapon for detecting memory safety violations, including temporal and spatial errors hidden in C/C++ programs during execution. However, ASan incurs significant runtime overhead, which limits its efficiency in testing large software. The overhead mainly comes from sanitizer checks due to the frequent and expensive shadow memory access. Over the past decade, many methods have been developed to speed up ASan by eliminating and accelerating sanitizer checks, however, they either fail to adequately eliminate redundant checks or compromise detection capabilities. To address this issue, this paper presents Tech-ASan, a two-stage check based technique to accelerate ASan with safety assurance. First, we propose a novel two-stage check algorithm for ASan, which leverages magic value comparison to reduce most of the costly shadow memory accesses. Second, we design an efficient optimizer to eliminate redundant checks, which integrates a novel algorithm for removing checks in loops. Third, we implement Tech-ASan as a memory safety tool based on the LLVM compiler infrastructure. Our evaluation using the SPEC CPU2006 benchmark shows that Tech-ASan outperforms
The correct use of a Hardware Abstraction Layer (HAL) interface in embedded applications is crucial to prevent malfunctions, crashes, or even hardware damage. Software model checking has been successfully applied to check interface specifications in application programs, but its employment in industrial practice is hindered by its unpredictability (whether it succeeds for a given application program or not). In this paper, we present a novel approach to address this problem by checking the HAL interface specification continuously and right from the start of the development. I.e., we develop an embedded application in several iterations without a formal connection between the steps. The steps start from a program skeleton which does nothing but calling HAL functions. Actual functionality is added consecutively. The HAL interface specification is checked in each step of the sequence. The idea of the approach is to exploit a specific feature of software model checking: Its attempt to compute exactly the abstraction that is needed for the check to succeed may carry over from one step to the next, even if there is no formal connection between the steps. The experience from a preliminary
Conformance checking is a sub-discipline of process mining, which compares observed process traces with a process model to analyze whether the process execution conforms with or deviates from the process design. Organizations can leverage this analysis, for example to check whether their processes comply with internal or external regulations or to identify potential improvements. Gaining these insights requires suitable visualizations, which make complex results accessible and actionable. So far, however, the development of conformance checking visualizations has largely been left to tool vendors. As a result, current tools offer a wide variety of visual representations for conformance checking, but the analytical purposes they serve often remain unclear. However, without a systematic understanding of these purposes, it is difficult to evaluate the visualizations' usefulness. Such an evaluation hence requires a deeper understanding of conformance checking as an analysis domain. To this end, we propose a task taxonomy, which categorizes the tasks that can occur when conducting conformance checking analyses. This taxonomy supports researchers in determining the purpose of visualizati
We study linear codes that maximize minimum distance subject to arbitrary support constraints on the parity-check matrix. Such constraints arise naturally in the design of LDPC codes, locally repairable codes, and hardware-constrained systems where each parity check must involve only a limited number of code symbols. They are also essential in quantum error correction, where sparse stabilizers reduce measurement noise and respect the connectivity constraints of physical qubit architectures. We derive the optimal minimum distance possible given support constraints on the parity-check matrix and show it is achievable over sufficiently large fields. When this maximum distance coincides with the Singleton bound for unconstrained parity check matrices, the dual GM-MDS construction yields generalized Reed--Solomon codes obeying the mask. In the generator-matrix setting, the GM-MDS theorem guarantees that the optimal distance can always be achieved by a subcode of a generalized Reed--Solomon code while satisfying arbitrary support constraints. We show that this is not true for the parity-check setting. We exhibit a set of support constraints, derived from the vertex-edge incidence of $K_{
Tensions often arise between different datasets in cosmology, and consistency tests can serve as a powerful tool for diagnosing potential issues. The density-shear Baryon Acoustic Oscillations (GI BAO) are the imprint of the BAO feature on the shear field induced by the large-scale tidal field. We highlight that GI BAO can provide a robust consistency check for the density BAO, shear measurement, and alignment model. Failure of this check hints at systematics in any of these parts. As an illustration, we present the first GI BAO measurement on photometric data, using the DES Y3 dataset. We find the GI BAO constraint on the BAO scale dilation parameter $α$ to be $ 0.966 \pm 0.252 $ (1$σ$), in good agreement with the density BAO constraint, $ 0.966 \pm 0.037 $, thereby validating the density BAO, shear measurement, and the linear alignment model. Furthermore, we argue that combining the density BAO with the GI BAO yields results that are more resilient to systematic effects. Thanks to the massive data volumes of stage IV surveys, the GI BAO will play an even more prominent role as a consistency check.
Visualizations play a critical role in validating and improving statistical models. However, the design space of model check visualizations is not well understood, making it difficult for authors to explore and specify effective graphical model checks. VMC defines a model check visualization using four components: (1) samples of distributions of checkable quantities generated from the model, including predictive distributions for new data and distributions of model parameters; (2) transformations on observed data to facilitate comparison; (3) visual representations of distributions; and (4) layouts to facilitate comparing model samples and observed data. We contribute an implementation of VMC as an R package. We validate VMC by reproducing a set of canonical model check examples, and show how using VMC to generate model checks reduces the edit distance between visualizations relative to existing visualization toolkits. The findings of an interview study with three expert modelers who used VMC highlight challenges and opportunities for encouraging exploration of correct, effective model check visualizations.
Variant belief propagation (BP) algorithms are applied to low-density parity-check (LDPC) codes. However, conventional decoders suffer from a large resource consumption due to gathering messages from all the neighbour variable-nodes and/or check-nodes through cumulative calculations. In this paper, a check-belief propagation (CBP) decoding algorithm is proposed. Check-belief is used as the probability that the corresponding parity-check is satisfied. All check-beliefs are iteratively enlarged in a sequential recursive order, and successful decoding will be achieved after the check-beliefs are all big enough. Compared to previous algorithms employing a large number of cumulative calculations to gather all the neighbor messages, CBP decoding can renew each check-belief by propagating it from one check-node to another through only one variable-node, resulting in a low complexity decoding with no cumulative calculations. The simulation results and analyses show that the CBP algorithm provides little error-rate performance loss in contrast with the previous BP algorithms, but consumes much fewer calculations and memories than them. It earns a big benefit in terms of complexity.
We introduce a machine learning approach to model checking temporal logic, with application to formal hardware verification. Model checking answers the question of whether every execution of a given system satisfies a desired temporal logic specification. Unlike testing, model checking provides formal guarantees. Its application is expected standard in silicon design and the EDA industry has invested decades into the development of performant symbolic model checking algorithms. Our new approach combines machine learning and symbolic reasoning by using neural networks as formal proof certificates for linear temporal logic. We train our neural certificates from randomly generated executions of the system and we then symbolically check their validity using satisfiability solving which, upon the affirmative answer, establishes that the system provably satisfies the specification. We leverage the expressive power of neural networks to represent proof certificates as well as the fact that checking a certificate is much simpler than finding one. As a result, our machine learning procedure for model checking is entirely unsupervised, formally sound, and practically effective. We experiment
The CheckThat! lab aims to advance the development of innovative technologies designed to identify and counteract online disinformation and manipulation efforts across various languages and platforms. The first five editions focused on key tasks in the information verification pipeline, including check-worthiness, evidence retrieval and pairing, and verification. Since the 2023 edition, the lab has expanded its scope to address auxiliary tasks that support research and decision-making in verification. In the 2025 edition, the lab revisits core verification tasks while also considering auxiliary challenges. Task 1 focuses on the identification of subjectivity (a follow-up from CheckThat! 2024), Task 2 addresses claim normalization, Task 3 targets fact-checking numerical claims, and Task 4 explores scientific web discourse processing. These tasks present challenging classification and retrieval problems at both the document and span levels, including multilingual settings.
Large language models (LLMs) show promise in healthcare, but hallucinations remain a major barrier to clinical use. We present CHECK, a continuous-learning framework that integrates structured clinical databases with a classifier grounded in information theory to detect both factual and reasoning-based hallucinations. Evaluated on 1500 questions from 100 pivotal clinical trials, CHECK reduced LLama3.3-70B-Instruct hallucination rates from 31% to 0.3% - making an open source model state of the art. Its classifier generalized across medical benchmarks, achieving AUCs of 0.95-0.96, including on the MedQA (USMLE) benchmark and HealthBench realistic multi-turn medical questioning. By leveraging hallucination probabilities to guide GPT-4o's refinement and judiciously escalate compute, CHECK boosted its USMLE passing rate by 5 percentage points, achieving a state-of-the-art 92.1%. By suppressing hallucinations below accepted clinical error thresholds, CHECK offers a scalable foundation for safe LLM deployment in medicine and other high-stakes domains.
With flexible modeling software - such as the probabilistic programming language Stan - growing in popularity, quantities of interest (QOIs) calculated post-estimation are increasingly desired and customly implemented, both by statistical software developers and applied scientists. Examples of QOI include the marginal expectation of a multilevel model with a non-linear link function, or an ANOVA decomposition of a bivariate regression spline. For this, the QOI-Check is introduced, a systematic approach to ensure proper calibration and correct interpretation of QOIs. It contributes to Bayesian Workflow, and aims to improve the interpretability and trust in post-estimation conclusions based on QOIs. The QOI-Check builds upon Simulation Based Calibration (SBC), and the Holdout Predictive Check (HPC). SBC verifies computational reliability of Bayesian inference algorithms by consistency check of posterior with prior when the posterior is estimated on prior-predicted data, while HPC ensures robust inference by assessing consistency of model predictions with holdout data. SBC and HPC are combined in QOI-Checking for validating post-estimation QOI calculation and interpretation in the con
We consider a sender-receiver game in which the receiver's action is binary and the sender's preferences are state-independent. The state is multidimensional. The receiver can select one dimension of the state to check (i.e., observe) before choosing his action. We identify a class of influential equilibria in which the sender's message reveals which components of the state are highest, and the receiver selects one of these components to check. The sender can benefit from communication if and only if she prefers one of these equilibria to the no-communication outcome. Similar equilibria exist when the receiver can check multiple dimensions.
Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge. We present an interpretable, unified, language checking (UniLC) method for both human and machine-generated language that aims to check if language input is factual and fair. While fairness and fact-checking tasks have been handled separately with dedicated models, we find that LLMs can achieve high performance on a combination of fact-checking, stereotype detection, and hate speech detection tasks with a simple, few-shot, unified set of prompts. With the ``1/2-shot'' multi-task language checking method proposed in this work, the GPT3.5-turbo model outperforms fully supervised baselines on several language tasks. The simple approach and results suggest that based on strong latent knowledge representations, an LLM can be an adaptive and explainable tool for detecting misinformation, stereotypes, and hate speech.
In this paper, we present a system called Checkbochs, a machine simulator that checks rules about its guest operating system and applications at the hardware level. The properties to be checked can be implemented as `plugins' in the Checkbochs simulator. Some of the properties that were checked using Checkbochs include null-pointer checks, format-string vulnerabilities, user/kernel pointer checks, and race-conditions. On implementing these checks, we were able to uncover previously-unknown bugs in widely used Linux distributions. We also tested our tools on undergraduate coursework, and found numerous bugs.
This paper introduces a novel benchmark, EGE-Math Solutions Assessment Benchmark, for evaluating Vision-Language Models (VLMs) on their ability to assess hand-written mathematical solutions. Unlike existing benchmarks that focus on problem solving, our approach centres on understanding student solutions, identifying mistakes, and assigning grades according to fixed criteria. We compile 122 scanned solutions from the Russian Unified State Exam (EGE) together with official expert grades, and evaluate seven modern VLMs from Google, OpenAI, Arcee AI, and Alibaba Cloud in three inference modes. The results reveal current limitations in mathematical reasoning and human-rubric alignment, opening new research avenues in AI-assisted assessment. You can find code in https://github.com/Karifannaa/Auto-check-EGE-math