We propose a novel targetless extrinsic calibration method for wide-baseline and wide-angle fisheye cameras, which are mounted on a driving vehicle for surround view monitoring. Sequences of image frames from three fisheye cameras are obtained, and the object instance and depth around the vehicle are used for calibration. Thus, the proposed method can be applied to online vehicle camera calibration. Fisheye images are first transformed into the cylindrical coordinate system by considering the panoramic formation of the cameras. Then, the state-of-the-art object detection and monocular depth estimation models are applied to the cylindrical images. Vehicle instances matched across different views are reconstructed into 3D point clouds, and their depths are scaled by employing the pose geometry of the front camera. The per-point depths and global scale are then jointly optimized to achieve accurate cross-view alignment and extrinsic calibration. Experiments on both real-world and synthetic video datasets show that the proposed method achieves higher accuracy than COLMAP and DUSt3R under challenging conditions such as wide baselines and low frame rates, without requiring an artificial calibration target.
Acoustic enrichment (AE)-the playback of ambient sound from healthy coral reefs-shows promise in attracting fish larvae to degraded or artificial reefs, but previous evaluations have used invasive diver-based sampling techniques, limiting most studies to short deployments in benign and accessible environments. This study used fully autonomous cameras to non-invasively evaluate AE efficacy in attracting settlement-stage fish larvae and mature fish over large fractions of a lunar cycle. We deployed an AE system in Kāne'ohe Bay (O'ahu, Hawai'i) over three spawning events (August 2023, June-July 2024). Treatment (active speaker) and control (inactive speaker) sites were built by placing artificial structures, autonomous cameras, and hydrophones on a sandy seabed at 5 m water depth, with sites separated by 42-65 m. Treatment and control designations were alternated between deployments to remove potential spatial bias. Manual and semi-automated image analysis found larval counts at both sites peaking around the new moon, but the treatment site attracted 4-14 times more larvae. Both sites encountered similar numbers of mature fish. These results demonstrate that autonomous camera systems can non-invasively study fish larval presence and provide further support that AE can enhance larval fish response.
Autonomous driving systems face critical perception failures in dense fog, where conventional RGB cameras suffer from severe degradation due to atmospheric scattering and reduced visibility. This paper presents an adaptive multi-modal fusion framework that synergistically integrates gated imaging with 3D LiDAR point clouds to achieve robust obstacle detection under visibility conditions as low as 50 m. Unlike standard cameras that passively capture scattered ambient light, gated cameras employ time-synchronized active illumination to physically filter backscattered photons, preserving structural features even in low-visibility scenarios. We propose a novel Adaptive Feature-Weighting Network (AFW-Net) that dynamically adjusts sensor modality contributions based on real-time environmental degradation assessment. The framework incorporates three key innovations: (1) a cross-modal feature extraction module that exploits the complementary physical properties of gated imaging and LiDAR, (2) an attention-based adaptive fusion mechanism that quantifies per-modality reliability through uncertainty estimation, and (3) a degradation-aware training strategy using weather-specific augmentation. Extensive experiments on the Princeton Automated Driving Dataset demonstrate that our approach maintains detection average precision (AP) above 82% under dense fog conditions (50 m visibility), representing a 23.7% improvement over state-of-the-art RGB-LiDAR fusion methods that exhibit substantial performance degradation to 58.4% AP. Ablation studies validate the necessity of each component, and cross-dataset evaluation confirms the generalization capability of the proposed framework. The adaptive weighting mechanism proves particularly effective, dynamically rebalancing modality contributions across the gated imaging and LiDAR branches while maintaining LiDAR geometric constraints. This work establishes a robust perception paradigm for safety-critical autonomous systems operating in low-visibility environmental conditions.
Extreme ultraviolet (EUV) thin-film filters are key optical components in space-based EUV imaging, employed to reject out-of-band visible and ultraviolet radiation. Currently, aluminum (Al) and zirconium (Zr) EUV filters are predominantly used due to their superior mechanical strength and stability. In contrast, indium (In) EUV filters had not been successfully deployed in spaceborne EUV cameras prior to this work. Their inherent fragility makes them highly susceptible to vibration and acoustic loads during the launch phase, ultimately resulting in structural failure. This study presents a comprehensive investigation into the structural protection strategies for indium EUV filters in space applications. Through systematic analysis of the photoenergy requirements and the mechanical characteristics of the indium filters, a robust filter assembly was developed, and vibration isolation as well as acoustic mitigation designs were implemented for the assembly. Finite element simulations and environmental tests confirmed that the indium filters can withstand vibration and acoustic loads. This technology has been successfully implemented in the 83.4 nm channel of the Extreme Ultraviolet Camera (EUVC) onboard the Queqiao-2 relay satellite for the Chang'E-7 mission. Subsequent in-orbit tests validated the structural integrity of the indium filters, providing a valuable technical reference for their successful application in future spaceborne EUV cameras.
Marker-based motion capture systems are considered the gold standard for biomechanical analysis of movements associated with anterior cruciate ligament (ACL) injury risk; however, their cost and technical requirements limit their use for large-scale athlete screening. Markerless motion capture has emerged as a potential alternative, using pose estimation algorithms or depth cameras to quantify movement without reflective markers. This systematic review evaluated the accuracy and validity of markerless motion capture systems for measuring lower-limb kinematics during jump-landing tasks commonly used in ACL injury screening. MEDLINE, Embase, and Web of Science were searched from 1990 to March 2025 for studies comparing markerless and marker-based systems in healthy participants. Extracted outcomes included Bland-Altman plots, root mean square error, mean absolute error, Pearson's correlation coefficient, coefficient of multiple correlation, and intraclass correlation coefficient. Across studies, markerless systems demonstrated moderate to high validity for several lower-limb kinematic measures, particularly in the sagittal plane, although validity varied across joints, movement phases, and task complexity. These findings suggest markerless motion capture shows potential for biomechanical assessment in ACL injury screening, but further validation is required before widespread implementation.
Thermal infrared imaging is physically well-suited to industrial anomaly detection. However, calibrated thermal cameras cost thousands of dollars per device, making it inaccessible for most industrial environments. In this paper, this barrier is directly addressed. Under the assumption of thermal equilibrium and Lambertian surface conditions, Kirchhoff's law of thermal radiation implies that thermal emissivity and optical reflectance are complementary quantities. Since luminance approximates reflectance for Lambertian surfaces, the synthetic thermal proxy ε = 1 - L follows directly from standard RGB input. This proxy is validated against real lock-in thermography images, achieving polarity-corrected mean Pearson r = 0.4805, with r > 0.78 for Lambertian-compliant surfaces. This paper introduces ThermalCLIP, which combines this proxy with frozen CLIP ViT-B/16 patch features and PDE-based multiscale residuals. On highly repetitive textures, the physics-informed Laplacian residual provides discriminative signal that may complement appearance-based methods. Across all 15 MVTec AD categories, ThermalCLIP achieves a mean image level AUROC of 0.9441, and 0.8724 in zero-shot transfer to the VisA dataset.
The use of image-sensing and real-time processing in Intelligent Transportation Systems (ITS) has introduced a sudden surge in transmitting and gathering high-resolution visual data from vehicle cameras, road infrastructures, and user devices. Such image data are, however, exceedingly susceptible to interception, tampering, and privacy breaches with regard to imminent quantum computing attacks that can break classical encryption algorithms. With such constraints in view, the paper presents a new Hybrid Quantum-Classical Image Encryption Framework that integrates chaos-based bit-level image encryption and quantum-resistant encryption measures to ensure high-security protection of image information in ITS infrastructures. The new framework integrates a customized bit-level chaotic permutation scheme using a Rearranged Arnold Cat Map (R-ACM) and 2D Logistic-Sine Chaotic Maps for confusion and diffusion, and the inclusion of a Quantum Key Distribution (QKD) or post-quantum lattice-based Kyber Key Encapsulation Mechanism (KEM) for secure key negotiation. The two-pyramidal security architecture enhances sensitivity to key and plaintext variations, offers chosen-plaintext, differential, noise, and occlusion attack immunity, and supports efficient encryption of RGB and grayscale image information without excessively large time overhead. Experimental results on representative ITS-relevant image data sets verify superior performance with mean NPCR > 99.60%, UACI ≈ 33.5%, entropy measures close to 8.0, and significantly suppressed correlation between neighboring pixels. Further, key space analysis demonstrates a combinatorial complexity of over 2²⁵⁶, making brute-force and quantum-type attacks computationally infeasible. The new framework is extremely suitable for real-time implementation in autonomous vehicles, roadside edge nodes, and intelligent traffic monitoring systems, thereby enabling secure, intelligent, and privacy-preserving ITS infrastructure in the post-quantum era.
Cameras and IMUs on heavy mining trucks supply the visual signal that Advanced Driver Assistance Systems (ADASs) use in open-pit operations. Haul roads in a surface mine are unstructured and unmarked, so a perception model must be both accurate and fast. We address this with a video-based multitask pipeline for a mining Driver Support System (DSS): a single BiSeNetV1 network produces drivable-area segmentation and steering-direction classification in one forward pass. Training used only 100 frames sampled non-sequentially from in-cab recordings of a real open-pit mine; evaluation used two full onboard sequences. To exploit temporal redundancy without annotating video, we propose an Adaptive Clockwork (A-CW) inference scheme: the spatial path runs on every frame, while the context path is refreshed only on keyframes whose cadence is set by the classification output, the same signal shown to the driver as a steering hint. This classification-guided policy increases context updates on curved segments, where the scene changes more rapidly, and reduces them on straight sections, where semantic redundancy is higher. The selected A-CW configuration was evaluated on full temporal test sequences, including one route kept entirely outside the training source. On this unseen route, A-CW achieved 94.70% road-class IoU and 73.68% Top-1 Accuracy. GPU-only throughput increased from about 55 FPS with frame-by-frame inference to 168.01 FPS, and display-excluded end-to-end processing in the simulated ADAS pipeline remained at approximately 37.5 FPS.
Integration of LiDAR and thermal sensing has become increasingly important in robotics, infrastructure diagnostics, environmental monitoring, and autonomous perception systems. LiDAR sensors provide accurate three-dimensional geometric information but do not directly capture thermal properties of observed objects, whereas thermal cameras provide temperature distributions without explicit spatial structure. Fusion of both sensing modalities enables thermally augmented 3D scene reconstruction and spatial localization of temperature anomalies. This paper presents a practical LiDAR-thermal fusion framework for three-dimensional localization of heat sources using an Ouster OS1 LiDAR sensor and a FLIR A70 thermal camera. The proposed framework includes intrinsic thermal-camera calibration, extrinsic LiDAR-thermal calibration, multimodal data synchronization, projection of LiDAR points onto the thermal image plane, and assignment of temperature values to spatial points. Additionally, a dedicated thermally distinguishable calibration target is proposed to enable reliable multimodal feature extraction under low-contrast LWIR imaging conditions. The developed framework was experimentally validated using real radiometric thermal data and LiDAR point clouds acquired under laboratory conditions. Quantitative evaluation demonstrated reprojection errors below 1 pixel and a mean hottest-point localisation error of approximately 4.1 cm at a distance of 12.3 m. The results confirm that accurate spatial localisation of thermal anomalies can be achieved using a geometry-based multimodal fusion approach without relying on computationally expensive learning-based methods. The proposed framework emphasises practical deployment, deterministic calibration, and applicability in scenarios where limited training data or constrained computational resources make learning-based approaches difficult to apply. The proposed system may be applied to building energy diagnostics, industrial inspection, technical infrastructure monitoring, and robotic perception systems that require reliable spatial localisation of heat sources under real measurement conditions.
Autonomous driving systems (ADSs) are reliable only when heterogeneous sensors, estimation algorithms, and safety mechanisms are engineered as a single coherent safety-critical measurement system rather than as loosely coupled modules. Production stacks integrate cameras, LiDAR, automotive radar, and GNSS/IMU, yet deployment remains constrained by modality-specific failure modes, calibration and synchronization drift, and out-of-distribution (OOD) conditions that violate modeling assumptions. These limitations induce overconfidence and downstream decision errors whenever planning assumes certainty sharper than sensing can justify. This survey introduces a sensor-centric framework linking measurement physics, uncertainty propagation, fusion integrity, safety assurance, and risk-aware planning and control. We formalize what each modality physically measures; unify probabilistic, evidential, and conformal uncertainty representations; analyze filtering, factor-graph, BEV, transformer, and state-space fusion architectures with an emphasis on robustness and graceful degradation; and generalize aviation-style integrity concepts (RAIM/ARAIM) to multi-modal autonomy. The distinctive contribution is a single sensor-to-assurance throughline in which every uncertainty representation is tied to its measurement physics, every fusion architecture is evaluated against an explicit integrity-monitoring requirement generalized from RAIM/ARAIM, and every safety-standard clause is mapped to a concrete architectural mechanism. We map these mechanisms onto ISO 26262, ISO 21448 (SOTIF), ISO/PAS 8800, ANSI/UL 4600, and the UNECE framework, and connect perception uncertainty to decision-making through chance-constrained MPC and formal safety filters (RSS, CBF). Industry case studies and emerging V2X and generative-simulation approaches close the loop to deployable safety arguments.
ObjectiveThis study developed an artificial intelligence (AI)-based hand hygiene assessment system to improve low-supervision monitoring, enhance compliance in operating rooms, standardize procedures, and reduce the risk of intraoperative infections.MethodsThis system installed high-definition cameras, faucets, and hand sanitizer sensors in the operating room handwashing area to collect real-time data on handwashing videos, water flow sounds, duration of water usage, and hand sanitizer consumption. The system automatically identified health care personnel and monitors compliance with surgical handwashing protocols step-by-step, assessing whether procedures were standardized and duration requirements were met. Upon detecting omissions, sequence errors, or insufficient duration, immediate corrections were provided through visual and auditory prompts on the screen. The system automatically recorded all data throughout the process without requiring on-site supervision, ensuring stable adaptation to both routine operating room scenarios and emergency surgical procedures.ResultsExperimental findings demonstrated that the system achieved a handwashing step recognition rate of 94.57%, an assessment accuracy rate of 93.25%, and a handwashing duration compliance rate of 92.68%. When deployed in clinical environments, including surgical and emergency departments, the system significantly improved handwashing compliance, increasing the adherence rate to 94.2%. Additionally, the average handwashing duration was reduced to 45.7 seconds, accompanied by a substantial decrease in non-compliant behaviors.ConclusionThe AI-based hand hygiene assessment system substantially enhanced the standardization and efficiency of handwashing procedures in operating rooms, significantly improved hand hygiene compliance and standardized practice, and demonstrated strong clinical applicability. Future research should focus on optimizing the model and incorporating feedback mechanisms to further improve the accuracy and user experience of the system.
Multiparametric video-based aquaculture monitoring systems are becoming essential for observing and quantifying fish behavior in real time. These platforms enable high-frequency, continuous recording of underwater activity, supporting automation in feeding management and behavioral analytics. The Tilapia Feeding Behavior Image Dataset (TFBID-mini) was generated using controlled recirculating aquaculture tanks equipped with high-definition underwater cameras to capture feeding events of Oreochromis niloticus (Nile tilapia). The dataset consists of 4,000 expert-annotated images classified into four feeding-intensity levels-None, Weak, Medium, and Strong-each representing a distinct behavioral state observed across multiple feeding sessions. Images were extracted from 1080p video sequences recorded under varying illumination conditions and quality-checked to ensure visual clarity, color balance, and consistency. The resulting dataset provides a standardized benchmarking resource for evaluating computer vision and artificial intelligence models for automated recognition of feeding behavior and for optimizing intelligent feeding systems in aquaculture.
Infrared focal plane array detectors produce column stripe noise due to inter-detector response variations. Existing single-frame correction methods operate exclusively on the degraded infrared image and cannot reliably distinguish column noise from genuine vertical scene structures. With the increasing availability of co-registered visible-light cameras in modern electro-optical/infrared payloads, we propose to exploit the visible image as a structural guide for infrared destriping. Through a cross-modal correlation analysis, we show that the structural correspondence between RGB and infrared images is spatially non-uniform, motivating a selective rather than uniform fusion strategy. Based on this observation, we propose CMSP (Cross-Modal Scene Prior), a lightweight single-frame denoising architecture that selectively applies RGB guidance where it is beneficial. The proposed AdaptiveSPADE module blends RGB-guided modulation with standard instance normalization through a learned per-pixel confidence map, while a dual-path output head separately estimates pixel-wise residuals and column-constant stripe patterns. Evaluated on three public RGB-IR datasets, CMSP achieves 51.91 dB PSNR on M3FD, outperforming the best baseline by 5.79 dB with only 638 K parameters. A downstream evaluation on real stripe noise demonstrates that CMSP not only removes artifacts but also preserves the fine structures critical for infrared small target detection. Ablation studies confirm that adaptive gating more than doubles the benefit of RGB guidance compared to uniform modulation, and prevents degradation when cross-modal alignment is weak.
Deepfakes threaten sensor-based authentication systems, including biometric sensors, surveillance cameras, and IoT edge devices. Unimodal detectors remain vulnerable to modality-specific attacks. We propose a multimodal deepfake detection framework optimized for resource-constrained edge devices, featuring a novel cross-modal attention fusion mechanism with adaptive gating. The architecture combines enhanced Res2Net for audio, temporal 3D CNN with SE attention for video, and bidirectional cross-modal attention with quality-based gates. On our benchmark (5472 audio + 1842 video samples), the fusion model achieves 96.7% accuracy, 96.6% F1-score, 0.988 AUC-ROC, and 3.3% EER. Adversarial testing shows 92.3% accuracy under the Fast Gradient Sign Method (FGSM) attack. The model has a 30.3 MB footprint and runs at 20 FPS on edge hardware. Modality contribution analysis reveals adaptive weighting (72% audio for TTS forgery, 78% video for lip-synced attacks). Cross-dataset evaluation on FakeAVCeleb achieves 92.3% overall accuracy, confirming generalization.
We investigated whether bobcats ( Lynx rufus) and domestic cats ( Felis catus ) exhibit distinct daily activity patterns or use different habitats in the Houston, Texas metropolitan area. Motion-activated cameras were deployed at 33 sites for 16 one-month sampling periods from 2020 - 2024. Bobcats exhibited primarily nocturnal activity wherever they were present. Domestic cats were primarily nocturnal at sites where no bobcats were detected. Bobcats and domestic cats overlapped at sites with a mixture of forest and developed land and domestic cats shifted to more daytime activity. Both temporal and spatial niche partitioning appear to facilitate predator coexistence in urban landscapes.
This article examines the meaning-making processes involved in how non-human animals, particularly dogs, contribute to our understanding of the concept of "a ball-throwing exercise." Theoretically, the article aligns with and contributes to interspecies pragmatics by introducing cognitive dimensions to the field. Interspecies pragmatics is based on conversation analytic (CA) research; this article explores how CA findings on cognition can contribute to pragmatic questions on meaning-making processes in interaction. For this investigation, data were collected using GoPro action cameras attached to harnesses on three dogs, providing footage from their first-person perspective. The data, which total nearly 2 hours, were analyzed using CA methods. We focus on dog-walking sessions where the dog summons the human to pick up its ball. The human arrives and either picks up the ball or leaves it on the ground to kick it later. This sequence is a pre to the exercise that occurs when the human returns the ball to the dog: the human throws or kicks the ball for the dog to catch. We identified two sequential occurrences in pre-sequences where human and non-human animal meaning-making processes intertwine: (1) the moment after the human's expected answer in the second position arrives, allowing the dog proceeds to the third position closing turn (dog runs) or directly to the main sequence (dog remains in place), and (2) the moment when the dog initiates a repair due to the lack of an expected answer by the human. Common to these occurrences is that they highlight the dog's expectations regarding the human's action in the second position of the pre-sequence. These findings suggest that cognition is sequentially situated within the relationship between the pre-sequence and the main sequence, indicating how occurrences in the pre-sequence materialize in the main sequence. Cognition is grounded in the sequential development that facilitates the entire exercise through cooperation with the other participant.
Medical infrared thermography, which involves the use of infrared thermal cameras for the non-invasive assessment of skin surface temperature distribution, has gained increasing interest in recent years as a tool supporting diagnosis and treatment monitoring. The aim of this article is to present the historical background and critically reassess the current role of infrared thermography in medicine, with particular emphasis on standardization as a key determinant of its clinical utility. This Perspective highlights the fundamental impact of methodological variability on diagnostic performance and reproducibility. A structured framework for standardization is proposed, encompassing patient preparation, environmental conditions, device parameters and calibration, image acquisition protocols, region-of-interest definition and analysis, as well as reporting and clinical interpretation. The analysis demonstrates how inconsistencies at each of these levels reduce measurement reliability, limit inter-study comparability, and weaken clinical confidence in infrared thermography. The article also addresses the growing availability of mobile thermal imaging systems and their integration with artificial intelligence, while emphasizing the need for stronger evidence-based support across all methodological domains. The presented analysis suggests that, despite existing limitations, medical infrared thermography holds considerable potential as a supportive clinical tool. However, its broader clinical implementation remains limited by several factors, with the lack of standardized protocols constituting a major and practically addressable translational barrier. Wider adoption will require standardization efforts alongside rigorous validation studies and application-specific interpretative guidelines. Addressing these challenges through technological advances and coordinated international standardization may facilitate meaningful progress over the next decade.
Polarimetric color cameras are a forefront technology that simultaneously captures polarimetric and color information by analyzing polarization states across different color channels, commonly red, green, and blue. In general, each of these color channels can carry different polarization information. Therefore, measuring the polarization Stokes vector at several discrete wavelengths simultaneously and with the highest possible resolution is of interest in multiple research areas. However, when a commercial color polarization sensor is used under simultaneous narrowband RGB illumination mode, its channels cannot be assumed to represent independent wavelength channels. Spectral overlap of the color filters introduces color crosstalk between wavelength-dependent analyzer intensities, which may bias the reconstructed Stokes parameters if it is not corrected before polarimetric inversion. Several methods have been proposed in the literature to address the color crosstalk problem but they typically assume that the polarization state is identical for all wavelengths. This assumption does not generally hold for real samples, which exhibit wavelength-dependent depolarization, retardance, and dichroism. To the best of our knowledge, this is the first work presenting a method that addresses the color crosstalk problem without assuming that the polarization state is identical across all wavelengths. In addition, Fourier domain demosaicking techniques are applied to interpolate the data and reconstruct the images. The present study demonstrates how the proposed method leads to an accurate recovery of chromatic and polarimetric information on both synthetic and real-world datasets. To test our approach, narrowband light beams at three wavelengths (470, 554, 630 nm), with different spatial polarization and degree of linear polarization distributions, have been simulated and validated with simulated and experimental data. The results demonstrate the feasibility of the method for accurate three polarization channels measurements.
Head-mounted eye-tracking systems play a critical role in virtual reality, human-computer interaction, and clinical applications, yet achieving both high angular accuracy and precise 3D gaze position estimation with low-cost hardware remains challenging. This paper proposes a lightweight, training-free geometric 3D gaze tracking framework for binocular 3D gaze tracking using consumer-grade hardware, which leverages stereo geometric triangulation and a simplified physiological eye model to achieve robust 3D gaze estimation, requiring only standard infrared cameras and dichroic mirrors without additional specialized hardware. The method was evaluated in controlled indoor conditions with 30 participants, where it achieved an angular error ranging from 1.1° to 2.82° and a 3D gaze position error below 13.24 mm. Compared to two state-of-the-art academic non-deep-learning methods, the proposed framework delivers competitive angular accuracy while significantly reducing 3D position error, outperforming the baselines by 34% to 56% in depth estimation precision. These results demonstrates that the proposed geometric framework is a practical and effective solution for high-precision 3D gaze tracking on low-cost hardware, suitable for both research and consumer applications.
Atypical gaze patterns are consistently reported in autism, reflecting differences in social attention and interest. Gaze-tracking paradigms provide an objective way to quantify these differences and may serve as early indicators of autism. This diagnostic test accuracy systematic review and meta-analysis evaluated the performance of eye-tracking-based gaze measures in children. Following Preferred Reporting Items for Systematic Reviews and Meta-Analyses of Diagnostic Test Accuracy (PRISMA-DTA) guidance, studies published between 2015 and 2025 that compared gaze-tracking paradigms with standardized autism diagnoses were synthesized. Pooled diagnostic odds ratio (DOR), sensitivity, and specificity were estimated using random-effects and hierarchical summary receiver operating characteristic models. Risk of bias was assessed with QUADAS-2 and funnel plots. Seventeen studies (n = 4,256) from six countries met the inclusion criteria. Tasks included social-geometric preference, motherese-nonsocial speech, and visual-orienting paradigms analyzed with rule-based or machine-learning methods. The pooled area under the hierarchical summary receiver operating characteristic curve (HSROC AUC) was 0.845; DOR 15.03 (95% CI 8.00-28.50); sensitivity 0.77 (95% CI 0.65-0.85); and specificity 0.80 (95% CI 0.75-0.84). Although heterogeneity was high (I2 = 87.78%), effect directions were consistent. Dynamic social stimuli and higher-frequency tracking systems achieved the best performance. Gaze-tracking tests distinguished autistic and nonautistic children across diverse settings, supporting their potential role as a quantitative, observer-independent adjunct for early identification and clinical decision support.Lay abstractAutism is a form of neurodiversity characterized by differences in social communication, sensory processing, and patterns of attention and interest, which often shape how autistic people look at and interpret the world around them. Eye-tracking technology records where a person looks on a screen and how long their gaze remains on elements, such as people, faces, or objects. Because it is objective and does not rely on language or complex instructions, eye-tracking may support earlier identification of autism. This study reviewed 17 research papers published between 2015 and 2025 that explored how eye-tracking distinguishes autistic and nonautistic children. Together, these studies included over 4,000 participants and compared attention to social scenes, like people talking or playing, with attention to nonsocial or geometric patterns. On average, eye-tracking correctly identified autism about 77% of the time and nonautistic children about 80% of the time, with the best results achieved with dynamic social videos and high-quality tracking cameras. These findings suggest that gaze-based measures capture meaningful differences in social attention and could complement existing diagnostic approaches through earlier, more objective assessment.