Accurate 3-D reconstruction of underwater objects is increasingly important for many applications, and a binocular laser scanner is a common way to achieve an underwater point cloud. However, the perspective-based binocular camera calibration method developed for in-air imaging often causes large errors in underwater 3-D reconstruction because the camera rays are refracted at the water-glass-air interfaces. To improve the accuracy of the underwater binocular laser scanner, this study establishes a binocular camera calibration method based on virtual imaging by considering multilayer refraction, which determines the refractive parameters and camera pose efficiently and accurately. The principles of the binocular laser scanning system are extended to the refractive geometry. The virtual point distribution constraint (VPDC) is employed for closed solution in binocular camera calibration, which estimates the axis, distance, and camera pose for accurate step-by-step initialization. The virtual optical center error (VOCE) is adopted for nonlinear optimization, achieving higher efficiency than forward projection without loss of accuracy. The camera calibration experiments demonstrate that the proposed method yields a more accurate closed solution (about five times and twice improvements for two configurations) and more efficient optimization (above ten times improvement) compared with existing methods. The 3-D reconstruction experiments also show improved accuracy with the proposed binocular camera calibration method.
This paper presents a multimodal sensing approach for fine-grained soccer action recognition using synchronized mm-wave FMCW radar and multiview RGB cameras. A TI IWR1443BOOST FMCW radar and three Sony IMX296 global-shutter cameras were used to record seven soccer-related actions in different movement directions in an outdoor environment. Range-Doppler radar processing is applied to extract global mel features and CFAR-localized block representations of mel and radar spectrogram features to capture both coarse and fine micro-Doppler characteristics. Camera features are derived from bounding box, HOG, optical flow, and pose estimations. Classification is performed using logistic regression as the classical model and various deep models. Performance is evaluated using cross-validation. Radar alone achieved moderate performance (0.897 F1macro using TCN), successfully identifying coarse motion but showing limited separability for dribbling-based actions. Camera-only models achieve near-perfect accuracy (≥0.997 F1macro using 1D-CNN), with the confusion matrices being nearly perfectly diagonal already. The best performance is obtained from a cross-modal transformer with multiple cameras (0.998 F1macro). These results demonstrate that a camera by itself performs strongly for the action recognition task but also that radar-camera fusion can improve robustness and enhance the discrimination of finer soccer player movements for outdoor analytics and player monitoring applications.
Existing models for stereo matching in minimally invasive surgery (MIS) require calibrated stereo images. Accurate calibration is however often unavailable intraoperatively. Training an uncalibrated stereo model is thus attractive but challenging owing to the lack of disparity-labelled surgical images. We leverage the wealth of non-medical stereo synthetic image datasets. These data were however generated in ideal conditions-rectified and with centred principal points-hence differ from real uncalibrated MIS images. We propose camera augmentation, a new type of image augmentation that augments a dataset by altering the camera's orientation and intrinsic parameters via geometric parameters. We augment the idealised existing datasets, sampling the geometric augmentation parameters from distributions estimated through an in-depth analysis and modelling of stereo laparoscopes. This forms the camera augmentation training strategy (CATS), with which we retrain RAFTStereo and IGEV++ for zero-shot uncalibrated stereo matching in MIS. We evaluated using the SCARED, StereoMIS, RIS2017, and an in-house datasets. In the uncalibrated setting on the SCARED dataset, CATS-RAFTStereo and CATS-IGEV++ achieved end-point errors (EPE) of 1.42 and 1.41 pixels. This is a successful result, as the reference pretrained models obtained 1.23 and 1.21 pixels in the calibrated setting and failed in the uncalibrated setting. Camera augmentation bridges the gap between ideally conditioned datasets and the real surgical conditions of uncertain or unavailable calibration, enabling the retraining of state-of-the-art architectures. Beyond stereo, the proposed CATS is applicable to tasks sensitive to camera geometry. Code and models will be released publicly.
Multi-line-scan camera systems provide high-frequency sampling and wide field-of-view coverage, making them valuable for three-dimensional measurement and dynamic reconstruction. However, their one-dimensional projection property introduces scale ambiguity and strong parameter coupling during calibration, which limits the consistency and stability of local optimization in multi-camera systems. To address this issue, this paper proposes a global calibration method based on physical constraints and hierarchical optimization. A unified imaging and motion model is constructed by incorporating physical scale constraints and structural priors, and geometric scale information is introduced into the joint optimization to reduce scale ambiguity and parameter coupling. Parameter normalization and staged optimization are further adopted to improve numerical stability for variables of different magnitudes and enable consistent estimation of multi-camera parameters within a unified framework. Simulation and experimental results show that the method achieves stable convergence under focal-length initialization perturbation, baseline deviation, and noise interference, with a three-dimensional reconstruction error below 0.67 mm and a convergence probability of at least 99.7%. These results indicate that the proposed method effectively reduces calibration uncertainty in multi-line-scan camera systems and supports high-precision online measurement and dynamic three-dimensional perception.
This study aimed to assess the diagnostic accuracy of digital intraoral photographs obtained using smartphones and a macro camera in evaluating oral health among adults. A total of 200 adult patients underwent clinical and radiographic examinations using the Decayed, Filled Teeth (DFT) Index, Caries Assessment Spectrum and Treatment (CAST) Index, Plaque Index (PI), and Modified Gingival Index (MGI). Intraoral photographs were taken using three devices: Samsung S23 Ultra, iPhone 14 Pro, and Canon EOS 400D with macro lens. Following the clinical recording of DFT, CAST, PI, and MGI scores by two calibrated examiners as the reference standard, intraoral photographs were captured by a third dentist and independently evaluated by two separate blinded examiners to compare the diagnostic accuracy of the devices against the clinical findings. Non-parametric analyses were conducted using the Friedman test with Dunn's post hoc test, Wilcoxon test and agreement between clinical and photographic methods was evaluated via the Bland-Altman method (p < 0.05). The macro camera demonstrated the highest inter-rater reliability for FT scores (ICC = 0.886), while iPhone-derived MGI scores showed the lowest reliability (ICC = 0.624). Statistically significant differences were found among all imaging devices for all indices (p < 0.001), except for MGI. Bland-Altman analysis showed that most values fell within the 95% limits of agreement, indicating good concordance with clinical data. Smartphone and macro camera photographs provided comparable diagnostic results for caries and restorations. However, limitations remain in the assessment of periodontal parameters via photographic methods. Smartphone-based intraoral photography can serve as a practical diagnostic tool in teledentistry.
Infrared thermography is a non-contact tool for monitoring inflammatory processes in the diabetic foot, but quantitative bedside use remains challenging with low-cost thermal infrared cameras due to radiometric drift, non-uniformity (vignetting), geometric distortions, and visible-thermal parallax. This paper presents an end-to-end clinical and instrumental framework built around a cheap thermal camera to ensure reproducible acquisition and physically consistent temperature estimation. The approach combines a standardized mobile acquisition setup and measurement protocol, extraction of embedded radiometric data from raw images, radiometric inversion with atmospheric correction, vignette correction performed in the radiometric domain, and geometric calibration of both visible and infrared sensors using dedicated (thermal) calibration targets. Accurate visible-infrared registration is obtained from hybrid heated markers, enabling reliable overlay and downstream analysis. The full processing chain yields quantitative thermograms with radiometric errors below 0.15 °C and sub-pixel multimodal alignment, supporting the detection of clinically relevant plantar temperature asymmetries and paving the way for routine calibrated low-cost thermography in diabetic foot care.
Video-based livestock monitoring offers a noninvasive, cost-effective, and scalable alternative to direct human monitoring, but also to commonly used collar or ear tag devices on farms. It enables simultaneous real-time observation of multiple animals while avoiding stress and injuries from physical devices. However, single-camera systems face challenges such as blind spots and limited individual tracking, especially in barns lacking corridor layouts. These limitations can be overcome using multi-camera, multi-cow tracking (MCMCT) systems that integrate deep learning and statistical techniques to enable continuous detection, identification, activity classification, and zone location of animals in the barn, under commercial conditions. This environment is characterized by high stocking density (in m2 per cow), occlusions, and variable lighting. In this study, a commercial MCMCT system was tested over 31 d (May 2025) on 3 Holstein dairy farms in western France. Herd size ranged from 70 to 250 lactating cows and used automatic milking systems (AMS), which allowed identification of all animals when milked. Individual detection performance of this MCMCT system was then validated compared with official AMS records. A dedicated hybrid confusion matrix framework was developed to jointly assess detection and identification errors in the sequential process, allowing precise calculation of recall, precision, and F1-scores at both stages. Overall, this MCMCT system achieved over 90% detection recall and 87% to 93% precision, successfully detecting continuously more than 9 out of 10 cows daily. Identification was more challenging, with recall varying from 69% to 78% and precision above 83%, resulting in F1-scores of 79% to 82%. The performance of detection varied significantly between day and night in 2 out of 3 farms (H1 and H2), with recall rates dropping to 76% at night and exceeding 94% during peak daylight, underscoring the impact of lighting and activity patterns. Activity classification and zone location were robust, with F1-scores exceeding 87%, demonstrating the system's capacity to provide practical insights for herd management such as monitoring individual behaviors, identifying high-density zones around resources, and supporting daily management decisions. This work confirms the system's practical viability as a scalable, noninvasive monitoring solution effective under commercial farm complexities such as crowding, occlusion, and lighting variability. The integration of day-night performance analysis and the hybrid confusion matrix provide a rigorous and transparent framework for assessing system reliability, critical for deploying precision livestock farming technologies. Identification performance decreased under overcrowded conditions. Overcrowding is defined here as a surface area of less than 9 m2 per cow or less than one cubicle per cow, as recommended by the EFSA Panel on Animal Health and Animal Welfare in 2023. The system demonstrates significant potential to support and enhance herd management, early disease detection, and animal welfare monitoring.
The on-orbit calibration for optical parameters of the space camera is the key to guaranteeing the imaging quality and navigation accuracy. The conventional on-orbit calibration methods are generally constructed based on the star angular distance invariance, which has high computational complexity, due to the complex matrix operation process. The computational and storage resources of spacecraft are severely limited, and the method is hard to realize. This paper proposes an efficient calibration method for the space camera. The core step is to solve the trace of the star matrix constructed by the observed star vectors. The computation process is essentially the addition of several scalars, and it is easier to compute than other calibration models. Besides, the traditional extended Kalman filtering algorithm needs to invert the high-order matrix, which is difficult to autonomously realize on spacecraft with severe resource constraints. The sequential extended Kalman filtering algorithm can avoid the problem, which can quickly and efficiently estimate the optical parameters. Simulation results demonstrate that the internal calibration eliminates most imaging distortion and provides an accurate mapping relationship between the imaging points and the observational direction of the target, with high computational efficiency and calibration accuracy.
The concept of piecewise-functional spectra is introduced as a tool for modeling autocorrelation statistics in the context of reflectance and color-signal spectra. By considering an infinite set of such spectra that satisfy any target statistics, a closed-form expression is obtained for the autocorrelation matrix. Significantly, the model only contains a single tuning parameter that enables the degree of correlation to be adjusted. Interestingly, if the autocorrelation statistics do not vary as a function of wavelength, then the model spectra become piecewise constant. The utility of the idea is demonstrated in the context of camera color characterization. Here a model of the reflectance spectra can be exactly characterized, which is important because it is not possible to measure or numerically calculate all reflectances that might be encountered in the real world.
We present an epi-illumination multi-camera array microscope (epi-MCAM) designed for wide-field reflective imaging of non-transparent samples. The epi-MCAM contains 24 tightly packed and synchronized epi-illumination microscope units, arranged in a 4 × 6 planar array at 18 mm spacing. Each unit contains a unique CMOS image sensor (13 megapixels each), an objective and tube lens pair, and a beamsplitter and epi-illumination light path. An epi-MCAM capture cycle produces a stitched image covering 72 × 108 mm2 at a micrometer scale resolution down to 2.46 μm. To image samples exceeding this native field of view, we translate the entire array across the sample surface to enable high-resolution coverage of large objects. We demonstrate the system's ability to image both flat and three-dimensionally structured reflective samples, such as semiconductor wafers and printed circuit boards, which highlight the epi-MCAM's strong potential within industrial inspection applications.
The cumulative misalignments in optical modules have become a major constraint on imaging quality, making precision alignment crucial in advanced manufacturing. However, traditional passive alignment is inefficient, and existing active alignment (AA) often relies on specialized equipment such as wavefront sensors, struggling to balance accuracy, speed, and practicality. This study introduces a sequential AA method using the modulation transfer function (MTF) as the key metric. The proposed method implements a sequential process: first, compensating for lens group decenters using a sensitivity matrix derived from defocus curves; second, optimizing lens group tilt via Bayesian optimization (BO); and finally, fine-tuning the image sensor by leveraging physical information from multiple fields of view (FoVs). The experimental results demonstrate that our method achieves alignment in merely 8.485 seconds, which is 59% faster than the traditional search-based method, while attaining a superior average MTF compared to mainstream solutions. This approach provides an accurate, efficient, and practical pathway for multi-degree-of-freedom AA of camera modules, with substantial potential for industrial application.
Achieving a high frame rate and high dynamic range (HDR) under complex illumination remains a significant challenge for airborne push-broom visible-near-infrared (VNIR) hyperspectral cameras. Problematic scenarios typically include high-contrast scenes, such as ocean whitecaps alongside deep water or concurrently sunlit and shadowed urban surfaces. To address this, a real-time HDR acquisition system based on a dual-gain complementary metal-oxide-semiconductor (CMOS) image sensor is proposed. Specifically, a four-pixel HDR fusion method is developed, utilizing an optical calibration setup to accurately determine the fusion parameters and configure the spectral region of interest (ROI) for reduced data volume. The complete workflow, encompassing spectral-spatial four-pixel binning and piecewise dual-gain fusion, is implemented on a field-programmable gate array (FPGA) using a dual-port RAM-based buffering strategy and a low-latency five-stage pipeline. Experimental results demonstrate a minimal processing latency of 0.0183 ms and a maximum frame rate of 290 frames/s. By extending the output bit depth from 11 to 15 bits, the system achieves a digital dynamic range of the final output of 2.03 × 104:1, representing a 9.58-fold improvement over the original low-gain data. The fused HDR data maintain high linearity and good spectral fidelity, with spectral angle mapper (SAM) values at the 10-3 level. Featuring a compact and low-power design, this system provides a practical engineering solution for efficient airborne VNIR hyperspectral acquisition.
To address the demand for wide-swath, high-resolution short-wave infrared (SWIR) imaging on resource-constrained spaceborne platforms, this study presents the design and on-orbit validation of a compact dual-channel push-broom (line-scanning) imaging system. The system adopts a transmissive optical architecture and a centralized, compact electronic control unit (ECU) configuration. By interleaving and mosaicking sixteen InGaAs linear array detectors, the system achieves an imaging swath of approximately 187 km and a nominal ground sampling distance of about 24 m, while maintaining a total instrument mass of 10.62 kg and a power consumption of approximately 12 W, thereby demonstrating a high level of integration and efficient resource utilization. To address focal plane consistency issues arising from multi-detector mosaicking, a closed-loop leveling method was developed using the modulation transfer function (MTF) as the primary performance metric. Through defocus estimation and quantitative correction of protrusions on a SiC substrate, convergence toward a unified confocal focal plane among multiple detectors was achieved. On-orbit image quality assessment indicates that the full width at half maximum (FWHM) of the line spread function (LSF) for both channels is approximately 1.38 pixels, with favorable signal-to-noise ratio (SNR) performance. These results validate the effectiveness of the proposed focal plane leveling strategy as well as the opto-mechanical-thermal design of the system. The proposed approach provides a practical pathway for the engineering implementation and consistency control of multi-detector mosaicked SWIR payloads under stringent resource constraints.
Multi-view pose estimation is essential for quantifying animal behavior in scientific research, yet current methods struggle to achieve accurate tracking with limited labeled data and suffer from poor uncertainty estimates. We address these challenges with a flexible framework that can operate with or without camera calibration, combining novel training and post-processing techniques with an uncertainty-aware pseudo-labeling distillation procedure. Our multi-view model processes all camera views jointly using a pretrained vision transformer backbone, and a simulated occlusion technique encourages the model to learn robust cross-view correspondences without requiring camera parameters. When camera parameters are available, 3D data augmentations and a triangulation-based loss further encourage geometric consistency. We extend the Ensemble Kalman Smoother (EKS) post-processor to the nonlinear case, leveraging camera geometry, and introduce a variance inflation technique that detects cross-view inconsistencies and corrects overconfident predictions. We validate our approach on five datasets spanning three species (fly, mouse, bird), including a multi-animal dataset with two visually distinct individuals; the proposed pipeline consistently outperforms existing methods across datasets. We demonstrate how these improvements translate to downstream scientific analyses using data from the International Brain Laboratory, showing improved unsupervised behavioral clustering and neural decoding of paw kinematics with just 200 labeled frames. To facilitate adoption, we developed a browser-based, cloud-compatible user interface that supports the full life cycle of multi-view pose estimation, from labeling and model training to post-processing with EKS and diagnostic visualizations.
To address the need for measuring high-precision full-field deformation of large planar array synthetic aperture radar antennas in orbit, this paper proposes a measurement method that integrates visual imaging and laser ranging. First, a coordinate system involving the camera, a two-axis turntable, and a laser rangefinder is established for the measurement system, and a three-dimensional (3D) coordinate calculation model is developed based on the triangulation principle. Subsequently, using the world coordinate system as a unified reference, a calibration model is constructed to characterize the non-orthogonal and non-intersecting spatial relationship between the two rotational axes of the turntable and the laser beam. On this basis, the external parameters of the camera relative to the world coordinate system are determined via the collinearity equation. The world coordinate system then serves as an intermediary to achieve high-precision extrinsic calibration between the camera and the laser rangefinder through coordinate transformation. Furthermore, a nonlinear mapping model between the image coordinates of a target point and the rotation angles of the turntable is established. It is demonstrated that this mapping exhibits a local one-to-one correspondence under aiming conditions. Through a strict convexity analysis, an effective operational domain is identified to ensure the stability of the inverse solution. Building on this, an autonomous laser beam aiming model is developed, incorporating coarse, secondary, and fine aiming strategies to guarantee precise convergence of the laser beam and the camera's line of sight at the target point. Experimental results show that the root mean square errors of the coordinate measurements in the X, Y, and Z directions are 0.28 mm, 0.34 mm, and 0.59 mm, respectively. This performance meets the accuracy requirements for in-orbit high-precision deformation monitoring of space-borne antennas, thereby providing a feasible technical solution for their on-orbit measurement.
Draft surveys are widely used to estimate cargo mass during bulk vessel loading and unloading; however, conventional procedures depend on manual draft readings that are episodic, labor-intensive, and sensitive to environmental conditions. Existing camera-based automated approaches rely on draft mark recognition or explicit waterline detection, which remain vulnerable to illumination variability, hull fouling, and wave-induced disturbances. This paper proposes a computer vision framework deployed at the Port of Santos, Brazil, using fixed quay-side cameras and a private 4G network infrastructure for continuous image transmission. Unlike prior methods, the framework estimates emergent hull height by segmenting vessel hull contours from bow and stern viewpoints using customized YOLOv8 instance-segmentation models, without relying on draft marks or waterline detection. Pixel measurements are converted to metric units using a nearby bollard of known height as a local physical reference. Field experiments monitor a Panamax bulk carrier over approximately 6.5 days, processing more than 34,000 images from each camera at an average rate of 3.7 images per minute. Both bow and stern segmentation models achieve mAP50-95 mask scores of 0.980 and 0.965, respectively, confirming precise and stable hull boundary delineation. Hull height decreases from 8.27 m to 4.64 m at the bow and from 7.98 m to 3.98 m at the stern over the loading period, with coherent and temporally stable trends across independent viewpoints. The proposed approach delivers repeatable and continuous hull-height estimates under real operational conditions, including variable lighting, background clutter, and partial occlusions, offering a practical and non-intrusive complement to traditional draft surveys for continuous vessel loading monitoring in modern ports.
Real-time monitoring of high-energy propellant combustion is difficult. Extreme high dynamic range (HDR), microsecond-scale particle motion, and heavy smoke often occur together. These conditions drive saturation, motion blur, and unstable particle extraction in conventional imaging. We present a closed-loop Event-SVE measurement system that couples a spatially variant exposure (SVE) camera with a stereo pair of neuromorphic event cameras. The SVE branch produces HDR maps with an explicit smoke-aware fusion strategy. A multi-cue smoke-likelihood map is used to separate particle emission from smoke scattering, yielding calibrated intensity maps for downstream analysis. The resulting HDR maps also provide the absolute-intensity reference missing in event cameras. This reference is used to suppress smoke-driven event artifacts and to improve particle-state discrimination. Based on the cleaned event observations, a stereo event-based 3D pipeline estimates separation height and equivalent particle size through feature extraction and triangulation (maximum calibration error 0.56%). Experiments on boron-based propellants show multimodal equivalent-radius statistics. The system also captures fast separation transients that are difficult to observe with conventional sensors. Overall, the proposed framework provides a practical, calibration-consistent route to microsecond-resolved 3D combustion measurement under smoke-obscured HDR conditions.
Background/Objectives: The incidence of pediatric pathological scars (PPS) has been gradually increasing due to various causes, highlighting the need for accurate scar assessment to monitor disease progression and therapeutic efficacy. Vancouver Scar Scale (VSS) and other scar evaluation systems are relatively subjective evaluation methods that rely on physicians' or patients' own judgment. By contrast, when comparing different scar scale evaluation methods, a three-dimensional (3D) camera and dermoscopy may provide relatively objective measurable parameters to avoid possible subjective bias created by the observers. This study aimed to compare the utility of traditional VSS evaluation with that of 3D cameras and dermoscopy in PPS evaluation. Methods: A total of 35 pediatric patients (aged 0-18 years) with PPS were involved, and their scars were assessed via the VSS, dermoscopy, and the Antera 3D® system. In addition, a subset of 18 patients (36 scar regions) was also evaluated for therapeutic efficacy after 3-6 months of treatment. Briefly, VSS scores were blindly evaluated by two independent dermatologists under standardized conditions. Quantitative assessment was also performed using dermoscopy and the Antera 3D® system. The former quantified chromatic parameters (pigmentation: L*, vascularity: a*, green value); the latter captured multispectral 3D images to analyze volume, pigmentation, and erythema. Data are presented as means ± standard deviation and analyzed using paired-sample t tests (one-tailed), the Wilcoxon signed-rank test, and standardized response means (SRMs) to assess therapeutic sensitivity, while baseline variability was evaluated using the standard deviation and coefficient of variation (CV). Results: The results showed that Antera 3D® detected significant reductions in pigmentation (p < 0.01, SRM = -0.46), vascularity (p < 0.001, SRM = -0.59), and volume (p < 0.0001, SRM = -0.83), while dermoscopy indicated similar moderate improvements in vascularity (Green value: p < 0.001, SRM = 0.57; a*: p < 0.0001, SRM = -0.68) and pigmentation (L*: p < 0.0001, SRM = 0.48) after treatments. VSS showed significant gains in pliability (p < 0.0001, SRM = -1.13), height (p < 0.01, SRM = -0.54), and overall impression (p < 0.0001, SRM = -0.86), but minimal changes in pigmentation (p > 0.05, SRM = 0) or vascularity (p > 0.05, SRM = -0.21). At baseline, Antera 3D® showed the greatest variability in pigmentation (CV 43.41%) and volume (CV 91.21%), followed by VSS in vascularity (CV 52.95%), pliability (CV 34.05%), and overall impression (CV 31.76%). Dermoscopy presented the lowest variability, indicating limited discriminative power. Conclusions: In conclusion, Antera 3D® offers an objective, sensitive, and spatially precise approach for PPS assessment and may provide additional quantitative information for evaluating subtle and early changes alongside traditional scar assessment scales. Its integration into clinical practice will enhance treatment monitoring and support more accurate timing of therapeutic interventions.
Determining the optical potential is one of the crucial aspects of the analysis and control of optical tweezers. It requires finding the scale factor of displacements in an optical tweezer recording, which typically is hard to obtain directly through a photodetector, and instead is usually done by fitting the statistical features of the data to the Langevin dynamics predictions. We argue that this procedure is harder for measurements made using a Segmented Quadrant Detector, for which these scale factors can vary along different axes due to image distortions. Subsequently, it can lead to a false recognition of the potential's anisotropy. We quantify this phenomenon by comparing simultaneous measurements from a Segmented Quadrant Detector and a camera, as well as from the 2D Lateral Effect Detector and a camera. We then discuss which experimental and data analysis techniques should be used to properly determine the geometry of the optical potential.
High-quality light field (LF) acquisition involves a trade-off between spatial and angular resolution. Hybrid camera systems offer a practical solution but present challenges for dense reconstruction due to sparse angular sampling and complex occlusions. To address these issues, this paper proposes a geometry-aware implicit neural representation (INR) framework for LF reconstruction. Distinct from traditional discrete representations or generic ray-space embeddings, we introduce a compact, continuous representation that combines a radiance field with a geometry module, where scalar disparity provides an ideal geometric interpretation and a ray-displacement field serves as the practical realization for occlusion-aware reconstruction. This framework leverages the global epipolar geometry of the 2D LF grid while utilizing the displacement field to correct local geometric inconsistencies caused by occlusions and non-Lambertian effects. By coupling these fields with a radiance network, our method enables end-to-end differentiable optimization from sparse, multi-resolution inputs without relying on large-scale external training datasets. Experiments on hybrid camera data fusion and spatial-angular super-resolution tasks demonstrate that our approach preserves high-frequency details and geometric consistency.