The growing demand for continuous physiological monitoring and human-machine interaction in real-world settings calls for wearable platforms that are flexible, low-power, and capable of on-device intelligence. This work presents BioGAP-Ultra, an advanced multimodal biosensing platform that supports synchronized acquisition of diverse electrophysiological and hemodynamic signals such as EEG, EMG, ECG, and PPG while enabling embedded AI processing at state-of-the-art energy efficiency. BioGAP-Ultra is a major extension of our previous BioGAP design aimed at meeting the rapidly growing requirements of wearable biosensing applications. It features (i) increased on-device storage ($\times$2 SRAM, $\times$4 FLASH), (ii) improved wireless connectivity (supporting up to 1.4 Mbit/s bandwidth, $\times$4 higher than BioGAP), (iii) enhanced number of signal modalities (from 3 to 5) and analog input channels ($\times$2). Further, it is accompanied by a real-time visualization and analysis software suite that supports the hardware design, providing access to raw data and real-time configurability on a mobile phone. Finally, we demonstrate the system's versatility through integration into various wearable form factors: an EEG-PPG headband consuming 32.8 mW, an EMG sleeve at 26.7 mW, and an ECG-PPG chestband requiring only 9.3 mW for continuous acquisition and streaming, tailored for diverse biosignal applications. To showcase its edge-AI capabilities, we further deploy two representative on-device applications: (1) ECG-PPG-based PAT estimation at 8.6 mW, and (2) EMG-ACC-based classification of reach-and-grasp motion phases, achieving 79.9 % $\pm$ 5.7 % accuracy at 23.6 mW. All hardware and software design files are also released open-source with a permissive license.
Surface-bound electrochemical aptamer-based (E-AB) sensors are a promising approach for continuous in-vivo and in-vitro biomolecular monitoring because they offer high selectivity, sensitivity, and real-time detection. However, accurately co-simulating E-AB sensors with readout circuits remains challenging due to the redox reporter's position-dependent electron-transfer kinetics and the electrical double layer's (EDL) complex behavior at the electrode-electrolyte interface. Here, we present a compact, SPICE-compatible electrochemical cell model that combines a Verilog-A implementation of the Marcus-Hush-based electron-transfer (ET) kinetics with a fractional-order RC-ladder representation of the EDL's non-ideal capacitance. The conventional Butler-Volmer model is replaced by Marcus-Hush kinetics, which features bounded and quantum mechanically derived ET rate constants, improving not only the model's physical interpretability but also numerical stability in circuit simulations. The model was validated with two E-AB sensors using square-wave voltammetry (SWV) across a range of excitation frequencies and target concentrations to confirm that the simulated transient currents accurately capture ET kinetics, thermodynamics, and the Langmuir isotherm's concentration response. When co-simulated with a transimpedance amplifier constructed with the TI OPA4354, the model produced electronic noise spectra that more closely matched experimental data when compared with spectra simulated using the simplified Randles circuit model. These results demonstrate that the proposed model provides a physically grounded framework for simulating surface-bound redox-based electrochemical biosensors and enables accurate co-simulation with readout circuits.
This manuscript provides a comprehensive review of the design, implementation, and advancements in integrated circuits (ICs) for electrochemical sensing, with a focus on biomedical and molecular applications. It begins by discussing the fundamental principles of electrochemical sensing and core modalities, including potentiometry, amperometry, impedimetry, and ISFET-based sensing, highlighting their unique requirements and challenges. A detailed analysis of state-of-the-art readout circuit architectures is presented, emphasizing strategies for achieving high dynamic range (DR), low noise, and enhanced stability while minimizing leakage currents. Both resistive and capacitive transimpedance amplifiers (TIAs) and current conveyor (CC)-based circuits are examined, exploring critical trade-offs between speed, power consumption, and noise performance. This review also discusses emerging applications such as DNA sequencing and molecular sensing, covering both ISFET and nanopore-based approaches, to showcase recent advancements in high-throughput, high-speed, and low-power interface circuit designs. By highlighting the challenges of the readout-circuit miniaturization, integration, and scalability, as well as the current limitations in existing approaches, this review provides a comprehensive synthesis of advancements in high-performance electrochemical readout architectures and their potential to address the evolving demands of modern biomedical applications.
Autonomous medical image segmentation enables critical applications, including urinary retention monitoring, prenatal fetal biometry, neuromodulation, and cardiovascular monitoring. Its deployment in wearable ultrasound patches demands on-device processing to preserve patient privacy and enable operation beyond clinical facilities. U-Net achieves state-of-the-art performance for biomedical segmentation, and recent binarized U-Nets retain high clinical accuracy with dramatically reduced computational cost. However, existing binary neural network (BNN) accelerators cannot support medicalgrade segmentation due to missing accuracy-enhancing features, poor hardware utilization for compute-optimal layers, and memory bottlenecks requiring costly external DRAM. This work presents a 0.81 mm2 fully-integrated U-Net processor in 28nm featuring: 1) mixed-precision datapaths combining binary convolution with 4-bit skip connections for clinical accuracy; 2) systematic design space exploration across 9,390 configurations optimizing energylatency tradeoffs; 3) interleaved memory representation and halo reuse for energy-efficient battery-powered operation; and 4) hardware-supported layer fusion and lossless compression eliminating external memory while reducing peak on-chip usage by 3.16× and 1.38×, respectively. Validated on bladder and fetal head segmentation datasets, the processor achieves 13.4 frames per second (fps) and 23 µJ per frame, enabling real-time autonomous monitoring in wearable medical devices.
Neuroprostheses capable of providing Somatotopic Sensory Feedback (SSF) enables the restoration of tactile sensations in amputees, thereby enhancing prosthesis embodiment, object manipulation, balance and walking stability. Transcutaneous Electrical Nerve Stimulation (TENS) represents a primary non-invasive technique for eliciting somatotopic sensations. Devices commonly used to evaluate the effectiveness of TENS stimulation are often bulky and main powered. However, current portable TENS devices frequently fall short of key functional requirements, particularly in terms of stimulation parameter ranges that are insufficient to reliably evoke somatotopic sensations in either upper and lower limb applications. Moreover, they typically do not support real-time independent channels programming and wireless communication. This work introduces a compact, wearable stimulator, including its external casing, with a total weight of 64 g and dimensions of 70 ${\boldsymbol{\times}}$ 40 ${\boldsymbol{\times}}$ 35 mm, designed to deliver SSF in both upper and lower limb applications. The device was validated through bench testing and human trials involving 20 healthy participants, by comparing the intensity, qualitative characteristics, and referred area of the elicited sensations with those produced by a benchmark. The stimulator reliably delivered the required parameters on a skin-like capacitive-resistive load and elicited somatotopic sensations consistent with the benchmark device and prior somatotopic feedback studies. The proposed stimulator provides non-invasive somatotopic sensory feedback for both upper and lower limbs. Its portability and modular design address key limitations of current commercial and research-grade TENS systems, enabling future studies on the functional benefits of sensory feedback in prosthetic control.
Epilepsy affects over 50 million people worldwide, posing a significant clinical challenge, particularly for patients unresponsive to conventional treatments. Advances in neural implants with on-device algorithms are revolutionizing epilepsy management by enabling precise, real-time seizure detection and reducing the technical and financial burden of data transmission. The current trend advances towards the integration of a larger number of electrodes in neural implants, enhancing spatial resolution and broadening brain coverage. Consequently, the increasing data demands necessitate highly efficient processing to minimize transmission bandwidth and power consumption, ensuring the long-term viability of implantable systems. This work presents a novel approach using time-series segmentation (TSS) to extract labeled information from raw recordings. The algorithm explores multiple outlier detection methods with a heuristic low-complexity event classifier, and employs a multichannel consensus strategy to improve detection accuracy through multichannel agreement. This system enables high-performance seizure detection and segments local field potentials (LFP) into clinically relevant labels for interpretation and post-processing. Tested on microelectrode array (MEA) recordings from mouse hippocampus-cortex slices treated with 4-aminopyridine, the system demonstrated robust reliability. Implemented on a Pynq-Z2 board with a Zynq 7020 System-on-Chip, the algorithm requires minimal calibration, achieving 95% accuracy, 94% sensitivity, and a 0.03% FPR with a low power consumption of 128 mW for the best-performing outlier detector. By demonstrating the application of TSS to implantable device algorithms for on-device processing, this work advances towards more effective, personalized epilepsy treatments.
Cardiovascular diseases (CVDs) are among the leading causes of mortality. Traditional diagnostic methods require hospital visits and professional medical personnel, but the timely detection of cardiac conditions can significantly improve survival rates. Therefore, wearable devices with edge-computing capabilities for real-time cardiovascular diagnosis are highly important. Heart sounds provide valuable information on valve closure; however, variations in heart rhythm or heart valve diseases (HVDs) can complicate the identification of affected valves and the interpretation of heart sound origins. Additionally, different disease classifications require distinct model architectures, posing significant challenges for implementation on wearable devices. This study addresses these challenges through three key contributions: an ECG-gating PCG algorithm, improved classification algorithms for arrhythmia and valvular heart disease, and a systolic array-based accelerator with an application-specific instruction-set processor (ASIP) capable of performing inference on multiple models. The algorithms achieve 97.8% and 99.3% accuracy on the MIT-BIH and heart murmur databases, respectively, with hardware quantization errors below 0.5%. The accelerator is fabricated in TSMC 180 nm CMOS technology, achieving an operating power of 414 µW at 1 MHz. The execution times for arrhythmia and valvular heart disease classification are 7.2 ms and 21 ms, respectively, and the energy efficiency normalized to 40 nm is 395.3 GOPS/W. These show that this system can effectively solve the classification of arrhythmia and heart valve diseases.
Continuous neural signal acquisition during electrical stimulation is essential for neuromodulation; nevertheless, it is often hindered by high-amplitude stimulation artifacts (SAs). This study presents a neuromodulation system with an application-specific integrated circuit (ASIC) that implements 2.9$\times$ faster adaptation than a fixed parameter method for the real-time recovery of neural signals fully overlapped with stimulation artifacts in both time and frequency domains, without any prior calibration. The on-chip SA removal module leverages an adaptive infinite impulse response (IIR)-based template-subtraction method with zero-multiplier operation and low computational complexity, enabling rapid template convergence and high accuracy under time-varying SAs while optimizing area and power efficiency. The stimulator incorporates a stimulation frequency dithering mechanism to minimize neural signal loss at the stimulation frequency and its harmonics during recovery. In vitro and in vivo experimental validation, including local field potential (LFP) and action potential (AP) recordings, demonstrated real-time SA removal, achieving 40 dB reduction of SA component and preserving neural signal integrity. The ASIC, fabricated using the TSMC 65 nm CMOS LP process, occupies a total die area of 1 mm2. The SA removal module including on-chip memory occupies 0.15 mm2 and consumes 1.3 µW. The presented system enables recovery of neural signals obscured by time-varying SAs in real time, without requiring prior calibration or external processing units.
Next-generation neurorehabilitation implants demand high-channel-count closed-loop systems with ultra-low-area and ultra-low-power readout and classification. This is essential in applications such as multi-type epileptic seizure detection, brain machine interfaces or brain-to-text conversion. Although recent designs achieve compactness and low power, they often cannot record neural signals during stimulation due to large, saturating artifacts. Conversely, artifact-tolerant solutions typically incur excessive area and power overhead to avoid saturation. We introduce a paradigm shift: enabling an ultra-compact, artifact-tolerant readout frontend by permitting brief saturation during stimulation pulses and applying backend interpolation to reconstruct the signals. High-fidelity neural features can thus be extracted with minimal error. To minimize the readout area footprint and to facilitate the routing from many electrodes, we reuse the whole frontend to read-out 64 inputs in a time-multiplexed fashion. Implemented in a 40nm CMOS process, our chip leverages the first published second-order fully time-based incremental analog-to-digital converter, achieving a state-of-the-art 290-$\mu$m2/ch area occupation and only 610-nW/ch of power consumption. The proposed hybrid electrode offset compensation further minimizes the area overhead without significantly compromising the noise or common-mode/power rejection across the full cancellation range. Artifact tolerance is validated in saline using an external stimulator chip. We demonstrate that the error on a broad set of features extracted from interpolated local-field-potential data remains below $\boldsymbol{\pm}$10%, even under harsh stimulation conditions.
In ECG classification applications, binarized convolutional neural networks (bCNNs) show great potential to achieve extremely low power consumption through 1-bit quantization. Existing bCNN approaches typically extract spatial features from the full ECG image without leveraging its sparsity, thereby introducing unnecessary computations and hardware resources. Meanwhile, inter-patient variability of ECG features degrades the classification performance due to accuracy loss caused by the binarization operation. To address these challenges, this paper proposes an energy-efficient ECG classifier based on a bCNN with on-chip learning. A patch-by-patch computation approach is used to reduce both power consumption and memory usage. Instead of processing the entire image, the ECG image is divided into small patches, and only the patches containing valid data are involved in feature extraction. An on-chip learning method is employed to improve classification accuracy among patients by updating the model weights using both the acquired bCNN features and the R-peak interval data. In addition, a reconfigurable convolutional processing element array and a base-2 softmax structure are designed to further reduce the hardware resources. The proposed classifier is verified on an FPGA, achieving a classification accuracy of 97.55% and a specificity of 89.15%. Synthesized using a 55 nm CMOS process, the ECG classifier occupies an area of 0.43 mm${}^{2}$. With a supply voltage of 1.2 V, the classifier consumes an average energy of 0.12 $\mu$J per classification and 0.09 $\mu$J per on-chip learning, making it suitable for wearable ECG classification application.
A wireless application-specific integrated circuit (ASIC), operating with the MagSonic modality using one magnetoelectric (ME) transducer, is presented for neural stimulation and recording. The ASIC integrates a bridge circuit that forms both power management and data transmitter with voltage doubling, rectification, regulation, and over voltage protection, a biphasic AC stimulator with high voltage tolerance and direct external control simplifying downlink complexities and on-chip processing overhead, an active charge balancing circuit adjusting the duration of second stimulation phase, and a continuous neural recording and uplink communication. The prototype MagSonic ASIC was fabricated in a 180 nm standard CMOS process (2 ${\boldsymbol\times}$ 1.75 mm2 total area) and requires only one ME transducer and an external storage capacitor to operate. In measurements, a bar shaped millimeter-scale ME transducer (5.1${\boldsymbol\times}$2.29${\boldsymbol\times}$1.69 mm3) with length mode operation at 330 kHz was used to power the ASIC, achieving up to 8.1 mW of received power at 40 mm depth. The biphasic AC stimulator occupying only 0.027 mm2 of active chip area provided 6.6 V (2${\boldsymbol\times}$VDD) tolerance (using 3.3 V transistors) with residual electrode voltage of < 50 mV. The amplified signals were converted into time using an analog-to-time converter and transmitted at a data rate of 186.2 kbps (< 10-3 BER) using the ME transducer's thickness mode frequency (1.66 MHz). Animal experiment results demonstrate the feasibility of ASIC's direct AC stimulation.
Gastric cancer remains a global health challenge with high mortality rates, underscoring the urgent need for advanced diagnostic tools. While conventional gastroscopy encounters patient reluctance due to procedural discomfort, wireless capsule endoscopy (WCE) provides a non-invasive alternative but faces challenges, including intricate motion control, constrained power supply, and restricted detection capability. This study presents a bimodal imaging WCE system that integrates near-infrared fluorescence and white-light imaging, enhanced with linear magnetic navigation and motion-robust wireless power transfer. The innovative geometrically polarized permanent magnet configuration enables sensorless adaptive and precise linear navigation (position accuracy: 0.29 mm; orientation accuracy: 0.97°). The axially self-aligning coil configuration achieves motion-robust power transfer, with capacity further enhanced by a novel internal magnet layout. Experimental validation demonstrates stable high-power reception (2 W), reduced operator dependency through the linear navigation, and improved lesion detection capability via bimodal imaging. This breakthrough addresses the fundamental limitations of current WCE systems, showcasing a mechatronic approach to advance gastric cancer diagnostics.
This article presents the first co-designed MRI imaging and magnetic positioning system for real-time dynamic motion compensation, achieving sub-millimeter tracking accuracy while preserving diagnostic image quality. The core innovation lies in a system-level co-design of an MRI imaging system and a magnetic localization system, featuring a customized receiver IC for processing magnetic signals coupled by the frontend RF coils, enabling artifact-free MRI imaging in dynamic scenarios. This integration enables a median positioning accuracy of 0.66 mm across a 40 × 40 × 50 cm3 field-of-view with a total power consumption of 997 μW. The key innovations include: 1) a time-division multiplexing scheme to enable signal detection from different coils while achieving spectral isolation between 1.4 MHz positioning signals and MRI Larmor frequencies through FPGA-synchronized blanking; 2) a dynamic calibration algorithm fusing magnetic tracking data with multi-frame MRI imaging, reducing spatial blur radius by 40% via weighted averaging; 3) an MRI-optimized Levenberg-Marquardt algorithm incorporating dynamic magnetic beacon weighting and spatial constraints, improving localization accuracy by 53% versus conventional algorithm. The system utilizes planar magnetic beacons with a dimension of 3 × 3 cm2, reducing spatial occupancy compared to prior designs. This work bridges critical gaps between high-precision tracking and artifact-free MRI, enabling real-time imaging of non-autonomous motion and respiratory motion compensation, representing a paradigm shift for MRI-guided interventions.
This paper proposes a wireless power and data transfer (WPDT) system for implantable medical applications, featuring a simple structure, high data rate (DR), and efficient power transmission. To streamline the frequency-shift keying (FSK) data transmission link, the FSK modulator integrates merely one oscillator and one frequency divider, while the FSK demodulator requires only one D flip-flop and one delay unit. This minimalist design generates two FSK carrier signals with a large frequency difference, simultaneously enhancing the data transmission rate and reducing the bit error rate (BER). To meet the WPDT system's requirements for high power transmission under transient conditions and low coupling coefficients, a coupled network capacitive compensation technique is employed. This method significantly enhances the power transmission capability of one carrier frequency, enabling greater power delivery to the load (PDL) of that carrier frequency during non-data transmission periods. Relevant circuits were fabricated using the 180 nm BCD process, and a prototype WPDT system based on these circuits has been successfully developed. Test results show that under a low coupling scenario (17.5 mm coil spacing), the system achieves a PDL of 148 mW while maintaining a DR of 1.1 Mbps with a BER below 10-8. This work fully verifies the system's feasibility and provides an efficient, reliable technical solution for wireless power supply and data transmission in implantable medical devices.
AI-powered medical imaging devices are increasingly used in clinical workflows to support real-time, accurate diagnosis and decision-making. Recent advances in State-Space Models (SSMs) such as Mamba have shown remarkable performance in capturing long-range dependencies for medical image classification. However, their computational complexity and sequential data flow make them difficult to deploy on hardware, limiting real-time and energy-efficient applications at the edge. To address this challenge, we propose MedMambaLite-v2, a shared selective scan framework enabling effective acceleration on embedded edge platforms. For this aim, we build upon our earlier MedMambaLite, and further extend it in MedMambaLite-v2 through a channel-only transition mechanism that achieves a 1.7× reduction in operations. We then optimize the Convolution (Conv) branch, and apply knowledge distillation to retain accuracy in a compressed student model. The resulting model is 23× smaller compared to the MedMamba baseline, with only 1.1% reduction in the overall accuracy evaluated across 10 distinct MedMNIST datasets spanning several imaging modalities. The proposed LiteSS2D hardware design also leverages parallelism across scan directions to enable simultaneous state updates, thereby improving memory efficiency, and further incorporates 8-bit quantization to reduce computational overhead. The reconfigurable FPGA hardware prototype demonstrates 9× reduction in latency for a parallel implementation compared with a serial baseline. Moreover, MedMambaLite-v2 is implemented and demonstrated through end-to-end inference on MedMNIST images on CPU and GPU platforms. Performance analysis of the proposed approach on NVIDIA Jetson Orin Nano and Raspberry Pi 5 shows up to 63% and 78% reductions in energy per inference, respectively, compared to the baseline.
Spiking Neural Networks are widely studied for their brain-inspired ability to process sequential information, yet their memory limitations often hinder the extraction of long-term dependencies. Reservoir computing, and in particular liquid state machines (LSM), has gained attention within this context for its ability to separate the recurrence and classification components into a recurrent reservoir liquid followed by a feedforward layer. However, existing LSM hardware suffers from significant design trade-offs, including large memory demands or performance degradation due to restrictive connectivity and weight precision. Inspired by these findings, we introduce SPIRE, a compact 1.13mm2 core area, fully digital multi-reservoir LSM with online learning adaptation. Implemented in TSMC 28nm CMOS technology, SPIRE is a memory-efficient multi-reservoir LSM tailored for time-series classification and edge deployment. By organizing up to eight reservoir ensembles into four parallelized cores, SPIRE enhances synaptic density and computational efficiency. Furthermore, SPIRE leverages on-the-fly generation of reservoir weights, reducing even further the memory footprint while supporting both sequential and parallelized dual operation modes. Benchmark results demonstrate that these design choices improve SPIRE's synaptic density by up to 18.46× over prior works. SPIRE achieves 3.56 GSOPs/mm2 with just 4.91 pJ/SOP in sequential inference and up to 76.05 GSOPs/mm2 with 0.1 pJ/SOP in parallel configurations running at 55 MHz and 0.55 V.
The detection of arrhythmias is crucial in monitoring cardiac health. However, electrocardiogram (ECG) signals obtained from wearable devices are often compromised by noise, including electrode motion artifacts, baseline wander, and muscle artifacts. This paper addresses these challenges by proposing a highly robust cardiac health monitoring processor featuring a cascaded triple-adaptive QRS detector and medically driven feature-fusion hybrid neural networks (HNN) for arrhythmia classification. The QRS detector uses a self-adaptive triple-threshold mechanism that dynamically correlates duration, RR interval, and error correction thresholds, allowing it to accurately identify QRS complex features in noisy signals, facilitated by event-driven sampling. The HNN arrhythmia classifier combines long short-term memory (LSTM) and artificial neural network architectures with three medically driven pathological feature fusion, achieving improved computational efficiency. The prototype is fabricated using the 65-nm CMOS process. The results reveal three findings. First, the total and dynamic power are 2.53 $\mu$W and 0.072 $\mu$W, respectively, and the all-digital implementation achieves the 0.99 mm${}^{2}$ area. Second, the average R-peak detection sensitivity/precision rates exceed 97.38%/97.08% on the MIT-BIH Noise Stress Test Database, and inter-patient classification accuracy exceeds 90.1% on the MIT-BIH Arrhythmia Database under a 6 dB signal-to-noise ratio (SNR). Third, the system achieves low computational complexity with only 2063 parameters and 5.5 KB of SRAM.
Brain-computer interfaces rely on precise decoding of neural signals, where spike sorting is a critical step to extract individual neuronal activities from complex neural data. This work presents a spiking neural network (SNN) framework for efficient spike sorting, named SIFT-RSNN. In the SIFT-RSNN, raw neural signals are encoded into spike trains using a threshold-based temporal encoding strategy, then a sparse-integrated filtering module refines misfiring spikes, enhancing data sparsity for pattern learning. The RSNN module with a membrane shortcut structure ensures efficient feature transfer and improves generalization performance of the overall system. The SIFT-RSNN achieves an accuracy of 96.2% and 99.6% on the Difficult1 and Difficult2 subsets of Leicester dataset, surpassing state-of-the-art methods. We also implement it on a compute-in-memory platform with 8k memristor cells utilizing quantization-free mapping method and propose two algorithm-hardware co-optimization strategies to mitigate non-ideal hardware effects: weight outlier pre-constraint (WOP) and noise adaptation training (NAT). After optimization, our algorithm continues to outperform existing spike sorting methods, achieving accuracies of 94.2% and 99.7%, while also demonstrating improved robustness. The memristor platform only exhibits a 2% and 1.5% accuracy drop compared to software results on the two difficult subsets. Additionally, it achieves 3.52 $ \boldsymbol{\mu}$J energy consumption and 0.5 ms latency per inference. This work offers promising solutions for brain-computer interface systems and neural prosthesis applications in the future.
Magnetic resonance imaging (MRI) exhibits rich and clinically useful endogenous contrast mechanisms, which can differentiate soft tissues and are sensitive to flow, diffusion, magnetic susceptibility, blood oxygenation level, and more. However, MRI sensitivity is ultimately constrained by Nuclear Magnetic Resonance (NMR) physics, and its spatiotemporal resolution is limited by SNR and spatial encoding. On the other hand, miniaturized implantable sensors offer highly localized physiological information, yet communication and localization can be challenging when multiple implants are present. This paper introduces the MRDust, an active "contrast agent" that integrates active sensor implants with MRI, enabling the direct encoding of highly localized physiological data into MR images to augment the anatomical images. MRDust employs a micrometer-scale on-chip coil to actively modulate the local magnetic field, enabling MR signal amplitude and phase modulation for digital data transmission. Since MRI inherently captures the anatomical tissue structure, this method has the potential to enable simultaneous data communication, localization, and image registration with multiple implants. This paper presents the underlying physical principles, design tradeoffs, and design methodology for this approach. To validate the concept, a 900 $\times$ 990 $\mu$m${}^{2}$ chip was designed using TSMC 28 nm technology, with an on-chip coil measuring 630 $\mu$m in diameter. The chip was tested with custom hardware in an MR750W GE3T MRI scanner. Successful voxel amplitude modulation is demonstrated with Spin-Echo Echo-Planar-Imaging (SE-EPI) sequence, achieving a contrast-to-noise ratio (CNR) of 25.58 with a power consumption of 130 $\mu$W.
Multispeckle Diffuse Correlation Spectroscopy (mDCS) is an advanced optical technology used to measure microvascular blood flow in deep tissue. It has emerged as a promising tool for continuous, real-time monitoring in clinical studies. However, its adoption in wearable applications is limited by the high resource demand for autocorrelation computation. To address this, we propose a resource-efficient 1-bit autocorrelator for in-pixel computation that exploits the inherent sparsity of photon detection events in long source-detector separation scenarios. By binarizing the detected photon counts, this architecture eliminates the need for multi-bit multipliers and reduces the bitwidth of shift registers. The proposed design is implemented on an FPGA and validated using real Single Photon Avalanche Diode (SPAD) inputs against a conventional 5-bit baseline under identical experimental conditions. It achieves 79% and 29.5% reductions in Look-Up Table and Flip-Flop usage, respectively. Systematic characterization using a rotating diffuser experimentally identifies an empirical threshold of 0.7 for the photon hit probability per lag bin, below which the 1-bit system estimates the decay time constant with tolerable error, closely matching the 5-bit baseline. The practical viability of this architecture is also demonstrated through a qualitative cuff occlusion measurement. This work presents a proof-of-concept for the 1-bit autocorrelator as a highly scalable solution toward massively parallel, large-array mDCS systems-on-chip for wearable healthcare devices.