A minimalist vision system uses the smallest number of pixels needed to solve a vision task. While traditional cameras use a large grid of square pixels, a minimalist camera uses freeform pixels that can take on arbitrary shapes to increase their information content. We show that the hardware of a minimalist camera can be modeled as the first layer of a neural network, where the subsequent layers are used for inference. Training the network for any given task yields the shapes of the camera's freeform pixels, each of which is implemented using a photodetector and an optical mask. We have designed minimalist cameras for monitoring indoor spaces (with 8 pixels), measuring room lighting (with 8 pixels), and estimating traffic flow (with 8 pixels). The performance demonstrated by these systems is on par with a traditional camera with orders of magnitude more pixels. Minimalist vision has two major advantages. First, it naturally tends to preserve the privacy of individuals in the scene since the captured information is inadequate for extracting visual details. Second, since the number of measurements made by a minimalist camera is very small, we show that it can be fully self-powered,
Mid-InfraRed spectroscopy is a promising label-free technique that can offer insights into morphological and pathological alterations in biological tissues at the molecular level. Owing to the development of the Fourier Transform InfraRed (FTIR) spectrometer, combined with scanning devices, FTIR images can be produced by simultaneously acquiring spectral data from multiple spatial points, generating comprehensive chemical maps. In the data pre-processing, the identification of the sample pixels, with the background pixels excluded, is important for further effective feature extraction in FTIR images. Here, we present three algorithms realized in unsupervised and supervised approaches for the identification of the sample pixels. The algorithms demonstrate accurate prediction results of the sample and background pixels, and the supervised method further enables the automatic detection. These findings highlight thorough and robust solutions to the sample pixels detection problem in FTIR images, contributing to the FTIR signal processing and future research with FTIR images.
We present a smart pixel prototype readout integrated circuit (ROIC) designed in CMOS 28 nm bulk process, with in-pixel implementation of an artificial intelligence (AI) / machine learning (ML) based data filtering algorithm designed as proof-of-principle for a Phase III upgrade at the Large Hadron Collider (LHC) pixel detector. The first version of the ROIC consists of two matrices of 256 smart pixels, each 25$\times$25 $μ$m$^2$ in size. Each pixel consists of a charge-sensitive preamplifier with leakage current compensation and three auto-zero comparators for a 2-bit flash-type ADC. The frontend is capable of synchronously digitizing the sensor charge within 25 ns. Measurement results show an equivalent noise charge (ENC) of $\sim$30e$^-$ and a total dispersion of $\sim$100e$^-$ The second version of the ROIC uses a fully connected two-layer neural network (NN) to process information from a cluster of 256 pixels to determine if the pattern corresponds to highly desirable high-momentum particle tracks for selection and readout. The digital NN is embedded in-between analog signal processing regions of the 256 pixels without increasing the pixel size and is implemented as fully comb
To better understand the behavior of image classifiers, it is useful to visualize the contribution of individual pixels to the model prediction. In this study, we propose a method, MoXI ($\textbf{Mo}$del e$\textbf{X}$planation by $\textbf{I}$nteractions), that efficiently and accurately identifies a group of pixels with high prediction confidence. The proposed method employs game-theoretic concepts, Shapley values and interactions, taking into account the effects of individual pixels and the cooperative influence of pixels on model confidence. Theoretical analysis and experiments demonstrate that our method better identifies the pixels that are highly contributing to the model outputs than widely-used visualization by Grad-CAM, Attention rollout, and Shapley value. While prior studies have suffered from the exponential computational cost in the computation of Shapley value and interactions, we show that this can be reduced to quadratic cost for our task. The code is available at https://github.com/KosukeSumiyasu/MoXI.
Can one perceive a video's content without seeing its pixels, just from the camera trajectory-the path it carves through space? This paper is the first to systematically investigate this seemingly implausible question. Towards this end, we propose a contrastive learning framework to train CamFormer, a dedicated encoder that projects camera pose trajectories into a joint embedding space, aligning them with natural language. We find that, contrary to its apparent simplicity, the camera trajectory is a remarkably informative signal to uncover video content. In other words, "how you move" can indeed provide valuable cues about "what you are doing" (egocentric) or "observing" (exocentric). We demonstrate the versatility of our learned CamFormer embeddings on a diverse suite of downstream tasks, ranging from cross-modal alignment to classification and temporal analysis. Importantly, our representations are robust across diverse camera pose estimation methods, including both high-fidelity multi-sensored and standard RGB-only estimators. Our findings establish camera trajectory as a lightweight, robust, and versatile modality for perceiving video content.
ASTRONIRCAM is an infrared camera-spectrograph installed at the 2.5-meter telescope of the CMO SAI. The instrument is equipped with the HAWAII-2RG array. A bad pixels classification of the ASTRONIRCAM detector is proposed. The classification is based on histograms of the difference of consecutive non-destructive readouts of a flat field. Bad pixels are classified into 5 groups: hot (saturated on the first readout), warm (the signal accumulation rate is above the mean value by more than 5 standard deviations), cold (the rate is under the mean value by more than 5 standard deviations), dead (no signal accumulation), and inverse (having a negative signal accumulation in the first readouts). Normal pixels of the ASTRONIRCAM detector account for 99.6% of the total. We investigated the dependence between the amount of bad pixels and the number of cooldown cycles of the instrument. While hot pixels remain the same, the bad pixels of other types may migrate between groups. The number of pixels in each group stays roughly constant. We found that the mean and variance of the bad pixels amount in each group and the transitions between groups do not differ noticeably between normal or slow coo
The semantic information of the image for intelligent tasks is hidden behind the pixels, and slight changes in the pixels will affect the performance of intelligent tasks. In order to preserve semantic information behind pixels for intelligent tasks during wireless image transmission, we propose a joint source-channel coding method based on semantics of pixels, which can improve the performance of intelligent tasks for images at the receiver by retaining semantic information. Specifically, we first utilize gradients of intelligent task's perception results with respect to pixels to represent the semantic importance of pixels. Then, we extract the semantic distortion, and train the deep joint source-channel coding network with the goal of minimizing semantic distortion rather than pixel's distortion. Experiment results demonstrate that the proposed method improves the performance of the intelligent classification task by 1.38% and 66% compared with the SOTA deep joint source-channel coding method and the traditional separately source-channel coding method at the same transmission ra te and signal-to-noise ratio.
In this paper, we present Mixels, programmable magnetic pixels that can be rapidly fabricated using an electromagnetic printhead mounted on an off-the-shelve 3-axis CNC machine. The ability to program magnetic material pixel-wise with varying magnetic force enables Mixels to create new tangible, tactile, and haptic interfaces. To facilitate the creation of interactive objects with Mixels, we provide a user interface that lets users specify the high-level magnetic behavior and that then computes the underlying magnetic pixel assignments and fabrication instructions to program the magnetic surface. Our custom hardware add-on based on an electromagnetic printhead and hall effect sensor clips onto a standard 3-axis CNC machine and can both write and read magnetic pixel values from magnetic material. Our evaluation shows that our system can reliably program and read magnetic pixels of various strengths, that we can predict the behavior of two interacting magnetic surfaces before programming them, that our electromagnet is strong enough to create pixels that utilize the maximum magnetic strength of the material being programmed, and that this material remains magnetized when removed from
Image inpainting is an effective method to enhance distorted digital images. Different inpainting methods use the information of neighboring pixels to predict the value of missing pixels. Recently deep neural networks have been used to learn structural and semantic details of images for inpainting purposes. In this paper, we propose a network for image inpainting. This network, similar to U-Net, extracts various features from images, leading to better results. We improved the final results by replacing the damaged pixels with the recovered pixels of the output images. Our experimental results show that this method produces high-quality results compare to the traditional methods.
This work observed the problem of fingerprint image recognition in the case of missing pixels from the original image. The possibility of missing pixels recovery is tested by applying the Compressive Sensing approach. Namely, different percentage of missing pixels is observed and the image reconstruction is done by applying commonly used approach for sparse image reconstruction. The theory is verified by experiments, showing successful image reconstruction and later person identification even if less then 90% of the image pixels is missing.
Language models are defined over a finite set of inputs, which creates a vocabulary bottleneck when we attempt to scale the number of supported languages. Tackling this bottleneck results in a trade-off between what can be represented in the embedding matrix and computational issues in the output layer. This paper introduces PIXEL, the Pixel-based Encoder of Language, which suffers from neither of these issues. PIXEL is a pretrained language model that renders text as images, making it possible to transfer representations across languages based on orthographic similarity or the co-activation of pixels. PIXEL is trained to reconstruct the pixels of masked patches instead of predicting a distribution over tokens. We pretrain the 86M parameter PIXEL model on the same English data as BERT and evaluate on syntactic and semantic tasks in typologically diverse languages, including various non-Latin scripts. We find that PIXEL substantially outperforms BERT on syntactic and semantic processing tasks on scripts that are not found in the pretraining data, but PIXEL is slightly weaker than BERT when working with Latin scripts. Furthermore, we find that PIXEL is more robust than BERT to orthog
We explore design principles for general pixel-level prediction problems, from low-level edge detection to mid-level surface normal estimation to high-level semantic segmentation. Convolutional predictors, such as the fully-convolutional network (FCN), have achieved remarkable success by exploiting the spatial redundancy of neighboring pixels through convolutional processing. Though computationally efficient, we point out that such approaches are not statistically efficient during learning precisely because spatial redundancy limits the information learned from neighboring pixels. We demonstrate that stratified sampling of pixels allows one to (1) add diversity during batch updates, speeding up learning; (2) explore complex nonlinear predictors, improving accuracy; and (3) efficiently train state-of-the-art models tabula rasa (i.e., "from scratch") for diverse pixel-labeling tasks. Our single architecture produces state-of-the-art results for semantic segmentation on PASCAL-Context dataset, surface normal estimation on NYUDv2 depth dataset, and edge detection on BSDS.
This document presents novel algorithms for detection of noisy and flickering pixels from Burst Alert telescope event data and subsequent elimination of data from such pixels to create a filtered event file. The approach adopted for this purpose is quite different from the current practises and focuses more on the temporal variation of data in the detector pixels over long intervals of time against the current algorithms which follow a pixel based approach.
A novel method to estimate the pixels simultaneous detection probability and the spatial resolution of pixelized detectors is proposed, which is based on the determination of the statistical correlations between detector neighbor pixels. The correlations are determined by means of noise variance measurement for a isolated pixels and the difference between neighbor pixels. The method is validated using images from the two different GE Senographe 2000D mammographic units. The pixelized detector has been irradiated using x-rays along its entire surface. It is shown that the pixel simultaneous detection probabilities can be estimated within accuracy 0.001 - 0.003, where the systematic error is estimated to be smaller than 0.005. The presampled two-dimensional point-spread function (PSF0) is determined using a single Gaussian and a sum of two Gaussian approximations. The obtained results for the presampled PSF0 show that the single Gaussian approximation is not appropriate, and the sum of two Gaussian approximations providing the best fit predicts the existence of a large (~50%) narrow component. Another proof of this fact is the latest simulation study of columnar indirect digital dete
The square and rectangular shape of the pixels in the digital images for sensing and display purposes introduces several inaccuracies in the representation of digital images. The major disadvantage of square pixel shapes is the inability to accurately capture and display the details in the objects having variable orientations to edges, shapes and regions. This effect can be observed by the inaccurate representation of diagonal edges in low resolution square pixel images. This paper explores a less investigated idea of using variable shaped pixels for improving visual quality of image scans without increasing the square pixel resolution. The proposed adaptive filtering technique reports an improvement in image PSNR.
Hybrid pixel detectors have been invented for the LHC to make tracking and vertexing possible at all in LHC's radiation intense environment. The LHC pixel detectors have meanwhile very successfully fulfilled their promises and R\&D for the planned HL-LHC upgrade is in full swing, targeting even higher ionising doses and non-ionising fluences. In terms of rate and radiation tolerance hybrid pixels are unrivaled. But they have disadvantages as well, most notably material thickness, production complexity, and cost. Meanwhile also active pixel sensors (DEPFET, MAPS) have become real pixel detectors but they would by far not stand the rates and radiation faced from HL-LHC. New MAPS developments, so-called DMAPS (depleted MAPS) which are full CMOS-pixel structures with charge collection in a depleted region have come in the R\&D focus for pixels at high rate/radiation levels. This goal can perhaps be realised exploiting HV technologies, high ohmic substrates and/or SOI based technologies. The paper covers the main ideas and some encouraging results from prototyping R\&D, not hiding the difficulties.
Despite extensive research into adversarial attacks, we do not know how adversarial attacks affect image pixels. Knowing how image pixels are affected by adversarial attacks has the potential to lead us to better adversarial defenses. Motivated by instances that we find where strong attacks do not transfer, we delve into adversarial examples at pixel level to scrutinize how adversarial attacks affect image pixel values. We consider several ImageNet architectures, InceptionV3, VGG19 and ResNet50, as well as several strong attacks. We find that attacks can have different effects at pixel level depending on classifier architecture. In particular, input pre-processing plays a previously overlooked role in the effect that attacks have on pixels. Based on the insights of pixel-level examination, we find new ways to detect some of the strongest current attacks.
Development of small pixels for high resolution image sensors implies a lot of challenges. A high level of performance should be guaranteed whereas the overall size must be reduced and so the degree of freedom in design and process. One key parameter of this constant improvement is the knowledge and the control of the crosstalk between pixels. In this paper, we present an advance in crosstalk characterization method based on the design of specific color patterns and the measurement of quantum efficiency. In a first part, we describe the color patterns designed to isolate one pixel or to simulate un-patterned colored pixels. These patterns have been implemented on test-chip and characterized. The second part deals with the characterization setup for quantum efficiency. Indeed, the use of spectral measurements allows us to discriminate pixels based on the color filter placed on top of them and to probe the crosstalk as a function of the depth in silicon, thanks to the photon absorption length variation with the wavelength. In the last part, results are presented showing the impact of color filters patterning, i.e. pixels in a Bayer pattern versus un-patterned pixels. The crosstalk di
Hyperspectral image (HSI) clustering is gaining considerable attention owing to recent methods that overcome the inefficiency and misleading results from the absence of supervised information. Contrastive learning methods excel at existing pixel level and super pixel level HSI clustering tasks. The pixel-level contrastive learning method can effectively improve the ability of the model to capture fine features of HSI but requires a large time overhead. The super pixel-level contrastive learning method utilizes the homogeneity of HSI and reduces computing resources; however, it yields rough classification results. To exploit the strengths of both methods, we present a pixel super pixel contrastive learning and pseudo-label correction (PSCPC) method for the HSI clustering. PSCPC can reasonably capture domain-specific and fine-grained features through super pixels and the comparative learning of a small number of pixels within the super pixels. To improve the clustering performance of super pixels, this paper proposes a pseudo-label correction module that aligns the clustering pseudo-labels of pixels and super-pixels. In addition, pixel-level clustering results are used to supervise s
Pixel antennas, based on discretizing a continuous radiation surface into small elements called pixels, are a flexible reconfigurable antenna technology. By controlling the connections between pixels via switches, the characteristics of pixel antennas can be adjusted to enhance the wireless channel. Inspired by this, we propose a novel technique denoted antenna coding empowered by pixel antennas. We first derive a physical and electromagnetic based communication model for pixel antennas using microwave multiport network theory and beamspace channel representation. With the model, we optimize the antenna coding to maximize the channel gain in a single-input single-output (SISO) pixel antenna system and develop a codebook design for antenna coding to reduce the computational complexity. We analyze the average channel gain of SISO pixel antenna system and derive the corresponding upper bound. In addition, we jointly optimize the antenna coding and transmit signal covariance matrix to maximize the channel capacity in a multiple-input multiple-output (MIMO) pixel antenna system. Simulation results show that using pixel antennas can enhance the average channel gain by up to 5.4 times and