The burgeoning growth of open-sourced vision-language models (VLMs) has catalyzed a plethora of applications across diverse domains. Ensuring the transparency and interpretability of these models is critical for fostering trustworthy and responsible AI systems. In this study, our objective is to delve into the internals of VLMs to interpret the functions of individual neurons. We observe the activations of neurons with respects to the input visual tokens and text tokens, and reveal some interesting findings. Particularly, we found that there are neurons responsible for only visual or text information, or both, respectively, which we refer to them as visual neurons, text neurons, and multi-modal neurons, respectively. We build a framework that automates the explanation of neurons with the assistant of GPT-4o. Meanwhile, for visual neurons, we propose an activation simulator to assess the reliability of the explanations for visual neurons. System statistical analyses on top of one representative VLM of LLaVA, uncover the behaviors/characteristics of different categories of neurons.
Neurons, as eukaryotic cells, have powerful internal computation capabilities. One neuron can have many distinct states, and brains can use this capability. Processes of neuron growth and maintenance use chemical signalling between cell bodies and synapses, ferrying chemical messengers over microtubules and actin fibres within cells. These processes are computations which, while slower than neural electrical signalling, could allow any neuron to change its state over intervals of seconds or minutes. Based on its state, a single neuron can selectively de-activate some of its synapses, sculpting a dynamic neural net from the static neural connections of the brain. Without this dynamic selection, the static neural networks in brains are too amorphous and dilute to do the computations of neural cognitive models. The use of multi-state neurons in animal brains is illustrated in hierarchical Bayesian object recognition. Multi-state neurons may support a design which is more efficient than two-state neurons, and scales better as object complexity increases. Brains could have evolved to use multi-state neurons. Multi-state neurons could be used in artificial neural networks, to use a kind
Neurons primarily communicate through the emission of action potentials, or spikes. To generate a spike, a neuron's membrane potential must cross a defined threshold. Does this spiking mechanism inherently prevent neurons from transmitting their subthreshold membrane potential fluctuations to other neurons? We prove that, in theory, it does not. The subthreshold membrane potential fluctuations of a presynaptic population of spiking neurons can be perfectly transmitted to a downstream population of neurons. Mathematically, this surprising result is an example of concentration phenomenon in high dimensions.
This work demonstrates how increasing the number of neurons in a network without increasing its total number of non-zero parameters improves performance. We show that this gain corresponds with a decrease in interference between multiple features that would otherwise share the same neurons. On symbolic Boolean tasks, splitting each neuron into sparser sub-neurons with knowledge of the clauses systematically reduces polysemanticity metrics and yields higher task accuracy. Notably, even random splits of neuron weights approximate these gains, indicating that reduced collisions, not precise assignment, are a primary driver. Consistent with the superposition hypothesis, the benefits of this framework grow with increasing interference: when polysemantic load is high, accuracy improvements are the largest. Transferring these insights to more realistic models, including classifiers over CLIP embeddings, convolutional neural networks, and deeper multilayer networks, we find that widening networks while maintaining a constant non-zero parameter count consistently increases accuracy. These results identify an interpretability-grounded mechanism to leverage width against superposition, improv
Brains can process sensory information from different modalities at astonishing speed; this is surprising as the integration of inputs through the membrane of each individual neuron already causes a delayed response. Neuronal recordings {\em in vitro} reveal a possible explanation for this fast processing, in terms of individual neurons advancing their output firing rates with respect to the input, a concept which we refer to as prospective coding. The underlying mechanisms of prospective coding, however, are not completely understood. We propose a mechanistic explanation for individual neurons advancing their output on the level of single action potentials and instantaneous firing rates. We show that the spike generation mechanism can be the source for prospective (advanced) or retrospective (delayed) responses. A simplified Hodgkin-Huxley model identifies sodium inactivation as a source for prospective firing, controlling the timing of the neuron's output as a function of the voltage and its temporal derivative. We further show that slow adaptation processes, such as spike-frequency adaptation or deactivating dendritic currents, represent mechanisms generating prospective firing
We address the problem of identifying functional interactions among stochastic neurons with variable-length memory from their spiking activity. The neuronal network is modeled by a stochastic system of interacting point processes with variable-length memory. Each chain describes the activity of a single neuron, indicating whether it spikes at a given time. One neuron's influence on another can be either excitatory or inhibitory. To identify the existence and nature of an interaction between a neuron and its postsynaptic counterpart, we propose a model selection procedure based on the observation of the spike activity of a finite set of neurons over a finite time. The proposed procedure is also based on the maximum likelihood estimator for the synaptic weight matrix of the network neuronal model. In this sense, we prove the consistency of the maximum likelihood estimator {followed} by a proof of the consistency of the neighborhood interaction estimation procedure. The effectiveness of the proposed model selection procedure is demonstrated using simulated data, which validates the underlying theory. The method is also applied to analyze spike train data recorded from hippocampal neur
The minimal number of neurons required for a feedforward neural network to interpolate $n$ generic input-output pairs from $\mathbb{R}^d\times \mathbb{R}^{d'}$ is $Θ(\sqrt{nd'})$. While previous results have shown that $Θ(\sqrt{nd'})$ neurons are sufficient, they have been limited to sigmoid, Heaviside, and rectified linear unit (ReLU) as the activation function. Using a different approach, we prove that $Θ(\sqrt{nd'})$ neurons are sufficient as long as the activation function is real analytic at a point and not a polynomial there. Thus, the only practical activation functions that our result does not apply to are piecewise polynomials. Importantly, this means that activation functions can be freely chosen in a problem-dependent manner without loss of interpolation power.
The generation of spikes by neurons is energetically a costly process and the evaluation of the metabolic energy required to maintain the signalling activity of neurons a challenge of practical interest. Neuron models are frequently used to represent the dynamics of real neurons but hardly ever to evaluate the electrochemical energy required to maintain that dynamics. This paper discusses the interpretation of a Hodgkin-Huxley circuit as an energy model for real biological neurons and uses it to evaluate the consumption of metabolic energy in the transmission of information between neurons coupled by electrical synapses, i.e. gap junctions. We show that for a single postsynaptic neuron maximum energy efficiency, measured in bits of mutual information per ATP molecule consumed, requires maximum energy consumption. On the contrary, for groups of parallel postsynaptic neurons we determine values of the synaptic conductance at which the energy efficiency of the transmission presents clear maxima at relatively very low values of metabolic energy consumption. Contrary to what it could be expected best performance occurs at low energy cost.
Cadieu et al. (Cadieu,2014) reported that deep neural networks(DNNs) could rival the representation of primate inferotemporal cortex for object recognition. Lehky et al. (Lehky,2011) provided a statistical analysis on neural responses to object stimuli in primate AIT cortex. They found the intrinsic dimensionality of object representations in AIT cortex is around 100 (Lehky,2014). Considering the outstanding performance of DNNs in object recognition, it is worthwhile investigating whether the responses of DNN neurons have similar response statistics to those of AIT neurons. Following Lehky et al.'s works, we analyze the response statistics to image stimuli and the intrinsic dimensionality of object representations of DNN neurons. Our findings show in terms of kurtosis and Pareto tail index, the response statistics on single-neuron selectivity and population sparseness of DNN neurons are fundamentally different from those of IT neurons except some special cases. By increasing the number of neurons and stimuli, the conclusions could alter substantially. In addition, with the ascendancy of the convolutional layers of DNNs, the single-neuron selectivity and population sparseness of DNN
Collective dynamics of spiking networks of neurons has been of central interest to both computation neuroscience and network science. Over the past years a new generation of neural population models based on exact reductions (ER) of spiking networks have been developed. However, most of these efforts have been limited to networks of neurons with simple dynamics (e.g. the quadratic integrate and fire models). Here, we present an extension of ER to conductance-based networks of two-dimensional Izhikevich neuron models. We employ an adiabatic approximation, which allows us to analytically solve the continuity equation describing the evolution of the state of the neural population and thus to reduce model dimensionality. We validate our results by showing that the reduced mean-field description we derived can qualitatively and quantitatively describe the macroscopic behaviour of populations of two-dimensional QIF neurons with different electrophysiological profiles (regular firing, adapting, resonator and type III excitable). Most notably, we apply this technique to develop an ER for networks of neurons with bursting dynamics.
We view a neural network as a distributed system of which neurons can fail independently, and we evaluate its robustness in the absence of any (recovery) learning phase. We give tight bounds on the number of neurons that can fail without harming the result of a computation. To determine our bounds, we leverage the fact that neural activation functions are Lipschitz-continuous. Our bound is on a quantity, we call the \textit{Forward Error Propagation}, capturing how much error is propagated by a neural network when a given number of components is failing, computing this quantity only requires looking at the topology of the network, while experimentally assessing the robustness of a network requires the costly experiment of looking at all the possible inputs and testing all the possible configurations of the network corresponding to different failure situations, facing a discouraging combinatorial explosion. We distinguish the case of neurons that can fail and stop their activity (crashed neurons) from the case of neurons that can fail by transmitting arbitrary values (Byzantine neurons). Interestingly, as we show in the paper, our bound can easily be extended to the case where synap
Can we use spiking neural networks (SNN) as generative models of multi-neuronal recordings, while taking into account that most neurons are unobserved? Modeling the unobserved neurons with large pools of hidden spiking neurons leads to severely underconstrained problems that are hard to tackle with maximum likelihood estimation. In this work, we use coarse-graining and mean-field approximations to derive a bottom-up, neuronally-grounded latent variable model (neuLVM), where the activity of the unobserved neurons is reduced to a low-dimensional mesoscopic description. In contrast to previous latent variable models, neuLVM can be explicitly mapped to a recurrent, multi-population SNN, giving it a transparent biological interpretation. We show, on synthetic spike trains, that a few observed neurons are sufficient for neuLVM to perform efficient model inversion of large SNNs, in the sense that it can recover connectivity parameters, infer single-trial latent population activity, reproduce ongoing metastable dynamics, and generalize when subjected to perturbations mimicking photo-stimulation.
As a follow-up tutorial article of [29], in this paper, we will introduce the basic compositional units of the human brain, which will further illustrate the cell-level bio-structure of the brain. On average, the human brain contains about 100 billion neurons and many more neuroglia which serve to support and protect the neurons. Each neuron may be connected to up to 10,000 other neurons, passing signals to each other via as many as 1,000 trillion synapses. In the nervous system, a synapse is a structure that permits a neuron to pass an electrical or chemical signal to another neuron or to the target effector cell. Such signals will be accumulated as the membrane potential of the neurons, and it will trigger and pass the signal pulse (i.e., action potential) to other neurons when the membrane potential is greater than a precisely defined threshold voltage. To be more specific, in this paper, we will talk about the neurons, synapses and the action potential concepts in detail. Many of the materials used in this paper are from wikipedia and several other neuroscience introductory articles, which will be properly cited in this paper. This is the second of the three tutorial articles a
Mammalian brain is a complex organ that contains billions of neurons. These neurons form various neural circuits that control the perception, cognition, emotion and behavior. Developing in vivo neuronal labeling and imaging techniques is crucial for studying the structure and function of neural circuits. In vivo techniques can provide true physiological information that cannot be provided by ex vivo methods. In this study, we describe a new strategy for in vivo neuronal labeling and quantification using MRI. To demonstrate the ability of this new method, we used neurotropic virus to deliver oatp1a1 gene to the target neural circuit. OATP1A1 protein is expressed on the neuronal membrane and can increase the uptake of a specific MRI contrast agent (Gd-EOB-DTPA). By using T1-weighted images for observation, labeled neurons "light up" on MRI. We further use a dynamic-contrast-enhancement based method to obtain measures that provide quantitative information of labeled neurons in vivo.
With the goal of understanding the intricate behavior and dynamics of collections of neurons, we present superconducting circuits containing Josephson junctions that model biologically realistic neurons. These "Josephson junction neurons" reproduce many characteristic behaviors of biological neurons such as action potentials, refractory periods, and firing thresholds. They can be coupled together in ways that mimic electrical and chemical synapses. Using existing fabrication technologies, large interconnected networks of Josephson junction neurons would operate fully in parallel. They would be orders of magnitude faster than both traditional computer simulations and biological neural networks. Josephson junction neurons provide a new tool for exploring long-term large-scale dynamics for networks of neurons.
A major goal of neuroscience, statistical physics and nonlinear dynamics is to understand how brain function arises from the collective dynamics of networks of spiking neurons. This challenge has been chiefly addressed through large-scale numerical simulations. Alternatively, researchers have formulated mean-field theories to gain insight into macroscopic states of large neuronal networks in terms of the collective firing activity of the neurons, or the firing rate. However, these theories have not succeeded in establishing an exact correspondence between the firing rate of the network and the underlying microscopic state of the spiking neurons. This has largely constrained the range of applicability of such macroscopic descriptions, particularly when trying to describe neuronal synchronization. Here we provide the derivation of a set of exact macroscopic equations for a network of spiking neurons. Our results reveal that the spike generation mechanism of individual neurons introduces an effective coupling between two biophysically relevant macroscopic quantities, the firing rate and the mean membrane potential, which together govern the evolution of the neuronal network. The resul
Neurons are subject to various kinds of noise. In addition to synaptic noise, the stochastic opening and closing of ion channels represents an intrinsic source of noise that affects the signal processing properties of the neuron. In this paper, we studied the response of a stochastic Hodgkin-Huxley neuron to transient input subthreshold pulses. It was found that the average response time decreases but variance increases as the amplitude of channel noise increases. In the case of single pulse detection, we show that channel noise enables one neuron to detect the subthreshold signals and an optimal membrane area (or channel noise intensity) exists for a single neuron to achieve optimal performance. However, the detection ability of a single neuron is limited by large errors. Here, we test a simple neuronal network that can enhance the pulse detecting abilities of neurons and find dozens of neurons can perfectly detect subthreshold pulses. The phenomenon of intrinsic stochastic resonance is also found both at the level of single neurons and at the level of networks. At the network level, the detection ability of networks can be optimized for the number of neurons comprising the networ
A mechanism is proposed for increasing selectivity of olfactory bulb projection neurons as compared to the olfactory receptor neurons, which could operate under low odor concentration, when the lateral inhibition mechanism becomes inefficient. The mechanism proposed is based on the threshold-type reaction to stimuli a projection neuron receives from the receptor neurons, the stochastic nature of those stimuli and electrical leakage in the projection neurons. The mechanism operates at the level of individual projection neuron and does not require involvement of other bulbar neurons. Keywords: olfactory receptor neuron; projection neuron; selectivity; stochastic process; theory
The full connectome of an adult Drosophila enables a search for novel neural structures in the insect brain. I describe a new neural structure, called a Parallel Neuron Group (PNG). Two neurons are called parallel if they share a significant number of input neurons and output neurons. Most pairs of neurons in the Drosophila brain have very small parallel match. There are about twenty larger groups of neurons for which any pair of neurons in the group has a high match. These are the parallel groups. Parallel groups contain only about 1000 out of the 65,000 neurons in the brain, and have distinctive properties. There are groups in the right mushroom bodies, the antennal lobes, the lobula, and in two central neuropils (GNG and EB). Most parallel groups do not have lateral symmetry. A group usually has one major input neuron, which inputs to all the neurons in the group, and a small number of major output neurons. The major input and output neurons are laterally asymmetric. Parallel neuron groups present puzzles, such as: what does a group do, that could not be done by one larger neuron? Do all neurons in a group fire in synchrony, or do they perform different functions? Why are they l
In the quest to model neuronal function amidst gaps in physiological data, a promising strategy is to develop a normative theory that interprets neuronal physiology as optimizing a computational objective. This study extends the current normative models, which primarily optimize prediction, by conceptualizing neurons as optimal feedback controllers. We posit that neurons, especially those beyond early sensory areas, act as controllers, steering their environment towards a specific desired state through their output. This environment comprises both synaptically interlinked neurons and external motor sensory feedback loops, enabling neurons to evaluate the effectiveness of their control via synaptic feedback. Utilizing the novel Direct Data-Driven Control (DD-DC) framework, we model neurons as biologically feasible controllers which implicitly identify loop dynamics, infer latent states and optimize control. Our DD-DC neuron model explains various neurophysiological phenomena: the shift from potentiation to depression in Spike-Timing-Dependent Plasticity (STDP) with its asymmetry, the duration and adaptive nature of feedforward and feedback neuronal filters, the imprecision in spike