共找到 20 条结果
The detection of irregularly spaced pulses of non-negligible width is a fascinating yet under-explored topic in signal processing. It sits adjacent to other core topics such as radar and symbol detection yet has its own distinctive challenges. Even modern techniques such as compressed sensing perform worse than may be expected on pulse processing problems. Real-world applications include nuclear spectroscopy, flow cytometry, seismic signal processing and neural spike sorting, and these in turn have applications to environmental radiation monitoring, surveying, diagnostic medicine, industrial imaging, biomedical imaging, top-down proteomics, and security screening, to name just a few. This overview paper endeavours to position the pulse processing problem in the context of signal processing. It also describes some current challenges in the field.
Our goal in this paper is to apply the topological signal processing (TSP) framework to the analysis of 3D Point Clouds (PCs) represented on simplicial complexes. Building on Discrete Exterior Calculus (DEC) theory for vector fields, we introduce higher-order Laplacian operators that enable the processing of signals over triangular meshes. Unlike traditional approaches, the proposed approach allows us to characterize both color attributes, modeled as 3D vectors on nodes, and geometry, modeled as 3D vectors on the barycenter of each triangle. Then, we show as TSP tools may efficiently be used to sample, recover and filter PCs attributes treating them as edge signals. Numerical results on synthetic PCs demonstrate accurate color reconstruction with robustness to sparse data and geometry refinement in the case of noisy PC coordinates. The proposed approach provides a topology-based representation to characterize the geometry and attributes of PCs.
The proliferation of capable and efficient machine learning (ML) models marks one of the strongest methodological shifts in signal processing (SP) in its nearly 100-year history. ML models support the development of SP systems that represent complex, nonlinear relationships with high predictive accuracy. Adapting these models often requires sequential inference, which differs both theoretically and methodologically from the usual paradigm of ML, where data are often assumed independent and identically distributed. Gaussian processes (GPs) are a flexible yet principled framework for modeling random functions, and they have become increasingly relevant to SP as statistical and ML methods assume a more prominent role. We provide a self-contained, tutorial-style overview of GPs, with a particular focus on recent methodological advances in sequential, incremental, or streaming inference. We introduce these techniques from a signal-processing perspective while bridging them to recent advances in ML. Many of the developments we survey have direct applications to state-space modeling, sequential regression and forecasting, anomaly detection in time series, sequential Bayesian optimization,
Brain foundation models (BFMs) have emerged as a transformative paradigm in computational neuroscience, offering a revolutionary framework for processing diverse neural signals across different brain-related tasks. These models leverage large-scale pre-training techniques, allowing them to generalize effectively across multiple scenarios, tasks, and modalities, thus overcoming the traditional limitations faced by conventional artificial intelligence (AI) approaches in understanding complex brain data. By tapping into the power of pretrained models, BFMs provide a means to process neural data in a more unified manner, enabling advanced analysis and discovery in the field of neuroscience. In this survey, we define BFMs for the first time, providing a clear and concise framework for constructing and utilizing these models in various applications. We also examine the key principles and methodologies for developing these models, shedding light on how they transform the landscape of neural signal processing. This survey presents a comprehensive review of the latest advancements in BFMs, covering the most recent methodological innovations, novel views of application areas, and challenges
Multi-channel acoustic signal processing is a well-established and powerful tool to exploit the spatial diversity between a target signal and non-target or noise sources for signal enhancement. However, the textbook solutions for optimal data-dependent spatial filtering rest on the knowledge of second-order statistical moments of the signals, which have traditionally been difficult to acquire. In this contribution, we compare model-based, purely data-driven, and hybrid approaches to parameter estimation and filtering, where the latter tries to combine the benefits of model-based signal processing and data-driven deep learning to overcome their individual deficiencies. We illustrate the underlying design principles with examples from noise reduction, source separation, and dereverberation.
Recent advances in deep neural networks (DNNs) have significantly improved various audio processing applications, including speech enhancement, synthesis, and hearing-aid algorithms. DNN-based closed-loop systems have gained popularity in these applications due to their robust performance and ability to adapt to diverse conditions. Despite their effectiveness, current DNN-based closed-loop systems often suffer from sound quality degradation caused by artifacts introduced by suboptimal sampling methods. To address this challenge, we introduce dCoNNear, a novel DNN architecture designed for seamless integration into closed-loop frameworks. This architecture specifically aims to prevent the generation of spurious artifacts-most notably tonal and aliasing artifacts arising from non-ideal sampling layers. We demonstrate the effectiveness of dCoNNear through a proof-of-principle example within a closed-loop framework that employs biophysically realistic models of auditory processing for both normal and hearing-impaired profiles to design personalized hearing-aid algorithms. We further validate the broader applicability and artifact-free performance of dCoNNear through speech-enhancement
One of the key challenges in many research fields is uncovering how different interconnected systems interact within complex networks, typically represented as multi-layer networks. Capturing the intra- and cross-layer interactions among different domains for analysis and processing calls for topological algebraic descriptors capable of localizing the homologies of different domains, at different scales, according to the learning task. Our first contribution in this paper is to introduce the Cell MultiComplexes (CMCs), which are novel topological spaces that enable the representation of higher-order interactions among interconnected cell complexes. We introduce cross-Laplacian operators as powerful algebraic descriptors of CMC spaces able to capture different topological invariants, whether global or local, at different resolutions. Using the eigenvectors of these operators as bases for the signal representation, we develop topological signal processing tools for signals defined over CMCs. Then, we focus on the signal spectral representation and on the filtering of noisy flows observed over the cross-edges between different layers of CMCs. We show that a local signal representation
Powerful artificial intelligence (AI) tools that have emerged in recent years -- including large language models, automated coding assistants, and advanced image and speech generation technologies -- are the result of monumental human achievements. These breakthroughs reflect mastery across multiple technical disciplines and the resolution of significant technological challenges. However, some of the most profound challenges may still lie ahead. These challenges are not purely technical but pertain to the fair and responsible use of AI in ways that genuinely improve the global human condition. This article explores one promising application aligned with that vision: the use of AI tools to facilitate and enhance education, with a specific focus on signal processing (SP). It presents two interrelated perspectives: identifying and addressing technical limitations, and applying AI tools in practice to improve educational experiences. Primers are provided on several core technical issues that arise when using AI in educational settings, including how to ensure fairness and inclusivity, handle hallucinated outputs, and achieve efficient use of resources. These and other considerations --
Neural networks have become ubiquitous in audio effects modelling, especially for guitar amplifiers and distortion pedals. One limitation of such models is that the sample rate of the training data is implicitly encoded in the model weights and therefore not readily adjustable at inference. Recent work explored modifications to recurrent neural network architecture to approximate a sample rate independent system, enabling audio processing at a rate that differs from the original training rate. This method works well for integer oversampling and can reduce aliasing caused by nonlinear activation functions. For small fractional changes in sample rate, fractional delay filters can be used to approximate sample rate independence, but in some cases this method fails entirely. Here, we explore the use of real-time signal resampling at the input and output of the neural network as an alternative solution. We investigate several resampling filter designs and show that a two-stage design consisting of a half-band IIR filter cascaded with a Kaiser window FIR filter can give similar or better results to the previously proposed model adjustment method with many fewer filtering operations per s
This monograph presents a theoretical background and a broad introduction to the Min-Max Framework for Majorization-Minimization (MM4MM), an algorithmic methodology for solving minimization problems by formulating them as min-max problems and then employing majorization-minimization. The monograph lays out the mathematical basis of the approach used to reformulate a minimization problem as a min-max problem. With the prerequisites covered, including multiple illustrations of the formulations for convex and non-convex functions, this work serves as a guide for developing MM4MM-based algorithms for solving non-convex optimization problems in various areas of signal processing. As special cases, we discuss using the majorization-minimization technique to solve min-max problems encountered in signal processing applications and min-max problems formulated using the Lagrangian. Lastly, we present detailed examples of using MM4MM in ten signal processing applications such as phase retrieval, source localization, independent vector analysis, beamforming and optimal sensor placement in wireless sensor networks. The devised MM4MM algorithms are free of hyper-parameters and enjoy the advantag
After nearly a century of specialized applications in optics, remote sensing, and acoustics, the near-field (NF) electromagnetic propagation zone is experiencing a resurgence in research interest. This renewed attention is fueled by the emergence of promising applications in various fields such as wireless communications, holography, medical imaging, and quantum-inspired systems. Signal processing within NF sensing and wireless communications environments entails addressing issues related to extended scatterers, range-dependent beampatterns, spherical wavefronts, mutual coupling effects, and the presence of both reactive and radiative fields. Recent investigations have focused on these aspects in the context of extremely large arrays and wide bandwidths, giving rise to novel challenges in channel estimation, beamforming, beam training, sensing, and localization. While NF optics has a longstanding history, advancements in NF phase retrieval techniques and their applications have lately garnered significant research attention. Similarly, utilizing NF localization with acoustic arrays represents a contemporary extension of established principles in NF acoustic array signal processing.
The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably. Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research.
This paper extends various theoretical results from stationary data processing to cyclostationary (CS) processes under a unified framework. We first derive their asymptotic eigenbasis, which provides a link between their Fourier and Karhunen-Loève (KL) expansions, through a unitary transformation dictated by the cyclic spectrum. By exploiting this connection and the optimalities offered by the KL representation, we study the asymptotic performance of smoothing, filtering and prediction of CS processes, without the need for deriving explicit implementations. We obtain minimum mean squared error expressions that depend on the cyclic spectrum and include classical limits based on the power spectral density as particular cases. We conclude this work by applying the results to a practical scenario, in order to quantify the achievable gains of synchronous signal processing.
This paper presents a novel approach to neuromorphic audio processing by integrating the strengths of Spiking Neural Networks (SNNs), Transformers, and high-performance computing (HPC) into the HPCNeuroNet architecture. Utilizing the Intel N-DNS dataset, we demonstrate the system's capability to process diverse human vocal recordings across multiple languages and noise backgrounds. The core of our approach lies in the fusion of the temporal dynamics of SNNs with the attention mechanisms of Transformers, enabling the model to capture intricate audio patterns and relationships. Our architecture, HPCNeuroNet, employs the Short-Time Fourier Transform (STFT) for time-frequency representation, Transformer embeddings for dense vector generation, and SNN encoding/decoding mechanisms for spike train conversions. The system's performance is further enhanced by leveraging the computational capabilities of NVIDIA's GeForce RTX 3060 GPU and Intel's Core i9 12900H CPU. Additionally, we introduce a hardware implementation on the Xilinx VU37P HBM FPGA platform, optimizing for energy efficiency and real-time processing. The proposed accelerator achieves a throughput of 71.11 Giga-Operations Per Sec
Advances in machine learning technology have enabled real-time extraction of semantic information in signals which can revolutionize signal processing techniques and improve their performance significantly for the next generation of applications. With the objective of a concrete representation and efficient processing of the semantic information, we propose and demonstrate a formal graph-based semantic language and a goal filtering method that enables goal-oriented signal processing. The proposed semantic signal processing framework can easily be tailored for specific applications and goals in a diverse range of signal processing applications. To illustrate its wide range of applicability, we investigate several use cases and provide details on how the proposed goal-oriented semantic signal processing framework can be customized. We also investigate and propose techniques for communications where sensor data is semantically processed and semantic information is exchanged across a sensor network.
Prompting has become a practical method for utilizing pre-trained language models (LMs). This approach offers several advantages. It allows an LM to adapt to new tasks with minimal training and parameter updates, thus achieving efficiency in both storage and computation. Additionally, prompting modifies only the LM's inputs and harnesses the generative capabilities of language models to address various downstream tasks in a unified manner. This significantly reduces the need for human labor in designing task-specific models. These advantages become even more evident as the number of tasks served by the LM scales up. Motivated by the strengths of prompting, we are the first to explore the potential of prompting speech LMs in the domain of speech processing. Recently, there has been a growing interest in converting speech into discrete units for language modeling. Our pioneer research demonstrates that these quantized speech units are highly versatile within our unified prompting framework. Not only can they serve as class labels, but they also contain rich phonetic information that can be re-synthesized back into speech signals for speech generation tasks. Specifically, we reformula
Devices and sensors generate streams of data across a diversity of locations and protocols. That data usually reaches a central platform that is used to store and process the streams. Processing can be done in real time, with transformations and enrichment happening on-the-fly, but it can also happen after data is stored and organized in repositories. In the former case, stream processing technologies are required to operate on the data; in the latter batch analytics and queries are of common use. This paper introduces a runtime to dynamically construct data stream processing topologies based on user-supplied code. These dynamic topologies are built on-the-fly using a data subscription model defined by the applications that consume data. Each user-defined processing unit is called a Service Object. Every Service Object consumes input data streams and may produce output streams that others can consume. The subscription-based programing model enables multiple users to deploy their own data-processing services. The runtime does the dynamic forwarding of data and execution of Service Objects from different users. Data streams can originate in real-world devices or they can be the outpu
Wireless communication technology has progressed dramatically over the past 25 years, in terms of societal adoption as well as technical sophistication. In 1998, mobile phones were still in the process of becoming compact and affordable devices that could be widely utilized in both developed and developing countries. There were "only" 300 million mobile subscribers in the world [1]. Cellular networks were among the first privatized telecommunication markets, and competition turned the devices into fashion accessories with attractive designs that could be individualized. The service was circumscribed to telephony and text messaging, but it was groundbreaking in that, for the first time, telecommunication was between people rather than locations. Wireless networks have changed dramatically over the past few decades, enabling this revolution in service provisioning and making it possible to accommodate the ensuing dramatic growth in traffic. There are many contributing components, including new air interfaces for faster transmission, channel coding for enhanced reliability, improved source compression to remove redundancies, and leaner protocols to reduce overheads. Signal processing
Recent studies have made some progress in refining end-to-end (E2E) speech recognition encoders by applying Connectionist Temporal Classification (CTC) loss to enhance named entity recognition within transcriptions. However, these methods have been constrained by their exclusive use of the ASCII character set, allowing only a limited array of semantic labels. We propose 1SPU, a 1-step Speech Processing Unit which can recognize speech events (e.g: speaker change) or an NL event (Intent, Emotion) while also transcribing vocal content. It extends the E2E automatic speech recognition (ASR) system's vocabulary by adding a set of unused placeholder symbols, conceptually akin to the <pad> tokens used in sequence modeling. These placeholders are then assigned to represent semantic events (in form of tags) and are integrated into the transcription process as distinct tokens. We demonstrate notable improvements on the SLUE benchmark and yields results that are on par with those for the SLURP dataset. Additionally, we provide a visual analysis of the system's proficiency in accurately pinpointing meaningful tokens over time, illustrating the enhancement in transcription quality through
Current methods of graph signal processing rely heavily on the specific structure of the underlying network: the shift operator and the graph Fourier transform are both derived directly from a specific graph. In many cases, the network is subject to error or natural changes over time. This motivated a new perspective on GSP, where the signal processing framework is developed for an entire class of graphs with similar structures. This approach can be formalized via the theory of graph limits, where graphs are considered as random samples from a distribution represented by a graphon. When the network under consideration has underlying symmetries, they may be modeled as samples from Cayley graphons. In Cayley graphons, vertices are sampled from a group, and the link probability between two vertices is determined by a function of the two corresponding group elements. Infinite groups such as the 1-dimensional torus can be used to model networks with an underlying spatial reality. Cayley graphons on finite groups give rise to a Stochastic Block Model, where the link probabilities between blocks form a (edge-weighted) Cayley graph. This manuscript summarizes some work on graph signal proc