共找到 20 条结果
A central question in vector- and function-valued learning is how to design kernels that capture both local and non-local interactions while remaining computationally tractable. Existing operator-valued kernels offer only partial answers: separable kernels are efficient but fail to model interactions across the function domain, while commutative kernels capture only pointwise structure. To address this, we propose spectral truncation kernels, a new class of positive definite kernels for vector- and function-valued learning based on spectral truncation and $C^*$-algebra. By allowing noncommutative products in the kernel construction, the proposed kernels induce interactions across the data function domain and fill the gap between existing separable and commutative kernels. In addition, by using the $C^*$-algebraic framework, we reduce the computational cost compared to the existing vector-valued RKHS framework with operator-valued kernels.
Quantum kernel methods are one of the most explored approaches to quantum machine learning. However, the structural properties and inductive bias of quantum kernels are not fully understood. In this work, we introduce the notion of entangled tensor kernels - a generalization of product kernels from classical kernel theory - and show that all embedding quantum kernels can be understood as an entangled tensor kernel. We discuss how this perspective allows one to gain novel insights into both the unique inductive bias of quantum kernels, and potential methods for their dequantization.
One of the most natural connections between quantum and classical machine learning has been established in the context of kernel methods. Kernel methods rely on kernels, which are inner products of feature vectors living in large feature spaces. Quantum kernels are typically evaluated by explicitly constructing quantum feature states and then taking their inner product, here called embedding quantum kernels. Since classical kernels are usually evaluated without using the feature vectors explicitly, we wonder how expressive embedding quantum kernels are. In this work, we raise the fundamental question: can all quantum kernels be expressed as the inner product of quantum feature states? Our first result is positive: Invoking computational universality, we find that for any kernel function there always exists a corresponding quantum feature map and an embedding quantum kernel. The more operational reading of the question is concerned with efficient constructions, however. In a second part, we formalize the question of universality of efficient embedding quantum kernels. For shift-invariant kernels, we use the technique of random Fourier features to show that they are universal within
This work introduces Adaptive Density Fields (ADF), a geometric attention framework that formulates spatial aggregation as a query-conditioned, metric-induced attention operator in continuous space. By reinterpreting spatial influence as geometry-preserving attention grounded in physical distance, ADF bridges concepts from adaptive kernel methods and attention mechanisms. Scalability is achieved via FAISS-accelerated inverted file indices, treating approximate nearest-neighbor search as an intrinsic component of the attention mechanism. We demonstrate the framework through a case study on aircraft trajectory analysis in the Chengdu region, extracting trajectory-conditioned Zones of Influence (ZOI) to reveal recurrent airspace structures and localized deviations.
In this paper we propose a family of tractable kernels that is dense in the family of bounded positive semi-definite functions (i.e. can approximate any bounded kernel with arbitrary precision). We start by discussing the case of stationary kernels, and propose a family of spectral kernels that extends existing approaches such as spectral mixture kernels and sparse spectrum kernels. Our extension has two primary advantages. Firstly, unlike existing spectral approaches that yield infinite differentiability, the kernels we introduce allow learning the degree of differentiability of the latent function in Gaussian process (GP) models and functions in the reproducing kernel Hilbert space (RKHS) in other kernel methods. Secondly, we show that some of the kernels we propose require fewer parameters than existing spectral kernels for the same accuracy, thereby leading to faster and more robust inference. Finally, we generalize our approach and propose a flexible and tractable family of spectral kernels that we prove can approximate any continuous bounded nonstationary kernel.
Audio captioning systems face a fundamental challenge: teacher-forcing training creates exposure bias that leads to caption degeneration during inference. While contrastive methods have been proposed as solutions, they typically fail to capture the crucial temporal relationships between acoustic and linguistic modalities. We address this limitation by introducing the unbiased sliced Wasserstein RBF (USW-RBF) kernel with rotary positional embedding, specifically designed to preserve temporal information across modalities. Our approach offers a practical advantage: the kernel enables efficient stochastic gradient optimization, making it computationally feasible for real-world applications. Building on this foundation, we develop a complete audio captioning framework that integrates stochastic decoding to further mitigate caption degeneration. Extensive experiments on AudioCaps and Clotho datasets demonstrate that our method significantly improves caption quality, lexical diversity, and text-to-audio retrieval accuracy. Furthermore, we demonstrate the generalizability of our USW-RBF kernel by applying it to audio reasoning tasks, where it enhances the reasoning capabilities of large a
This paper considers paired operators in the context of the Lebesgue Hilbert space on the unit circle and its subspace, the Hardy space $H^2$. The kernels of such operators, together with their analytic projections, which are generalizations of Toeplitz kernels, are studied. Results on near-invariance properties, representations, and inclusion relations for these kernels are obtained. The existence of a minimal Toeplitz kernel containing any projected paired kernel and, more generally, any nearly $S^*$-invariant subspace of $H^2$, is derived. The results are applied to describing the kernels of finite-rank asymmetric truncated Toeplitz operators.
As hardware architectures are evolving in the push towards exascale, developing Computational Science and Engineering (CSE) applications depend on performance portable approaches for sustainable software development. This paper describes one aspect of performance portability with respect to developing a portable library of kernels that serve the needs of several CSE applications and software frameworks. We describe Kokkos Kernels, a library of kernels for sparse linear algebra, dense linear algebra and graph kernels. We describe the design principles of such a library and demonstrate portable performance of the library using some selected kernels. Specifically, we demonstrate the performance of four sparse kernels, three dense batched kernels, two graph kernels and one team level algorithm.
A shortening method for large polarization kernels is presented, which results in shortened kernels with the highest error exponent if applied to kernels of size up to 32. It uses lower and upper bounds on partial distances for quick elimination of unsuitable shortening patterns. The proposed algorithm is applied to some kernels of sizes 16 and 32 to obtain shortened kernels of sizes from 9 to 31. These kernels are used in mixed-kernel polar codes of various lengths. Numerical results demonstrate the advantage of polar codes with shortened large kernels compared with shortened and punctured Arikan polar codes, and polar codes with small kernels.
We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels, the Random Forest Kernel and the Fast Cluster Kernel, and show that these kernels consistently outperform standard kernels on problems involving real-world datasets. Finally, we show how the form of these kernels lend themselves to a natural approximation that is appropriate for certain big data problems, allowing $O(N)$ inference in methods such as Gaussian Processes, Support Vector Machines and Kernel PCA.
The success of kernel-based learning methods depend on the choice of kernel. Recently, kernel learning methods have been proposed that use data to select the most appropriate kernel, usually by combining a set of base kernels. We introduce a new algorithm for kernel learning that combines a {\em continuous set of base kernels}, without the common step of discretizing the space of base kernels. We demonstrate that our new method achieves state-of-the-art performance across a variety of real-world datasets. Furthermore, we explicitly demonstrate the importance of combining the right dictionary of kernels, which is problematic for methods based on a finite set of base kernels chosen a priori. Our method is not the first approach to work with continuously parameterized kernels. However, we show that our method requires substantially less computation than previous such approaches, and so is more amenable to multiple dimensional parameterizations of base kernels, which we demonstrate.
This article studies constructions of reproducing kernel Banach spaces (RKBSs) which may be viewed as a generalization of reproducing kernel Hilbert spaces (RKHSs). A key point is to endow Banach spaces with reproducing kernels such that machine learning in RKBSs can be well-posed and of easy implementation. First we verify many advanced properties of the general RKBSs such as density, continuity, separability, implicit representation, imbedding, compactness, representer theorem for learning methods, oracle inequality, and universal approximation. Then, we develop a new concept of generalized Mercer kernels to construct $p$-norm RKBSs for $1\leq p\leq\infty$. The $p$-norm RKBSs preserve the same simple format as the Mercer representation of RKHSs. Moreover, the $p$-norm RKBSs are isometrically equivalent to the standard $p$-norm spaces of countable sequences. Hence, the $p$-norm RKBSs possess more geometrical structures than RKHSs including sparsity. The generalized Mercer kernels also cover many well-known kernels, for example, min kernels, Gaussian kernels, and power series kernels. Finally, we propose to solve the support vector machines in the $p$-norm RKBSs, which are to minim
Graph kernels have attracted a lot of attention during the last decade, and have evolved into a rapidly developing branch of learning on structured data. During the past 20 years, the considerable research activity that occurred in the field resulted in the development of dozens of graph kernels, each focusing on specific structural properties of graphs. Graph kernels have proven successful in a wide range of domains, ranging from social networks to bioinformatics. The goal of this survey is to provide a unifying view of the literature on graph kernels. In particular, we present a comprehensive overview of a wide range of graph kernels. Furthermore, we perform an experimental evaluation of several of those kernels on publicly available datasets, and provide a comparative study. Finally, we discuss key applications of graph kernels, and outline some challenges that remain to be addressed.
In this paper, we compare 5 different nonlinear kernels: min-max, RBF, fRBF (folded RBF), acos, and acos-$χ^2$, on a wide range of publicly available datasets. The proposed fRBF kernel performs very similarly to the RBF kernel. Both RBF and fRBF kernels require an important tuning parameter ($γ$). Interestingly, for a significant portion of the datasets, the min-max kernel outperforms the best-tuned RBF/fRBF kernels. The acos kernel and acos-$χ^2$ kernel also perform well in general and in some datasets achieve the best accuracies. One crucial issue with the use of nonlinear kernels is the excessive computational and memory cost. These days, one increasingly popular strategy is to linearize the kernels through various randomization algorithms. In our study, the randomization method for the min-max kernel demonstrates excellent performance compared to the randomization methods for other types of nonlinear kernels, measured in terms of the number of nonzero terms in the transformed dataset. Our study provides evidence for supporting the use of the min-max kernel and the corresponding randomized linearization method (i.e., the so-called "0-bit CWS"). Furthermore, the results motivate
We consider the problem of operator-valued kernel learning and investigate the possibility of going beyond the well-known separable kernels. Borrowing tools and concepts from the field of quantum computing, such as partial trace and entanglement, we propose a new view on operator-valued kernels and define a general family of kernels that encompasses previously known operator-valued kernels, including separable and transformable kernels. Within this framework, we introduce another novel class of operator-valued kernels called entangled kernels that are not separable. We propose an efficient two-step algorithm for this framework, where the entangled kernel is learned based on a novel extension of kernel alignment to operator-valued kernels. We illustrate our algorithm with an application to supervised dimensionality reduction, and demonstrate its effectiveness with both artificial and real data for multi-output regression.
In the Maximum Minimal Vertex Cover (MMVC) problem, we are given a graph $G$ and a positive integer $k$, and the objective is to decide whether $G$ contains a minimal vertex cover of size at least $k$. Motivated by the kernelization of MMVC with parameter $k$, our main contribution is to introduce a simple general framework to obtain kernelization lower bounds for a certain type of kernels for optimization problems, which we call lop-kernels. Informally, this type of kernels is required to preserve large optimal solutions in the reduced instance, and captures the vast majority of existing kernels in the literature. As a consequence of this framework, we show that the trivial quadratic kernel for MMVC is essentially optimal, answering a question of Boria et al. [Discret. Appl. Math. 2015], and that the known cubic kernel for Maximum Minimal Feedback Vertex Set is also essentially optimal. We present further applications for Tree Deletion Set and for Maximum Independent Set on $K_t$-free graphs. Back to the MMVC problem, given the (plausible) non-existence of subquadratic kernels for MMVC on general graphs, we provide subquadratic kernels on $H$-free graphs for several graphs $H$, su
Graph kernels have become an established and widely-used technique for solving classification tasks on graphs. This survey gives a comprehensive overview of techniques for kernel-based graph classification developed in the past 15 years. We describe and categorize graph kernels based on properties inherent to their design, such as the nature of their extracted graph features, their method of computation and their applicability to problems in practice. In an extensive experimental evaluation, we study the classification accuracy of a large suite of graph kernels on established benchmarks as well as new datasets. We compare the performance of popular kernels with several baseline methods and study the effect of applying a Gaussian RBF kernel to the metric induced by a graph kernel. In doing so, we find that simple baselines become competitive after this transformation on some datasets. Moreover, we study the extent to which existing graph kernels agree in their predictions (and prediction errors) and obtain a data-driven categorization of kernels as result. Finally, based on our experimental results, we derive a practitioner's guide to kernel-based graph classification.
The role of kernels is central to machine learning. Motivated by the importance of power-law distributions in statistical modeling, in this paper, we propose the notion of power-law kernels to investigate power-laws in learning problem. We propose two power-law kernels by generalizing Gaussian and Laplacian kernels. This generalization is based on distributions, arising out of maximization of a generalized information measure known as nonextensive entropy that is very well studied in statistical mechanics. We prove that the proposed kernels are positive definite, and provide some insights regarding the corresponding Reproducing Kernel Hilbert Space (RKHS). We also study practical significance of both kernels in classification and regression, and present some simulation results.
The term "CoRE kernel" stands for correlation-resemblance kernel. In many applications (e.g., vision), the data are often high-dimensional, sparse, and non-binary. We propose two types of (nonlinear) CoRE kernels for non-binary sparse data and demonstrate the effectiveness of the new kernels through a classification experiment. CoRE kernels are simple with no tuning parameters. However, training nonlinear kernel SVM can be (very) costly in time and memory and may not be suitable for truly large-scale industrial applications (e.g. search). In order to make the proposed CoRE kernels more practical, we develop basic probabilistic hashing algorithms which transform nonlinear kernels into linear kernels.
For a finite collection of connected graphs $\mathcal{F}$, the $\mathcal{F}$-MINOR-DELETION problem consists in, given a graph $G$ and an integer $\ell$, deciding whether $G$ contains a vertex set of size at most $\ell$ whose removal results in an $\mathcal{F}$-minor-free graph. We lift the existence of (approximate) polynomial kernels for $\mathcal{F}$-MINOR-DELETION by the solution size to (approximate) polynomial kernels parameterized by the vertex-deletion distance to graphs of bounded elimination distance to $\mathcal{F}$-minor-free graphs. This results in exact polynomial kernels for every family $\mathcal{F}$ that contains a planar graph, and an approximate polynomial kernel for PLANAR VERTEX DELETION. Moreover, combining our result with a previous lower bound, we obtain the following infinite set of dichotomies, assuming $NP ot\subseteq coNP/poly$: for any finite set $\mathcal{F}$ of biconnected graphs on at least three vertices containing a planar graph, and any minor-closed class of graphs $\mathcal{C}$, $\mathcal{F}$-MINOR-DELETION admits a polynomial kernel parameterized by the vertex-deletion distance to $\mathcal{C}$ if and only if $\mathcal{C}$ has bounded eliminati