Machine intelligence is very successful at standard recognition tasks when having high-quality training data. There is still a significant gap between machine-level pattern recognition and human-level concept learning. Humans can learn under uncertainty from only a few examples and generalize these concepts to solve new problems. The growing interest in explainable machine intelligence, requires experimental environments and diagnostic tests to analyze weaknesses in existing approaches to drive progress in the field. In this paper, we discuss existing diagnostic tests and test data sets such as CLEVR, CLEVERER, CLOSURE, CURI, Bongard-LOGO, V-PROM, and present our own experimental environment: The KANDINSKYPatterns, named after the Russian artist Wassily Kandinksy, who made theoretical contributions to compositivity, i.e. that all perceptions consist of geometrically elementary individual components. This was experimentally proven by Hubel &Wiesel in the 1960s and became the basis for machine learning approaches such as the Neocognitron and the even later Deep Learning. While KANDINSKYPatterns have computationally controllable properties on the one hand, bringing ground truth, t
Artificial intelligence (AI) enabled products and services are becoming a staple of everyday life. While governments and businesses are eager to enjoy the benefits of AI innovations, the mixed impact of these autonomous and intelligent systems on human well-being has become a pressing issue. This article introduces one of the first international standards focused on the social and ethical implications of AI: The Institute of Electrical and Electronics Engineering (IEEE) Standard (Std) 7010-2020 Recommended Practice for Assessing the Impact of Autonomous and Intelligent Systems on Human Well-being. Incorporating well-being factors throughout the lifecycle of AI is both challenging and urgent and IEEE 7010 provides key guidance for those who design, deploy, and procure these technologies. We begin by articulating the benefits of an approach for AI centered around well-being and the measurement of well-being data. Next, we provide an overview of IEEE 7010, including its key principles and how the standard relates to approaches and perspectives in place in the AI community. Finally, we indicate where future efforts are needed.
In this paper, we analyze the coexistence of a primary and a secondary (cognitive) network when both networks use the IEEE 802.11 based distributed coordination function for medium access control. Specifically, we consider the problem of channel capture by a secondary network that uses spectrum sensing to determine the availability of the channel, and its impact on the primary throughput. We integrate the notion of transmission slots in Bianchi's Markov model with the physical time slots, to derive the transmission probability of the secondary network as a function of its scan duration. This is used to obtain analytical expressions for the throughput achievable by the primary and secondary networks. Our analysis considers both saturated and unsaturated networks. By performing a numerical search, the secondary network parameters are selected to maximize its throughput for a given level of protection of the primary network throughput. The theoretical expressions are validated using extensive simulations carried out in the Network Simulator 2. Our results provide critical insights into the performance and robustness of different schemes for medium access by the secondary network. In p
We propose the first general and scalable framework to design certifiable algorithms for robust geometric perception in the presence of outliers. Our first contribution is to show that estimation using common robust costs, such as truncated least squares (TLS), maximum consensus, Geman-McClure, Tukey's biweight, among others, can be reformulated as polynomial optimization problems (POPs). By focusing on the TLS cost, our second contribution is to exploit sparsity in the POP and propose a sparse semidefinite programming (SDP) relaxation that is much smaller than the standard Lasserre's hierarchy while preserving empirical exactness, i.e., the SDP recovers the optimizer of the nonconvex POP with an optimality certificate. Our third contribution is to solve the SDP relaxations at an unprecedented scale and accuracy by presenting STRIDE, a solver that blends global descent on the convex SDP with fast local search on the nonconvex POP. Our fourth contribution is an evaluation of the proposed framework on six geometric perception problems including single and multiple rotation averaging, point cloud and mesh registration, absolute pose estimation, and category-level object pose and shape
Trajectory prediction is a fundamental problem in computer vision, vision-language-action models, world models, and autonomous systems, with broad impact on autonomous driving, robotics, and surveillance. However, most existing methods assume complete and clean observations, and therefore do not adequately handle out-of-sight agents or noisy sensing signals caused by limited camera coverage, occlusions, and the absence of ground-truth denoised trajectories. These challenges raise safety concerns and reduce robustness in real-world deployment. In this extended study, we introduce major improvements to Out-of-Sight Trajectory (OST), a task for predicting noise-free visual trajectories of out-of-sight objects from noisy sensor observations. Building on our prior work, we expand Out-of-Sight Trajectory Prediction (OOSTraj) from pedestrians to both pedestrians and vehicles, increasing its relevance to autonomous driving, robotics, and surveillance. Our improved Vision-Positioning Denoising Module exploits camera calibration to establish vision-position correspondence, mitigating the lack of direct visual cues and enabling effective unsupervised denoising of noisy sensor signals. Extensi
Generative Artificial Intelligence (AI) is enabling unprecedented automation in content creation and decision support, but it also raises novel risks. This paper presents a first-principles risk assessment framework underlying the IEEE P3396 Recommended Practice for AI Risk, Safety, Trustworthiness, and Responsibility. We distinguish between process risks (risks arising from how AI systems are built or operated) and outcome risks (risks manifest in the AI system's outputs and their real-world effects), arguing that generative AI governance should prioritize outcome risks. Central to our approach is an information-centric ontology that classifies AI-generated outputs into four fundamental categories: (1) Perception-level information, (2) Knowledge-level information, (3) Decision/Action plan information, and (4) Control tokens (access or resource directives). This classification allows systematic identification of harms and more precise attribution of responsibility to stakeholders (developers, deployers, users, regulators) based on the nature of the information produced. We illustrate how each information type entails distinct outcome risks (e.g. deception, misinformation, unsafe re
In the rapidly changing environments of disaster response, planning and decision-making for autonomous agents involve complex and interdependent choices. Although recent advancements have improved traditional artificial intelligence (AI) approaches, they often struggle in such settings, particularly when applied to agents operating outside their well-defined training parameters. To address these challenges, we propose an attention-based cognitive architecture inspired by Dual Process Theory (DPT). This framework integrates, in an online fashion, rapid yet heuristic (human-like) responses (System 1) with the slow but optimized planning capabilities of machine intelligence (System 2). We illustrate how a supervisory controller can dynamically determine in real-time the engagement of either system to optimize mission objectives by assessing their performance across a number of distinct attributes. Evaluated for trajectory planning in dynamic environments, our framework demonstrates that this synergistic integration effectively manages complex tasks by optimizing multiple mission objectives.
Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical ML and scalable general-purpose GPU computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.
We introduce the Fourier Learning Machine (FLM), a neural network (NN) architecture designed to represent a multidimensional nonharmonic Fourier series. The FLM uses a simple feedforward structure with cosine activation functions to learn the frequencies, amplitudes, and phase shifts of the series as trainable parameters. This design allows the model to create a problem-specific spectral basis adaptable to both periodic and nonperiodic functions. Unlike previous Fourier-inspired NN models, the FLM is the first architecture able to represent a multidimensional Fourier series with a complete set of basis functions in separable form, doing so by using a standard Multilayer Perceptron-like architecture. A one-to-one correspondence between the Fourier coefficients and amplitudes and phase-shifts is demonstrated, allowing for the translation between a full, separable basis form and the cosine phase-shifted one. Additionally, we evaluate the performance of FLMs on several scientific computing problems, including benchmark Partial Differential Equations (PDEs) and a family of Optimal Control Problems (OCPs). Computational experiments show that the performance of FLMs is comparable, and oft
This document contains the outcome of the first Human behaviour and machine intelligence (HUMAINT) workshop that took place 5-6 March 2018 in Barcelona, Spain. The workshop was organized in the context of a new research programme at the Centre for Advanced Studies, Joint Research Centre of the European Commission, which focuses on studying the potential impact of artificial intelligence on human behaviour. The workshop gathered an interdisciplinary group of experts to establish the state of the art research in the field and a list of future research challenges to be addressed on the topic of human and machine intelligence, algorithm's potential impact on human cognitive capabilities and decision making, and evaluation and regulation needs. The document is made of short position statements and identification of challenges provided by each expert, and incorporates the result of the discussions carried out during the workshop. In the conclusion section, we provide a list of emerging research topics and strategies to be addressed in the near future.
Machine learning systems (MLSys) are emerging in the Internet of Things (IoT) to provision edge intelligence, which is paving our way towards the vision of ubiquitous intelligence. However, despite the maturity of machine learning systems and the IoT, we are facing severe challenges when integrating MLSys and IoT in practical context. For instance, many machine learning systems have been developed for large-scale production (e.g., cloud environments), but IoT introduces additional demands due to heterogeneous and resource-constrained devices and decentralized operation environment. To shed light on this convergence of MLSys and IoT, this paper analyzes the trade-offs by covering the latest developments (up to 2020) on scaling and distributing ML across cloud, edge, and IoT devices. We position machine learning systems as a component of the IoT, and edge intelligence as a socio-technical system. On the challenges of designing trustworthy edge intelligence, we advocate a holistic design approach that takes multi-stakeholder concerns, design requirements and trade-offs into consideration, and highlight the future research opportunities in edge intelligence.
The following briefly discusses possible difficulties in communication with and control of an AGI (artificial general intelligence), building upon an explanation of The Fermi Paradox and preceding work on symbol emergence and artificial general intelligence. The latter suggests that to infer what someone means, an agent constructs a rationale for the observed behaviour of others. Communication then requires two agents labour under similar compulsions and have similar experiences (construct similar solutions to similar tasks). Any non-human intelligence may construct solutions such that any rationale for their behaviour (and thus the meaning of their signals) is outside the scope of what a human is inclined to notice or comprehend. Further, the more compressed a signal, the closer it will appear to random noise. Another intelligence may possess the ability to compress information to the extent that, to us, their signals would appear indistinguishable from noise (an explanation for The Fermi Paradox). To facilitate predictive accuracy an AGI would tend to more compressed representations of the world, making any rationale for their behaviour more difficult to comprehend for the same r
Fitting regression models with many multivariate responses and covariates can be challenging, but such responses and covariates sometimes have tensor-variate structure. We extend the classical multivariate regression model to exploit such structure in two ways: first, we impose four types of low-rank tensor formats on the regression coefficients. Second, we model the errors using the tensor-variate normal distribution that imposes a Kronecker separable format on the covariance matrix. We obtain maximum likelihood estimators via block-relaxation algorithms and derive their computational complexity and asymptotic distributions. Our regression framework enables us to formulate tensor-variate analysis of variance (TANOVA) methodology. This methodology, when applied in a one-way TANOVA layout, enables us to identify cerebral regions significantly associated with the interaction of suicide attempters or non-attemptor ideators and positive-, negative- or death-connoting words in a functional Magnetic Resonance Imaging study. Another application uses three-way TANOVA on the Labeled Faces in the Wild image dataset to distinguish facial characteristics related to ethnic origin, age group and
Kinship recognition is a challenging problem with many practical applications. With much progress and milestones having been reached after ten years - we are now able to survey the research and create new milestones. We review the public resources and data challenges that enabled and inspired many to hone-in on the views of automatic kinship recognition in the visual domain. The different tasks are described in technical terms and syntax consistent across the problem domain and the practical value of each discussed and measured. State-of-the-art methods for visual kinship recognition problems, whether to discriminate between or generate from, are examined. As part of such, we review systems proposed as part of a recent data challenge held in conjunction with the 2020 IEEE Conference on Automatic Face and Gesture Recognition. We establish a stronghold for the state of progress for the different problems in a consistent manner. This survey will serve as the central resource for the work of the next decade to build upon. For the tenth anniversary, the demo code is provided for the various kin-based tasks. Detecting relatives with visual recognition and classifying the relationship is
Understanding how humans conceptualize and categorize natural objects offers critical insights into perception and cognition. With the advent of Large Language Models (LLMs), a key question arises: can these models develop human-like object representations from linguistic and multimodal data? In this study, we combined behavioral and neuroimaging analyses to explore the relationship between object concept representations in LLMs and human cognition. We collected 4.7 million triplet judgments from LLMs and Multimodal LLMs (MLLMs) to derive low-dimensional embeddings that capture the similarity structure of 1,854 natural objects. The resulting 66-dimensional embeddings were stable, predictive, and exhibited semantic clustering similar to human mental representations. Remarkably, the dimensions underlying these embeddings were interpretable, suggesting that LLMs and MLLMs develop human-like conceptual representations of objects. Further analysis showed strong alignment between model embeddings and neural activity patterns in brain regions such as EBA, PPA, RSC, and FFA. This provides compelling evidence that the object representations in LLMs, while not identical to human ones, share
Deep neural models in recent years have been successful in almost every field, including extremely complex problem statements. However, these models are huge in size, with millions (and even billions) of parameters, thus demanding more heavy computation power and failing to be deployed on edge devices. Besides, the performance boost is highly dependent on redundant labeled data. To achieve faster speeds and to handle the problems caused by the lack of data, knowledge distillation (KD) has been proposed to transfer information learned from one model to another. KD is often characterized by the so-called `Student-Teacher' (S-T) learning framework and has been broadly applied in model compression and knowledge transfer. This paper is about KD and S-T learning, which are being actively studied in recent years. First, we aim to provide explanations of what KD is and how/why it works. Then, we provide a comprehensive survey on the recent progress of KD methods together with S-T frameworks typically for vision tasks. In general, we consider some fundamental questions that have been driving this research area and thoroughly generalize the research progress and technical details. Additional
This article reviews the "Once learning" mechanism that was proposed 23 years ago and the subsequent successes of "One-shot learning" in image classification and "You Only Look Once - YOLO" in objective detection. Analyzing the current development of Artificial Intelligence (AI), the proposal is that AI should be clearly divided into the following categories: Artificial Human Intelligence (AHI), Artificial Machine Intelligence (AMI), and Artificial Biological Intelligence (ABI), which will also be the main directions of theory and application development for AI. As a watershed for the branches of AI, some classification standards and methods are discussed: 1) Human-oriented, machine-oriented, and biological-oriented AI R&D; 2) Information input processed by Dimensionality-up or Dimensionality-reduction; 3) The use of one/few or large samples for knowledge learning.
We present an innovative of artificial intelligence with column chromatography, aiming to resolve inefficiencies and standardize data collection in chemical separation and purification domain. By developing an automated platform for precise data acquisition and employing advanced machine learning algorithms, we constructed predictive models to forecast key separation parameters, thereby enhancing the efficiency and quality of chromatographic processes. The application of transfer learning allows the model to adapt across various column specifications, broadening its utility. A novel metric, separation probability ($S_p$), quantifies the likelihood of effective compound separation, validated through experimental verification. This study signifies a significant step forward int the application of AI in chemical research, offering a scalable solution to traditional chromatography challenges and providing a foundation for future technological advancements in chemical analysis and purification.
The existing matrix completion methods focus on optimizing the relaxation of rank function such as nuclear norm, Schatten-p norm, etc. They usually need many iterations to converge. Moreover, only the low-rank property of matrices is utilized in most existing models and several methods that incorporate other knowledge are quite time-consuming in practice. To address these issues, we propose a novel non-convex surrogate that can be optimized by closed-form solutions, such that it empirically converges within dozens of iterations. Besides, the optimization is parameter-free and the convergence is proved. Compared with the relaxation of rank, the surrogate is motivated by optimizing an upper-bound of rank. We theoretically validate that it is equivalent to the existing matrix completion models. Besides the low-rank assumption, we intend to exploit the column-wise correlation for matrix completion, and thus an adaptive correlation learning, which is scaling-invariant, is developed. More importantly, after incorporating the correlation learning, the model can be still solved by closed-form solutions such that it still converges fast. Experiments show the effectiveness of the non-convex
Deep reinforcement learning (RL) has been applied extensively to solve complex decision-making problems. In many real-world scenarios, tasks often have several conflicting objectives and may require multiple agents to cooperate, which are the multi-objective multi-agent decision-making problems. However, only few works have been conducted on this intersection. Existing approaches are limited to separate fields and can only handle multi-agent decision-making with a single objective, or multi-objective decision-making with a single agent. In this paper, we propose MO-MIX to solve the multi-objective multi-agent reinforcement learning (MOMARL) problem. Our approach is based on the centralized training with decentralized execution (CTDE) framework. A weight vector representing preference over the objectives is fed into the decentralized agent network as a condition for local action-value function estimation, while a mixing network with parallel architecture is used to estimate the joint action-value function. In addition, an exploration guide approach is applied to improve the uniformity of the final non-dominated solutions. Experiments demonstrate that the proposed method can effectiv