搜索 — ResearchTracker

Sequence and channel mixers, the core mechanism in sequence models, have become the de facto standard in time series analysis (TSA). However, recent studies have questioned the necessity of complex sequence mixers, such as attention mechanisms, demonstrating that simpler architectures can achieve comparable or even superior performance. This suggests that the benefits attributed to complex sequencemixers might instead emerge from other architectural or optimization factors. Based on this observation, we pose a central question: Are common sequence mixers necessary for time-series analysis? Therefore, we propose JustDense, an empirical study that systematically replaces sequence mixers in various well-established TSA models with dense layers. Grounded in the MatrixMixer framework, JustDense treats any sequence mixer as a mixing matrix and replaces it with a dense layer. This substitution isolates the mixing operation, enabling a clear theoretical foundation for understanding its role. Therefore, we conducted extensive experiments on 29 benchmarks covering five representative TSA tasks using seven state-of-the-art TSA models to address our research question. The results show that rep

Position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

arXiv2025-07-30作者：Tom Sühr, Florian E. Dorner, Olawale Salaudeen

Large Language Models (LLMs) have achieved remarkable results on a range of standardized tests originally designed to assess human cognitive and psychological traits, such as intelligence and personality. While these results are often interpreted as strong evidence of human-like characteristics in LLMs, this paper argues that such interpretations constitute an ontological error. Human psychological and educational tests are theory-driven measurement instruments, calibrated to a specific human population. Applying these tests to non-human subjects without empirical validation, risks mischaracterizing what is being measured. Furthermore, a growing trend frames AI performance on benchmarks as measurements of traits such as ``intelligence'', despite known issues with validity, data contamination, cultural bias and sensitivity to superficial prompt changes. We argue that interpreting benchmark performance as measurements of human-like traits, lacks sufficient theoretical and empirical justification. This leads to our position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead. We call for the development of principled, AI-specific evaluation frameworks t

搜索结果：instead

JustDense: Just using Dense instead of Sequence Mixer for Time Series analysis

Position: Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead

A Safer, Smaller, Cleaner Subcritical Thorium Fission - Deuteron Fusion Hybrid Reactor: DD Collider Instead of Muonic Fusion

A regular center instead of a black bounce

CoDream: Exchanging dreams instead of models for federated aggregation with heterogeneous models

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Can we only use guideline instead of shot in prompt?

Crystallization Instead of Amorphization in Collision Cascades in Gallium Oxide

Use Model Averaging instead of Model Selection in Pulsar Timing

Keep the Future Human: Why and How We Should Close the Gates to AGI and Superintelligence, and What We Should Build Instead

Make Explicit Calibration Implicit: Calibrate Denoiser Instead of the Noise Model

Facilitating method development for reverse fill/flush flow modulation by using a tunable auxiliary pressure source instead of a fixed bleed capillary

PromptFL: Let Federated Participants Cooperatively Learn Prompts Instead of Models -- Federated Learning in Age of Foundation Model

Can you recommend content to creatives instead of final consumers? A RecSys based on user's preferred visual styles

Stop using the elbow criterion for k-means and how to choose the number of clusters instead

Tuning Synaptic Connections instead of Weights by Genetic Algorithm in Spiking Policy Network

Applied electric field instead of pressure in H-based superconductors

Scalable Proof Producing Multi-Threaded SAT Solving with Gimsatul through Sharing instead of Copying Clauses

Bias in the representative volume element method: periodize the ensemble instead of its realizations

Proof-theoretic dilator and intermediate pointclasses