Recently, end-to-end neural diarization (EEND) is introduced and achieves promising results in speaker-overlapped scenarios. In EEND, speaker diarization is formulated as a multi-label prediction problem, where speaker activities are estimated independently and their dependency are not well considered. To overcome these disadvantages, we employ the power set encoding to reformulate speaker diarization as a single-label classification problem and propose the overlap-aware EEND (EEND-OLA) model, in which speaker overlaps and dependency can be modeled explicitly. Inspired by the success of two-stage hybrid systems, we further propose a novel Two-stage OverLap-aware Diarization framework (TOLD) by involving a speaker overlap-aware post-processing (SOAP) model to iteratively refine the diarization results of EEND-OLA. Experimental results show that, compared with the original EEND, the proposed EEND-OLA achieves a 14.39% relative improvement in terms of diarization error rates (DER), and utilizing SOAP provides another 19.33% relative improvement. As a result, our method TOLD achieves a DER of 10.14% on the CALLHOME dataset, which is a new state-of-the-art result on this benchmark to th
作者:Bram M. A. van Dijk, Max J. van Duijn, Suzan Verberne
In this resource paper we release ChiSCor, a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character perspectives, and unravelling language and cognition in development, with computational tools. Unlike existing resources, ChiSCor's stories were produced in natural contexts, in line with recent calls for more ecologically valid datasets. ChiSCor hosts text, audio, and annotations for character complexity and linguistic complexity. Additional metadata (e.g. education of caregivers) is available for one third of the Dutch children. ChiSCor also includes a small set of 62 English stories. This paper details how ChiSCor was compiled and shows its potential for future work with three brief case studies: i) we show that the syntactic complexity of stories is strikingly stable across children's ages; ii) we extend work on Zipfian distributions in free speech and show that ChiSCor obeys Zipf's law closely, reflecting its social context; iii) we show that even though ChiSCor is relatively small, the corpus is rich enough to train informative lemma vectors that allow us to analyse children's langu
作者:Elaine Fehrman, Vincent Egan, Alexander N. Gorban
This is a preprint version of the first book from the series: "Stories told by data". In this book a story is told about the psychological traits associated with drug consumption. The book includes: - A review of published works on the psychological profiles of drug users. - Analysis of a new original database with information on 1885 respondents and usage of 18 drugs. (Database is available online.) - An introductory description of the data mining and machine learning methods used for the analysis of this dataset. - The demonstration that the personality traits (five factor model, impulsivity, and sensation seeking), together with simple demographic data, give the possibility of predicting the risk of consumption of individual drugs with sensitivity and specificity above 70% for most drugs. - The analysis of correlations of use of different substances and the description of the groups of drugs with correlated use (correlation pleiades). - Proof of significant differences of personality profiles for users of different drugs. This is explicitly proved for benzodiazepines, ecstasy, and heroin. - Tables of personality profiles for users and non-users of 18 substances. The book is aime
We present MUSE observations in the core of the HFF galaxy cluster MACS J1149.5+2223, where the first magnified and spatially-resolved multiple images of SN 'Refsdal' at redshift 1.489 were detected. Thanks to a DDT program with the VLT and the extraordinary efficiency of MUSE, we measure 117 secure redshifts with just 4.8 hours of total integration time on a single target pointing. We spectroscopically confirm 68 galaxy cluster members, with redshift values ranging from 0.5272 to 0.5660, and 18 multiple images belonging to 7 background, lensed sources distributed in redshifts between 1.240 and 3.703. Starting from the combination of our catalog with those obtained from extensive spectroscopic and photometric campaigns using the HST, we select a sample of 300 (164 spectroscopic and 136 photometric) cluster members, within approximately 500 kpc from the BCG, and a set of 88 reliable multiple images associated to 10 different background source galaxies and 18 distinct knots in the spiral galaxy hosting SN 'Refsdal'. We exploit this valuable information to build 6 detailed strong lensing models, the best of which reproduces the observed positions of the multiple images with a rms offs
作者:Jenny G. Sorce, Stefan Gottlöber, Gustavo Yepes
Galaxy clusters can play a key role in modern cosmology provided their evolution is properly understood. However, observed clusters give us only a single timeframe of their dynamical state. Therefore, finding present observable data of clusters that are well correlated to their assembly history constitutes an inestimable tool for cosmology. Former studies correlating environmental descriptors of clusters to their formation history are dominated by halo mass - environment relations. This paper presents a mass-free correlation between the present neighbor distribution of cluster-size halos and the latter mass assembly history. From the Big Multidark simulation, we extract two large samples of random halos with masses ranging from Virgo to Coma cluster sizes. Additionally, to find the main environmental culprit for the formation history of the Virgo cluster, we compare the Virgo-size halos to 200 Virgo-like halos extracted from simulations that resemble the local Universe. The number of neighbors at different cluster-centric distances permits discriminating between clusters with different mass accretion histories. Similarly to Virgo-like halos, clusters with numerous neighbors within
作者:S. Thatikonda, F. N. De Oliveira-Lopes, A. Mustonen
We investigate the nonlinear formation of plasmoids in 2D low-beta current sheets through the interplay between the Kelvin-Helmholtz instability (KHI) and the lower-hybrid drift instability (LHDI). Using a hybrid kinetic-gyrokinetic model-based Super Simple Vlasov (ssV) code with fully kinetic ions and drift-kinetic electrons, we simulate Harris-type current sheets and velocity shear layers with strong cross-field density gradients. Our central hypothesis is that steep density gradients drive LHDI, which can grow faster than KHI and initiate an inverse cascade from kinetic to fluid scales, potentially suppressing KHI. Our simulations confirm that, in thin current sheets, LHDI develops rapidly at the sheet edges and nonlinearly merges into larger-scale magnetic islands before KHI can evolve. These LHDI-driven structures distort the velocity shear and suppress classical KH vortices. In contrast, for thicker current sheets or weaker density gradients, KHI dominates and produces the expected rolled-up vortices and associated plasmoids. These findings demonstrate that LHDI-induced turbulence can act as both a seed and a regulator of plasmoid-generating instabilities, mediating cross-sca
Optimizing the performance of magnetic confinement fusion devices is critical to achieving an attractive fusion reactor design. Negative triangularity (NT) scenarios have been shown to achieve excellent levels of energy confinement, while avoiding edge localized modes (ELMs). Modeling turbulent transport in the edge and SOL is key in understanding the impact of NT on turbulence and extrapolating the results to future devices and regimes. Previous gyrokinetic turbulence studies have reported beneficial effects of NT across a broad range of parameters. However, most simulations have focused on the inner plasma region, neglecting the impact of NT on the outermost edge. In this work, we investigate the effect of NT in edge and scrape-off layer (SOL) simulations, including the magnetic X-point and separatrix. For the first time, we employ a multi-fidelity approach, combining global, non-linear gyrokinetic simulations with drift-reduced fluid simulations, to gain a deeper understanding of the underlying physics at play. First-principles simulations using the GENE-X code demonstrate that in comparable NT and PT geometries, similar profiles are achieved, while the turbulent heat flux is re
作者:Sreenivasa chary Thatikonda, F. N. De Oliveira-Lopes, A. Mustonen
The super simple Vlasov (ssV) code was developed to study instabilities, turbulence, and reconnection in weakly magnetized plasmas, such as the solar wind in the dissipation range and the edge of fusion plasmas. The ssV code overcomes the limitations of standard gyrokinetic theory by using a hybrid model that incorporates fully kinetic ions and gyrokinetic electrons. This hybrid gyrokinetic model enables accurate modeling in regimes characterized by steep gradients and high-frequency dynamics. To achieve this, ssV implements a set of semi-Lagrangian numerical schemes, including Positive Flux Conservative (PFC), Flux Conservative fifth-order (FCV), FCV with Umeda limiters, and a Semi-Lagrangian Monotonicity-Preserving fifth-order scheme (SLMP5). Benchmark problems such as Landau damping, ion-acoustic waves, ion Bernstein waves, and kinetic Alfven waves were employed to evaluate the schemes. The SLMP5 scheme consistently delivered the best overall accuracy and numerical stability performance. The code also addresses well-known electromagnetic gyrokinetic simulation issues, such as the Ampere cancellation problem, using carefully chosen velocity-space resolutions and accurate integral