Peking Opera has been the most dominant form of Chinese performing art since around 200 years ago. A Peking Opera singer usually exhibits a very strong personal style via introducing improvisation and expressiveness on stage which leads the actual rhythm and pitch contour to deviate significantly from the original music score. This inconsistency poses a great challenge in Peking Opera singing voice synthesis from a music score. In this work, we propose to deal with this issue and synthesize expressive Peking Opera singing from the music score based on the Duration Informed Attention Network (DurIAN) framework. To tackle the rhythm mismatch, Lagrange multiplier is used to find the optimal output phoneme duration sequence with the constraint of the given note duration from music score. As for the pitch contour mismatch, instead of directly inferring from music score, we adopt a pseudo music score generated from the real singing and feed it as input during training. The experiments demonstrate that with the proposed system we can synthesize Peking Opera singing voice with high-quality timbre, pitch and expressiveness.
This paper presents a method that generates expressive singing voice of Peking opera. The synthesis of expressive opera singing usually requires pitch contours to be extracted as the training data, which relies on techniques and is not able to be manually labeled. With the Duration Informed Attention Network (DurIAN), this paper makes use of musical note instead of pitch contours for expressive opera singing synthesis. The proposed method enables human annotation being combined with automatic extracted features to be used as training data thus the proposed method gives extra flexibility in data collection for Peking opera singing synthesis. Comparing with the expressive singing voice of Peking opera synthesised by pitch contour based system, the proposed musical note based system produces comparable singing voice in Peking opera with expressiveness in various aspects.
The Peking University Integral Split Ring Radio Frequency Quadrupole(ISR RFQ) accelerator was constructed in 1999 with a high duty factor 16.7% and repetition frequency 166Hz, and it was able to accelerate N+, O+,O-, C+ and He+ from 1.6kev/u to 65keV/u. It was later upgraded as an injector of the Separated Function RFQ (SFRFQ). The experiments indicated that the maximum accelerated O+ beam current could exceed 3.2mA with energy 1.03MeV and an energy spread (FWHM) 3.1%. Then the beam transports through a 1m-long magnetic triplet to the entrance of SFRFQ and is finally accelerated to 1.64MeV. The beam conditioning of RFQ were carefully optimized to satisfy the requirements of the SFRFQ. The combined accelerator eventually can deliver 0.53mA O+ beam with energy 1.65MeV, which has sufficiently demonstrated the feasibility of the SFRFQ structure.
Photoinjectors are widely used for linear accelerators as electron sources to generate high-brightness electron beam. Drive laser, which determines the timing structure and quality of the electron beam, is a crucial device of photoinjector. A new drive laser system has been designed and constructed for the upgraded 3.5-cell DC-SRF photoinjector at Peking University. The drive laser system consists of a 1064 nm laser oscillator, a four- stage amplifier, the second and fourth harmonic generators, the optical system to transfer the UV pulses to the photocathode, and the synchronization system. The drive laser system has been successfully applied in the stable operation of DC-SRF photoinjector and its performance meets the requirements. 266 nm laser with an average power close to 1W can be delivered to illuminate the Cs2Te photocathode and the instability is less than 5% for long time operation. The design consideration for improving the UV laser quality, a detailed description of laser system, and its performance are presented in this paper.
We introduce the LAMOST Stellar Parameter Pipeline at Peking University --- LSP3, developed and implemented for the determinations of radial velocity $V_{\rm r}$ and stellar atmospheric parameters (effective temperature $T_{\rm eff}$, surface gravity log\,$g$, metallicity [Fe/H]) for the LAMOST Spectroscopic Survey of the Galactic Anti-center (LSS-GAC). We describe the algorithms of LSP3 and examine the accuracy of parameters yielded by it. The precision and accuracy of parameters yielded are investigated by comparing results of multi-epoch observations and of candidate members of open and globular clusters, with photometric calibration, as well as with independent determinations available from a number of external databases, including the PASTEL archive, the APOGEE, SDSS and RAVE surveys, as well as those released in the LAMOST DR1. The uncertainties of LSP3 parameters are characterized and quantified as a function of the spectral signal-to-noise ratio (SNR) and stellar atmospheric parameters. We conclude that the current implementation of LSP3 has achieved an accuracy of 5.0\,km\,s$^{-1}$, 150\,K, 0.25\,dex, 0.15\,dex for the radial velocity, effective temperature, surface gravit
Hyperkalemia is a life-threatening electrolyte disorder that is common in patients with chronic kidney disease and heart failure, yet frequent monitoring remains difficult outside hospital settings. We developed and validated Pocket-K, a single-lead AI-ECG system initialized from the ECGFounder foundation model for non-invasive hyperkalemia screening and handheld deployment. In this multicentre observational study using routinely collected clinical ECG and laboratory data, 34,439 patients contributed 62,290 ECG--potassium pairs. Lead I data were used to fine-tune the model. Data from Peking University People's Hospital were divided into development and temporal validation sets, and data from The Second Hospital of Tianjin Medical University served as an independent external validation set. Hyperkalemia was defined as venous serum potassium > 5.5 mmol/L. Pocket-K achieved AUROCs of 0.936 in internal testing, 0.858 in temporal validation, and 0.808 in external validation. For KDIGO-defined moderate-to-severe hyperkalemia (serum potassium >= 6.0 mmol/L), AUROCs increased to 0.940 and 0.861 in the temporal and external sets, respectively. External negative predictive value exceed
Attention Deficit Hyperactivity Disorder (ADHD) is a highly prevalent neurodevelopmental condition; however, its neurobiological diagnosis remains challenging due to the lack of reliable imaging-based biomarkers, particularly anatomical markers. Structural MRI (sMRI) provides a non-invasive modality for investigating brain alterations associated with ADHD; nevertheless, most deep learning approaches function as black-box systems, limiting clinical trust and interpretability. In this work, we propose DuSCN-FusionNet, an interpretable sMRI-based framework for ADHD classification that leverages dual-channel Structural Covariance Networks (SCNs) to capture inter-regional morphological relationships. ROI-wise mean intensity and intra-regional variability descriptors are used to construct intensity-based and heterogeneity-based SCNs, which are processed through an SCN-CNN encoder. In parallel, auxiliary ROI-wise variability features and global statistical descriptors are integrated via late-stage fusion to enhance performance. The model is evaluated using stratified 10-fold cross-validation with a 5-seed ensemble strategy, achieving a mean balanced accuracy of 80.59% and an AUC of 0.778
Performance artforms like Peking opera face transmission challenges due to the extensive passive listening required to understand their nuance. To create engaging forms of experiencing auditory Intangible Cultural Heritage (ICH), we designed a spatial interaction-based segmented-audio (SISA) Virtual Reality system that transforms passive ICH experiences into active ones. We undertook: (1) a co-design workshop with seven stakeholders to establish design requirements, (2) prototyping with five participants to validate design elements, and (3) user testing with 16 participants exploring Peking Opera. We designed transformations of temporal music into spatial interactions by cutting sounds into short audio segments, applying t-SNE algorithm to cluster audio segments spatially. Users navigate through these sounds by their similarity in audio property. Analysis revealed two distinct interaction patterns (Progressive and Adaptive), and demonstrated SISA's efficacy in facilitating active auditory ICH engagement. Our work illuminates the design process for enriching traditional performance artform using spatially-tuned forms of listening.
Attention Deficit Hyperactivity Disorder (ADHD) is a prevalent neurodevelopmental disorder whose neuroimaging-based diagnosis remains challenging due to complex time-varying disruptions in brain connectivity. Functional MRI (fMRI) provides a powerful non-invasive modality for identifying functional alterations. Existing deep learning (DL) studies employ diverse neuroimaging features; however, static functional connectivity remains widely used, whereas dynamic connectivity modeling is comparatively underexplored. Moreover, many DL models lack interpretability. In this work, we propose D-GATNet, an interpretable temporal graph-based framework for automated ADHD classification using dynamic functional connectivity (dFC). Sliding-window Pearson correlation constructs sequences of functional brain graphs with regions of interest as nodes and connectivity strengths as edges. Spatial dependencies are learned via a multi-layer Graph Attention Network, while temporal dynamics are modeled using 1D convolution followed by temporal attention. Interpretability is achieved through graph attention weights revealing dominant ROI interactions, ROI importance scores identifying influential regions,
Large Vision-Language Models (LVLMs) can reason from image-text inputs and perform well in various multimodal tasks. Despite this success, they are affected by language priors and often produce hallucinations. Hallucinations denote generated content that is grammatically and syntactically coherent, yet bears no match or direct relevance to visual input. To address this problem, we propose Residual Decoding (ResDec). It is a novel training-free method that uses historical information to aid decoding. The method relies on the internal implicit reasoning mechanism and token logits evolution mechanism of LVLMs to correct biases. Extensive experiments demonstrate that ResDec effectively suppresses hallucinations induced by language priors, significantly improves visual grounding, and reduces object hallucinations. In addition to mitigating hallucinations, ResDec also performs exceptionally well on comprehensive LVLM benchmarks, highlighting its broad applicability.
Dialect speech embodies rich cultural and linguistic diversity, yet building text-to-speech (TTS) systems for dialects remains challenging due to scarce data, inconsistent orthographies, and complex phonetic variation. To address these issues, we present DiaMoE-TTS, a unified IPA-based framework that standardizes phonetic representations and resolves grapheme-to-phoneme ambiguities. Built upon the F5-TTS architecture, the system introduces a dialect-aware Mixture-of-Experts (MoE) to model phonological differences and employs parameter-efficient adaptation with Low-Rank Adaptors (LoRA) and Conditioning Adapters for rapid transfer to new dialects. Unlike approaches dependent on large-scale or proprietary resources, DiaMoE-TTS enables scalable, open-data-driven synthesis. Experiments demonstrate natural and expressive speech generation, achieving zero-shot performance on unseen dialects and specialized domains such as Peking Opera with only a few hours of data.
Due to time constraints, mental health professionals in China are unable to offer patients prolonged talk therapy, leaving a gap in care for patients with psychological disorders, including aberrant sleep and eating patterns, maladaptive explanatory styles, and gastrointestinal dysfunction. To bridge this gap in care and address these problems in a large-scale manner, we built NeuroPal, a large language model (LLM)-assistant that provides scalable, evidence-based interventions with three clinically validated modules: (1) a sleep chronotherapy planner to output personalized circadian rhythm correction protocols, (2) a cognitive-behavioral reframing engine grounded in CBT and humanistic principles to shift negative attributional biases, and (3) a biochemical regulation advisor to output phytotherapy formulations to regulate sleep-metabolism-gut-axis imbalances. In collaboration with Peking Union Medical College Hospital and Xiangya Hospital Central South University, we ran an RCT protocol with 513 participants with mood/anxiety disorders and showed statistically significant improvements towards primary endpoints (> p<.01). Experiment shows 37.2% drop in the Pittsburgh Sleep Qua
Foundational models have emerged as powerful tools for addressing various tasks in clinical settings. However, their potential development to breast ultrasound analysis remains untapped. In this paper, we present BUSGen, the first foundational generative model specifically designed for breast ultrasound image analysis. Pretrained on over 3.5 million breast ultrasound images, BUSGen has acquired extensive knowledge of breast structures, pathological features, and clinical variations. With few-shot adaptation, BUSGen can generate repositories of realistic and informative task-specific data, facilitating the development of models for a wide range of downstream tasks. Extensive experiments highlight BUSGen's exceptional adaptability, significantly exceeding real-data-trained foundational models in breast cancer screening, diagnosis, and prognosis. In breast cancer early diagnosis, our approach outperformed all board-certified radiologists (n=9), achieving an average sensitivity improvement of 16.5% (P-value<0.0001). Additionally, we characterized the scaling effect of using generated data which was as effective as the collected real-world data for training diagnostic models. Moreove
We investigate in this paper the so-called pointed Shafarevich problem for families of primitive symplectic varieties. More precisely, for any fixed pointed curve $(B, 0)$ and any fixed primitive symplectic variety $X$, among all locally trivial families of $\mathbb{Q}$-factorial and terminal primitive symplectic varieties over $B$ whose fiber over $0$ is isomorphic to $X$, we show that there are only finitely many isomorphism classes of generic fibers. Moreover, assuming semi-ampleness of isotropic nef divisors, which holds true for all hyper-Kähler manifolds of known deformation types, we show that there are only finitely many such projective families up to isomorphism. These results are optimal since we can construct infinitely many pairwise non-isomorphic (not necessarily projective) families of smooth hyper-Kähler varieties over some pointed curve $(B, 0)$ such that they are all isomorphic over the punctured curve $B\backslash \{0\}$ and have isomorphic fibers over the base point $0$.
Objective: This study introduces ChatSchema, an effective method for extracting and structuring information from unstructured data in medical paper reports using a combination of Large Multimodal Models (LMMs) and Optical Character Recognition (OCR) based on the schema. By integrating predefined schema, we intend to enable LMMs to directly extract and standardize information according to the schema specifications, facilitating further data entry. Method: Our approach involves a two-stage process, including classification and extraction for categorizing report scenarios and structuring information. We established and annotated a dataset to verify the effectiveness of ChatSchema, and evaluated key extraction using precision, recall, F1-score, and accuracy metrics. Based on key extraction, we further assessed value extraction. We conducted ablation studies on two LMMs to illustrate the improvement of structured information extraction with different input modals and methods. Result: We analyzed 100 medical reports from Peking University First Hospital and established a ground truth dataset with 2,945 key-value pairs. We evaluated ChatSchema using GPT-4o and Gemini 1.5 Pro and found a h
Large language models (LLMs) like ChatGPT show excellent capabilities in various natural language processing tasks, especially for text generation. The effectiveness of LLMs in summarizing radiology report impressions remains unclear. In this study, we explore the capability of eight LLMs on the radiology report impression summarization. Three types of radiology reports, i.e., CT, PET-CT, and Ultrasound reports, are collected from Peking University Cancer Hospital and Institute. We use the report findings to construct the zero-shot, one-shot, and three-shot prompts with complete example reports to generate the impressions. Besides the automatic quantitative evaluation metrics, we define five human evaluation metrics, i.e., completeness, correctness, conciseness, verisimilitude, and replaceability, to evaluate the semantics of the generated impressions. Two thoracic surgeons (ZSY and LB) and one radiologist (LQ) compare the generated impressions with the reference impressions and score each impression under the five human evaluation metrics. Experimental results show that there is a gap between the generated impressions and reference impressions. Although the LLMs achieve comparable
This study investigates the utility of speech signals for AI-based depression screening across varied interaction scenarios, including psychiatric interviews, chatbot conversations, and text readings. Participants include depressed patients recruited from the outpatient clinics of Peking University Sixth Hospital and control group members from the community, all diagnosed by psychiatrists following standardized diagnostic protocols. We extracted acoustic and deep speech features from each participant's segmented recordings. Classifications were made using neural networks or SVMs, with aggregated clip outcomes determining final assessments. Our analysis across interaction scenarios, speech processing techniques, and feature types confirms speech as a crucial marker for depression screening. Specifically, human-computer interaction matches clinical interview efficacy, surpassing reading tasks. Segment duration and quantity significantly affect model performance, with deep speech features substantially outperforming traditional acoustic features.
GWnext 2024 was a meeting held in the Kavli Institute for Astronomy and Astrophysics at Peking University in March $4^\text{th} - 8^\text{th}$, 2024. In the meeting researchers at different career stages -- with a particular focus on early career scientists -- working on the different aspects of gravitational wave (GW) astronomy gathered to discuss the current status as well as prospects of the field. The meeting was divided into three core sessions: Astrophysics, GW Theory, and Detection. Each session consisted of introductory talks and extended discussion sessions. Moreover, there was a poster session where students could present their results. In this paper, we summarize the results presented during the meeting and present the most important outcomes.
We study the modulated Korteweg-de~Vries equation (KdV) on the circle with a time non-homogeneous modulation acting on the linear dispersion term. By adapting the normal form approach to the modulated setting, we prove sharp unconditional uniqueness of solutions to the modulated KdV in $L^2(\mathbb T)$ if a modulation is sufficiently irregular. For example, this result implies that if the modulation is given by a sample path of a fractional Brownian motion with Hurst index $0 < H < \frac 25$, the modulated KdV on the circle is unconditionally well-posed in $L^2(\mathbb T)$. Our normal form approach provides the construction of solutions to the modulated KdV (and the associated nonlinear Young integral) {\it without} assuming any positive regularity in time. As an interesting byproduct of our normal form approach, we extend the construction of the nonlinear Young integral to a much larger class of functions, and obtain an improved Euler approximation scheme as compared to the classical sewing lemma approach. We also establish analogous sharp unconditional uniqueness results for the modulated Benjamin-Ono equation and the modulated derivative nonlinear Schrödinger equation (NLS
The atmospheres within our Solar System can be categorized into four distinct climate regimes: "terrestrial", "Jovian", "condensable", and "exosphere". Beyond the three terrestrial planets (excluding Mercury) and the four giant planets, collisional atmospheres are also found on smaller celestial bodies such as Jupiter's moon Io, Saturn's moon Titan, Neptune's moon Triton, and Pluto. This article reviews the key characteristics of these atmospheres and the underlying physical and chemical processes that govern them. I focus on their thermal structures, chemical constituents, wind patterns, and the origins and losses of the atmospheres, and highlight the critical roles of surface ices and liquids, atmospheric hazes, and the space environments of their host planets in shaping these atmospheres. I dedicated this article to Prof. Zuo Xiao (1936-2024) at Peking University.