We study the statistics of record-breaking events in daily stock prices of 366 stocks from the Standard and Poors 500 stock index. Both the record events in the daily stock prices themselves and the records in the daily returns are discussed. In both cases we try to describe the record statistics of the stock data with simple theoretical models. The daily returns are compared to i.i.d. RV's and the stock prices are modeled using a biased random walk, for which the record statistics are known. These models agree partly with the behavior of the stock data, but we also identify several interesting deviations. Most importantly, the number of records in the stocks appears to be systematically decreased in comparison with the random walk model. Considering the autoregressive AR(1) process, we can predict the record statistics of the daily stock prices more accurately. We also compare the stock data with simulations of the record statistics of the more complicated GARCH(1,1) model, which, in combination with the AR(1) model, gives the best agreement with the observational data. To better understand our findings, we discuss the survival and first-passage times of stock prices on certain in
The longest increasing subsequence (LIS) of a random walk has been studied mainly for zero-mean, symmetric step increments. We numerically investigate the LIS of biased Gaussian random walks, with unit-variance increments and positive drift $μ_{p} = Φ^{-1}(p)$, where $p = P(ξ>0)$. In contrast with the symmetric case, we find that for every fixed $p>1/2$ the mean LIS length grows linearly, $\langle L_{n}(p)\rangle \sim a(p)n$, with $a(p)$ increasing from $0$ at $p=1/2$ to $1$ as $p \to 1$. The record count is also linear, with coefficient $λ(p)$ fixed by Spitzer's formula for the ascending ladder epoch, and the LIS becomes increasingly aligned with this record skeleton as $p$ grows. At the symmetric point $p=1/2$, the record skeleton collapses to the Sparre Andersen $\sqrt{n}$ scale, while the LIS returns to the finite-variance $\sqrt{n}\log{n}$ regime. Near this limit the record rate has the closed-form small-drift slope $λ(μ_{p}) \simeq \sqrt{2}\,μ_{p}$, whereas the excess $a(μ_{p})-λ(μ_{p})$ vanishes more slowly than linearly in the drift, although our data do not resolve a single power law. The empirical distribution of $L_{n}$ also changes across this point, from lognorma
We employ the novel theory of heterogeneous extreme value statistics to accurately estimate the ultimate world records for the 100-m running race, for men and for women. For this aim we collected data from 1991 through 2023 from thousands of top athletes, using multiple fast times per athlete. We consider the left endpoint of the probability distribution of the running times of a top athlete and define the ultimate world record as the minimum, over all top athletes, of all these endpoints. For men we estimate the ultimate world record to be 9.56 seconds. More prudently, employing this heterogeneous extreme value theory we construct an accurate asymptotic 95% lower confidence bound on the ultimate world record of 9.49 seconds, still quite close to the present world record of 9.58. For the women's 100-meter dash our point estimate of the ultimate world record is 10.34 seconds, somewhat lower than the world record of 10.49. The more prudent 95% lower confidence bound on the women's ultimate world record is 10.20.
For a zero-mean, unit-variance second-order stationary univariate Gaussian process we derive the probability that a record at the time $n$, say $X_n$, takes place and derive its distribution function. We study the joint distribution of the arrival time process of records and the distribution of the increments between the first and second record, and the third and second record and we compute the expected number of records. We also consider two consecutive and non-consecutive records, one at time $j$ and one at time $n$ and we derive the probability that the joint records $(X_j,X_n)$ occur as well as their distribution function. The probability that the records $X_n$ and $(X_j,X_n)$ take place and the arrival time of the $n$-th record, are independent of the marginal distribution function, provided that it is continuous. These results actually hold for a second-order stationary process with Gaussian copulas. We extend some of these results to the case of a multivariate Gaussian process. Finally, for a strictly stationary process satisfying some mild conditions on the tail behavior of the common marginal distribution function $F$ and the long-range dependence of the extremes of the p
This paper proves a conditional structural uniqueness theorem for induced weight on robust record sectors within an admissible Hilbert record layer. Its theorem target and additive carrier differ from those of the standard Born-rule routes: additivity is not placed on the full projector lattice, but on disjoint admissible continuation bundles through an extensive bundle valuation, from which the sector-level additive law is inherited under admissible refinement. Accordingly, the result is not a Gleason-type representation theorem in different language, but a distinct uniqueness theorem about induced sector weight inherited from bundle additivity on admissible continuation structure. Under two explicit structural conditions, internal equivalence of admissible binary refinement profiles and sufficient admissible refinement richness, the quadratic assignment is the only non-negative refinement-stable induced weight on robust record sectors. In the main theorem, refinement richness is secured by admissible binary saturation. A supplementary proposition shows that dense admissible saturation already suffices if continuity of the profile function is added. Under normalization, the result
In this paper we examine some relative orderings of upper and lower records. It is shown that if m > n, the mth upper record ages faster than the nth upper record, where the data sets come from a sequence of independent and identically distributed observations from a continuous distribution. Sufficient conditions are also obtained to see whether the mth upper record arisen from a continuous distribution ages faster in terms of the relative hazard rate than the nth upper record arisen from another continuous distribution. It is also shown that the reversed hazard rate of the mth lower record decreases faster than the reversed hazard rate of the nth lower record, when m > n. Preservation property of the relative reversed hazard rate order at lower record values is investigated. Several examples are presented to examine the results.
Veterinary medical records represent a large data resource for application to veterinary and One Health clinical research efforts. Use of the data is limited by interoperability challenges including inconsistent data formats and data siloing. Clinical coding using standardized medical terminologies enhances the quality of medical records and facilitates their interoperability with veterinary and human health records from other sites. Previous studies, such as DeepTag and VetTag, evaluated the application of Natural Language Processing (NLP) to automate veterinary diagnosis coding, employing long short-term memory (LSTM) and transformer models to infer a subset of Systemized Nomenclature of Medicine - Clinical Terms (SNOMED-CT) diagnosis codes from free-text clinical notes. This study expands on these efforts by incorporating all 7,739 distinct SNOMED-CT diagnosis codes recognized by the Colorado State University (CSU) Veterinary Teaching Hospital (VTH) and by leveraging the increasing availability of pre-trained language models (LMs). 13 freely-available pre-trained LMs were fine-tuned on the free-text notes from 246,473 manually-coded veterinary patient visits included in the CSU
Let $X_1,X_2,\dots$ be independent and identically distributed random variables on the real line with a joint continuous distribution function $F$. The stochastic behavior of the sequence of subsequent records is well known. Alternatively to that, we investigate the stochastic behavior of arbitrary $X_j,X_k,j<k$, under the condition that they are records, without knowing their orders in the sequence of records. The results are completely different. In particular it turns out that the distribution of $X_k$, being a record, is not affected by the additional knowledge that $X_j$ is a record as well. On the contrary, the distribution of $X_j$, being a record, is affected by the additional knowledge that $X_k$ is a record as well. If $F$ has a density, then the gain of this additional information, measured by the corresponding Kullback-Leibler distance, is $j/k$, independent of $F$. We derive the limiting joint distribution of two records, which is not a bivariate extreme value distribution. We extend this result to the case of three records. In a special case we also derive the limiting joint distribution of increments among records.
Veterinary electronic health records (vEHRs) contain privacy-sensitive identifiers that limit secondary use. While PetEVAL provides a benchmark for veterinary de-identification, the domain remains low-resource. This study evaluates whether large language model (LLM)-generated synthetic narratives improve de-identification safety under distinct training regimes, emphasizing (i) synthetic augmentation and (ii) fixed-budget substitution. We conducted a controlled simulation using a PetEVAL-derived corpus (3,750 holdout/1,249 train). We generated 10,382 synthetic notes using a privacy-preserving "template-only" regime where identifiers were removed prior to LLM prompting. Three transformer backbones (PetBERT, VetBERT, Bio_ClinicalBERT) were trained under varying mixtures. Evaluation prioritized document-level leakage rate (the fraction of documents with at least one missed identifier) as the primary safety outcome. Results show that under fixed-sample substitution, replacing real notes with synthetic ones monotonically increased leakage, indicating synthetic data cannot safely replace real supervision. Under compute-matched training, moderate synthetic mixing matched real-only performa
We analyze the occurrence and the values of record-breaking temperatures in daily and monthly temperature observations. Our aim is to better understand and quantify the statistics of temperature records in the context of global warming. Similar to earlier work we employ a simple mathematical model of independent and identically distributed random variables with a linearly growing expectation value. This model proved to be useful in predicting the increase (decrease) in upper (lower) temperature records in a warming climate. Using both station and re-analysis data from Europe and the United States we further investigate the statistics of temperature records and the validity of this model. The most important new contribution in this article is an analysis of the statistics of record values for our simple model and European reanalysis data. We estimate how much the mean values and the distributions of record temperatures are affected by the large scale warming trend. In this context we consider both the values of records that occur at a certain time and the values of records that have a certain record number in the series of record events. We compare the observational data both to sim
Recently, with increasing interest in pet healthcare, the demand for computer-aided diagnosis (CAD) systems in veterinary medicine has increased. The development of veterinary CAD has stagnated due to a lack of sufficient radiology data. To overcome the challenge, we propose a generative active learning framework based on a variational autoencoder. This approach aims to alleviate the scarcity of reliable data for CAD systems in veterinary medicine. This study utilizes datasets comprising cardiomegaly radiograph data. After removing annotations and standardizing images, we employed a framework for data augmentation, which consists of a data generation phase and a query phase for filtering the generated data. The experimental results revealed that as the data generated through this framework was added to the training data of the generative model, the frechet inception distance consistently decreased from 84.14 to 50.75 on the radiograph. Subsequently, when the generated data were incorporated into the training of the classification model, the false positive of the confusion matrix also improved from 0.16 to 0.66 on the radiograph. The proposed framework has the potential to address t
In 1949, Captain Alberto Larraguibel and his horse Huaso set the world record for equestrian high jump in Viña del Mar, Chile, by clearing a height of 2.47 meters, a mark that remains unbeaten. This work proposes the use of this historical event as a teaching resource for physics, integrating perspectives from biomechanics and veterinary medicine. Based on the analysis of an audiovisual record of the jump, a kinematic model is developed using the \textit{Tracker} software, determining variables such as displacement, velocity, and acceleration of the horse--rider system. The results make it possible to reflect on the biomechanical and physiological factors involved in animal performance, thus linking physics with real biological processes. It is proposed that this interdisciplinary approach, based on authentic cultural and scientific contexts, may promote meaningful learning, motivation, and a more comprehensive understanding of natural phenomena in science education.
Record statistics is the study of how new highs or lows are created and sustained in any dynamical process. The study of the highest or lowest records constitute the study of extreme values. This paper represents an exploration of record statistics for certain aspects of the classical and quantum standard map. For instance the momentum square or energy records is shown to behave like that of records in random walks when the classical standard map is in a regime of hard chaos. However different power laws is observed for the mixed phase space regimes. The presence of accelerator modes are well-known to create anomalous diffusion and we notice here that the record statistics is very sensitive to their presence. We also discuss records in random vectors and use it to analyze the {\it quantum} standard map via records in their eigenfunction intensities, reviewing some recent results along the way.
Asymptotic theories on record values and times, including central limit theorems, make sense only if the sequence of records values (and of record times) is infinite. If not, such theories could not even be an option. In this paper, we give necessary and/or sufficient conditions for the finiteness of the number of records. We prove, for example for \textsl{iid} real valued random variable, that strong upper record values are finite if and only if the upper endpoint is finite and is an atom of the common cumulative distribution function. The only asymptotic study left to us concerns the infinite sequence of hitting times of that upper endpoints, which by the way, is the sequence of weak record times. The asymptotic characterizations are made using negative binomial random variables and the dimensional multinomial random variables. Asymptotic comparison in terms of consistency bounds and confidence intervals on the different sequences of hitting times are provide. The example of a binomial random variable is given
Manual count of mitotic figures, which is determined in the tumor region with the highest mitotic activity, is a key parameter of most tumor grading schemes. It can be, however, strongly dependent on the area selection due to uneven mitotic figure distribution in the tumor section.We aimed to assess the question, how significantly the area selection could impact the mitotic count, which has a known high inter-rater disagreement. On a data set of 32 whole slide images of H&E-stained canine cutaneous mast cell tumor, fully annotated for mitotic figures, we asked eight veterinary pathologists (five board-certified, three in training) to select a field of interest for the mitotic count. To assess the potential difference on the mitotic count, we compared the mitotic count of the selected regions to the overall distribution on the slide.Additionally, we evaluated three deep learning-based methods for the assessment of highest mitotic density: In one approach, the model would directly try to predict the mitotic count for the presented image patches as a regression task. The second method aims at deriving a segmentation mask for mitotic figures, which is then used to obtain a mitotic
In recent years there has been a surge of interest in the statistics of record-breaking events in stochastic processes. Along with that, many new and interesting applications of the theory of records were discovered and explored. The record statistics of uncorrelated random variables sampled from time-dependent distributions was studied extensively. The findings were applied in various areas to model and explain record-breaking events in observational data. Particularly interesting and fruitful was the study of record-breaking temperatures and their connection with global warming, but also records in sports, biology and some areas in physics were considered in the last years. Similarly, researchers have recently started to understand the record statistics of correlated processes such as random walks, which can be helpful to model record events in financial time series. This review is an attempt to summarize and evaluate the progress that was made in the field of record statistics throughout the last years.
This paper shows the machine learning system which performs instance segmentation of cytological images in veterinary medicine. Eleven cell types were used directly and indirectly in the experiments, including damaged and unrecognized categories. The deep learning models employed in the system achieve a high score of average precision and recall metrics, i.e. 0.94 and 0.8 respectively, for the selected three types of tumors. This variety of label types allowed us to draw a meaningful conclusion that there are relatively few mistakes for tumor cell types. Additionally, the model learned tumor cell features well enough to avoid misclassification mistakes of one tumor type into another. The experiments also revealed that the quality of the results improves with the dataset size (excluding the damaged cells). It is worth noting that all the experiments were done using a custom dedicated dataset provided by the cooperating vet doctors.
We study records generated by Brownian particles in one dimension. Specifically, we investigate an ordinary random walk and define the record as the maximal position of the walk. We compare the record of an individual random walk with the mean record, obtained as an average over infinitely many realizations. We term the walk "superior" if the record is always above average, and conversely, the walk is said to be "inferior" if the record is always below average. We find that the fraction of superior walks, S, decays algebraically with time, S ~ t^(-beta), in the limit t --> infty, and that the persistence exponent is nontrivial, beta=0.382258.... The fraction of inferior walks, I, also decays as a power law, I ~ t^(-alpha), but the persistence exponent is smaller, alpha=0.241608.... Both exponents are roots of transcendental equations involving the parabolic cylinder function. To obtain these theoretical results, we analyze the joint density of superior walks with given record and position, while for inferior walks it suffices to study the density as function of position.
While the global integration of artificial intelligence (AI) into veterinary medicine is accelerating, its adoption dynamics in major markets such as China remain uncharacterized. This paper presents the first exploratory analysis of AI perception and adoption among veterinary professionals in China, based on a cross-sectional survey of 455 practitioners conducted in mid-2025. We identify a distinct "adoption paradox": although 71.0% of respondents have incorporated AI into their workflows, 44.6% of these active users report low familiarity with the technology. In contrast to the administrative-focused patterns observed in North America, adoption in China is practitioner-driven and centers on core clinical tasks, such as disease diagnosis (50.1%) and prescription calculation (44.8%). However, concerns regarding reliability and accuracy remain the primary barrier (54.3%), coexisting with a strong consensus (93.8%) for regulatory oversight. These findings suggest a unique "inside-out" integration model in China, characterized by high clinical utility but restricted by an "interpretability gap," underscoring the need for specialized tools and robust regulatory frameworks to safely har
World record setting has long attracted public interest and scientific investigation. Extremal records summarize the limits of the space explored by a process, and the historical progression of a record sheds light on the underlying dynamics of the process. Existing analyses of prediction, statistical properties, and ultimate limits of record progressions have focused on particular domains. However, a broad perspective on how record progressions vary across different spheres of activity needs further development. Here we employ cross-cutting metrics to compare records across a variety of domains, including sports, games, biological evolution, and technological development. We find that these domains exhibit characteristic statistical signatures in terms of rates of improvement, "burstiness" of record-breaking time series, and the acceleration of the record breaking process. Specifically, sports and games exhibit the slowest rate of improvement and a wide range of rates of "burstiness." Technology improves at a much faster rate and, unlike other domains, tends to show acceleration in records. Many biological and technological processes are characterized by constant rates of improvem