As plots play a critical role in modern data visualization and analysis, Plot2API is launched to help non-experts and beginners create their desired plots by directly recommending graphical APIs from reference plot images by neural networks. However, previous works on Plot2API have primarily focused on the recommendation for standard plot images, while overlooking the hand-drawn plot images that are more accessible to non-experts and beginners. To make matters worse, both Plot2API models trained on standard plot images and powerful multi-modal large language models struggle to effectively recommend APIs for hand-drawn plot images due to the domain gap and lack of expertise. To facilitate non-experts and beginners, we introduce a hand-drawn plot dataset named HDpy-13 to improve the performance of graphical API recommendations for hand-drawn plot images. Additionally, to alleviate the considerable strain of parameter growth and computational resource costs arising from multi-domain and multi-language challenges in Plot2API, we propose Plot-Adapter that allows for the training and storage of separate adapters rather than requiring an entire model for each language and domain. In parti
Story plots, while short, carry most of the essential information of a full story that may contain tens of thousands of words. We study the problem of automatic generation of story plots, which includes story premise, character descriptions, plot outlines, etc. To generate a single engaging plot, existing plot generators (e.g., DOC (Yang et al., 2022a)) require hundreds to thousands of calls to LLMs (e.g., OpenAI API) in the planning stage of the story plot, which is costly and takes at least several minutes. Moreover, the hard-wired nature of the method makes the pipeline non-differentiable, blocking fast specialization and personalization of the plot generator. In this paper, we propose three models, $\texttt{OpenPlot}$, $\texttt{E2EPlot}$ and $\texttt{RLPlot}$, to address these challenges. $\texttt{OpenPlot}$ replaces expensive OpenAI API calls with LLaMA2 (Touvron et al., 2023) calls via careful prompt designs, which leads to inexpensive generation of high-quality training datasets of story plots. We then train an end-to-end story plot generator, $\texttt{E2EPlot}$, by supervised fine-tuning (SFT) using approximately 13000 story plots generated by $\texttt{OpenPlot}$. $\texttt{
Matrix-valued data, where each observation is represented as a matrix, frequently arises in various scientific disciplines. Modeling such data often relies on matrix-variate normal distributions, making matrix-variate normality testing crucial for valid statistical inference. Recently, the Distance-Distance (DD) plot has been introduced as a graphical tool for visually assessing matrix-variate normality. However, the Mahalanobis squared distances (MSD) used in the DD plot require vectorizing matrix observations, restricting its applicability to cases where the dimension of the vectorized data does not exceed the sample size. To address this limitation, we propose a novel graphical method called the Matrix Healy (MHealy) plot, an extension of the Healy plot for vector-valued data. This new plot is based on more accurate matrix-based MSD that leverages the inherent structure of matrix data. Consequently, it offers a more reliable visual assessment. Importantly, the MHealy plot eliminates the sample size restriction of the DD plot and hence more applicable to matrix-valued data. Empirical results demonstrate its effectiveness and practicality compared to the DD plot across various sce
A very common task in data visualization is to plot many data points with some measured y-value as a function of fixed x-values. Uncertainties on the y-values are typically presented as vertical error bars that represent either a Frequentist confidence interval or Bayesian credible interval for each data point. Most of the time, these error bars represent a 68\% confidence/credibility level, which leads to the intuition that a model fits the data reasonably well if its prediction lies within the error bars of roughly two thirds of the data points. Unfortunately, this and other intuitions no longer work when the uncertainties of the data points are correlated. If the error bars only show the square root of diagonal elements of some covariance matrix with non-negligible off-diagonal elements, we simply do not have enough information in the plot to judge whether a drawn model line agrees well with the data or not. In this paper we will demonstrate this problem and discuss ways to add more information to the plots to make it easier to judge the agreement between the data and some model prediction in the plot, as well as glean some insight where the model might be deficient. This is don
Preference learning in Large Language Models (LLMs) has advanced significantly, yet existing methods remain limited by modest performance gains, high computational costs, hyperparameter sensitivity, and insufficient modeling of global token-level relationships. We introduce PLOT, which enhances Preference Learning in fine-tuning-based alignment through a token-level loss derived from Optimal Transport. By formulating preference learning as an Optimal Transport Problem, PLOT aligns model outputs with human preferences while preserving the original distribution of LLMs, ensuring stability and robustness. Furthermore, PLOT leverages token embeddings to capture semantic relationships, enabling globally informed optimization. Experiments across two preference categories - Human Values and Logic & Problem Solving - spanning seven subpreferences demonstrate that PLOT consistently improves alignment performance while maintaining fluency and coherence. These results substantiate optimal transport as a principled methodology for preference learning, establishing a theoretically grounded framework that provides new insights for preference learning of LLMs.
In this paper we demonstrate a new advance in causal Bayesian graphical modelling combined with Adversarial Risk Analysis. This research aims to support strategic analyses of various defensive interventions to counter the threat arising from plots of an adversary. These plots are characterised by a sequence of preparatory phases that an adversary must necessarily pass through to achieve their hostile objective. To do this we first define a new general class of plot models. Then we demonstrate that this is a causal graphical family of models - albeit with a hybrid semantic. We show this continues to be so even in this adversarial setting. It follows that this causal graph can be used to guide a Bayesian decision analysis to counter the adversary's plot. We illustrate the causal analysis of a plot with details of a decision analysis designed to frustrate the progress of a planned terrorist attack.
Graphics play a crucial role in statistical analysis and data mining. This paper describes metrics developed to assist the use of lineups for making inferential statements. Lineups embed the plot of the data among a set of null plots, and engage a human observer to select the plot that is most different from the rest. If the data plot is selected it corresponds to the rejection of a null hypothesis. Metrics are calculated in association with lineups, to measure the quality of the lineup, and help to understand what people see in the data plots. The null plots represent a finite sample from a null distribution, and the selected sample potentially affects the ease or difficulty of a lineup. Distance metrics are designed to describe how close the true data plot is to the null plots, and how close the null plots are to each other. The distribution of the distance metrics is studied to learn how well this matches to what people detect in the plots, the effect of null generating mechanism and plot choices for particular tasks. The analysis was conducted on data that has already been collected from Amazon Turk studies conducted with lineups for studying an array of data analysis tasks.
Retrieving relevant plots from the book for a query is a critical task, which can improve the reading experience and efficiency of readers. Readers usually only give an abstract and vague description as the query based on their own understanding, summaries, or speculations of the plot, which requires the retrieval model to have a strong ability to estimate the abstract semantic associations between the query and candidate plots. However, existing information retrieval (IR) datasets cannot reflect this ability well. In this paper, we propose Plot Retrieval, a labeled dataset to train and evaluate the performance of IR models on the novel task Plot Retrieval. Text pairs in Plot Retrieval have less word overlap and more abstract semantic association, which can reflect the ability of the IR models to estimate the abstract semantic association, rather than just traditional lexical or semantic matching. Extensive experiments across various lexical retrieval, sparse retrieval, dense retrieval, and cross-encoder methods compared with human studies on Plot Retrieval show current IR models still struggle in capturing abstract semantic association between texts. Plot Retrieval can be the benc
There are plenty of excellent plotting libraries. Each excels at a different use case: one is good for printed 2D publication figures, the other at interactive 3D graphics, a third has excellent L A TEX integration or is good for creating dashboards on the web. The aim of Plots.jl is to enable the user to use the same syntax to interact with many different plotting libraries, such that it is possible to change the library "backend" without needing to touch the code that creates the content -- and without having to learn yet another application programming interface (API). This is achieved by the separation of the plot specification from the implementation of the actual graphical backend. These plot specifications may be extended by a "recipe" system, which allows package authors and users to define how to plot any new type (be it a statistical model, a map, a phylogenetic tree or the solution to a system of differential equations) and create new types of plots -- without depending on the Plots.jl package. This supports a modular ecosystem structure for plotting and yields a high reuse potential across the entire julia package ecosystem. Plots.jl is publicly available at https://git
I highlight that there is a substantial number of papers (at least 11 published since 2024) which all refer to a specific type of plot as an "Allan variance" plot, when in fact they seem to be plotting the standard deviation of the residuals versus bin size. The Allan variance quantifies the stability of a time series by calculating the average squared difference between successive time-averaged segments over a specified interval; it is not equivalent to the standard deviation. This misattribution seems particularly prolific in the exoplanet transit spectroscopy community. However, I emphasize that it does not impact the scientific analyses presented in those works. I discuss where this confusion seems to stem from and encourage the community to ensure statistical measures are named correctly to avoid confusion.
Recurrence is a fundamental property of dynamical systems, which can be exploited to characterise the system's behaviour in phase space. A powerful tool for their visualisation and analysis called recurrence plot was introduced in the late 1980's. This report is a comprehensive overview covering recurrence based methods and their applications with an emphasis on recent developments. After a brief outline of the theory of recurrences, the basic idea of the recurrence plot with its variations is presented. This includes the quantification of recurrence plots, like the recurrence quantification analysis, which is highly effective to detect, e. g., transitions in the dynamics of systems from time series. A main point is how to link recurrences to dynamical invariants and unstable periodic orbits. This and further evidence suggest that recurrences contain all relevant information about a system's behaviour. As the respective phase spaces of two systems change due to coupling, recurrence plots allow studying and quantifying their interaction. This fact also provides us with a sensitive tool for the study of synchronisation of complex systems. In the last part of the report several applic
A powerful tool in control and systems engineering is represented by Nyquist plots, for which a qualitative representation often gives a clearer visualization of the frequency response function that is typically not given by computer programs, especially if portions of the Nyquist plot extend to infinity. This letter addresses the graphical analysis of the frequency response function, with the objective of enhancing the procedure for the qualitative construction of Nyquist plots. Several results supported by analytical proofs are derived for what concerns the low and high frequency behavior, which enable to improve the qualitative construction of Nyquist plots in the vicinity of the initial and final points.
In field electron emission (FE) studies, it is important to check and analyse the quality and validity of results experimentally obtained from samples, using suitably plotted current-voltage [I(V)] measurements. For the traditional plotting method, the Fowler-Nordheim (FN) plot, there exists a so-called "orthodoxy test" that can be applied to the FN plot, in order to check whether the FE device/system generating the results is "ideal". If it is not ideal, then emitter characterization parameters deduced from the FN plot are likely to be spurious. A new form of FE I(V) data plot, the so-called "Murphy-Good (MG) plot" has recently been introduced (R.G. Forbes, Roy. Soc. Open Sci. 6 (2019) 190912. This aims to improve the precision with which characterization-parameter values (particularly values of formal emission area) can be extracted from FE I(V) data. The present paper compares this new plotting form with the older FN and Millikan-Lauritsen (ML) forms, and makes an independent assessment of the consistency with which slope (and hence scaled-field) estimates can be extracted from a MG plot. It is shown that, by using a revised formula for the extraction of scaled-field values, the
Diagonal lines in symbolic recurrence plots are closely related to the identification and characterization of specific biprolongable words within a sequence. In this paper we focus on the recurrence plot of a fixed point of a uniform binary substitution. We show that, if the substitution is primitive and aperiodic, the set of all diagonal line lengths of the recurrence plot has zero density. However, if a line of a specific length exists in the recurrence plot, the density of (the set of starting points of) all diagonal lines with that length is strictly positive. On the other hand, we demonstrate that the recurrence plot of a non-primitive substitution contains lines of any given length. Nonetheless, for any given length, the density of lines with that length is zero.
We propose Blue Noise Plots, two-dimensional dot plots that depict data points of univariate data sets. While often one-dimensional strip plots are used to depict such data, one of their main problems is visual clutter which results from overlap. To reduce this overlap, jitter plots were introduced, whereby an additional, non-encoding plot dimension is introduced, along which the data point representing dots are randomly perturbed. Unfortunately, this randomness can suggest non-existent clusters, and often leads to visually unappealing plots, in which overlap might still occur. To overcome these shortcomings, we introduce BlueNoise Plots where random jitter along the non-encoding plot dimension is replaced by optimizing all dots to keep a minimum distance in 2D i. e., Blue Noise. We evaluate the effectiveness as well as the aesthetics of Blue Noise Plots through both, a quantitative and a qualitative user study. The Python implementation of Blue Noise Plots is available here.
In this paper, we introduce the cyclic polygon plot, a representation based on a novel projection concept for multi-dimensional values. Cyclic polygon plots combine the typically competing requirements of quantitativeness, image-space efficiency, and readability. Our approach is complemented with a placement strategy based on its intrinsic features, resulting in a dimensionality reduction strategy that is consistent with our overall concept. As a result, our approach combines advantages from dimensionality reduction techniques and quantitative plots, supporting a wide range of tasks in multi-dimensional data analysis. We examine and discuss the overall properties of our approach, and demonstrate its utility with a user study and selected examples.
TOPCAT is a desktop GUI tool for working with tabular data such as source catalogues. Among other capabilities it provides a rich set of visualisation options suitable for interactive exploration of large datasets. The latest release introduces a Corner Plot window which displays a grid of linked scatter-plot-like and histogram-like plots for all pair and single combinations from a supplied list of coordinates.
The trace plot is seldom used in meta-analysis, yet it is a very informative plot. In this article we define and illustrate what the trace plot is, and discuss why it is important. The Bayesian version of the plot combines the posterior density of tau, the between-study standard deviation, and the shrunken estimates of the study effects as a function of tau. With a small or moderate number of studies, tau is not estimated with much precision, and parameter estimates and shrunken study effect estimates can vary widely depending on the correct value of tau. The trace plot allows visualization of the sensitivity to tau along with a plot that shows which values of tau are plausible and which are implausible. A comparable frequentist or empirical Bayes version provides similar results. The concepts are illustrated using examples in meta-analysis and meta-regression; implementaton in R is facilitated in a Bayesian or frequentist framework using the bayesmeta and metafor packages, respectively.
We propose a mean functional which exists for any probability distributions, and which characterizes the Pareto distribution within the set of distributions with finite left endpoint. This is in sharp contrast to the mean excess plot which is not meaningful for distributions without existing mean, and which has a nonstandard behaviour if the mean is finite, but the second moment does not exist. The construction of the plot is based on the so called principle of a single huge jump, which differentiates between distributions with moderately heavy and super heavy tails. We present an estimator of the tail function based on $U$-statistics and study its large sample properties. The use of the new plot is illustrated by several loss datasets.
When a structure displays dependence on distance and azimuthal angle from a center (for example the spiral arms of galaxies or the diffraction spikes of stars), projecting the pixels to polar coordinates greatly simplifies their study. This projection from one pixel grid to another is known as a "polar plot". For this purpose, a new option has been added to the GNU Astronomy Utilities (Gnuastro) in version 0.23 to "astscript-radial-profile" script, which we describe in this research note. The figures of this research note are reproducible with Maneage, on the Git commit 5d34243.