Let $Th(4,2n+1)=\widehat{β_{2n+1}}$ be the four-strand Turk's head knot, where $β_{2n+1}=σ_1σ_2^{-1}σ_3)^{2n+1}$. In our earlier work the Alexander polynomial of this family was reduced, after the substitution $t=-z$, to the factorization \[ A_{2n+1}(z)=(1+z+\cdots+z^{2n})D_n(z)^2, \qquad D_n(z)=\prod_{r=1}^n \left(z^2+4\sin^2\frac{πr}{2n+1}\,z+1\right). \] The remaining difficulty was to prove log-concavity of the coefficient sequence of $D_n(z)$. In this work we prove that the coefficient sequence of $D_n(z)$ is log-concave. The main new ingredient is a four-block smoothing theorem for products of reciprocal quartics $(1+az+z^2)(1+bz+z^2), \: 0\le a,b\le 4,\: a+b\ge 4.$ The smoothing theorem is proved by a finite exact positivity certificate using only integer arithmetic. Combining this smoothing theorem with the trigonometric pairing $r\leftrightarrow n+1-r$ proves that $D_n(z)$ is log-concave for all $n\ge1.$ It follows that the absolute values of the coefficients of the Alexander polynomial of $Th(4,2n+1)$ form a trapezoidal sequence. Thus Fox's trapezoidal conjecture holds for the entire family of four-strand Turk's head knots.
We collect and discuss various results on an important family of knots and links called Turk's head knots and links $Th (p,q)$. In the mathematical literature, they also appear under different names such as rosette knots and links or weaving knots and links. Unless being the unknot or the alternating torus links $T(2,q)$, the Turk's head links $Th (p,q)$ are all known to be alternating, fibered, hyperbolic, invertible, non-split, periodic, and prime. The Turk's head links $Th (p,q)$ are also both positive and negative amphichiral if $p$ is chosen to be odd. Moreover, we highlight and present several more results, focusing on Turk's head knots $Th (3,q)$. We finally list several open problems and conjectures for Turk's head knots and links. We conclude with a short appendix on torus knots and links, which might be of independent interest.
In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework as well as our recent efforts for unifying the Turkish UD treebanks through manual re-annotation. To the best of our knowledge, BOUN Treebank is the largest Turkish treebank. It contains a total of 9,761 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a state-of-the-art dependency parser obtained over the BOUN Treebank as well as two other treebanks in Turkish. Our results demonstrate that the unification of the Turkish annotation scheme and the introduction of a more comprehensive treebank lead to improved performance with regard to dependency parsin
Building machine translation (MT) test sets is a relatively expensive task. As MT becomes increasingly desired for more and more language pairs and more and more domains, it becomes necessary to build test sets for each case. In this paper, we investigate using Amazon's Mechanical Turk (MTurk) to make MT test sets cheaply. We find that MTurk can be used to make test sets much cheaper than professionally-produced test sets. More importantly, in experiments with multiple MT systems, we find that the MTurk-produced test sets yield essentially the same conclusions regarding system performance as the professionally-produced test sets yield.
In this study, we investigate the attentiveness exhibited by participants sourced through Amazon Mechanical Turk (MTurk), thereby discovering a significant level of inattentiveness amongst the platform's top crowd workers (those classified as 'Master', with an 'Approval Rate' of 98% or more, and a 'Number of HITS approved' value of 1,000 or more). A total of 564 individuals from the United States participated in our experiment. They were asked to read a vignette outlining one of four hypothetical technology products and then complete a related survey. Three forms of attention check (logic, honesty, and time) were used to assess attentiveness. Through this experiment we determined that a total of 126 (22.3%) participants failed at least one of the three forms of attention check, with most (94) failing the honesty check - followed by the logic check (31), and the time check (27). Thus, we established that significant levels of inattentiveness exist even among the most elite MTurk workers. The study concludes by reaffirming the need for multiple forms of carefully crafted attention checks, irrespective of whether participant quality is presumed to be high according to MTurk criteria s
In this paper, we present our system design for conducting longitudinal daily-task studies with the same workers throughout on Amazon Mechanical Turk. We implement this system to conduct a study into touch dynamics, and present our experiences, challenges and lessons learned from doing so. Study participants installed our application on their Apple iOS phones and completed two tasks daily for 31 days. Each task involves performing a series of scrolling or swiping gestures, from which behavioral information such as movement speed or pressure is extracted. The completion of the daily tasks did not require extra interaction with the Mechanical Turk platform, yet paid workers through it. This differs somewhat from the typical rapid completion of one-off tasks that workers are used to on Amazon Mechanical Turk. This atypical use of the platform prompted us to evaluate aspects related to long-term worker retention and engagement over the study period, in particular the impacts of payment schedule (amount and structure over time) and reminder notifications. We also investigate the specific concern of reconciling informed consent with workers' desire to complete tasks quickly. We find that
Large-Language Models like GPT-3 have the potential to enable HCI designers and researchers to create more human-like and helpful chatbots for specific applications. But evaluating the feasibility of these chatbots and designing prompts that optimize GPT-3 for a specific task is challenging. We present a case study in tackling these questions, applying GPT-3 to a brief 5-minute chatbot that anyone can talk to better manage their mood. We report a randomized factorial experiment with 945 participants on Mechanical Turk that tests three dimensions of prompt design to initialize the chatbot (identity, intent, and behaviour), and present both quantitative and qualitative analyses of conversations and user perceptions of the chatbot. We hope other HCI designers and researchers can build on this case study, for other applications of GPT-3 based chatbots to specific tasks, and build on and extend the methods we use for prompt design, and evaluation of the prompt design.
Supervised machine learning provides the learner with a set of input-output examples of the target task. Humans, however, can also learn to perform new tasks from instructions in natural language. Can machines learn to understand instructions as well? We present the Turking Test, which examines a model's ability to follow natural language instructions of varying complexity. These range from simple tasks, like retrieving the nth word of a sentence, to ones that require creativity, such as generating examples for SNLI and SQuAD in place of human intelligence workers ("turkers"). Despite our lenient evaluation methodology, we observe that a large pretrained language model performs poorly across all tasks. Analyzing the model's error patterns reveals that the model tends to ignore explicit instructions and often generates outputs that cannot be construed as an attempt to solve the task. While it is not yet clear whether instruction understanding can be captured by traditional language models, the sheer expressivity of instruction understanding makes it an appealing alternative to the rising few-shot inference paradigm.
Internal HITs on Mechanical Turk can be programmatically restrictive, and as a result, many requesters turn to using external HITs as a more flexible alternative. However, creating such HITs can be redundant and time-consuming. We present MmmTurkey, a framework that enables researchers to not only quickly create and manage external HITs, but more significantly also capture and record detailed worker behavioral data characterizing how each worker completes a given task.
We compute the Kauffman bracket polynomial of the three-lead Turk's head, the chain sinnet and the figure-eight chain shadow diagrams. Each of these knots can in fact be constructed by repeatedly concatenating the same 3-tangle, respectively, then taking the closure. The bracket is then evaluated by expressing the state diagrams of the concerned 3-tangle by means of the Kauffman monoid diagram's elements.
Two clinical AI systems can score nearly identically on coverage-based rubrics yet behave radically differently when their patient inputs change: one updates its recommendations to match the new clinical signal, while the other produces the same output regardless. We introduce the Causal Sensitivity Score (CSS), a pre-registered interventional metric that mutates oncology tumor-board cases along five clinically meaningful dimensions - biomarker flips, prior-treatment failures, biomarker removals, surgery-status changes, and stage perturbations - and scores whether each model updates its recommendations in the pre-registered correct direction using a {0, 0.5, 1.0} scale. Benchmarked against the Consensus Match Score (CMS), a coverage-based weighted recall metric, six frontier models from three labs evaluated in single-shot inference across 224 cases rank in nearly opposite orders: all six models change rank, the CMS-worst model becomes CSS-best, and one upper-mid CMS model ranks last on CSS. We further surface a universal safety blind spot: every frontier model fails on surgery-status interventions (at most 17.2% CSS on Family D), a finding CMS does not expose. The metric also trans
A game of chess is a stream: a time-ordered sequence of moves, each carrying an engine evaluation, a measure of accuracy, a measure of position complexity, and a clock reading. We model a game as a multivariate path and apply the signature transform of rough-path theory to obtain a reparametrization-invariant, graded feature set that records the order and interaction of in-game events without a parametric likelihood. We show that a player's law of play is identifiable from the expected signature up to tree-like equivalence, construct a signature-kernel two-sample test on path space, and recast cheating detection as an anytime-valid sequential test: a signature conformance score becomes an e-process whose error is controlled for every sample size at once by Ville's inequality, with fluctuations calibrated on the moderate-deviation scale. The discriminating information lives in the signature's Levy areas, which measure whether accuracy rises precisely when positions become hard--the fingerprint of engine assistance that aggregate match-rate statistics discard. In a controlled study the test holds exact type-I control and detection power rises from negligible for subtle assistance to
Agreement attraction errors, in which a verb erroneously agrees with an intervening noun rather than its grammatical head, are amplified by morphological syncretism in some languages (English, German, Russian) but not others (Turkish, Armenian), a cross-linguistic pattern without a principled account. We use surprisal and attention entropy from large language models as processing proxies to investigate this variation across four languages. LLM-derived measures replicate behavioral findings in English and German (syncretism modulates attraction), align with Turkish null results (no modulation), and partially capture Russian patterns. We discuss further directions for better understanding why syncretism affects agreement attraction differently across languages.
In this paper, we present our system for SemEval-2026 Task 6 (CLARITY) on response clarity and evasion detection in question-answer pairs from U.S. presidential interviews, comparing fine-tuned encoders with prompt-based LLMs. Our LLM ensemble achieves 80 macro-F1 on the 3-class Task 1 (9th/41) and 59 on the 9-class Task 2 (3rd/33). Across 8 transformer encoders optimized through a four-stage pipeline, partial encoder layer unfreezing outperforms full fine-tuning by a wide margin. Combining English and multilingual encoders further improves ensemble performance over either family alone, despite multilingual models being individually weaker. Prompt-based LLMs, without any task-specific parameter updates, outperform fine-tuned encoders, particularly on minority classes; among open-weight LLMs, parameter count does not predict performance. Enriched input, concatenating the full interviewer turn, improves LLM performance but not that of encoders, an effect that persists with Longformer's extended context window, suggesting the divergence is not attributable to sequence-length capacity alone in our settings. The Clear Reply/Ambivalent boundary remains the dominant failure mode, mirrorin
Threat modelling is the process of identifying potential vulnerabilities in a system and prioritising them. Existing threat modelling tools focus primarily on technical systems and are not as well suited to interpersonal threats. In this paper, we discuss traditional threat modelling methods and their shortcomings, and propose a new threat modelling framework (HARMS) to identify non-technical and human factors harms. We also cover a case study of applying HARMS when it comes to IoT devices such as smart speakers with virtual assistants.
We present our submission to Task 3 (Discourse Relation Classification) of the DISRPT 2025 shared task. Task 3 introduces a unified set of 17 discourse relation labels across 39 corpora in 16 languages and six discourse frameworks, posing significant multilingual and cross-formalism challenges. We first benchmark the task by fine-tuning multilingual BERT-based models (mBERT, XLM-RoBERTa-Base, and XLM-RoBERTa-Large) with two argument-ordering strategies and progressive unfreezing ratios to establish strong baselines. We then evaluate prompt-based large language models (namely Claude Opus 4.0) in zero-shot and few-shot settings to understand how LLMs respond to the newly proposed unified labels. Finally, we introduce HiDAC, a Hierarchical Dual-Adapter Contrastive learning model. Results show that while larger transformer models achieve higher accuracy, the improvements are modest, and that unfreezing the top 75% of encoder layers yields performance comparable to full fine-tuning while training far fewer parameters. Prompt-based models lag significantly behind fine-tuned transformers, and HiDAC achieves the highest overall accuracy (67.5%) while remaining more parameter-efficient than
This paper investigates the relationship between Persuasion Techniques (PTs) and Discourse Relations (DRs) by leveraging Large Language Models (LLMs) and prompt engineering. Since no dataset annotated with both PTs and DRs exists, we took the SemEval 2023 Task 3 dataset labelled with 19 PTs as a starting point and developed LLM-based classifiers to label each instance of the dataset with one of the 22 PDTB 3.0 level-2 DRs. In total, four LLMs were evaluated using 10 different prompts, resulting in 40 unique DR classifiers. Ensemble models using different majority-pooling strategies were used to create 5 silver datasets of instances labelled with both persuasion techniques and level-2 PDTB senses. The silver dataset sizes vary from 1,281 instances to 204 instances, depending on the majority pooling technique used. Statistical analysis of these silver datasets shows that six discourse relations (namely Cause, Purpose, Contrast, Cause+Belief, Concession, and Condition) play a crucial role in persuasive texts, especially in the use of Loaded Language, Exaggeration/Minimisation, Repetition and to cast Doubt. This insight can contribute to detecting online propaganda and misinformation,
Lithium thiophosphate (LPS) is a promising solid electrolyte for next-generation lithium-ion batteries due to its superior energy storage, high ionic conductivity, and low-flammability components. Despite its potential, the high reactivity of LPS with common contaminants such as atmospheric water, preparation solvents, and electrode materials poses significant challenges for commercialization. The lack of understanding regarding the structure, morphology, and chemical behavior of LPS's surface slows down the search for solutions to these issues. Here, we utilize a machine learning interatomic potential to achieve a fundamental, atomistic understanding of the mechanical and chemical properties of the $β$-Li$_3$PS$_4$ surfaces. Employing molecular dynamics simulations, we identify relevant surface complexions formed by surface reconstructions, determine their surface energies and compute the Wulff shape of $β$-LPS. The most stable complexions exhibit properties distinctly different from the bulk, including amorphization, increased density, decreased conductivity and large deformation of the structure building blocks. We demonstrate that these surfaces are not static, but undergo sign
In the exascale computing era, handling and analyzing massive datasets have become extremely challenging. In situ analysis, which processes data during simulation runtime and bypasses costly intermediate disk input and output steps, offers a promising solution. We present libyt (https://github.com/yt-project/libyt), an open-source C library that enables astrophysical simulations to analyze and visualize data in parallel computation with yt or other Python packages. libyt can invoke Python routines automatically or provide interactive entry points via a Python prompt or a Jupyter Notebook. It requires minimal intervention in researchers' workflow, allowing users to reuse job submission scripts and Python routines. We describe libyt's architecture for parallel computing in high-performance computing environments, including its bidirectional connection between simulation codes and Python, and its integration into the Jupyter ecosystem. We detail its methods for reading patch-based adaptive mesh refinement (AMR) simulations and handling in-memory data with minimal overhead, and procedures for yielding data when requested by Python. We describe how libyt maps simulation data to yt front
The dynamics of self-propelled colloidal particles are strongly influenced by their environment through hydrodynamic and, in many cases, chemical interactions. We develop a theoretical framework to describe the motion of confined active particles by combining the Lorentz reciprocal theorem with a Galerkin discretisation of surface fields, yielding an equation of motion that efficiently captures self-propulsion without requiring an explicit solution for the bulk fluid flow. Applying this framework, we identify and characterise the long-time behaviours of a Janus particle near rigid, permeable, and fluid-fluid interfaces, revealing distinct motility regimes, including surface-bound skating, stable hovering, and chemo-hydrodynamic reflection. Our results demonstrate how the solute permeability and the viscosity contrast of the surface influence a particle's dynamics, providing valuable insights into experimentally relevant guidance mechanisms for autophoretic particles. The computational efficiency of our method makes it particularly well-suited for systematic parameter sweeps, offering a powerful tool for mapping the phase space of confined active particles and informing high-fidelit