共找到 20 条结果
We collect and discuss various results on an important family of knots and links called Turk's head knots and links $Th (p,q)$. In the mathematical literature, they also appear under different names such as rosette knots and links or weaving knots and links. Unless being the unknot or the alternating torus links $T(2,q)$, the Turk's head links $Th (p,q)$ are all known to be alternating, fibered, hyperbolic, invertible, non-split, periodic, and prime. The Turk's head links $Th (p,q)$ are also both positive and negative amphichiral if $p$ is chosen to be odd. Moreover, we highlight and present several more results, focusing on Turk's head knots $Th (3,q)$. We finally list several open problems and conjectures for Turk's head knots and links. We conclude with a short appendix on torus knots and links, which might be of independent interest.
Let $Th(4,2n+1)=\widehat{β_{2n+1}}$ be the four-strand Turk's head knot, where $β_{2n+1}=σ_1σ_2^{-1}σ_3)^{2n+1}$. In our earlier work the Alexander polynomial of this family was reduced, after the substitution $t=-z$, to the factorization \[ A_{2n+1}(z)=(1+z+\cdots+z^{2n})D_n(z)^2, \qquad D_n(z)=\prod_{r=1}^n \left(z^2+4\sin^2\frac{πr}{2n+1}\,z+1\right). \] The remaining difficulty was to prove log-concavity of the coefficient sequence of $D_n(z)$. In this work we prove that the coefficient sequence of $D_n(z)$ is log-concave. The main new ingredient is a four-block smoothing theorem for products of reciprocal quartics $(1+az+z^2)(1+bz+z^2), \: 0\le a,b\le 4,\: a+b\ge 4.$ The smoothing theorem is proved by a finite exact positivity certificate using only integer arithmetic. Combining this smoothing theorem with the trigonometric pairing $r\leftrightarrow n+1-r$ proves that $D_n(z)$ is log-concave for all $n\ge1.$ It follows that the absolute values of the coefficients of the Alexander polynomial of $Th(4,2n+1)$ form a trapezoidal sequence. Thus Fox's trapezoidal conjecture holds for the entire family of four-strand Turk's head knots.
In this paper, we introduce the resources that we developed for Turkish dependency parsing, which include a novel manually annotated treebank (BOUN Treebank), along with the guidelines we adopted, and a new annotation tool (BoAT). The manual annotation process we employed was shaped and implemented by a team of four linguists and five Natural Language Processing (NLP) specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies (UD) framework as well as our recent efforts for unifying the Turkish UD treebanks through manual re-annotation. To the best of our knowledge, BOUN Treebank is the largest Turkish treebank. It contains a total of 9,761 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a state-of-the-art dependency parser obtained over the BOUN Treebank as well as two other treebanks in Turkish. Our results demonstrate that the unification of the Turkish annotation scheme and the introduction of a more comprehensive treebank lead to improved performance with regard to dependency parsin
Building machine translation (MT) test sets is a relatively expensive task. As MT becomes increasingly desired for more and more language pairs and more and more domains, it becomes necessary to build test sets for each case. In this paper, we investigate using Amazon's Mechanical Turk (MTurk) to make MT test sets cheaply. We find that MTurk can be used to make test sets much cheaper than professionally-produced test sets. More importantly, in experiments with multiple MT systems, we find that the MTurk-produced test sets yield essentially the same conclusions regarding system performance as the professionally-produced test sets yield.
In this study, we investigate the attentiveness exhibited by participants sourced through Amazon Mechanical Turk (MTurk), thereby discovering a significant level of inattentiveness amongst the platform's top crowd workers (those classified as 'Master', with an 'Approval Rate' of 98% or more, and a 'Number of HITS approved' value of 1,000 or more). A total of 564 individuals from the United States participated in our experiment. They were asked to read a vignette outlining one of four hypothetical technology products and then complete a related survey. Three forms of attention check (logic, honesty, and time) were used to assess attentiveness. Through this experiment we determined that a total of 126 (22.3%) participants failed at least one of the three forms of attention check, with most (94) failing the honesty check - followed by the logic check (31), and the time check (27). Thus, we established that significant levels of inattentiveness exist even among the most elite MTurk workers. The study concludes by reaffirming the need for multiple forms of carefully crafted attention checks, irrespective of whether participant quality is presumed to be high according to MTurk criteria s
In this paper, we present our system design for conducting longitudinal daily-task studies with the same workers throughout on Amazon Mechanical Turk. We implement this system to conduct a study into touch dynamics, and present our experiences, challenges and lessons learned from doing so. Study participants installed our application on their Apple iOS phones and completed two tasks daily for 31 days. Each task involves performing a series of scrolling or swiping gestures, from which behavioral information such as movement speed or pressure is extracted. The completion of the daily tasks did not require extra interaction with the Mechanical Turk platform, yet paid workers through it. This differs somewhat from the typical rapid completion of one-off tasks that workers are used to on Amazon Mechanical Turk. This atypical use of the platform prompted us to evaluate aspects related to long-term worker retention and engagement over the study period, in particular the impacts of payment schedule (amount and structure over time) and reminder notifications. We also investigate the specific concern of reconciling informed consent with workers' desire to complete tasks quickly. We find that
Large-Language Models like GPT-3 have the potential to enable HCI designers and researchers to create more human-like and helpful chatbots for specific applications. But evaluating the feasibility of these chatbots and designing prompts that optimize GPT-3 for a specific task is challenging. We present a case study in tackling these questions, applying GPT-3 to a brief 5-minute chatbot that anyone can talk to better manage their mood. We report a randomized factorial experiment with 945 participants on Mechanical Turk that tests three dimensions of prompt design to initialize the chatbot (identity, intent, and behaviour), and present both quantitative and qualitative analyses of conversations and user perceptions of the chatbot. We hope other HCI designers and researchers can build on this case study, for other applications of GPT-3 based chatbots to specific tasks, and build on and extend the methods we use for prompt design, and evaluation of the prompt design.
Internal HITs on Mechanical Turk can be programmatically restrictive, and as a result, many requesters turn to using external HITs as a more flexible alternative. However, creating such HITs can be redundant and time-consuming. We present MmmTurkey, a framework that enables researchers to not only quickly create and manage external HITs, but more significantly also capture and record detailed worker behavioral data characterizing how each worker completes a given task.
Supervised machine learning provides the learner with a set of input-output examples of the target task. Humans, however, can also learn to perform new tasks from instructions in natural language. Can machines learn to understand instructions as well? We present the Turking Test, which examines a model's ability to follow natural language instructions of varying complexity. These range from simple tasks, like retrieving the nth word of a sentence, to ones that require creativity, such as generating examples for SNLI and SQuAD in place of human intelligence workers ("turkers"). Despite our lenient evaluation methodology, we observe that a large pretrained language model performs poorly across all tasks. Analyzing the model's error patterns reveals that the model tends to ignore explicit instructions and often generates outputs that cannot be construed as an attempt to solve the task. While it is not yet clear whether instruction understanding can be captured by traditional language models, the sheer expressivity of instruction understanding makes it an appealing alternative to the rising few-shot inference paradigm.
We investigate the feasibility of obtaining highly trustworthy results using crowdsourcing on complex engineering tasks. Crowdsourcing is increasingly seen as a potentially powerful way of increasing the supply of labor for solving society's problems. While applications in domains such as citizen-science, citizen-journalism or knowledge organization (e.g., Wikipedia) have seen many successful applications, there have been fewer applications focused on solving engineering problems, especially those involving complex tasks. This may be in part because of concerns that low quality input into engineering analysis and design could result in failed structures leading to loss of life. We compared the quality of work of the anonymous workers of Amazon Mechanical Turk (AMT), an online crowdsourcing service, with the quality of work of expert engineers in solving the complex engineering task of evaluating virtual wind tunnel data graphs. On this representative complex engineering task, our results showed that there was little difference between expert engineers and crowdworkers in the quality of their work and explained reasons for these results. Along with showing that crowdworkers are effe
A growing number of people are working as part of on-line crowd work, which has been characterized by its low wages; yet, we know little about wage distribution and causes of low/high earnings. We recorded 2,676 workers performing 3.8 million tasks on Amazon Mechanical Turk. Our task-level analysis revealed that workers earned a median hourly wage of only ~\$2/h, and only 4% earned more than \$7.25/h. The average requester pays more than \$11/h, although lower-paying requesters post much more work. Our wage calculations are influenced by how unpaid work is included in our wage calculations, e.g., time spent searching for tasks, working on tasks that are rejected, and working on tasks that are ultimately not submitted. We further explore the characteristics of tasks and working patterns that yield higher hourly wages. Our analysis informs future platform design and worker tools to create a more positive future for crowd work.
Agreement attraction errors, in which a verb erroneously agrees with an intervening noun rather than its grammatical head, are amplified by morphological syncretism in some languages (English, German, Russian) but not others (Turkish, Armenian), a cross-linguistic pattern without a principled account. We use surprisal and attention entropy from large language models as processing proxies to investigate this variation across four languages. LLM-derived measures replicate behavioral findings in English and German (syncretism modulates attraction), align with Turkish null results (no modulation), and partially capture Russian patterns. We discuss further directions for better understanding why syncretism affects agreement attraction differently across languages.
Recent text generation research has increasingly focused on open-ended domains such as story and poetry generation. Because models built for such tasks are difficult to evaluate automatically, most researchers in the space justify their modeling choices by collecting crowdsourced human judgments of text quality (e.g., Likert scores of coherence or grammaticality) from Amazon Mechanical Turk (AMT). In this paper, we first conduct a survey of 45 open-ended text generation papers and find that the vast majority of them fail to report crucial details about their AMT tasks, hindering reproducibility. We then run a series of story evaluation experiments with both AMT workers and English teachers and discover that even with strict qualification filters, AMT workers (unlike teachers) fail to distinguish between model-generated text and human-generated references. We show that AMT worker judgments improve when they are shown model-generated output alongside human-generated references, which enables the workers to better calibrate their ratings. Finally, interviews with the English teachers provide deeper insights into the challenges of the evaluation process, particularly when rating model-
The m,n Turk's Head Knot, THK(m,n), is an "alternating (m,n) torus knot." We prove the Harary-Kauffman conjecture for all THK(m,n) except for the case where m \geq 5 is odd and n \geq 3 is relatively prime to m. We also give evidence in support of the conjecture in that case. Our proof rests on the observation that none of these knots have prime determinant except for THK(m,2) when P_m is a Pell prime.
Online labor markets give people in poor countries direct access to buyers in rich countries. Economic theory and empirical evidence strongly suggest that this kind of access improves human welfare. However, critics claim that abuses are endemic in these markets and that employers exploit unprotected, vulnerable workers. I investigate part of this claim using a randomized, paired survey in which I ask workers in an online labor market (Amazon Mechanical Turk) how they perceive online employers and employers in their host country in terms of honesty and fairness. I find that, on average, workers perceive the collection of online employers as slightly fairer and more honest than offline employers, though the effect is not significant. Views are more polarized in the online employer case, with more respondents having very positive views of the online collection of employers.
The minimum number of colors is a challenging knot invariant since, by definition, its calculation requires taking the minimum over infinitely many minima. In this article we estimate and in some cases calculate the minimum number of colors for the Turk's head knots on three strands.
We compute the Kauffman bracket polynomial of the three-lead Turk's head, the chain sinnet and the figure-eight chain shadow diagrams. Each of these knots can in fact be constructed by repeatedly concatenating the same 3-tangle, respectively, then taking the closure. The bracket is then evaluated by expressing the state diagrams of the concerned 3-tangle by means of the Kauffman monoid diagram's elements.
A game of chess is a stream: a time-ordered sequence of moves, each carrying an engine evaluation, a measure of accuracy, a measure of position complexity, and a clock reading. We model a game as a multivariate path and apply the signature transform of rough-path theory to obtain a reparametrization-invariant, graded feature set that records the order and interaction of in-game events without a parametric likelihood. We show that a player's law of play is identifiable from the expected signature up to tree-like equivalence, construct a signature-kernel two-sample test on path space, and recast cheating detection as an anytime-valid sequential test: a signature conformance score becomes an e-process whose error is controlled for every sample size at once by Ville's inequality, with fluctuations calibrated on the moderate-deviation scale. The discriminating information lives in the signature's Levy areas, which measure whether accuracy rises precisely when positions become hard--the fingerprint of engine assistance that aggregate match-rate statistics discard. In a controlled study the test holds exact type-I control and detection power rises from negligible for subtle assistance to
Two clinical AI systems can score nearly identically on coverage-based rubrics yet behave radically differently when their patient inputs change: one updates its recommendations to match the new clinical signal, while the other produces the same output regardless. We introduce the Causal Sensitivity Score (CSS), a pre-registered interventional metric that mutates oncology tumor-board cases along five clinically meaningful dimensions - biomarker flips, prior-treatment failures, biomarker removals, surgery-status changes, and stage perturbations - and scores whether each model updates its recommendations in the pre-registered correct direction using a {0, 0.5, 1.0} scale. Benchmarked against the Consensus Match Score (CMS), a coverage-based weighted recall metric, six frontier models from three labs evaluated in single-shot inference across 224 cases rank in nearly opposite orders: all six models change rank, the CMS-worst model becomes CSS-best, and one upper-mid CMS model ranks last on CSS. We further surface a universal safety blind spot: every frontier model fails on surgery-status interventions (at most 17.2% CSS on Family D), a finding CMS does not expose. The metric also trans
In the exascale computing era, handling and analyzing massive datasets have become extremely challenging. In situ analysis, which processes data during simulation runtime and bypasses costly intermediate disk input and output steps, offers a promising solution. We present libyt (https://github.com/yt-project/libyt), an open-source C library that enables astrophysical simulations to analyze and visualize data in parallel computation with yt or other Python packages. libyt can invoke Python routines automatically or provide interactive entry points via a Python prompt or a Jupyter Notebook. It requires minimal intervention in researchers' workflow, allowing users to reuse job submission scripts and Python routines. We describe libyt's architecture for parallel computing in high-performance computing environments, including its bidirectional connection between simulation codes and Python, and its integration into the Jupyter ecosystem. We detail its methods for reading patch-based adaptive mesh refinement (AMR) simulations and handling in-memory data with minimal overhead, and procedures for yielding data when requested by Python. We describe how libyt maps simulation data to yt front