共找到 20 条结果
This article focuses on what Luis von Ahn called the "twofer," that is, a single solution that elegantly addresses two problems on a large scale. We describe two of von Ahn's creations: reCAPTCHA, which validates a human web presence while also digitizing hard-to-read words, and Duolingo, which teaches new languages while translating the web. We then consider how this approach can be applied to medical education. Embedding Wikipedia-editing into educational settings is one such solution that could both improve the quality of health information available to the public while enhancing the learning of future health professionals.
CAPTCHAs (Completely Automated Public Turing test to tell Computers and Humans Apart) are widespread security measures on the World Wide Web that prevent automated programs from abusing online services. They do so by asking humans to perform a task that computers cannot yet perform, such as deciphering distorted characters. Our research explored whether such human effort can be channeled into a useful purpose: helping to digitize old printed material by asking users to decipher scanned words from books that computerized optical character recognition failed to recognize. We showed that this method can transcribe text with a word accuracy exceeding 99%, matching the guarantee of professional human transcribers. Our apparatus is deployed in more than 40,000 Web sites and has transcribed over 440 million words.
Related: Google Cloud fraud defense, the next evolution of reCAPTCHA - https://news。ycombinator。id=48039362also: Google Cloud Fraud Defence is just WEI repackaged - https://news
Recent growth of online research has been accompanied by an increase in reports of fraudulent participants, which can significantly comprise research validity. Drawing from our experience using Qualtrics with open recruitment, existing literature, and emerging studies in eating disorders (ED), we outline the risk and provide simple, practical recommendations for preventing, detecting, and managing fraudulent participants in online ED research. Over the conduct of a three-round Delphi consensus study with 138 English-speaking individuals aged 18 and older, we were inundated with fraudulent sign-ups between July and August 2024, despite implementing multiple fraud prevention strategies. In response, we introduced additional fraud mitigation strategies and established a three-step procedure for identifying and managing fraudulent participants. The additional fraud mitigation measures, including a second reCAPTCHA, a duplicate question for consistency checks, and modified attention check questions, potentially aided in preventing further fraudulent sign-ups. Our procedure involving manual comprehensive review of all incoming survey data and checks against a fraudulent participants' profile enabled us to identify and withdraw suspected or likely fraudulent participants. With increasing fraudulent participation rates and rapidly advancing technological advancements such as artificial intelligence, all online studies are at risk and researchers need to be proactive in their use of antifraud practices to safeguard online research. Our practical recommendations can assist future researchers in managing fraudulent participants.
Although screening for bots and/or using costly panel services for recruiting participants online has become increasingly necessary, such efforts may no longer ensure the validity of data collected online. Newly released agentic AI models, such as the ChatGPT agent, have the ability to complete surveys relatively indistinguishably from humans. The current paper outlines efforts that the body image, weight, and eating disorders (BIWED) lab has undergone to screen for and detect AI data completion reliably and validly. There are some tasks that ChatGPT agents do not perform identically to human responders (e.g., video tasks, online games, open-ended responses, and reCAPTCHA). We present the methods that have been the most successful at identifying AI agent survey completion. We discuss potential solutions, field-wide concerns, and future directions for the field more broadly.
To evaluate state-wide nutrition policies, valid tools are required to gather sufficient sample sizes. Remote data collection, including web-based dietary assessments, offers convenience for participants and researchers and enables faster and more diverse recruitment. However, it presents challenges, including risk of bots compromising data integrity. This study describes the technical survey design of an ongoing longitudinal study, which is evaluating a state-wide Supplemental Nutrition Assistance Program (SNAP) incentive program, discusses strategies to prevent and identify bots, duplicates, fraudulent entries, and implausible data, and provides recommendations to improve future public health nutrition research. From May to September 2023, SNAP participants from Rhode Island and Connecticut were recruited to complete an online food frequency questionnaire (FFQ) and a demographic survey. Given the large sample and online format, our interdisciplinary team designed the technical backend to optimize participants' convenience while ensuring data quality through an automated system that assessed FFQ responses. To prevent bots and duplicates, we created duplicate application programming interfaces (API), randomly called participants, and evaluated Completely Automated Public Turing Test to Tell Computers and Humans Apart (reCAPTCHA), geotags, and Internet Protocol (IP) addresses. Using a combination of text blasts and in-person recruitment, we enrolled 1367 participants, with text blasts proving the most effective strategy (∼60% of participants). Midway through recruitment, we identified 544 potential bots that completed the screener, with duplicate IP addresses and geotags from outside the recruitment area serving as strong indicators of bot activity. At baseline, 112 participants failed FFQ data quality checks, prompting follow-up by research assistants. Our automated duplicate and FFQ APIs saved countless hours of staff time. Remote data collection tools were critical for meeting recruitment goals and ensuring our data authenticity. A combination of strategies is necessary to effectively mitigate against bots and ensure plausible responses. Widely available, built-in tools (e.g., reCAPTCHA) are helpful but are insufficient alone. Customized solutions like our automated systems may be critical for future researchers to maintain data integrity.
Background/Objectives: Robust, state-level LGBTQ+ health surveillance is scarce in Kentucky, limiting evidence-based healthcare planning and policy. We aimed to evaluate the feasibility and early public-health utility of a community-partnered annual survey and compare selected mental health stressors between Kentucky and non-Kentucky respondents. Methods: We conducted a cross-sectional online survey (13 April-15 July 2024) developed with a statewide LGBTQ+ nonprofit. Recruitment occurred via organizational channels and community events. A content warning preceded the survey, which was administered via Qualtrics. Data quality was screened using reCAPTCHA. We assessed feasibility metrics including recruitment and completion rates. Mental health stressors were captured with a six-item scale. Group differences were estimated with Welch's t-tests. Results: Of 3852 survey starts, 1559 were retained as analyzable completes (completion rate: 40.47%), with 78.7% residing in-state. Initial analysis revealed a significant divergence in mental health patterns: while Kentucky participants reported lower stress regarding their personal mental health, they reported significantly higher stress stemming from socio-political issues like homophobia and transphobia compared to out-of-state respondents. Conclusions: An annual, community-partnered surveillance platform is a feasible strategy for generating actionable mental health signals relevant to healthcare. These findings will inform targeted outreach and guide health system partnerships to enhance LGBTQ+-affirming care in Kentucky.
Threats to data integrity have always existed in online human subjects research, but it appears these threats have become more common and more advanced in recent years. Researchers have proposed various techniques to address satisficers, repeat participants, bots, and fraudulent participants; yet, no synthesis of this literature has been conducted. This study undertakes a scoping review of recent methods and ethical considerations for addressing threats to data integrity in online research. A PubMed search was used to identify 90 articles published from 2020 to 2024 that were written in English, that discussed online human subjects research, and that had at least one paragraph dedicated to discussing threats to online data integrity. We cataloged 16 types of techniques for addressing threats to online data integrity. Techniques to authenticate personal information (eg, videoconferencing and mailing incentives to a physical address) appear to be very effective at deterring or identifying fraudulent participants. Yet such techniques also come with ethical considerations, including participant burden and increased threats to privacy. Other techniques, such as Completely Automated Public Turing test to tell Computers and Humans Apart (reCAPTCHA; Google LLC), scores, and checking IP addresses, although very common, were also deemed by several researchers as no longer sufficient protections against advanced threats to data integrity. Overall, this review demonstrates the importance of shifting online research protocols as bots and fraudulent participants become more sophisticated.
Historically, recruiting research participants through social media facilitated access to people who use opioids, capturing a range of drug use behaviors. The current rapidly changing online landscape, however, casts doubt on social media's continued usefulness for study recruitment. In this viewpoint paper, we assessed social media recruitment for people who use opioids and described challenges and potential solutions for effective recruitment. As part of a study on barriers to harm reduction health services, we recruited people who use opioids in New York City to complete a REDCap (Research Electronic Data Capture; Vanderbilt University) internet-based survey using Meta (Facebook and Instagram), X (formerly known as Twitter), Reddit, and Discord. Eligible participants must have reported using opioids (heroin, prescription opioids, or fentanyl) for nonprescription purposes in the past 90 days and live or work in New York City. Data collection took place from August 2023 to November 2023. Including study purpose, compensation, and inclusion criteria caused Meta's social media platforms and X to flag our ads as "discriminatory" and "spreading false information." Listing incentives increased bot traffic across all platforms despite bot prevention activities (eg, reCAPTCHA and counting items in an image). We instituted a rigorous post hoc data cleaning protocol (eg, investigating duplicate IP addresses, participants reporting use of a fictitious drug, invalid ZIP codes, and improbable drug use behaviors) to identify bot submissions and repeat participants. Participants received a US $20 gift card if still deemed eligible after post hoc data inspection. There were 2560 submissions, 93.2% (n=2387) of which were determined to be from bots or malicious responders. Of these, 23.9% (n=571) showed evidence of a duplicate IP or email address, 45.9% (n=1095) reported consuming a fictitious drug, 15.8% (n=378) provided an invalid ZIP code, and 9.4% (n=225) reported improbable drug use behaviors. The majority of responses deemed legitimate (n=173) were collected from Meta (n=79, 45.7%) and Reddit (n=48, 27.8%). X's ads were the most expensive (US $1.96/click) and yielded the fewest participants (3 completed surveys). Social media recruitment of hidden populations is challenging but not impossible. Rigorous data collection protocols and post hoc data inspection are necessary to ensure the validity of findings. These methods may counter previous best practices for researching stigmatized behaviors.
Bots are automated software programs that pose an ongoing threat to psychological research by invading online research studies and their increasing sophistication over time. Despite this growing concern, research in this area has been limited to bot detection in existing data sets following an unexpected encounter with bots. The present three-condition, quasi-experimental study aimed to address this gap in the literature by examining the efficacy of three types of bot screening tools across three incentive conditions ($0, $1, and $5). Data were collected from 444 respondents via Twitter advertisements between July and September 2021. The efficacy of five task-based (i.e., anagrams, visual search), question-based (i.e., attention checks, ReCAPTCHA), and data-based (i.e., consistency, metadata) tools was examined with Bonferroni-adjusted univariate and multivariate logistic regression analyses. In general, study results suggest that bot screening tools function similarly for participants recruited across incentive conditions. Moreover, the present analyses revealed heterogeneity in the efficacy of bot screening tool subtypes. Notably, the present results suggest that the least effective bot screening tools were among the most commonly used tools in existing literature (e.g., ReCAPTCHA). In sum, the study findings revealed highly effective and highly ineffective bot screening tools. Study design and data integrity recommendations for researchers are provided. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
Citizen science is often celebrated. We interrogate this position through exploration of socio-technoscientific phenomena that mirror citizen science yet are disaligned with its ideals. We term this 'Dark Citizen Science'. We identify five conceptual dimensions of citizen science - purpose, process, perceptibility, power and public effect. Dark citizen science mirrors traditional citizen science in purpose and process but diverges in perceptibility, power and public effect. We compare two Internet-based categorisation processes, Citizen Science project Galaxy Zoo and Dark Citizen Science project Google's reCAPTCHA. We highlight that the reader has, likely unknowingly, provided unpaid technoscientific labour to Google. We apply insights from our analysis of dark citizen science to traditional citizen science. Linking citizen science as practice and normative democratic ideal ignores how some science-citizen configurations actively pit practice against ideal. Further, failure to fully consider the implications of citizen science for science and society allows exploitative elements of citizen science to evade the sociological gaze.
Increasingly, studies use social media to recruit, enroll, and collect data from participants. This introduces a threat to data integrity: efforts to produce fraudulent data to receive participant compensation, e.g., gift cards. MOMENT is an online symptom-monitoring and self-care study that implemented safeguards to protect data integrity. Facebook, Twitter, and patient organizations were used to recruit participants with chronic health conditions in four countries (USA, Italy, The Netherlands, Sweden). Links to the REDCap baseline survey were posted to social media accounts. The initial study launch, where participants completed the baseline survey and were automatically re-directed to the LifeData ecological momentary assessment app, was overwhelmed with fraudulent responses. In response, safeguards (e.g., reCAPTCHA, attention checks) were implemented and baseline data was manually inspected prior to LifeData enrollment. The initial launch resulted in 411 responses in 48 hours, 265 of which (64.5%) successfully registered for the LifeData app and were considered enrolled. Ninety-nine percent of these were determined to be fraudulent. Following implementation of safeguards, the re-launch yielded 147 completed baselines in 3.5 months. Eighteen cases (12.2%) were found fraudulent and not invited to enroll. Most fraudulent cases in the re-launch (15 of 18) were identified by a single attention check question. In total, 96.1% of fraudulent responses were to the USA-based survey. Data integrity safeguards are necessary for research studies that recruit online and should be reported in manuscripts. Three safeguard strategies were effective in preventing and removing most of the fraudulent data in the MOMENT study. Additional strategies were also used and may be necessary in other contexts.
Fraudulent participation is an escalating concern for online clinical trials and research studies and can have a significant negative impact on findings. We aim to shed light on the risk and to provide practical recommendations for detecting and managing such instances. The FREED-Mobile (FREED-M) study is an online, randomized controlled feasibility trial to assess a digital early intervention for young people (aged 16-25) in England or Wales with eating problems. The trial involved baseline (week 0), post-intervention (week 4), and follow-up (week 12) surveys, alongside weekly modules provided over 4 weeks on the study website. Study completers were compensated with £20 shopping vouchers. Despite the complexity of the trial design, two instances of fraudulent sign-ups occurred in January and March 2023. To counter this, we developed a "fraudulent participants protocol" following internal investigations and discussions with collaborators. The implementation of prevention measures such as reCAPTCHA updates, IP address review, and changes in reimbursement effectively halted further fraudulent sign-ups. Our protocol facilitated the systematic identification and withdrawal of suspected or clear fraudsters and was demonstrably robust at distinguishing between fraudsters and genuine responders. All remote, online trials or studies are at risk of fraudulent participation. Drawing from our experience and existing literature, we offer practical recommendations for researchers considering online recruitment and data collection. Vigilance and the integration of deterrents, and data quality checks into the study design from the outset are advised to safeguard research integrity. Fraudulent participation in digital research can have asignificant impact on research findings, potentially leading to biased resultsand misinformed decisions. We developed an effective protocol for theprevention, identification, and management of fraudulent participants. Bysharing our insights and recommendations, we hope to raise awareness of thisissue and provide other researchers with the knowledge and strategies necessaryto safeguard research integrity moving forward.
The standard approach for detecting and preventing bots from doing harm online involves CAPTCHAs. However, recent AI research, including our own in this manuscript, suggests that bots can complete many common CAPTCHAs with ease. The most effective methodology for identifying potential bots involves completing image-processing, causal-reasoning based, free-response questions that are hand coded by human analysts. However, this approach is labor intensive, slow, and inefficient. Moreover, with the advent of Generative AI such as GPT and Bard, it may soon be obsolete. Here, we develop and test various automated, bot-screening questions, grounded in psychological research, to serve as a proactive screen against bots. Utilizing hand coded free-response questions in the naturalistic domain of MTurkers recruited for a Qualtrics survey, we identify 18.9% of our sample to be potential bots, whereas Google's reCAPTCHA V3 identified only 1.7% to be potential bots. We then look at the performance of these potential bots on our novel bot-screeners, each of which has different strengths and weaknesses but all of which outperform CAPTCHAs.
As technology continues to shape the landscape of health research, the utilization of web-based surveys for collecting sexual health information among adolescents and young adults has become increasingly prevalent. However, this shift toward digital platforms brings forth a new set of challenges, particularly the infiltration of automated bots that can compromise data integrity and the reliability of survey results. We aimed to outline the data verification process used in our study design, which employed survey programming and data cleaning protocols. A 26-item survey was developed and programmed with several data integrity functions, including reCAPTCHA scores, RelevantID fraud and duplicate scores, verification of IP addresses, and honeypot questions. Participants aged 15-24 years were recruited via social media advertisements over 7 weeks and received a US $15 incentive after survey completion. Data verification occurred through a 2-part cleaning process, which removed responses that were incomplete, flagged as spam by Qualtrics, or from duplicate IP addresses, or those that did not meet the inclusion criteria. Final comparisons of reported age with date of birth and reported state with state inclusion criteria were performed. Participants who completed the study survey were linked to a second survey to receive their incentive. Responses without first and last names and full addresses were removed, as were those with duplicate IP addresses or the exact same longitude and latitude coordinates. Finally, IP addresses used to complete both surveys were compared, and consistent responses were eligible for an incentive. Over 7 weeks, online advertisements for a web-based survey reached 1.4 million social media users. Of the 20,585 survey responses received, 4589 (22.3%) were verified. Incentives were sent to 462 participants; of these, 14 responses were duplicates and 3 contained discrepancies, resulting in a final sample of 445 responses. Confidential web-based surveys are an appealing method for reaching populations-particularly adolescents and young adults, who may be reluctant to disclose sensitive information to family, friends, or clinical providers. Web-based surveys are a useful tool for researchers targeting hard-to-reach populations due to the difficulty in obtaining a representative sample. However, researchers face the ongoing threat of bots and fraudulent participants in a technology-driven world, necessitating the adoption of evolving bot detection software and tailored protocols for data collection in unique contexts.
Using the internet to recruit participants into research trials is effective but can attract high numbers of fraudulent attempts, particularly via social media. We drew upon the previous literature to rigorously identify and remove fraudulent attempts when recruiting rural residents into a community-based health improvement intervention trial. Our objectives herein were to describe our dynamic process for identifying fraudulent attempts, quantify the fraudulent attempts identified by each action, and make recommendations for minimizing fraudulent responses. The analysis was descriptive. Validation methods occurred in four phases: (1) recruitment and screening for eligibility and validation; (2) investigative periods requiring greater scrutiny; (3) baseline data cleaning; and (4) validation during the first annual follow-up survey. A total of 19,665 attempts to enroll were recorded, 74.4% of which were considered fraudulent. Automated checks for IP addresses outside study areas (22.1%) and reCAPTCHA screening (10.1%) efficiently identified many fraudulent attempts. Active investigative procedures identified the most fraudulent cases (33.7%) but required time-consuming interaction between researchers and individuals attempting to enroll. Some automated validation was overly zealous: 32.1% of all consented individuals who provided an invalid birthdate at follow-up were actively contacted by researchers and could verify or correct their birthdate. We anticipate fraudulent responses will grow increasingly nuanced and adaptive given recent advances in generative artificial intelligence. Researchers will need to balance automated and active validation techniques adapted to the topic of interest, population being recruited, and acceptable participant burden.
Internet-based surveys are increasingly used for health research because they offer several advantages including greater geographic reach, increased participant anonymity, and reduced financial/time burden. However, there is also a need to address inherent challenges, such as the likelihood of fraudulent responses and greater difficulty in determining eligibility. We conducted an online nationwide survey of 18-29 year olds living with HIV in the United States, to assess willingness to participate in HIV cure research. To ensure that respondents met age and HIV serostatus inclusion criteria, we instituted screening procedures to identify ineligible respondents using tools that were built into the survey platform (eg, reCAPTCHA, geolocation) and required documentation of age and serostatus before providing access to the incentivized study survey. Of 1308 eligibility surveys, 569 were incomplete or ineligible because of reported age or serostatus. Of the remaining 739 potentially eligible respondents, we determined that 413 were from fraudulent, bot, or ineligible respondents. We sent individual study survey links to 326 participants (25% of all eligibility survey respondents) whose eligibility was reviewed and confirmed by our study team. Our multicomponent strategy was effective for identifying ineligible and fraudulent responses to our eligibility survey, allowing us to send the study survey link only to those whose eligibility we were able to confirm. Our findings suggest that proactive fraud prevention can be built into the screening phase of the study to prevent wasted resources related to data cleaning and unretrievable study incentives and ultimately improve the quality of data.
In a major breakthrough, scientists have experimentally confirmed a universal growth law in two dimensions using a quantum system of fleeting light–matter particles。 The finding strengthens the idea that wildly different processes—from crystals to living systems—may all follow the same hidden rules
Scientists are using sunlight to turn plastic waste into clean fuels like hydrogen, offering a breakthrough solution to both pollution and energy challenges。 While still in development, the approach could transform trash into a valuable resource for a low-carbon future
For decades, psychologists have debated whether the human mind can be explained by one unified theory or must be broken into separate parts like memory and attention。 A recent AI model called Centaur seemed to offer a breakthrough, claiming it could mimic human thinking across 160 different cognitive tasks。 But new research is challenging that bold