搜索 — ResearchTracker

Detecting toxic content using language models is important but challenging. While large language models (LLMs) have demonstrated strong performance in understanding Chinese, recent studies show that simple character substitutions in toxic Chinese text can easily confuse the state-of-the-art (SOTA) LLMs. In this paper, we highlight the multimodal nature of Chinese language as a key challenge for deploying LLMs in toxic Chinese detection. First, we propose a taxonomy of 3 perturbation strategies and 8 specific approaches in toxic Chinese content. Then, we curate a dataset based on this taxonomy, and benchmark 9 SOTA LLMs (from both the US and China) to assess if they can detect perturbed toxic Chinese text. Additionally, we explore cost-effective enhancement solutions like in-context learning (ICL) and supervised fine-tuning (SFT). Our results reveal two important findings. (1) LLMs are less capable of detecting perturbed multimodal Chinese toxic contents. (2) ICL or SFT with a small number of perturbed examples may cause the LLMs "overcorrect'': misidentify many normal Chinese contents as toxic.

CARE: Extracting Experimental Findings From Clinical Literature

arXiv2023-11-16作者：Aakanksha Naik, Bailey Kuehl, Erin Bransom

Extracting fine-grained experimental findings from literature can provide dramatic utility for scientific applications. Prior work has developed annotation schemas and datasets for limited aspects of this problem, failing to capture the real-world complexity and nuance required. Focusing on biomedicine, this work presents CARE -- a new IE dataset for the task of extracting clinical findings. We develop a new annotation schema capturing fine-grained findings as n-ary relations between entities and attributes, which unifies phenomena challenging for current IE systems such as discontinuous entity spans, nested relations, variable arity n-ary relations and numeric results in a single schema. We collect extensive annotations for 700 abstracts from two sources: clinical trials and case reports. We also demonstrate the generalizability of our schema to the computer science and materials science domains. We benchmark state-of-the-art IE systems on CARE, showing that even models such as GPT4 struggle. We release our resources to advance research on extracting and aggregating literature findings.

搜索结果：Findings

Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings

CARE: Extracting Experimental Findings From Clinical Literature

Potentials of Green Coding -- Findings and Recommendations for Industry, Education and Science -- Extended Paper

Consistent Intradecadal/Interdecadal Oscillations in the Surface Geomagnetic Observations and in the $Δ$LOD: New Findings and Unresolved Problems

Word Graph Guided Summarization for Radiology Findings

Learning Visual-Semantic Embeddings for Reporting Abnormal Findings on Chest X-rays

Self-Emotion-Mediated Exploration in Artificial Intelligence Mirrors: Findings from Cognitive Psychology

MedLayBench-V: A Large-Scale Benchmark for Expert-Lay Semantic Alignment in Medical Vision Language Models

Can LLMs Understand the Impact of Trauma? Costs and Benefits of LLMs Coding the Interviews of Firearm Violence Survivors

Fin-Bias: Comprehensive Evaluation for LLM Decision-Making under human bias in Finance Domain

One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them

From n-gram to Attention: How Model Architectures Learn and Propagate Bias in Language Modeling

CSTRL: Context-Driven Sequential Transfer Learning for Abstractive Radiology Report Summarization

You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations

Explaining Puzzle Solutions in Natural Language: An Exploratory Study on 6x6 Sudoku

Breaking Token Into Concepts: Exploring Extreme Compression in Token Representation Via Compositional Shared Semantics

ACCESS DENIED INC: The First Benchmark Environment for Sensitivity Awareness

Towards Better Understanding of Program-of-Thought Reasoning in Cross-Lingual and Multilingual Environments

Robust Bias Detection in MLMs and its Application to Human Trait Ratings

Measuring Sycophancy of Language Models in Multi-turn Dialogues