搜索结果：Turkish neurosurgery

共找到 20 条结果

高级筛选 ▾

OCRTurk: A Comprehensive OCR Benchmark for Turkish

arXiv

Document parsing is now widely used in applications, such as large-scale document digitization, retrieval-augmented generation, and domain-specific pipelines in healthcare and education. Benchmarking these models is crucial for assessing their reliability and practical robustness. Existing benchmarks mostly target high-resource languages and provide limited coverage for low-resource settings, such as Turkish. Moreover, existing studies on Turkish document parsing lack a standardized benchmark that reflects real-world scenarios and document diversity. To address this gap, we introduce OCRTurk, a Turkish document parsing benchmark covering multiple layout elements and document categories at three difficulty levels. OCRTurk consists of 180 Turkish documents drawn from academic articles, theses, slide decks, and non-academic articles. We evaluate seven OCR models on OCRTurk using element-wise metrics. Across difficulty levels, PaddleOCR achieves the strongest overall results, leading most element-wise metrics except figures and attaining high Normalized Edit Distance scores in easy, medium, and hard subsets. We also observe performance variation by document type. Models perform well on

HUKUKBERT: Domain-Specific Language Model for Turkish Law

arXiv2026-04-06作者：Mehmet Utku Öztürk, Tansu Türkoğlu, Buse Buz-Yalug

Recent advances in natural language processing (NLP) have increasingly enabled LegalTech applications, yet existing studies specific to Turkish law have still been limited due to the scarcity of domain-specific data and models. Although extensive models like LEGAL-BERT have been developed for English legal texts, the Turkish legal domain lacks a domain-specific high-volume counterpart. In this paper, we introduce HukukBERT, the most comprehensive legal language model for Turkish, trained on a 18 GB cleaned legal corpus using a hybrid Domain-Adaptive Pre-Training (DAPT) methodology integrating Whole-Word Masking, Token Span Masking, Word Span Masking, and targeted Keyword Masking. We systematically compared our 48K WordPiece tokenizer and DAPT approach against general-purpose and existing domain-specific Turkish models. Evaluated on a novel Legal Cloze Test benchmark -- a masked legal term prediction task designed for Turkish court decisions -- HukukBERT achieves state-of-the-art performance with 84.40\% Top-1 accuracy, substantially outperforming existing models. Furthermore, we evaluated HukukBERT in the downstream task of structural segmentation of official Turkish court decision

搜索结果：Turkish neurosurgery

OCRTurk: A Comprehensive OCR Benchmark for Turkish

HUKUKBERT: Domain-Specific Language Model for Turkish Law

TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval

SindBERT, the Sailor: Charting the Seas of Turkish NLP

TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish

Setting Standards in Turkish NLP: TR-MMLU for Large Language Model Evaluation

HeceTokenizer: A Syllable-Based Tokenization Approach for Turkish Retrieval

Context Aware Lemmatization and Morphological Tagging Method in Turkish

Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models

A Cross-Validation Study of Turkish Sentiment Analysis Datasets and Tools

TurkEmbed: Turkish Embedding Model on NLI &amp; STS Tasks

Developing a Comprehensive Framework for Sentiment Analysis in Turkish

Automating Turkish Educational Quiz Generation Using Large Language Models

Turkronicles: Diachronic Resources for the Fast Evolving Turkish Language

Tokens with Meaning: A Hybrid Tokenization Approach for Turkish

TurBLiMP: A Turkish Benchmark of Linguistic Minimal Pairs

Investigating Gender Bias in Turkish Language Models

Turkish Delights: a Dataset on Turkish Euphemisms

TurkEmbed4Retrieval: Turkish Embedding Model for Retrieval Task

Introducing cosmosGPT: Monolingual Training for Turkish Language Models

TurkEmbed: Turkish Embedding Model on NLI & STS Tasks