搜索 — ResearchTracker

In this paper, we introduce SaudiBERT, a monodialect Arabic language model pretrained exclusively on Saudi dialectal text. To demonstrate the model's effectiveness, we compared SaudiBERT with six different multidialect Arabic language models across 11 evaluation datasets, which are divided into two groups: sentiment analysis and text classification. SaudiBERT achieved average F1-scores of 86.15\% and 87.86\% in these groups respectively, significantly outperforming all other comparative models. Additionally, we present two novel Saudi dialectal corpora: the Saudi Tweets Mega Corpus (STMC), which contains over 141 million tweets in Saudi dialect, and the Saudi Forums Corpus (SFC), which includes 15.2 GB of text collected from five Saudi online forums. Both corpora are used in pretraining the proposed model, and they are the largest Saudi dialectal corpora ever reported in the literature. The results confirm the effectiveness of SaudiBERT in understanding and analyzing Arabic text expressed in Saudi dialect, achieving state-of-the-art results in most tasks and surpassing other language models included in the study. SaudiBERT model is publicly available on \url{https://huggingface.co/

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

arXiv2025-03-21作者：Lama Ayash, Hassan Alhuzali, Ashwag Alasmari

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing; however, they often struggle to accurately capture and reflect cultural nuances. This research addresses this challenge by focusing on Saudi Arabia, a country characterized by diverse dialects and rich cultural traditions. We introduce SaudiCulture, a novel benchmark designed to evaluate the cultural competence of LLMs within the distinct geographical and cultural contexts of Saudi Arabia. SaudiCulture is a comprehensive dataset of questions covering five major geographical regions, such as West, East, South, North, and Center, along with general questions applicable across all regions. The dataset encompasses a broad spectrum of cultural domains, including food, clothing, entertainment, celebrations, and crafts. To ensure a rigorous evaluation, SaudiCulture includes questions of varying complexity, such as open-ended, single-choice, and multiple-choice formats, with some requiring multiple correct answers. Additionally, the dataset distinguishes between common cultural knowledge and specialized regional aspects. We conduct extensive evaluations on five LLMs, such as GPT-4, Llama

搜索结果：Saudi

SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora

SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia

Saudi-Dialect-ALLaM: LoRA Fine-Tuning for Dialectal Arabic Generation

Generative AI in Saudi Arabia: A National Survey of Adoption, Risks, and Public Perceptions

Climate-based Pre-screening of Self-sustaining Regreening Opportunities in Drylands: A Case Study for Saudi Arabia

Leveraging Social Media Analytics for Sustainability Trend Detection in Saudi Arabias Evolving Market

LLM-Based Financial Sentiment Analysis in Arabic: Evidence from Saudi Markets

From Words to Proverbs: Evaluating LLMs Linguistic and Cultural Competence in Saudi Dialects with Absher

Gender Stereotypes in Professional Roles Among Saudis: An Analytical Study of AI-Generated Images Using Language Models

From Expectation To Experience: A Before And After Survey Of Public Opinion On Autonomous Cars In Saudi Arabia

Determination of new national highpoints of five African and Asian countries, Saudi Arabia, Uzbekistan, Gambia, Guinea-Bissau, and Togo

Continuous Saudi Sign Language Recognition: A Vision Transformer Approach

Citizens' Contentment with e-Government Solutions and Services in Saudi Arabia

Saudi Arabian Perspective of Security, Privacy, and Attitude of Using Facial Recognition Technology

AI-Enhanced TOE Framework for Sustainable Industrial Performance in Fragile and Transforming Economies: Evidence from Yemen and Saudi Arabia

Language Shift or Maintenance? An Intergenerational Study of the Tibetan Community in Saudi Arabia

Saudi Sign Language Translation Using T5

The Saudi Privacy Policy Dataset

Understanding the Landscape of Leveraging IoT for Sustainable Growth in Saudi Arabia

Factors unflinching e-commerce adoption by retailers in Saudi Arabia: Qual Analysis