In this paper, we introduce SaudiBERT, a monodialect Arabic language model pretrained exclusively on Saudi dialectal text. To demonstrate the model's effectiveness, we compared SaudiBERT with six different multidialect Arabic language models across 11 evaluation datasets, which are divided into two groups: sentiment analysis and text classification. SaudiBERT achieved average F1-scores of 86.15\% and 87.86\% in these groups respectively, significantly outperforming all other comparative models. Additionally, we present two novel Saudi dialectal corpora: the Saudi Tweets Mega Corpus (STMC), which contains over 141 million tweets in Saudi dialect, and the Saudi Forums Corpus (SFC), which includes 15.2 GB of text collected from five Saudi online forums. Both corpora are used in pretraining the proposed model, and they are the largest Saudi dialectal corpora ever reported in the literature. The results confirm the effectiveness of SaudiBERT in understanding and analyzing Arabic text expressed in Saudi dialect, achieving state-of-the-art results in most tasks and surpassing other language models included in the study. SaudiBERT model is publicly available on \url{https://huggingface.co/
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing; however, they often struggle to accurately capture and reflect cultural nuances. This research addresses this challenge by focusing on Saudi Arabia, a country characterized by diverse dialects and rich cultural traditions. We introduce SaudiCulture, a novel benchmark designed to evaluate the cultural competence of LLMs within the distinct geographical and cultural contexts of Saudi Arabia. SaudiCulture is a comprehensive dataset of questions covering five major geographical regions, such as West, East, South, North, and Center, along with general questions applicable across all regions. The dataset encompasses a broad spectrum of cultural domains, including food, clothing, entertainment, celebrations, and crafts. To ensure a rigorous evaluation, SaudiCulture includes questions of varying complexity, such as open-ended, single-choice, and multiple-choice formats, with some requiring multiple correct answers. Additionally, the dataset distinguishes between common cultural knowledge and specialized regional aspects. We conduct extensive evaluations on five LLMs, such as GPT-4, Llama
Large language models (LLMs) for Arabic are still dominated by Modern Standard Arabic (MSA), with limited support for Saudi dialects such as Najdi and Hijazi. This underrepresentation hinders their ability to capture authentic dialectal variation. Using a privately curated Saudi Dialect Instruction dataset (Hijazi and Najdi; 5,466 synthetic instruction-response pairs; 50/50 split), we LoRA-tune ALLaM-7B-Instruct-preview, the first foundation model developed in Saudi Arabia, for Saudi dialect generation. We investigate two variants: (i) Dialect-Token training, which prepends an explicit dialect tag to the instruction, and (ii) No-Token training, which omits the tag at formatting time. Evaluation on a held-out test set combines an external dialect classifier with text fidelity metrics (chrF++ and BERTScore) and diversity measures. The Dialect-Token model achieves the best control, raising the Saudi rate from 47.97% to 84.21% and reducing MSA leakage from 32.63% to 6.21%; fidelity also improves (chrF++ +3.53, BERTScore +0.059). Both LoRA variants outperform strong generic instruction models (Falcon-7B-Instruct, Llama-3.1-8B-Instruct, Qwen-2.5-7B-Instruct, AceGPT-v2-8B-Chat, JAIS-13B-C
Generative Artificial Intelligence (GenAI) is rapidly becoming embedded in Saudi Arabia's digital transformation under Vision 2030, yet public awareness, adoption, and concerns surrounding these tools remain underexplored. This study provides an early snapshot of GenAI engagement among Saudi nationals. Using a nationwide survey of 330 participants across regions, age groups, and employment sectors, we examine seven dimensions of GenAI use: awareness and understanding, adoption patterns, perceived impacts, training needs, risks and barriers, data-sharing behaviors, and future expectations. Findings show that 93% of respondents actively use GenAI primarily for text-based tasks, while more advanced uses such as programming or multimodal generation are less common. Despite the prevalence of use, overall awareness and conceptual understanding remain uneven, with many reporting limited technical knowledge. Participants recognize GenAI's benefits for productivity, work quality, and understanding complex information, yet caution that sustained reliance may undermine critical thinking and key professional skills. Trust in AI-generated outputs remains cautious, with widespread concerns about
Large-scale restoration in drylands is widely promoted to address land degradation and biodiversity loss, yet many efforts rely on long-term irrigation, limiting sustainability in water-scarce regions. A key challenge is identifying locations where native vegetation can persist without intensive management while minimizing costly field campaigns. A scalable pre-screening framework is presented that integrates climate and remote sensing data to enable cost-efficient site selection in arid environments using Saudi Arabia as a case study. A Climate Suitability Score (CSS), derived from machine learning models trained on expert-curated reference sites, captures complex climatic dependencies on vegetation persistence. Using multi-year ERA5-Land data for Saudi Arabia, national-scale prediction maps are generated and combined with vegetation indices to identify areas where climate is favorable, but vegetation remains underdeveloped. Multi-criteria screening reduces candidates to thirteen priority locations. Climatically analogous intact ecosystems provide benchmarks for restoration targets and indicate that an average 2.5 fold increase in vegetation coverage is a realistic target for rest
Saudi Arabias rapid economic growth and social evolution under Vision 2030 present a unique opportunity to track emerging trends in real time. Uncovering trends in real time can open up new avenues for business and investment opportunities. This paper explores how AI and social media analytics can uncover and monitor these trends across sectors like sustainability, construction, food beverages industry, tourism, technology, and entertainment. This paper focus on use of AI-driven methodology to identify sustainability trends across Saudi Arabia. We processed millions of social media posts, news, blogs in order to understand sustainability trends in the region. The paper presents an AI approach that can help economists, businesses, government to understand sustainability trends and make better decisions around them. This approach offers both sector-specific and cross-sector insights, giving decision-makers a reliable, up to date snapshot of Saudi Arabias market shifts. Beyond Saudi Arabia, this framework also shows potential for adapting to other regions. Overall, our findings highlight how by using AI-methodologies, give decision makers a reliable method to understand how initiative
Investor sentiment shapes financial markets, yet modeling sentiment in Arabic financial contexts remains challenging due to linguistic complexity and limited resources. We present an Arabic NLP framework for large-scale financial sentiment analysis tailored to the Saudi market, integrating official financial news and social media to capture institutional and public investor sentiment. The framework constructs a large Arabic financial corpus through a multi-stage pipeline encompassing data collection, cleaning, deduplication, entity linking, and sentiment annotation. Transformer-based NER combined with a curated company lexicon links textual mentions to canonical company identifiers, with sentiment labels assigned using a five-class scheme. The resulting dataset of 84K samples supports company-level sentiment aggregation and analysis of sentiment dynamics relative to stock market behavior on the Saudi Exchange. Experimental results demonstrate reliable and scalable Arabic financial sentiment analysis.
As large language models (LLMs) become increasingly central to Arabic NLP applications, evaluating their understanding of regional dialects and cultural nuances is essential, particularly in linguistically diverse settings like Saudi Arabia. This paper introduces Absher, a comprehensive benchmark specifically designed to assess LLMs performance across major Saudi dialects. \texttt{Absher} comprises over 18,000 multiple-choice questions spanning six distinct categories: Meaning, True/False, Fill-in-the-Blank, Contextual Usage, Cultural Interpretation, and Location Recognition. These questions are derived from a curated dataset of dialectal words, phrases, and proverbs sourced from various regions of Saudi Arabia. We evaluate several state-of-the-art LLMs, including multilingual and Arabic-specific models. We also provide detailed insights into their capabilities and limitations. Our results reveal notable performance gaps, particularly in tasks requiring cultural inference or contextual understanding. Our findings highlight the urgent need for dialect-aware training and culturally aligned evaluation methodologies to improve LLMs performance in real-world Arabic applications.
This study investigates the extent to which contemporary Text-to-Image artificial intelligence (AI) models perpetuate gender stereotypes and cultural inaccuracies when generating depictions of professionals in Saudi Arabia. We analyzed 1,006 images produced by ImageFX, DALL-E V3, and Grok for 56 diverse Saudi professions using neutral prompts. Two trained Saudi annotators evaluated each image on five dimensions: perceived gender, clothing and appearance, background and setting, activities and interactions, and age. A third senior researcher adjudicated whenever the two primary raters disagreed, yielding 10,100 individual judgements. The results reveal a strong gender imbalance, with ImageFX outputs being 85\% male, Grok 86.6\% male, and DALL-E V3 96\% male, indicating that DALL-E V3 exhibited the strongest overall gender stereotyping. This imbalance was most evident in leadership and technical roles. Moreover, cultural inaccuracies in clothing, settings, and depicted activities were frequently observed across all three models. Counter-stereotypical images often arise from cultural misinterpretations rather than genuinely progressive portrayals. We conclude that current models mirro
Autonomous vehicles (AVs) are emerging as a transformative innovation in transportation, offering potential benefits in safety, sustainability, and efficiency. Saudi Arabian adoption of AVs aligns with Vision 2030, emphasizing smart mobility through initiatives such as the Riyadh Autonomous Metro and self-driving cars. This study explores Saudi citizens perceptions of AVs before and after exposure to these technologies and examines whether demographic factors age, gender, education level, and driving habits affect acceptance. Using quantitative methods, the findings provide insights into the broader influences shaping AV adoption, highlighting the importance of trust, perceived safety, and convenience. These results can inform policymakers and industry stakeholders on strategies to facilitate successful integration of AVs into Saudi Arabian transportation ecosystem.
Not all nations on earth have previously been surveyed accurately enough to know for certain which peak is the national highpoint, the highest peak in the country. Knowledge of these peaks is important for understanding the physical geography of these countries in terms of natural resource availability, watershed management, and tourism potential. For this study, ground surveys were conducted between 2018-2025 with modern professional surveying equipment, including differential GPS units and Abney levels, to accurately determine the national highpoints in five African and Asian countries where uncertainty existed. New national highpoints were determined for Saudi Arabia (Jabal Ferwa), Uzbekistan (Alpomish), Gambia (Sare Firasu Hill), Guinea-Bissau (Mt Ronde), and Togo (Mt Atilakoutse). Elevations were measured with sub-meter vertical accuracy for candidate peaks in Saudi Arabia, Gambia, Guinea-Bissau, and Togo. Relative elevations were measured between contender peaks in Uzbekistan with sufficient accuracy to determine the highpoint.
Sign language (SL) is an essential communication form for hearing-impaired and deaf people, enabling engagement within the broader society. Despite its significance, limited public awareness of SL often leads to inequitable access to educational and professional opportunities, thereby contributing to social exclusion, particularly in Saudi Arabia, where over 84,000 individuals depend on Saudi Sign Language (SSL) as their primary form of communication. Although certain technological approaches have helped to improve communication for individuals with hearing impairments, there continues to be an urgent requirement for more precise and dependable translation techniques, especially for Arabic sign language variants like SSL. Most state-of-the-art solutions have primarily focused on non-Arabic sign languages, resulting in a considerable absence of resources dedicated to Arabic sign language, specifically SSL. The complexity of the Arabic language and the prevalence of isolated sign language datasets that concentrate on individual words instead of continuous speech contribute to this issue. To address this gap, our research represents an important step in developing SSL resources. To ad
Governments around the world have worked tirelessly to develop technological solutions, on the one hand to better serve their citizens and, on the other hand, to advance in the United Nations Electronic Government Development Index (EGDI). Thus, it is crucial to assess e-government solutions and services from different aspects. This study evaluates e-government solutions and services based on user expectations in four aspects: general satisfaction, satisfaction with features, trust, and pleasure. In this study, a questionnaire was developed to allow the evaluation of e-government solutions and services in Saudi Arabia, and could also be utilized to evaluate e-government in any other nation. The study included 276 valid participants, while the required sample size was calculated using a standard sample size estimation formula for large populations (95% confidence level, 5% margin of error). In addition, descriptive analysis was used to analyze participant responses. The results showed that e-government services in Saudi Arabia achieved a level of citizen contentment that is consistent with its score in the EGDI published in the year 2024.
Facial Recognition Technology (FRT) is a pioneering field of mass surveillance that sparks privacy concerns and is considered a growing threat in the modern world. FRT has been widely adopted in the Kingdom of Saudi Arabia to improve public services and surveillance. Accordingly, the following study aims to understand the privacy and security concerns, trust, and acceptance of FRT in Saudi Arabia. Validated Privacy Concerns (IUIPC-8), Security Attitudes (SA-6), and Security Behavior (SeBIS) scales are used along with replicate studies from Pew Research Center trust questions and government trust questions. In addition, we examine potential differences between Saudis and Americans. To gain insights into these concerns, we conducted an online survey involving 53 Saudi Arabia citizens who are residing in the USA. We have collected data in the US instead of Saudi Arabia to avoid the regulatory challenges of the Saudi Data & Artificial Intelligence Authority (SDAIA). Responses from closed-ended questions revealed that Saudis score much lower than Americans when it comes to security attitudes, whereas they score lower when it comes to privacy concerns. We found no significant differe
Using an integrated framework rooted in the TOE model enhanced with AI, this study looks at ways to improve industrial performance and environmental sustainability in fragile and rapidly transforming contexts such as those found in Yemen and Saudi Arabia. Data for the research are field-based and were obtained from a total of 600 SMEs operating in both countries. Based on the questionnaires' responses by 294 managers, results from the partial least squares structural equation modeling (PLS-SEM) have indicated significant positive effects of AI-TOE on environmental performance (beta = 0.487) and manufacturing performance (beta = 0.759). Results indicate that AI acts as a transformative force, though its impact differs based on the maturity of infrastructure and organizational readiness. The Saudi SMEs gain from their institutional support and advanced technologies, while those in Yemen are dependent on the low-cost adoption of AI and organizational flexibility to accept structural challenges. PLS-SEM analysis of the study showed that integrating AI into the TOE dimensions accelerates operational efficiency in order to support environmental performance. Industrial performance was fou
The present study provides the first-ever report on the language shift from Tibetan to Arabic among descendants of Tibetan families who migrated from the Tibet region to Saudi Arabia around 70 years ago. The aim of this study was to determine whether three age groups had adopted different practices in terms of maintaining Tibetan or shifting to Hijazi Arabic. To this end, 96 male and female members of the Tibetan community responded to a questionnaire in which they were asked about their code choice in different domains (home, neighbourhood, friends and relatives, expressing emotion, and performing religious rituals). The data revealed significant intergenerational differences between members of the community in terms of the extent of the shift to Arabic, with Tibetan rarely used by younger members and older members making only slightly more use of it. The difference between the three age groups was significant, at a p-value of .001.
This paper explores the application of T5 models for Saudi Sign Language (SSL) translation using a novel dataset. The SSL dataset includes three challenging testing protocols, enabling comprehensive evaluation across different scenarios. Additionally, it captures unique SSL characteristics, such as face coverings, which pose challenges for sign recognition and translation. In our experiments, we investigate the impact of pre-training on American Sign Language (ASL) data by comparing T5 models pre-trained on the YouTubeASL dataset with models trained directly on the SSL dataset. Experimental results demonstrate that pre-training on YouTubeASL significantly improves models' performance (roughly $3\times$ in BLEU-4), indicating cross-linguistic transferability in sign language models. Our findings highlight the benefits of leveraging large-scale ASL data to improve SSL translation and provide insights into the development of more effective sign language translation systems. Our code is publicly available at our GitHub repository.
This paper introduces the Saudi Privacy Policy Dataset, a diverse compilation of Arabic privacy policies from various sectors in Saudi Arabia, annotated according to the 10 principles of the Personal Data Protection Law (PDPL); the PDPL was established to be compatible with General Data Protection Regulation (GDPR); one of the most comprehensive data regulations worldwide. Data were collected from multiple sources, including the Saudi Central Bank, the Saudi Arabia National United Platform, the Council of Health Insurance, and general websites using Google and Wikipedia. The final dataset includes 1,000 websites belonging to 7 sectors, 4,638 lines of text, 775,370 tokens, and a corpus size of 8,353 KB. The annotated dataset offers significant reuse potential for assessing privacy policy compliance, benchmarking privacy practices across industries, and developing automated tools for monitoring adherence to data protection regulations. By providing a comprehensive and annotated dataset of privacy policies, this paper aims to facilitate further research and development in the areas of privacy policy analysis, natural language processing, and machine learning applications related to pr
The integration of Internet of Things (IoT) technologies in agriculture holds promise for transforming farming practices, particularly in the Kingdom of Saudi Arabia (KSA). This study explores the adoption of smart farming practices among KSA farmers. Due to the geographical location and nature of KSA, it faces significant challenges in agriculture. The objective of this research is to discuss how IoT will enhance agriculture in KSA and identify its current usage by conducting a study on Saudi farmers with varying ages, regions, and years of experience. The results indicate that 90% of the farmers encounter challenges in farming, and all of them express interest in adopting smart farming to address these issues. While 60% of farmers are currently utilizing IoT technologies, they encounter challenges in implementing smart farming practices. Thus, smart farming presents solutions to prevalent challenges including adverse weather, water scarcity, and labor shortages, though barriers include cost and educational challenges.
This paper presents the preliminary findings of a study researching the diffusion and the adoption of online retailing in Saudi Arabia. It reports new research that identifies and explores the key issues that positively and negatively influence retailers in Saudi Arabia regarding the adoption of electronic commerce. Retailers in Saudi Arabia have been reserved in their adoption of electronically delivered aspects of their business. Despite the fact that Saudi Arabia has the largest and fastest growth of ICT marketplaces in the Arab region, ecommerce activities are not progressing at the same speed. Only a tiny number of Saudi commercial organizations, mostly medium and large companies from the manufacturing sector, are involved in e-commerce implementation. Based on qualitative data, collected by conducting interviews with a sample population of retail sector decision makers in Saudi Arabia, both positive and negative issues influencing retailer adoption of electronic retailing systems in Saudi Arabia are identified. A number of impediments which include cultural, business and technical issues were reported. Facilitating factors include access to educational programs and awareness