We introduce the Meaning Intelligence Framework (MIF), a nine-dimension annotation and evaluation schema for Nigerian public discourse that separates surface sentiment from true communicative intent. Existing benchmarks for Nigerian languages, including NaijaSenti and AfriSenti, treat sentiment classification as a three-way polarity task. We argue that the dominant failure mode of AI systems on Nigerian discourse is not translation failure but context failure: the same utterance carries opposite pragmatic force depending on speaker, audience, and situation. The MIF operationalises this insight across nine scored dimensions: register, surface sentiment, true intent, irony, coded subtext, risk tier, annotator confidence, speaker emotion, and recommended communications action. We construct a 30-item calibration dataset spanning Standard English, Nigerian English, Nigerian Pidgin, and code-mixed registers, and evaluate three frontier language models (Gemini 2.5 Flash, GPT-5, and Gemini 2.5 Pro) under zero-shot and schema-informed prompting conditions. Two headline findings emerge. First, the Register Gap: zero-shot register classification accuracy is 33.3%, rising to 73.3% (+40 points)
Although modern multilingual Automatic Speech Recognition (ASR) systems support several Nigerian languages, their performance consistently lags behind high-resource languages like English and French. Nigerian languages present unique modelling hurdles, including acute data scarcity, inconsistent orthography, tonal diacritics, diverse accents, frequent code-switching, and localized named entities. To address these challenges, we developed a multilingual ASR framework utilizing a two-stage distillation process. First, we employ student-teacher knowledge distillation from existing monolingual models, conditioned on robust language-specific N-gram language models. Second, we perform iterative self improvement using pseudo-labelled data to further refine accuracy. Our method significantly bridges the performance gap, achieving on average a relative Word Error Rate (WER) reduction of 29 % over monolingual baselines. Our models also outperform state-of-the-art multilingual models across major benchmarks, including Common Voice and Fleurs. We introduce Sometin Beta Pass Notin (SBPN), a foundational multilingual ASR model covering Yorùbá, Hausa, Igbo, Nigerian Pidgin, and Nigerian English.
License Plate Recognition (LPR) systems are critical tools in traffic monitoring, security enforcement, and urban mobility management. Traditional LPR systems often rely on a multi-stage pipeline involving object detection using You Only Look Once (YOLO) and Optical Character Recognition (OCR), which suffer from limitations such as high resource demands, poor performance in unstructured environments, and the need for large annotated datasets. This study explores the potential of Vision-Language Models (VLMs) as a unified, zeroshot learning solution for Nigerian license plate recognition. Using a curated dataset of 88 challenging real-world images collected in Nigeria, we evaluate five selected VLMs: Gemini 2.0 Flash Exp (Google DeepMind), Qwen2.5-VL-7B-Instruct (Alibaba), GPT-4o (OpenAI), Claude 4 Sonnet (Anthropic), and Llama 3.2 Vision 90b (Meta). Results based on Character Error Rate (CER) reveal that Gemini and Qwen significantly outperform other models in both accuracy and robustness, on the challenging image scenarios. This work highlights the practical advantages of VLMs over YOLO+OCR, questions the claims by model providers, and compares the performances of the VLMs.
Speech translation for low-resource languages remains fundamentally limited by the scarcity of high-quality, diverse parallel speech data, a challenge that is especially pronounced in African linguistic contexts. To address this, we introduce NaijaS2ST, a parallel speech translation dataset spanning Igbo, Hausa, Yorùbá, and Nigerian Pidgin paired with English. The dataset comprises approximately 50 hours of speech per language and captures substantial variation in speakers and accents, reflecting realistic multilingual and multi-accent conditions. With NaijaS2ST, we conduct a comprehensive benchmark of cascaded, end-to-end (E2E), and AudioLLM-based approaches across bidirectional translation settings. Our results show that audio LLMs with few-shot examples are more effective for speech-to-text translation than cascaded and end-to-end methods trained on fine-tuned data. However, for speech-to-speech translation, the cascaded and audio LLM paradigms yield comparable performance, indicating that there is still considerable room for improvement in developing targeted, task-specific models for this setting. By providing both a high-quality dataset and a systematic benchmark, we hope tha
This study examines whether following popular Nigerian sports betting influencers on social media is a financially sound strategy. To avoid the survivorship bias that occurs when influencers only share their winning bets, we tracked 5,467 pre-match betting slips from three prominent tipsters on X (formerly Twitter) and Telegram. We verified the outcomes against official Stake.com records, resulting in a final dataset covering approximately $4.8 million in tracked bets. We analyzed raw performance, assessed risk based on odds sizes, and applied four common staking strategies (Flat, Inverse, Square Root, and Fixed Return) to simulate realistic follower outcomes. The results show a sharp contrast between the wealth these influencers display online and the actual financial results. The influencers themselves collectively lost 25.24% on their promoted bets, while a follower who staked the same amount on every tip would lose 38.27% on their investment. Across all tested strategies, following these influencers consistently led to significant financial losses. These findings raise serious consumer protection concerns in Nigeria's expanding gambling market.
In Nigeria, electoral behavior is often interpreted through ethno-religious views and regional allegiances, without empirically assessing the influence of socioeconomic indicators such as health, income, education, and other deprivations on voter behavior. This study investigates the voting pattern in the 2023 Nigerian presidential election and previous cycles using spatio-temporal and multivariate analysis. It examines whether support for some candidates was more substantial in states with higher Human Development Index (HDI) and whether this alignment impacts governance quality and macroeconomic performance post-election. Socioeconomic data were obtained from the Global Data Lab and the Nigerian Bureau of Statistics (NBS) for the year preceding each election, while presidential vote results were sourced from the Independent National Electoral Commission (INEC). The results show that the Labour Party (LP) dominated states with high socioeconomic indices, accounting for about 30-53% of the variance in voting patterns of LP and People's Democratic Party (PDP), while the All Progressive Congress (APC) variance is less explained by these factors. Multinomial logits based on HDI is use
Depression is a major contributor to the mental-health burden in Nigeria, yet screening coverage remains limited due to low access to clinicians, stigma, and language barriers. Traditional tools like the Patient Health Questionnaire-9 (PHQ-9) were validated in high-income countries but may be linguistically or culturally inaccessible for low- and middle-income countries and communities such as Nigeria where people communicate in Nigerian Pidgin and more than 520 local languages. This study presents a novel approach to automated depression screening using fine-tuned large language models (LLMs) adapted for conversational Nigerian Pidgin. We collected a dataset of 432 Pidgin-language audio responses from Nigerian young adults aged 18-40 to prompts assessing psychological experiences aligned with PHQ-9 items, performed transcription, rigorous preprocessing and annotation, including semantic labeling, slang and idiom interpretation, and PHQ-9 severity scoring. Three LLMs - Phi-3-mini-4k-instruct, Gemma-3-4B-it, and GPT-4.1 - were fine-tuned on this annotated dataset, and their performance was evaluated quantitatively (accuracy, precision and semantic alignment) and qualitatively (clari
Credit card fraud is a major cause of national concern in the Nigerian financial sector, affecting hundreds of transactions per second and impacting international ecommerce negatively. Despite the rapid spread and adoption of online marketing, millions of Nigerians are prevented from transacting in several countries with local credit cards due to bans and policies directed at restricting credit card fraud. Presently, a myriad of technologies exist to detect fraudulent transactions, a few of which are adopted by Nigerian financial institutions to proactively manage the situation. Fraud detection allows institutions to restrict offenders from networks and with a centralized banking identity management system, such as the Bank Verification Number used by the Central Bank of Nigeria, offenders who may have stolen other identities can be backtraced and their bank accounts frozen. This paper aims to compare the effectiveness of two fraud detection technologies that are projected to work fully independent of human intervention to possibly predict and detect fraudulent credit card transactions. Autoencoders as an unsupervised tensorflow based anomaly detection technique generally offers gr
To address the global issue of online hate, hate speech detection (HSD) systems are typically developed on datasets from the United States, thereby failing to generalize to English dialects from the Majority World. Furthermore, HSD models are often evaluated on non-representative samples, raising concerns about overestimating model performance in real-world settings. In this work, we introduce NaijaHate, the first dataset annotated for HSD which contains a representative sample of Nigerian tweets. We demonstrate that HSD evaluated on biased datasets traditionally used in the literature consistently overestimates real-world performance by at least two-fold. We then propose NaijaXLM-T, a pretrained model tailored to the Nigerian Twitter context, and establish the key role played by domain-adaptive pretraining and finetuning in maximizing HSD performance. Finally, owing to the modest performance of HSD systems in real-world conditions, we find that content moderators would need to review about ten thousand Nigerian tweets flagged as hateful daily to moderate 60% of all hateful content, highlighting the challenges of moderating hate speech at scale as social media usage continues to gr
This paper presents a unique driving dataset collected in Nigeria via mobile phone sensors to support a machine learning model for detecting alcohol-influenced driving behaviours, with the long-term aim of integrating this model into a mobile application that encourages safer driving behaviours. Driving under the influence of alcohol is a major public safety concern, particularly in low-income countries like Nigeria, where traditional enforcement mechanisms may be limited. The proposed model leverages smartphone sensors such as accelerometers, gyroscopes, and GPS to provide a non-invasive, continuous solution for detecting impaired driving patterns in real time. This study adapts existing data processing and pattern matching methodologies to label real-world driving data collected from Nigerian drivers, which are then used to train the model. A decision tree classifier is developed to detect alcohol influence, based on behavioural and temporal features, achieving a recall of 100%, a precision of 60%, and an F1 score of 75%. The model's overall accuracy was 90.91%, ensuring that no alcohol influenced trips were missed. Key predictive features included speed variability, course devia
Over a decade there has been a rapid growth in Nigerian educational system particularly higher education. Various institutions have come up both from public and private sector offering many of courses both under and post graduate students. Therefore, rates of students enroll for higher educational institutions in Nigeria have also increased. Hence it is very important to understand the roles play by data mining in analyzing the collected data of students and their academic progression. It is a concern for today's education system and this gap has to be identified and properly addressed to the learning community. Data Mining it helps in various ways to resolve issues face in predictions students and staff performances within Nigerian education system. This paperwork we discuss the roles of Data Mining tools and techniques which can be used effectively in resolving issues in some functional unit of Nigerian tertiary institutions.
The growing use of Internet of Things (IoT) technologies in Nigerian healthcare offers potential improvements in remote monitoring and data-driven care, but unsecured wireless communication in medical IoT (mIoT) devices exposes patient data to cyber threats. This study investigates such vulnerabilities through a real-time Man in the Middle (MITM) attack simulation and evaluates lightweight AES-128 encryption on low-cost devices. A prototype mIoT device was built with a NodeMCU ESP8266 and sensors for heart rate and temperature. In controlled lab conditions simulating local healthcare networks, unencrypted data transmissions were intercepted and altered using common tools (Bettercap, Wireshark). After AES-128 encryption was applied, all transmissions became unreadable and tamper attempts failed, demonstrating its effectiveness. Performance costs were modest, latency rose from 80 ms to 125 ms (56.25 percent increase) and CPU use from 30 percent to 45 percent, but system stability remained intact. Device cost stayed under 18,000 NGN (about 12 USD), making it feasible for Nigeria's resource constrained facilities. A survey of healthcare professionals showed moderate awareness of IoT-re
With over 500 languages in Nigeria, three languages -- Hausa, Yorùbá and Igbo -- spoken by over 175 million people, account for about 60% of the spoken languages. However, these languages are categorised as low-resource due to insufficient resources to support tasks in computational linguistics. Several research efforts and initiatives have been presented, however, a coherent understanding of the state of Natural Language Processing (NLP) - from grammatical formalisation to linguistic resources that support complex tasks such as language understanding and generation is lacking. This study presents the first comprehensive review of advancements in low-resource NLP (LR-NLP) research across the three major Nigerian languages (NaijaNLP). We quantitatively assess the available linguistic resources and identify key challenges. Although a growing body of literature addresses various NLP downstream tasks in Hausa, Igbo, and Yorùbá, only about 25.1% of the reviewed studies contribute new linguistic resources. This finding highlights a persistent reliance on repurposing existing data rather than generating novel, high-quality resources. Additionally, language-specific challenges, such as the
Nigerian Pidgin is an English-derived contact language and is traditionally an oral language, spoken by approximately 100 million people. No orthographic standard has yet been adopted, and thus the few available Pidgin datasets that exist are characterised by noise in the form of orthographic variations. This contributes to under-performance of models in critical NLP tasks. The current work is the first to describe various types of orthographic variations commonly found in Nigerian Pidgin texts, and model this orthographic variation. The variations identified in the dataset form the basis of a phonetic-theoretic framework for word editing, which is used to generate orthographic variations to augment training data. We test the effect of this data augmentation on two critical NLP tasks: machine translation and sentiment analysis. The proposed variation generation framework augments the training data with new orthographic variants which are relevant for the test set but did not occur in the training set originally. Our results demonstrate the positive effect of augmenting the training data with a combination of real texts from other corpora as well as synthesized orthographic variatio
Investors and stock market analysts face major challenges in predicting stock returns and making wise investment decisions. The predictability of equity stock returns can boost investor confidence, but it remains a difficult task. To address this issue, a study was conducted using a Long Short-term Memory (LSTM) model to predict future stock market movements. The study used a historical dataset from the Nigerian Stock Exchange (NSE), which was cleaned and normalized to design the LSTM model. The model was evaluated using performance metrics and compared with other deep learning models like Artificial and Convolutional Neural Networks (CNN). The experimental results showed that the LSTM model can predict future stock market prices and returns with over 90% accuracy when trained with a reliable dataset. The study concludes that LSTM models can be useful in predicting financial time-series-related problems if well-trained. Future studies should explore combining LSTM models with other deep learning techniques like CNN to create hybrid models that mitigate the risks associated with relying on a single model for future equity stock predictions.
This paper carries out a scientific study of the effects of strike on Nigerian Universities. A mathematical model is formulated to examine the behavior of Nigerian University System when Public Universities are on strike. The results show that if all State and Federal Universities are on strike with the exception of the Private Universities, the University System in Nigeria is locally asymptotically stable.
The inception of AI-based fraud detection systems has presented the banking sector across the globe the opportunity to enhance fraud prevention mechanisms. However, the extent of adoption in Nigeria has been slow, fragmented, and inconsistent due to high cost of implementation and lack of technical expertise. This study seeks to investigate extent of adoption and determinants of AI-driven fraud detection systems in Nigerian banks. This study adopted a cross-sectional survey research design. Data were extracted from primary sources through structured questionnaire based on 5-point Likert scale. The population of the study consist of 24 licensed banks in Nigeria. A purposive sampling technique was used to select 5 biggest banks based on market capitalization and customer base. The Ordered Logistic Regression (OLR) model was used to estimate the data. The results showed that top management support, IT infrastructure, regulatory compliance, staff competency and perceived effectiveness accelerate the uptake of AI-driven fraud detection systems adoption. However, high implementation cost discourages it. Therefore, the study recommended that banks should invest in modern and scalable IT s
Nigeria is a multilingual country with 500+ languages. Naija is a Nigerian Pidgin spoken by approximately 120M speakers and it is a mixed language (e.g., English, Portuguese, Yoruba, Hausa and Igbo). Although it has mainly been a spoken language until recently, there are some online platforms (e.g., Wikipedia), publishing in written Naija as well. West African Pidgin English (WAPE) is also spoken in Nigeria and it is used by BBC to broadcast news on the internet to a wider audience not only in Nigeria but also in other West African countries (e.g., Cameroon and Ghana). Through statistical analyses and Machine Translation experiments, our paper shows that these two pidgin varieties do not represent each other (i.e., there are linguistic differences in word order and vocabulary) and Generative AI operates only based on WAPE. In other words, Naija is underrepresented in Generative AI, and it is hard to teach LLMs with few examples. In addition to the statistical analyses, we also provide historical information on both pidgins as well as insights from the interviews conducted with volunteer Wikipedia contributors in Naija.
Despite attempts to make Large Language Models multi-lingual, many of the world's languages are still severely under-resourced. This widens the performance gap between NLP and AI applications aimed at well-financed, and those aimed at less-resourced languages. In this paper, we focus on Nigerian Pidgin (NP), which is spoken by nearly 100 million people, but has comparatively very few NLP resources and corpora. We address the task of Implicit Discourse Relation Classification (IDRC) and systematically compare an approach translating NP data to English and then using a well-resourced IDRC tool and back-projecting the labels versus creating a synthetic discourse corpus for NP, in which we translate PDTB and project PDTB labels, and then train an NP IDR classifier. The latter approach of learning a "native" NP classifier outperforms our baseline by 13.27\% and 33.98\% in f$_{1}$ score for 4-way and 11-way classification, respectively.
Musicians frequently use social media to express their opinions, but they often convey different messages in their music compared to their posts online. Some utilize these platforms to abuse their colleagues, while others use it to show support for political candidates or engage in activism, as seen during the #EndSars protest. There are extensive research done on offensive language detection on social media, the usage of offensive language by musicians has received limited attention. In this study, we introduce VocalTweets, a code-switched and multilingual dataset comprising tweets from 12 prominent Nigerian musicians, labeled with a binary classification method as Normal or Offensive. We trained a model using HuggingFace's base-Twitter-RoBERTa, achieving an F1 score of 74.5. Additionally, we conducted cross-corpus experiments with the OLID dataset to evaluate the generalizability of our dataset.