The accelerating militarization of artificial intelligence has transformed the ethics, politics, and governance of warfare. This article interrogates how AI-driven targeting systems function as epistemic infrastructures that classify, legitimize, and execute violence, using Israel's conduct in Gaza as a paradigmatic case. Through the lens of responsibility, the article examines three interrelated dimensions: (a) political responsibility, exploring how states exploit AI to accelerate warfare while evading accountability; (b) professional responsibility, addressing the complicity of technologists, engineers, and defense contractors in the weaponization of data; and (c) personal responsibility, probing the moral agency of individuals who participate in or resist algorithmic governance. This is complemented by an examination of the position and influence of those participating in public discourse, whose narratives often obscure or normalize AI-enabled violence. The Gaza case reveals AI not as a neutral instrument but as an active participant in the reproduction of colonial hierarchies and the normalization of atrocity. Ultimately, the paper calls for a reframing of technological agency
This study examines how different artificial intelligence architectures interpret sentiment in conflict-related media discourse, using the 2023 Gaza War as a case study. Drawing on a corpus of 10,990 Arabic news headlines (Eleraqi 2026), the research conducts a comparative analysis between three large language models and six fine-tuned Arabic BERT models. Rather than evaluating accuracy against a single human-annotated gold standard, the study adopts an epistemological approach that treats sentiment classification as an interpretive act produced by model architectures. To quantify systematic differences across models, the analysis employs information-theoretic and distributional metrics, including Shannon Entropy, Jensen-Shannon Distance, and a Variance Score measuring deviation from aggregate model behavior. The results reveal pronounced and non-random divergence in sentiment distributions. Fine-tuned BERT models, particularly MARBERT, exhibit a strong bias toward neutral classifications, while LLMs consistently amplify negative sentiment, with LLaMA-3.1-8B showing near-total collapse into negativity. Frame-conditioned analysis further demonstrates that GPT-4.1 adjusts sentiment j
Population size estimation from capture-recapture data is central for studying hard-to-reach populations, incorporating auxiliary covariates to account for heterogeneous capture probabilities and recapture dependencies. However, missing attributes pose a critical methodological challenge due to reluctance to share sensitive information, data collection limitations, and imperfect record linkage. Existing approaches either ignore missingness or rely on a priori imputation, potentially introducing substantial bias. In this work, we develop a novel nonparametric estimation framework using a Missing at Random assumption to identify capture probabilities under missing covariates. Using semiparametric efficiency theory, we construct one-step estimators that combine efficiency, robustness, and finite-sample validity: they approximately achieve the nonparametric efficiency bound, accommodate flexible machine learning methods through a doubly robust structure, and provide approximately valid inference for any sample size. Simulations demonstrate substantial improvements over naive imputation approaches, with our doubly robust ML estimators maintaining valid inference even at high missingness
Aerial bombardment of the Gaza Strip beginning October 7, 2023 is one of the most intense bombing campaigns of the twenty-first century, driving widespread urban damage. Characterizing damage over a geographically dynamic and protracted armed conflict requires active monitoring. Synthetic aperture radar (SAR) has precedence for mapping disaster-induced damage with bi-temporal methods but applications to active monitoring during sustained crises are limited. Using interferometric SAR data from Sentinel-1, we apply a long temporal-arc coherent change detection (LT-CCD) approach to track weekly damage trends over the first year of the 2023- Israel-Hamas War. We detect 92.5% of damage labels in reference data from the United Nations with a negligible (1.2%) false positive rate. The temporal fidelity of our approach reveals rapidly increasing damage during the first three months of the war focused in northern Gaza, a notable pause in damage during a temporary ceasefire, and surges of new damage as conflict hot-spots shift from north to south. Three-fifths (191,263) of all buildings are damaged or destroyed by the end of the study. With massive need for timely data on damage in armed con
October 7, 2023 marked the start of a war against Gaza, one of the most devastating conflicts in modern history, which quickly produced a stark global attitudinal divide. To examine the role of media bias in shaping public understanding of this asymmetrical war, we analyzed more than 14,000 news articles published during its first year across three major Western outlets (The New York Times, BBC, CNN) and one non-Western English-language outlet (Al Jazeera English). Focusing on media narratives surrounding Israeli and Palestinian victims, we identify three systematic biases in Western coverage: (1) Identifiable Victim Reporting: Israeli victims were substantially more likely to be depicted as identifiable individuals, whereas Palestinian victims were predominantly represented as undifferentiated collectives. (2) Equalization Bias: Despite the profound asymmetry in casualties, displacement, and other forms of suffering, Western reporting repeatedly invoked the October 7 attacks to equalize Israeli and Palestinian victimhood, even in the absence of new Israeli-casualty events. (3) One-sided Doubt Casting: Journalists disproportionately used language that casts doubt on the credibility
Accurately and swiftly assessing damage from conflicts is crucial for humanitarian aid and regional stability. In conflict zones, damaged zones often share similar architectural styles, with damage typically covering small areas and exhibiting blurred boundaries. These characteristics lead to limited data, annotation difficulties, and significant recognition challenges, including high intra-class similarity and ambiguous semantic changes. To address these issues, we introduce a pre-trained DINOv3 model and propose a multi-scale cross-attention difference siamese network (MC-DiSNet). The powerful visual representation capability of the DINOv3 backbone enables robust and rich feature extraction from bi-temporal remote sensing images. The multi-scale cross-attention mechanism allows for precise localization of subtle semantic changes, while the difference siamese structure enhances inter-class feature discrimination, enabling fine-grained semantic change detection. Furthermore, a simple yet powerful lightweight decoder is designed to generate clear detection maps while maintaining high efficiency. We also release a new Gaza-change dataset containing high-resolution satellite image pai
We report empirical evidence of web defacement and DDoS attacks carried out by low-level cybercrime actors in the Israel-Gaza conflict. Our quantitative measurements indicate an immediate increase in such cyberattacks following the Hamas-led assault and the subsequent declaration of war. However, the surges waned quickly after a few weeks, with patterns resembling those observed in the aftermath of the Russian invasion of Ukraine. The scale of attacks and discussions within the hacking community this time was both significantly lower than those during the early days of the Russia-Ukraine war, and attacks have been prominently one-sided: many pro-Palestinian supporters have targeted Israel, while attacks on Palestine have been much less significant. Beyond targeting these two, attackers also defaced sites of other countries to express their war support. Their broader opinions are also largely disparate, with far more support for Palestine and many objections expressed toward Israel.
Social media has become a crucial conduit for the swift dissemination of information during global crises. However, this also paves the way for the manipulation of narratives by malicious actors. This research delves into the interaction dynamics between coordinated (malicious) entities and organic (regular) users on Twitter amidst the Gaza conflict. Through the analysis of approximately 3.5 million tweets from over 1.3 million users, our study uncovers that coordinated users significantly impact the information landscape, successfully disseminating their content across the network: a substantial fraction of their messages is adopted and shared by organic users. Furthermore, the study documents a progressive increase in organic users' engagement with coordinated content, which is paralleled by a discernible shift towards more emotionally polarized expressions in their subsequent communications. These results highlight the critical need for vigilance and a nuanced understanding of information manipulation on social media platforms.
In remote sensing imagery, multi class change detection (MCD) is crucial for fine grained monitoring, yet it has long been constrained by complex scene variations and the scarcity of detailed annotations. To address this, we propose the Tripath DINO architecture, which adopts a three path complementary feature learning strategy to facilitate the rapid adaptation of pre trained foundation models to complex vertical domains. Specifically, we employ the DINOv3 pre trained model as the backbone feature extraction network to learn coarse grained features. An auxiliary path also adopts a siamese structure, progressively aggregating intermediate features from the siamese encoder to enhance the learning of fine grained features. Finally, a multi scale attention mechanism is introduced to augment the decoder network, where parallel convolutions adaptively capture and enhance contextual information under different receptive fields. The proposed method achieves optimal performance on the MCD task on both the Gaza facility damage assessment dataset (Gaza change) and the classic SECOND dataset. GradCAM visualizations further confirm that the main and auxiliary paths naturally focus on coarse gr
This short paper explores trends in extremist Facebook data from July 2023 to June 2024. We examined engagement, sentiment, and topics within Facebook groups categorized as anti-Israel/Semitic, anti-Palestine/Muslim, and anti-both, mapping these trends against five major events related to the recent Israel-Hamas conflict. Our findings support the hypothesis that shifts in trends correspond with these key events, showing varying patterns across different group categories. We observed decreased activity proportion in anti-both groups and increased activity proportion in the two one-sided hate groups at the conflict's onset. This pattern reversed after the Israeli troop withdrawal from Khan Yunis, Gaza. During the conflict, negative content proportion surged, and neutral content proportion fell in all the three group categories. Anti-Palestine/Muslim groups' discourses shifted from religious to social media activism and political/protest around the time the war began, while anti-Israel/Semitic groups moved from political/protest to religious topics a couple of weeks before the war.
Media bias detection has predominantly been framed as a classification task: assign a political label to an article or outlet. We argue this framing is too shallow: it identifies that bias exists but not where, how, or crucially, what is structurally omitted. We present NewsLens, a five-agent adversarial pipeline for structured news bias navigation. A Fact Verifier, Progressive Framing Analyst, Conservative Framing Analyst, Propaganda Detector, and Neutral Summarizer collaborate to deconstruct articles into interpretable framing maps, exposing ideological omissions, rhetorical manipulation, and framing boundaries. The system is evaluated on 15 articles across four geopolitical event clusters (India-Pakistan Kashmir, Gaza, Climate Policy, Ukraine) using Qwen2.5-3B-Instruct (4-bit quantised, Google Colab T4), with cross-model validation using Mistral 7B on the Kashmir cluster. Center outlets show the highest mean Perspective Divergence Score (PDS: Qwen 0.907, Mistral 0.729 on Kashmir subset); conservative-framing outlets show the highest mean Manipulation Index (MI: 0.600 across both models). Cross-model comparison shows high consistency for high-propaganda content (Republic World de
Social media has become a crucial arena for shaping public narratives during armed conflicts, providing space for both harmful and constructive communication. While hate speech and misinformation have been widely studied, expressions that promote resilience, solidarity, and optimism remain underexplored, particularly in Arabic contexts. This paper introduces AraHopeCorpus, the first annotated dataset of Arabic hope speech collected from ten thousand YouTube comments related to the war on Gaza between 2023 and 2024. Using a detailed annotation framework, comments were classified into three categories: hope speech, no hope speech, and neutral or unclear discourse. The dataset shows that hopeful language dominates, accounting for more than sixty four percent of all comments. These expressions of hope appear mainly as religious encouragement, collective solidarity, and optimism for endurance and justice. No hope speech, representing about thirteen percent, reflects despair and disillusionment, while the rest of the comments contain neutral or mixed content. Inter-Annotator Agreement reached substantial levels (Cohen's Kappa equals 0.71), though dialectal variation, sarcasm, and implici
Multilinguality is a core capability for modern foundation models, yet training high-quality multilingual models remains challenging due to uneven data availability across languages. A further challenge is the performance interference that can arise from joint multilingual training, commonly referred to as the "curse of multilinguality". We study multilingual data curation across thirteen languages and find that many reported regressions are not inherent to multilingual scaling but instead stem from correctable deficiencies in data quality and composition rather than fundamental capacity limits. In controlled bilingual experiments, improving data quality for any single language benefits others: curating English improves non-English performance in 12 of 13 languages, while curating non-English yields reciprocal improvements in English. Bespoke per-language curation produces substantially larger within-language improvements. Extending these findings to large-scale general-purpose training mixtures, we show that curated multilingual allocations comprising under 8% of total tokens remain remarkably effective. We operationalize this approach within an effort that produced a 20T-token pr
Empirical evaluation serves as the primary compass guiding research progress in foundation models. Despite a large body of work focused on training frontier vision-language models (VLMs), approaches to their evaluation remain nascent. To guide their maturation, we propose three desiderata that evaluations should satisfy: (1) faithfulness to the modality and application, (2) discriminability between models of varying quality, and (3) efficiency in compute. Through this lens, we identify critical failure modes that violate faithfulness and discriminability, misrepresenting model capabilities: (i) multiple-choice formats reward guessing, poorly reflect downstream use cases, and saturate early as models improve; (ii) blindly solvable questions, which can be answered without images, constitute up to 70% of some evaluations; and (iii) mislabeled or ambiguous samples compromise up to 42% of examples in certain datasets. Regarding efficiency, the computational burden of evaluating frontier models has become prohibitive: by some accounts, nearly 20% of development compute is devoted to evaluation alone. Rather than discarding existing benchmarks, we curate them via transformation and filter
Remote sensing change detection is pivotal for urban monitoring, disaster assessment, and environmental resource management. Yet, unimodal deep learning methods frequently confuse genuine semantic changes with visually similar but irrelevant variations. Recent multimodal approaches incorporate text as auxiliary supervision, but their descriptions are either semantically coarse and unstructured or model-generated and thus noisy. Critically, all of them overlook a simple fact: fine-grained change semantics are already implicitly encoded in the ground-truth mask labels that come standard with every change detection dataset. These masks know where the change happened, what the land-cover types were before and after, how the transition occurred, and how many objects were involved. In this paper, we propose S2M, a framework that obtains structured textual features directly from change labels at zero additional annotation cost. Specifically, each change region is automatically transcribed into a semantic quadruple (where, what, how, how many) and converted into several fixed-template text descriptions, providing precise, dense, and noise-free multimodal supervision. We adopts a two-stage
A growing body of work has shown that AI-assisted methods -- leveraging large language models, social choice methods, and collective dialogues -- can help navigate polarization and surface common ground in controlled lab settings. But what can these approaches contribute in real-world contexts? We present a case study applying these techniques to find common ground between Israeli and Palestinian peacebuilders in the period following October 7th, 2023. From April to July 2024 an iterative deliberative process combining LLMs, bridging-based ranking, and collective dialogues was conducted in partnership with the Alliance for Middle East Peace. Around 138 civil society peacebuilders participated including Israeli Jews, Palestinian citizens of Israel, and Palestinians from the West Bank and Gaza. The process resulted in a set of collective statements, including demands to world leaders, with at least 84% agreement from participants on each side. In this paper, we document the process, results, challenges, and important open questions.
Data curation has shifted the quality-compute frontier for language-model and contrastive image-text pretraining, but its role for vision-language models (VLMs) is far less established. We ask how far data curation alone can take VLM performance, holding architecture, training recipe, and compute fixed and varying only the training data. Our pipeline, applied to the MAmmoTH-VL single-image subset, lifts performance by +11.7pp on average across 20 public VLM benchmarks (spanning grounding, VQA, OCR/documents, captioning, spatial/3D, counting, charts, math, brand-ID, and multi-image reasoning) and by +11.3pp on average across all nine capability axes of DatBench, our high-fidelity VLM eval suite. At 2B, our curated model surpasses InternVL3.5-2B by 9.9pp at ~17x less training compute and closes the gap to Qwen3-VL-2B to within 1.8pp at ~87x less compute, from pretraining alone. Beyond accuracy, curation delivers four further properties: (1) Reliability: per-capability std across training seeds drops by ~67% and the lift survives a 4k-to-16k context-length sweep; (2) OOD generalization: the 9-eval OOD average rises by +7.2pp, and multi-image BLINK rises by +3.09pp despite single-image
The Israeli-Palestinian conflict started on 7 October 2023, have resulted thus far to over 48,000 people killed including more than 17,000 children with a majority from Gaza, more than 30,000 people injured, over 10,000 missing, and over 1 million people displaced, fleeing conflict zones. The infrastructure damage includes the 87\% of housing units, 80\% of public buildings and 60\% of cropland 17 out of 36 hospitals, 68\% of road networks and 87\% of school buildings damaged. This conflict has as well launched an online discussion across various social media platforms. Telegram was no exception due to its encrypted communication and highly involved audience. The current study will cover an analysis of the related discussion in relation to different participants of the conflict and sentiment represented in those discussion. To this end, we prepared a dataset of 125K messages shared on channels in Telegram spanning from 23 October 2025 until today. Additionally, we apply the same analysis in two publicly available datasets from Twitter containing 2001 tweets and from Reddit containing 2M opinions. We apply a volume analysis across the three datasets, entity extraction and then proce
This study investigates the classification of progressive rock music, a genre characterized by complex compositions and diverse instrumentation, distinct from other musical styles. Addressing this Music Information Retrieval (MIR) task, we extracted comprehensive audio features, including spectrograms, Mel-Frequency Cepstral Coefficients (MFCCs), chromagrams, and beat positions from song snippets using the Librosa library. A winner-take-all voting strategy was employed to aggregate snippet-level predictions into final song classifications. We conducted a comparative analysis of various machine learning techniques. Ensemble methods, encompassing Bagging (Random Forest, ExtraTrees, Bagging Classifier) and Boosting (XGBoost, Gradient Boosting), were explored, utilizing Principal Component Analysis (PCA) for dimensionality reduction to manage computational constraints with high-dimensional feature sets. Additionally, deep learning approaches were investigated, including the development of custom 1D Convolutional Neural Network (1D CNN) architectures (named "Zuck" and "Satya") featuring specific layer configurations, normalization, and activation functions. Furthermore, we fine-tuned a
Recent advances in large language model (LLM) pretraining have shown that simply scaling data quantity eventually leads to diminishing returns, hitting a data wall. In response, the use of synthetic data for pretraining has emerged as a promising paradigm for pushing the frontier of performance. Despite this, the factors affecting synthetic data quality remain poorly understood. In this work, we introduce BeyondWeb, a synthetic data generation framework that produces high-quality synthetic data for pretraining. BeyondWeb significantly extends the capabilities of traditional web-scale datasets, outperforming state-of-the-art synthetic pretraining datasets such as Cosmopedia and Nemotron-CC's high-quality synthetic subset (Nemotron-Synth) by up to 5.1 percentage points (pp) and 2.6pp, respectively, when averaged across a suite of 14 benchmark evaluations. It delivers up to 7.7x faster training than open web data and 2.7x faster than Nemotron-Synth. Remarkably, a 3B model trained for 180B tokens on BeyondWeb outperforms an 8B model trained for the same token budget on Cosmopedia. We also present several insights from BeyondWeb on synthetic data for pretraining: what drives its benefit