Music perception, a multi-sensory process based on the synesthesia effect, is an essential component of music aesthetic education. Understanding music structure helps both perception and aesthetic education. Music structure incorporates a range of information, the coordination of which forms the melody, just as different military actions cooperate to produce a military strategy. However, there are a few ways for assessing music perception from the perspectives of system operation and information management. In this paper, we explore the similarities between music structure and military strategy while creating the Music Clips Correlation Network (MCCN) based on Mel-frequency Cepstral Coefficients (MFCCs). The inspiration comes from the comparison between a concert conductor's musical score and a military war commander's sand table exercise. Specifically, we create MCCNs for various kinds of war movie soundtracks, then relate military tactics (Sun Tzu's Art of War, etc.) and political institutions to military operations networks. Our primary findings suggest a few similarities, implying that music perception and aesthetic education can be approached from a military strategy and manag
Military Large Language Models (LLMs) must provide accurate information to the warfighter in time-critical and dangerous situations. However, today's LLMs are imbued with safety behaviors that cause the LLM to refuse many legitimate queries in the military domain, particularly those related to violence, terrorism, or military technology. Our gold benchmark for assessing refusal rates, which was developed by veterans of the US Army and special forces, is to our knowledge the first dataset of its kind. We present results for refusal and deflection rates on 31 public models and 3 military models. We observe hard rejection rates as high as 98.2% and soft deflection rates ranging from 0% to 21.3%. We also present results on two additional synthetic datasets and show their correlations with the gold dataset. Finally, we perform abliteration using the Heretic library on a military-tuned gpt-oss-20b model, showing an absolute increase in answer rate of 66.5 points but an average relative decrease of 2% on other military tasks. In our concluding remarks, we argue for deeper specialization, including with mid-training and end-to-end post-training, to achieve zero refusals and maximum militar
Military human robot interaction (MHRI) presents a novel opportunity to blend the capabilities of autonomous and Artificial Intelligence (AI)-enabled systems with the skills and expertise of humans. The concept promises military advantages and greater operational effectiveness and efficiencies. However, the associated human-AI dynamics create challenges when attempting to design, implement, and operationalise the increasingly symbiotic relationship between humans and machines. Meaningful human control (MHC) is a popularised conceptualisation of what is deemed a responsible interaction among human and artificial agents; however, this notion falls short in military contexts and hinders the realisation of military advantages that could be achieved by advancing the adoption of responsible AI. This paper presents meaningful human command (MHC1) as a more operationally effective concept for advanced military command and control systems that embed AI-enabled autonomous systems. We introduce, explore, and unpack meaningful human command in the context of military human-robot interaction, presenting a vignette that offers a technologically feasible concept of an AI-enabled system within mil
We present EdgeRunner 20B, a fine-tuned version of gpt-oss-20b optimized for military tasks. EdgeRunner 20B was trained on 1.6M high-quality records curated from military documentation and websites. We also present four new tests sets: (a) combat arms, (b) combat medic, (c) cyber operations, and (d) mil-bench-5k (general military knowledge). On these military test sets, EdgeRunner 20B matches or exceeds GPT-5 task performance with 95%+ statistical significance, except for the high reasoning setting on the combat medic test set and the low reasoning setting on the mil-bench-5k test set. Versus gpt-oss-20b, there is no statistically-significant regression on general-purpose benchmarks like ARC-C, GPQA Diamond, GSM8k, IFEval, MMLU Pro, or TruthfulQA, except for GSM8k in the low reasoning setting. We also present analyses on hyperparameter settings, cost, and throughput. These findings show that small, locally-hosted models are ideal solutions for data-sensitive operations such as in the military domain, allowing for deployment in air-gapped edge devices.
Object detection is one of the key target tasks of interest in the context of civil and military applications. In particular, the real-world deployment of target detection methods is pivotal in the decision-making process during military command and reconnaissance. However, current domain adaptive object detection algorithms consider adapting one domain to another similar one only within the scope of natural or autonomous driving scenes. Since military domains often deal with a mixed variety of environments, detecting objects from multiple varying target domains poses a greater challenge. Several studies for armored military target detection have made use of synthetic aperture radar (SAR) data due to its robustness to all weather, long range, and high-resolution characteristics. Nevertheless, the costs of SAR data acquisition and processing are still much higher than those of the conventional RGB camera, which is a more affordable alternative with significantly lower data processing time. Furthermore, the lack of military target detection datasets limits the use of such a low-cost approach. To mitigate these issues, we propose to generate RGB-based synthetic data using a photoreali
Relays are pivotal in military communication networks, expanding coverage and ensuring reliable connectivity in challenging operational environments. While traditional terrestrial relays (TR) are constrained by fixed locations and vulnerability to physical obstructions, unmanned aerial vehicle (UAV)-mounted aerial relays (AR) offer a dynamic and flexible alternative by operating above obstacles and adapting to changing battlefield conditions. This paper provides a comprehensive survey of AR systems in military communications, presenting a detailed comparison between AR and TR paradigms and examining two specific AR technologies: active aerial relays (AAR) and aerial reconfigurable intelligent surface (ARIS) relays. The survey delves into their operation, benefits, challenges, and military applications, supported by a qualitative analysis across metrics such as coverage, flexibility, security, and cost. A novel multi-dimensional metric, the mission-critical relay effectiveness score (MCRES), is introduced as a quantitative method for evaluating relay suitability based on mission-specific weights for critical attributes like mobility, jamming resilience, deployment speed, stealth, co
Model Medicine is the science of understanding, diagnosing, treating, and preventing disorders in AI models, grounded in the principle that AI models -- like biological organisms -- have internal structures, dynamic processes, heritable traits, observable symptoms, classifiable conditions, and treatable states. This paper introduces Model Medicine as a research program, bridging the gap between current AI interpretability research (anatomical observation) and the systematic clinical practice that complex AI systems increasingly require. We present five contributions: (1) a discipline taxonomy organizing 15 subdisciplines across four divisions -- Basic Model Sciences, Clinical Model Sciences, Model Public Health, and Model Architectural Medicine; (2) the Four Shell Model (v3.3), a behavioral genetics framework empirically grounded in 720 agents and 24,923 decisions from the Agora-12 program, explaining how model behavior emerges from Core--Shell interaction; (3) Neural MRI (Model Resonance Imaging), a working open-source diagnostic tool mapping five medical neuroimaging modalities to AI interpretability techniques, validated through four clinical cases demonstrating imaging, compari
This white paper underscores the critical importance of responsibly deploying Artificial Intelligence (AI) in military contexts, emphasizing a commitment to ethical and legal standards. The evolving role of AI in the military goes beyond mere technical applications, necessitating a framework grounded in ethical principles. The discussion within the paper delves into ethical AI principles, particularly focusing on the Fairness, Accountability, Transparency, and Ethics (FATE) guidelines. Noteworthy considerations encompass transparency, justice, non-maleficence, and responsibility. Importantly, the paper extends its examination to military-specific ethical considerations, drawing insights from the Just War theory and principles established by prominent entities. In addition to the identified principles, the paper introduces further ethical considerations specifically tailored for military AI applications. These include traceability, proportionality, governability, responsibility, and reliability. The application of these ethical principles is discussed on the basis of three use cases in the domains of sea, air, and land. Methods of automated sensor data analysis, eXplainable AI (XAI)
This paper investigates the heterogeneous effects of military spending news shocks on household income and wealth inequality for a large, panel of advanced and emerging economies. Confirming prior literature, we find that military spending news shocks lead to persistent increases in aggregate output and Total Factor Productivity. Our primary contribution is documenting contrasting distributional impacts. We find that expansionary military spending is associated with a mitigation of income inequality, as income gains are disproportionately larger at the left tail of the distribution, primarily driven by a rise in labour income and employment in industry. Conversely, the shock is found to increase wealth inequality, particularly in high-income countries, by raising the wealth share of the top decile via effects on business asset holdings.
We show that the amount of foreign exchange reserves (FER) in the world in a given currency is highly correlated with the GDP and military spending of that country for a set of western economies during the last 20 years. Taking into account multicollinearity, Ridge and Lasso regressions reveal that the Foreign Exchange Reserve is better explained by military spending than GDP for seven western currencies. For each year shown, military spending is statistically significant more than the monetary instrument M2. Comparing the currency of the second world economy, the Chinese renminbi, is well beyond the western FER equilibrium, but yearly analysis shows that there is a steady trend towards a new FER balance. Next, we define a complex geopolitical network model in which the probability of switching to an alternative FER currency depends both on economic and political factors. Military spending is introduced into the model as an average share of GDP observed within the data. As the GDP of a particular country grows, so does the military power of a country. The nature of the creation of new currency networks initially depends only on geopolitical allegiance. As the volume of trade with a
The military environment generates a large amount of data of great importance, which makes necessary the use of machine learning for its processing. Its ability to learn and predict possible scenarios by analyzing the huge volume of information generated provides automatic learning and decision support. This paper aims to present a model of a machine learning architecture applied to a military organization, carried out and supported by a bibliometric study applied to an architecture model of a nonmilitary organization. For this purpose, a bibliometric analysis up to the year 2021 was carried out, making a strategic diagram and interpreting the results. The information used has been extracted from one of the main databases widely accepted by the scientific community, ISI WoS. No direct military sources were used. This work is divided into five parts: the study of previous research related to machine learning in the military world; the explanation of our research methodology using the SciMat, Excel and VosViewer tools; the use of this methodology based on data mining, preprocessing, cluster normalization, a strategic diagram and the analysis of its results to investigate machine lear
Artificial Intelligence (AI) plays a significant role in enhancing the capabilities of defense systems, revolutionizing strategic decision-making, and shaping the future landscape of military operations. Neuro-Symbolic AI is an emerging approach that leverages and augments the strengths of neural networks and symbolic reasoning. These systems have the potential to be more impactful and flexible than traditional AI systems, making them well-suited for military applications. This paper comprehensively explores the diverse dimensions and capabilities of Neuro-Symbolic AI, aiming to shed light on its potential applications in military contexts. We investigate its capacity to improve decision-making, automate complex intelligence analysis, and strengthen autonomous systems. We further explore its potential to solve complex tasks in various domains, in addition to its applications in military contexts. Through this exploration, we address ethical, strategic, and technical considerations crucial to the development and deployment of Neuro-Symbolic AI in military and civilian applications. Contributing to the growing body of research, this study represents a comprehensive exploration of the
Which factors determine AI's propensity to support military intervention? While the use of AI in high-stakes decision-making is growing exponentially, we still lack systematic analysis of the key drivers embedded in these models. This paper conducts a conjoint experiment in which large language models (LLMs) from leading providers (OpenAI, Anthropic, Google) are asked to decide on military intervention across 128 vignettes, with each vignette run 10 times. This design enables a systematic assessment of AI decision-making in military contexts. The results are remarkably consistent across models: all models place substantial weight on the probability of success and domestic support, prioritizing these factors over civilian casualties, economic shock, or international sanctions. The paper then tests whether LLMs are sensitive to context by introducing different motivations for intervention. The scoring is indeed context-dependent; however, probability of victory remains the most important factor in all scenarios. Finally, the paper evaluates numerical sensitivity and finds that models display some responsiveness to the scale of civilian casualties but no detectable sensitivity to the
In a time of rapidly evolving military threats and increasingly complex operational environments, the integration of AI into military operations proves significant advantages. At the same time, this implies various challenges and risks regarding building and deploying human-AI teaming systems in an effective and ethical manner. Currently, understanding and coping with them are often tackled from an external perspective considering the human-AI teaming system as a collective agent. Nevertheless, zooming into the dynamics involved inside the system assures dealing with a broader palette of relevant multidimensional responsibility, safety, and robustness aspects. To this end, this research proposes the design of a trustworthy co-learning model for human-AI teaming in military operations that encompasses a continuous and bidirectional exchange of insights between the human and AI agents as they jointly adapt to evolving battlefield conditions. It does that by integrating four dimensions. First, adjustable autonomy for dynamically calibrating the autonomy levels of agents depending on aspects like mission state, system confidence, and environmental uncertainty. Second, multi-layered con
It is beyond dispute that the potential benefits of artificial intelligence (AI) in military intelligence are considerable. Nevertheless, it remains uncertain precisely how AI can enhance the analysis of military data. The aim of this study is to address this issue. To this end, the AI demonstrator deepCOM was developed in collaboration with the start-up Aleph Alpha. The AI functions include text search, automatic text summarization and Named Entity Recognition (NER). These are evaluated for their added value in military analysis. It is demonstrated that under time pressure, the utilization of AI functions results in assessments clearly superior to that of the control group. Nevertheless, despite the demonstrably superior analysis outcome in the experimental group, no increase in confidence in the accuracy of their own analyses was observed. Finally, the paper identifies the limitations of employing AI in military intelligence, particularly in the context of analyzing ambiguous and contradictory information.
In this paper, military use cases or applications and implementation thereof are considered for natural language processing and large language models, which have broken into fame with the invention of the generative pre-trained transformer (GPT) and the extensive foundation model pretraining done by OpenAI for ChatGPT and others. First, we interrogate a GPT-based language model (viz. Microsoft Copilot) to make it reveal its own knowledge about their potential military applications and then critically assess the information. Second, we study how commercial cloud services (viz. Microsoft Azure) could be used readily to build such applications and assess which of them are feasible. We conclude that the summarization and generative properties of language models directly facilitate many applications at large and other features may find particular uses.
AI has made significant strides recently, leading to various applications in both civilian and military sectors. The military sees AI as a solution for developing more effective and faster technologies. While AI offers benefits like improved operational efficiency and precision targeting, it also raises serious ethical and legal concerns, particularly regarding human rights violations. Autonomous weapons that make decisions without human input can threaten the right to life and violate international humanitarian law. To address these issues, we propose a three-stage framework (Design, In Deployment, and During/After Use) for evaluating human rights concerns in the design, deployment, and use of military AI. Each phase includes multiple components that address various concerns specific to that phase, ranging from bias and regulatory issues to violations of International Humanitarian Law. By this framework, we aim to balance the advantages of AI in military operations with the need to protect human rights.
With the increasing interest in deploying Artificial Intelligence in medicine, we previously introduced HAIM (Holistic AI in Medicine), a framework that fuses multimodal data to solve downstream clinical tasks. However, HAIM uses data in a task-agnostic manner and lacks explainability. To address these limitations, we introduce xHAIM (Explainable HAIM), a novel framework leveraging Generative AI to enhance both prediction and explainability through four structured steps: (1) automatically identifying task-relevant patient data across modalities, (2) generating comprehensive patient summaries, (3) using these summaries for improved predictive modeling, and (4) providing clinical explanations by linking predictions to patient-specific medical knowledge. Evaluated on the HAIM-MIMIC-MM dataset, xHAIM improves average AUC from 79.9% to 90.3% across chest pathology and operative tasks. Importantly, xHAIM transforms AI from a black-box predictor into an explainable decision support system, enabling clinicians to interactively trace predictions back to relevant patient data, bridging AI advancements with clinical utility.
This study proposes a systematic non-kinetic deterrence path modeling framework based on strategic rare earth supply cut-off, aiming to assess the strategic effects of China's export control policy against the United States at the military system level. The model adopts a four-layer structure of "policy input -- resource node -- equipment system -- capability output" and integrates path dependency modeling, degradation function design, and capability lag prediction mechanisms to form a strategic simulation system. The study incorporates graph neural networks and LSTM-based time series methods to dynamically evaluate the impact of rare earth supply disruption on key U.S. military platforms such as the F-35 fighter, nuclear submarines, and AI combat systems, identifying critical path nodes and strategic timing windows. Results indicate that a ten-year zero-tolerance policy on rare earth exports would lead to a significant technological disconnect between years 3 to 5 and a systemic capability lag between years 8 to 12, with an estimated average annual economic impact of 35 to 40 billion USD. These findings demonstrate that rare earth export cut-offs can serve as a structural strategi
The success of precision medicine requires computational models that can effectively process and interpret diverse physiological signals across heterogeneous patient populations. While foundation models have demonstrated remarkable transfer capabilities across various domains, their effectiveness in handling individual-specific physiological signals - crucial for precision medicine - remains largely unexplored. This work introduces a systematic pipeline for rapidly and efficiently evaluating foundation models' transfer capabilities in medical contexts. Our pipeline employs a three-stage approach. First, it leverages physiological simulation software to generate diverse, clinically relevant scenarios, particularly focusing on data-scarce medical conditions. This simulation-based approach enables both targeted capability assessment and subsequent model fine-tuning. Second, the pipeline projects these simulated signals through the foundation model to obtain embeddings, which are then evaluated using linear methods. This evaluation quantifies the model's ability to capture three critical aspects: physiological feature independence, temporal dynamics preservation, and medical scenario d