The Open Radio Access Network (Open RAN) paradigm, and its reference architecture proposed by the O-RAN Alliance, is paving the way toward open, interoperable, observable and truly intelligent cellular networks. Crucial to this evolution is Machine Learning (ML), which will play a pivotal role by providing the necessary tools to realize the vision of self-organizing O-RAN systems. However, to be actionable, ML algorithms need to demonstrate high reliability, effectiveness in delivering high performance, and the ability to adapt to varying network conditions, traffic demands and performance requirements. To address these challenges, in this paper we propose a novel Deep Reinforcement Learning (DRL) agent design for O-RAN applications that can learn control policies under varying Service Level Agreement (SLAs) with heterogeneous minimum performance requirements. We focus on the case of RAN slicing and SLAs specifying maximum tolerable end-to-end latency levels. We use the OpenRAN Gym open-source environment to train a DRL agent that can adapt to varying SLAs and compare it against the state-of-the-art. We show that our agent maintains a low SLA violation rate that is 8.3x and 14.4x l
Serverless query processing has become increasingly popular due to its auto-scaling, high elasticity, and pay-as-you-go pricing. It allows cloud data warehouse (or lakehouse) users to focus on data analysis without the burden of managing systems and resources. Accordingly, in serverless query services, users become more concerned about cost-efficiency under acceptable performance than performance under fixed resources. This poses new challenges for serverless query engine design in providing flexible performance service-level agreements (SLAs) and cost-efficiency (i.e., prices). In this paper, we first define the problem of flexible performance SLAs and prices in serverless query processing and discuss its significance. Then, we envision the challenges and solutions for solving this problem and the opportunities it raises for other database research. Finally, we present PixelsDB, an open-source prototype with three service levels supported by dedicated architectural designs. Evaluations show that PixelsDB reduces resource costs by 65.5\% for near-real-world workloads generated by Cloud Analytics Benchmark (CAB) while not violating the pending time guarantees.
This paper proposes a hierarchical autonomous vehicle navigation architecture, composed of a high-level speed and lane advisory system (SLAS) coupled with low-level trajectory generation and trajectory following modules. Specifically, we target a multi-lane highway driving scenario where an autonomous ego vehicle navigates in traffic. We propose a novel receding horizon mixed-integer optimization based method for SLAS with the objective to minimize travel time while accounting for passenger comfort. We further incorporate various modifications in the proposed approach to improve the overall computational efficiency and achieve real-time performance. We demonstrate the efficacy of the proposed approach in contrast to the existing methods, when applied in conjunction with state-of-the-art trajectory generation and trajectory following frameworks, in a CARLA simulation environment.
In this paper, we examine the problem of a single provider offering multiple types of service level agreements, and the implications thereof. In doing so, we propose a simple model for machine-readable service level agreements (SLAs) and outline specifically how these machine-readable SLAs can be constructed and injected into cloud infrastructures - important for next-generation cloud systems as well as customers. We then computationally characterize the problem, establishing the importance of both verification and solution, showing that in the general case injecting policies into cloud infrastructure is NP-Complete, though the problem can be made more tractable by further constraining SLA representations and using approximation techniques.
The evolution toward 6G networks increasingly relies on network slicing to provide tailored, End-to-End (E2E) logical networks over shared physical infrastructures. A critical challenge is effectively decomposing E2E Service Level Agreements (SLAs) into domain-specific SLAs, which current solutions handle through computationally intensive, iterative optimization processes that incur substantial latency and complexity. To address this, we introduce Casformer, a cascaded Transformer architecture designed for fast, optimization-free SLA decomposition. Casformer leverages historical domain feedback encoded through domain-specific Transformer encoders in its first layer, and integrates cross-domain dependencies using a Transformer-based aggregator in its second layer. The model is trained under a learning paradigm inspired by Domain-Informed Neural Networks (DINNs), incorporating risk-informed modeling and amortized optimization to learn a stable, forward-only SLA decomposition policy. Extensive evaluations demonstrate that Casformer achieves improved SLA decomposition quality against state-of-the-art optimization-based frameworks, while exhibiting enhanced scalability and robustness un
Next-generation networks increasingly rely on network slices - logical networks tailored to specific application requirements, each with distinct Service-Level Agreements (SLAs). Ensuring compliance with these SLAs requires continuous, real-time monitoring of end-to-end performance metrics for each slice, within a limited telemetry budget. However, we find that existing solutions face two fundamental limitations: they either lack end-to-end visibility (e.g., sketches, probabilistic sampling) or provide visibility but lack the control mechanisms to dynamically allocate monitoring resources according to slice SLAs. We address this through a formal framework that reframes slice monitoring as a closed-loop control problem, and defines the minimal data plane requirements for SLA-aware slice monitoring via a telemetry primitive contract. We then present SliceScope, a realization of this framework that combines: (1) a control strategy that dynamically allocates the monitoring resources across diverse slices according to their SLA criticality, and (2) a data-plane based on change-triggered INT that provides per-packet end-to-end visibility with tunable accuracy-overhead trade-offs, satisfy
The convergence of AI and 6G network automation introduces new challenges in maintaining transparency, fairness, and accountability across multivendor management systems. Although closed-loop AI orchestration improves adaptability and self-optimization, it also creates a responsibility gap, where violations of SLAs cannot be causally attributed to specific agents or vendors. This paper presents a hybrid responsible AI-stochastic learning framework that embeds fairness, robustness, and auditability directly into the network control loop. The framework integrates RAI games with stochastic optimization, enabling dynamic adversarial reweighting and probabilistic exploration across heterogeneous vendor domains. An RAAP continuously records AI-driven decision trajectories and produces dual accountability reports: user-level SLA summaries and operator-level responsibility analytics. Experimental evaluations on synthetic two-class multigroup datasets demonstrate that the proposed hybrid model improves the accuracy of the worst group by up to 10.5\%. Specifically, hybrid RAI achieved a WGAcc of 60.5\% and an AvgAcc of 72.7\%, outperforming traditional RAI-GA (50.0\%) and ERM (21.5\%). The a
The integration of Non-Terrestrial Networks (NTN) with Terrestrial Networks (TN) is a key enabler for resilient 5G-Advanced and future 6G backhaul infrastructures. However, managing traffic across these highly asymmetric links remains a significant routing challenge, as systems must support heterogeneous network slices with conflicting service-level agreements (SLAs) while selectively utilizing costly NTN resources. This paper presents a computationally lightweight SLA-aware traffic-steering framework for a hybrid TN-NTN backhaul that models the load-balancing problem as an exact potential game. This mathematical foundation inherently enables decentralized coordination between uplink and downlink load-balancing agents without control-message overhead. By formulating traffic steering as a coupled optimization problem, per-slice (or per-user group) traffic fractions are dynamically distributed across terrestrial and satellite paths based on utility functions that capture throughput, latency, packet loss, and SLA penalties. The resulting game admits a pure Nash equilibrium, ensuring stable and predictable traffic adaptation under non-stationary load conditions. The framework is evalua
Service level agreements (SLAs) in data center colocation contracts define precise thresholds for power, temperature, and humidity, with tiered violation penalties expressed as credits against monthly recurring charges. Traditional reactive monitoring detects breaches only after they occur, limiting remediation opportunities. We present a framework that encodes SLA rules as structured JSON objects to generate training data without manual annotation. We train a per-customer multi-head transformer model in which each attention head specializes in one SLA rule, learning temporal dependencies that precede violations by 30 minutes. Post-training, the inference service emits structured prediction events transformed into three role-specific views: finance schemas exposing credit liability, operations schemas surfacing risk scores and recommended interventions, and compliance schemas bundling predictions with immutable telemetry signatures for audit. By aligning model architecture directly with contractual obligations, this framework enables operators to anticipate SLA breaches, prioritize corrective actions, and minimize financial penalties.
The evolution of 6G envisions a wide range of applications and services characterized by highly differentiated and stringent Quality of Service (QoS) requirements. Open Radio Access Network (O-RAN) technology has emerged as a transformative approach that enables intelligent software-defined management of the RAN. A cornerstone of O-RAN is the RAN Intelligent Controller (RIC), which facilitates the deployment of intelligent applications (xApps and rApps) near the radio unit. In this context, QoS management through O-RAN has been explored using network slice and machine learning (ML) techniques. Although prior studies have demonstrated the ability to optimize RAN resource allocation and prioritize slices effectively, they have not considered the critical integration of Service Level Agreements (SLAs) into the ML learning process. This omission can lead to suboptimal resource utilization and, in many cases, service outages when target Key Performance Indicators (KPIs) are not met. This work introduces RSLAQ, an innovative xApp designed to ensure robust QoS management for RAN slicing while incorporating SLAs directly into its operational framework. RSLAQ translates operator policies in
Cloud computing has been consolidated as a support for the vast majority of current and emerging technologies. However, there are some barriers that prevent the exploitation of the full potential of this technology. First, the major cloud providers currently put the onus of implementing the mechanisms that ensure compliance with the desired service levels on cloud consumers. However, consumers do not have the required expertise. Since each cloud provider exports a different set of low-level metrics, the strategies defined to ensure compliance with the established service-level agreement (SLA) are bound to a particular cloud provider. This fosters provider lock-in and prevents consumers from benefiting from the advantages of multi-cloud environments. This paper presents a solution to the problem of automatically translating SLAs into objectives expressed as metrics that can be measured across multiple cloud providers. First, we propose an intelligent knowledge-based system capable of automatically translating high-level SLAs defined by cloud consumers into a set of conditions expressed as vendor-neutral metrics, providing feedback to cloud consumers (intelligent tutoring system). Se
The emergence of the fifth generation (5G) technology has transformed mobile networks into multi-service environments, necessitating efficient network slicing to meet diverse Service Level Agreements (SLAs). SLA decomposition across multiple network domains, each potentially managed by different service providers, poses a significant challenge due to limited visibility into real-time underlying domain conditions. This paper introduces Risk-Aware Iterated Local Search (RAILS), a novel risk model-driven meta-heuristic framework designed to jointly address SLA decomposition and service provider selection in multi-domain networks. By integrating online risk modeling with iterated local search principles, RAILS effectively navigates the complex optimization landscape, utilizing historical feedback from domain controllers. We formulate the joint problem as a Mixed-Integer Nonlinear Programming (MINLP) problem and prove its NP-hardness. Extensive simulations demonstrate that RAILS achieves near-optimal performance, offering an efficient, real-time solution for adaptive SLA management in modern multi-domain networks.
Cloud computing offers on-demand resource access, regulated by Service-Level Agreements (SLAs) between consumers and Cloud Service Providers (CSPs). SLA violations can impact efficiency and CSP profitability. In this work, we propose an SLA-aware automated algorithm-selection framework for combinatorial optimization problems in resource-constrained cloud environments. The framework uses an ensemble of machine learning models to predict performance and rank algorithm-hardware pairs based on SLA constraints. We also apply our framework to the 0-1 knapsack problem. We curate a dataset comprising instance specific features along with memory usage, runtime, and optimality gap for 6 algorithms. As an empirical benchmark, we evaluate the framework on both classification and regression tasks. Our ablation study explores the impact of hyperparameters, learning approaches, and large language models effectiveness in regression, and SHAP-based interpretability.
Next-generation (NextG) cellular networks are designed to support emerging applications with diverse data rate and latency requirements, such as immersive multimedia services and large-scale Internet of Things deployments. A key enabling mechanism is radio access network (RAN) slicing, which dynamically partitions radio resources into virtual resource blocks to efficiently serve heterogeneous traffic classes, including enhanced mobile broadband (eMBB), massive machine-type communications (mMTC), and ultra-reliable low-latency communications (URLLC). In this paper, we study the impact of adversarial attacks on AI-driven RAN slicing decisions, where a budget-constrained adversary selectively jams slice transmissions to bias deep reinforcement learning (DRL)-based resource allocation, and quantify the resulting service level agreement (SLA) violations and post-attack recovery behavior. Our results indicate that budget-constrained adversarial jamming can induce severe and slice-dependent steady-state SLA violations. Moreover, the DRL agent's reward converges toward the clean baseline only after a non-negligible recovery period.
Network slicing plays a crucial role in realizing 5G/6G advances, enabling diverse Service Level Agreement (SLA) requirements related to latency, throughput, and reliability. Since network slices are deployed end-to-end (E2E), across multiple domains including access, transport, and core networks, it is essential to efficiently decompose an E2E SLA into domain-level targets, so that each domain can provision adequate resources for the slice. However, decomposing SLAs is highly challenging due to the heterogeneity of domains, dynamic network conditions, and the fact that the SLA orchestrator is oblivious to the domain's resource optimization. In this work, we propose Odin, a Bayesian Optimization-based solution that leverages each domain's online feedback for provably-efficient SLA decomposition. Through theoretical analyses and rigorous evaluations, we demonstrate that Odin's E2E orchestrator can achieve up to 45% performance improvement in SLA satisfaction when compared with baseline solutions whilst reducing overall resource costs even in the presence of noisy feedback from the individual domains.
A recent line of research on spoken language assessment (SLA) employs neural models such as BERT and wav2vec 2.0 (W2V) to evaluate speaking proficiency across linguistic and acoustic modalities. Although both models effectively capture features relevant to oral competence, each exhibits modality-specific limitations. BERT-based methods rely on ASR transcripts, which often fail to capture prosodic and phonetic cues for SLA. In contrast, W2V-based methods excel at modeling acoustic features but lack semantic interpretability. To overcome these limitations, we propose a system that integrates W2V with Phi-4 multimodal large language model (MLLM) through a score fusion strategy. The proposed system achieves a root mean square error (RMSE) of 0.375 on the official test set of the Speak & Improve Challenge 2025, securing second place in the competition. For comparison, the RMSEs of the top-ranked, third-ranked, and official baseline systems are 0.364, 0.384, and 0.444, respectively.
In Diffusion Transformer (DiT) models, particularly for video generation, attention latency is a major bottleneck due to the long sequence length and the quadratic complexity. We find that attention weights can be separated into two parts: a small fraction of large weights with high rank and the remaining weights with very low rank. This naturally suggests applying sparse acceleration to the first part and low-rank acceleration to the second. Based on this finding, we propose SLA (Sparse-Linear Attention), a trainable attention method that fuses sparse and linear attention to accelerate diffusion models. SLA classifies attention weights into critical, marginal, and negligible categories, applying O(N^2) attention to critical weights, O(N) attention to marginal weights, and skipping negligible ones. SLA combines these computations into a single GPU kernel and supports both forward and backward passes. With only a few fine-tuning steps using SLA, DiT models achieve a 20x reduction in attention computation, resulting in significant acceleration without loss of generation quality. Experiments show that SLA reduces attention computation by 95% without degrading end-to-end generation qua
Edge computing decentralizes computing resources, allowing for novel applications in domains such as the Internet of Things (IoT) in healthcare and agriculture by reducing latency and improving performance. This decentralization is achieved through the implementation of microservice architectures, which require low latencies to meet stringent service level agreements (SLA) such as performance, reliability, and availability metrics. While cloud computing offers the large data storage and computation resources necessary to handle peak demands, a hybrid cloud and edge environment is required to ensure SLA compliance. This is achieved by sophisticated orchestration strategies such as Kubernetes, which help facilitate resource management. The orchestration strategies alone do not guarantee SLA adherence due to the inherent delay of scaling resources. Existing auto-scaling algorithms have been proposed to address these challenges, but they suffer from performance issues and configuration complexity. In this paper, a novel auto-scaling algorithm is proposed for SLA-constrained edge computing applications. This approach combines a Machine Learning (ML) based proactive auto-scaling algorith
Code Large Language Models (CodeLLMs) are increasingly integrated into modern software development workflows, yet efficiently serving them in resource-constrained, self-hosted environments remains a significant challenge. Existing LLM serving systems employs Continuous Batching for throughput improvement. However, they rely on static batch size configurations that cannot adapt to fluctuating request rates or heterogeneous workloads, leading to frequent SLA (Service Level Agreement) violations and unstable performance. In this study, We propose SABER, a dynamic batching strategy that predicts per-request SLA feasibility and adjusts decisions in real time. SABER improves goodput by up to 26% over the best static configurations and reduces latency variability by up to 45%, all without manual tuning or service restarts. Our results demonstrate that SLA-aware, adaptive scheduling is key to robust, high-performance CodeLLM serving.
An Edge-Cloud Continuum integrates edge and cloud resources to provide a flexible and scalable infrastructure. This paradigm can minimize latency by processing data closer to the source at the edge while leveraging the vast computational power of the cloud for more intensive tasks. In this context, module application placement requires strategic allocation plans that align user demands with infrastructure constraints, aiming for efficient resource use. Therefore, we propose Tetris, an application placement strategy that utilizes a heuristic algorithm to distribute computational services across edge and cloud resources efficiently. Tetris prioritizes services based on SLA urgencies and resource efficiency to avoid system overloading. Our results demonstrate that Tetris reduces SLA violations by approximately 76% compared to the baseline method, which serves as a reference point for benchmarking performance in this scenario. Therefore, Tetris offers an effective placement approach for managing latency-sensitive applications in Edge-Cloud Continuum environments, enhancing Quality of Service (QoS) for users.