Model Context Protocol (MCP) servers enable AI applications to connect to external systems in a plug-and-play manner, but their rapid proliferation also introduces severe security risks. Unlike mature software ecosystems with rigorous vetting, MCP servers still lack standardized review mechanisms, giving adversaries opportunities to distribute malicious implementations. Despite this pressing risk, the security implications of MCP servers remain underexplored. To address this gap, we present the first systematic study that treats MCP servers as active threat actors and decomposes them into core components to examine how adversarial developers can implant malicious intent. Specifically, we investigate three research questions: (i) what types of attacks malicious MCP servers can launch, (ii) how vulnerable MCP hosts and Large Language Models (LLMs) are to these attacks, and (iii) how feasible it is to carry out MCP server attacks in practice. Our study proposes a component-based taxonomy comprising twelve attack categories. For each category, we develop Proof-of-Concept (PoC) servers and demonstrate their effectiveness across diverse real-world host-LLM settings. We further show that
Placement of edge servers is the prerequisite of provisioning edge computing services for Internet of Vehicles (IoV). Fixed-site edge servers at Road Side Units (RSUs) or base stations are able to offer basic service coverage for end users, i.e., vehicles on road. However, the server locations and capacity are fixed after deployment, rendering their inefficiency in handling spationtemporal user dynamics. Mobile servers such as buses, on the other hand, have the potential of adding computation elasticity to such system. To this end, this paper studies the feasibility of bus-mounted edge servers based on real traces. First, we investigate the coverage of the buses and base stations using the Shanghai bus/taxi/Telecom datasets, which shows a great potential of bus-based edge servers as they cover a great portion of geographic area and demand points. Next, we build a mathematical model and design a simple greedy heuristic algorithm to select a limited number of buses that maximizes the coverage of demand points, i.e., with a limited purchase budget. We perform trace-driven simulations to verify the performance of the proposed bus selection algorithm. The results show that our approach
The Language Server Protocol (LSP) has revolutionized the integration of code intelligence in modern software development. There are approximately 300 LSP server implementations for various languages and 50 editors offering LSP integration. However, the reliability of LSP servers is a growing concern, as crashes can disable all code intelligence features and significantly impact productivity, while vulnerabilities can put developers at risk even when editing untrusted source code. Despite the widespread adoption of LSP, no existing techniques specifically target LSP server testing. To bridge this gap, we present LSPFuzz, a grey-box hybrid fuzzer for systematic LSP server testing. Our key insight is that effective LSP server testing requires holistic mutation of source code and editor operations, as bugs often manifest from their combinations. To satisfy the sophisticated constraints of LSP and effectively explore the input space, we employ a two-stage mutation pipeline: syntax-aware mutations to source code, followed by context-aware dispatching of editor operations. We evaluated LSPFuzz on four widely used LSP servers. LSPFuzz demonstrated superior performance compared to baseline
With the rise of LLMs, a large number of Model Context Protocol (MCP) services have emerged since the end of 2024. However, the effectiveness and efficiency of MCP servers have not been well studied. To study these questions, we propose an evaluation framework, called MCPBench. We selected several widely used MCP server and conducted an experimental evaluation on their accuracy, time, and token usage. Our experiments showed that the most effective MCP, Bing Web Search, achieved an accuracy of 64%. Importantly, we found that the accuracy of MCP servers can be substantially enhanced by involving declarative interface. This research paves the way for further investigations into optimized MCP implementations, ultimately leading to better AI-driven applications and data retrieval solutions.
A service system with multiple types of customers, arriving as Poisson processes, is considered. The system has infinite number of servers, ranked by $1,2,3, \ldots$; a server rank is its ``location." Each customer has an independent exponentially distributed service time, with the mean determined by its type. Multiple customers (possibly of different types) can be placed for service into one server, subject to ``packing'' constraints. Service times of different customers are independent, even if served simultaneously by the same server. The large-scale asymptotic regime is considered, such that the mean number of customers $r$ goes to infinity. We seek algorithms with the underlying objective of minimizing the location (rank) $U$ of the right-most (highest ranked) occupied (non-empty) server. Therefore, this objective seeks to minimize the total number $Q$ of occupied servers {\em and} keep the set of occupied servers as far at the ``left'' as possible, i.e., keep $U$ close to $Q$. In previous work, versions of {\em Greedy Random} (GRAND) algorithm have been shown to asymptotically minimize $Q/r$ as $r\to\infty$. In this paper we show that when these algorithms are combined with t
In most service systems, the servers are humans who desire to experience a certain level of idleness. In call centers, this manifests itself as the call avoidance behavior, where servers strategically adjust their service rate to strike a balance between the idleness they receive and effort to work harder. Moreover, being humans, each server values this trade-off differently and has different capabilities. Drawing ideas on mean-field games we develop a novel framework relying on measure-valued processes to simultaneously address strategic server behavior and inherent server heterogeneity in service systems. This framework enables us to extend the recent literature on strategic servers in four new directions by: (i) incorporating individual choices of servers, (ii) incorporating individual abilities of servers, (iii) modeling the discomfort experienced by servers due to low levels of idleness, and (iv) considering more general routing policies. Using our framework, we are able to asymptotically characterize asymmetric Nash equilibria for many-server systems with strategic servers. In simpler cases, it has been shown that the purely quality-driven regime is asymptotically optimal. Ho
Private information retrieval (PIR) schemes (with or without colluding servers) have been proposed for realistic coded distributed data storage systems. Star product PIR schemes with colluding servers for general coded distributed storage system were constructed over general finite fields by R. Freij-Hollanti, O. W. Gnilke, C. Hollanti and A. Karpuk in 2017. These star product PIR schemes with colluding servers are suitable for the storage of files over small fields and can be constructed for coded distributed storage system with large number of servers. In this paper for an efficient storage code, the problem to find good retrieval codes is considered. In general if the storage code is a binary Reed-Muller code the retrieval code needs not to be a binary Reed-Muller code in general. It is proved that when the storage code contains some special codewords, nonzero retrieval rate star product PIR schemes with colluding servers can only protect against small number of colluding servers. We also give examples to show that when the storage code is a good cyclic code, the best choice of the retrieval code is not cyclic in general. Therefore in the design of star product PIR schemes with
We consider large fluctuations, namely overload of servers, in a network with dynamic routing of messages. The servers form a circle. The number of input flows is equal to the number of servers, the messages of any flow are distributed between two neighboring servers, upon its arrival a message is directed to the least loaded of these servers. Under the condition that at least two servers are overloaded the number of overloaded servers in such network depends on the rate of input flows. In particular there exists critical level of input rate that in case of higher rate most probable that all servers are overloaded.
We study the capacity of quantum private information retrieval (QPIR) with multiple servers. In the QPIR problem with multiple servers, a user retrieves a classical file by downloading quantum systems from multiple servers each of which contains the copy of a classical file set while the identity of the downloaded file is not leaked to each server. The QPIR capacity is defined as the maximum rate of the file size over the whole dimension of the downloaded quantum systems. When the servers are assumed to share prior entanglement, we prove that the QPIR capacity with multiple servers is 1 regardless of the number of servers and files. We construct a rate-one protocol only with two servers. This capacity-achieving protocol outperforms its classical counterpart in the sense of capacity, server secrecy, and upload cost. The strong converse bound is derived concisely without using any secrecy condition. We also prove that the capacity of multi-round QPIR is 1.
Speed scaling for a tandem server setting is considered, where there is a series of servers, and each job has to be processed by each of the servers in sequence. Servers have a variable speed, their power consumption being a convex increasing function of the speed. We consider the worst case setting as well as the stochastic setting. In the worst case setting, the jobs are assumed to be of unit size with arbitrary (possibly adversarially determined) arrival instants. For this problem, we devise an online speed scaling algorithm that is constant competitive with respect to the optimal offline algorithm that has non-causal information. The proposed algorithm, at all times, uses the same speed on all active servers, such that the total power consumption equals the number of outstanding jobs. In the stochastic setting, we consider a more general tandem network, with a parallel bank of servers at each stage. In this setting, we show that random routing with a simple gated static speed selection is constant competitive. In both cases, the competitive ratio depends only on the power functions, and is independent of the workload and the number of servers.
Renting servers in the cloud is a generalization of the bin packing problem, motivated by job allocation to servers in cloud computing applications. Jobs arrive in an online manner, and need to be assigned to servers; their duration and size are known at the time of arrival. There is an infinite supply of identical servers, each having one unit of computational capacity per unit of time. A server can be rented at any time and continues to be rented until all jobs assigned to it finish. The cost of an assignment is the sum of durations of rental periods of all servers. The goal is to assign jobs to servers to minimize the overall cost while satisfying server capacity constraints. We focus on analyzing two natural algorithms, NextFit and FirstFit, for the case of jobs of equal duration. It is known that the competitive ratio of NextFit and FirstFit are at most 3 and 4 respectively for this case. We prove a tight bound of 2 on the competitive ratio of NextFit. For FirstFit, we establish a lower bound of 2.519 on the competitive ratio, even when jobs have only two distinct arrival times. For the case when jobs have arrival times 0 and 1 and duration 2, we show a lower bound of 1.89 and
Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable---e.g., for rapidly evaluating new model designs---they often come with significantly higher monetary costs due to sublinear scalability. In this paper, we investigate the feasibility of using training clusters composed of cheaper transient GPU servers to get the benefits of distributed training without the high costs. We conduct the first large-scale empirical analysis, launching more than a thousand GPU servers of various capacities, aimed at understanding the characteristics of transient GPU servers and their impact on distributed training performance. Our study demonstrates the potential of transient servers with a speedup of 7.7X with more than 62.9% monetary savings for some cluster configurations. We also identify a number of important challenges and opportunities for redesigning distributed training frameworks to be transient-aware. For example, the dynamic cost and availability characteristics of transient servers suggest the need for frameworks to dynamically change cl
This paper introduces a new resource allocation problem in distributed computing called distributed serving with mobile servers (DSMS). In DSMS, there are $k$ identical mobile servers residing at the processors of a network. At arbitrary points of time, any subset of processors can invoke one or more requests. To serve a request, one of the servers must move to the processor that invoked the request. Resource allocation is performed in a distributed manner since only the processor that invoked the request initially knows about it. All processors cooperate by passing messages to achieve correct resource allocation. They do this with the goal to minimize the communication cost. Routing servers in large-scale distributed systems requires a scalable location service. We introduce the distributed protocol GNN that solves the DSMS problem on overlay trees. We prove that GNN is starvation-free and correctly integrates locating the servers and synchronizing the concurrent access to servers despite asynchrony, even when the requests are invoked over time. Further, we analyze GNN for "one-shot" executions, i.e., all requests are invoked simultaneously. We prove that when running GNN on top o
We study the steady-state performance of parallel-server systems under an immediate routing architecture with two sources of heterogeneity: servers and job classes, subject to compatibility constraints. We focus on the weighted-workload-task-allocation (WWTA) policy, a load-balancing scheme known to be throughput-optimal for such systems. Under a relaxed complete-resource-pooling (CRP) condition, we prove a "strong form" of state-space collapse in heavy traffic and that the scaled workload of each server converges in distribution to an exponential random variable, whose parameter is explicitly given by system primitives. Our analysis yields three main insights. First, the conventional heavy-traffic requirement of a unique static allocation plan can be dropped; a relaxed CRP condition suffices. Second, the limiting workload distribution is shown to be independent of local scheduling policy on server side, allowing substantial flexibility. Third, the inefficient (non-basic) activities prescribed by static allocation plan is proved to receive an asymptotically negligible fraction of routing and service, even though WWTA has no prior knowledge of which activities are basic, highlightin
To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management---a key concern in distributed training---, but can induce severe communication overhead. To reduce communication overhead, distributed machine learning algorithms use techniques to increase parameter access locality (PAL), achieving up to linear speed-ups. We found that existing parameter servers provide only limited support for PAL techniques, however, and therefore prevent efficient training. In this paper, we explore whether and to what extent PAL techniques can be supported, and whether such support is beneficial. We propose to integrate dynamic parameter allocation into parameter servers, describe an efficient implementation of such a parameter server called Lapse, and experimentally compare its performance to existing parameter servers across a number of machine learning tasks. We found that Lapse provides near-linear scaling and can be orders of magnitude faster than existing parameter servers.
An optimal control problem with heterogeneous servers to minimize the average age of information (AoI) is considered. Each server maintains a separate queue, and each packet arriving to the system is randomly routed to one of the servers. Assuming Poisson arrivals and exponentially distributed service times, we first derive an exact expression of the average AoI for two heterogeneous servers. Next, to solve for the optimal average AoI, a close approximation is derived, called the approximate AoI, this is shown to be useful for multi-server systems as well. We show that for the optimal approximate AoI, server utilization (ratio of arrival rate and service rate) for each server should be same as the optimal server utilization with a single server queue. For two identical servers, it is shown that the average AoI is approximately 5/8 times the average AoI of a single server. Furthermore, the average AoI is shown to decrease considerably with the addition of more servers to the system.
Traditionally, research focusing on the design of routing and staffing policies for service systems has modeled servers as having fixed (possibly heterogeneous) service rates. However, service systems are generally staffed by people. Furthermore, people respond to workload incentives; that is, how hard a person works can depend both on how much work there is, and how the work is divided between the people responsible for it. In a service system, the routing and staffing policies control such workload incentives; and so the rate servers work will be impacted by the system's routing and staffing policies. This observation has consequences when modeling service system performance, and our objective is to investigate those consequences. We do this in the context of the M/M/N queue, which is the canonical model for large service systems. First, we present a model for "strategic" servers that choose their service rate in order to maximize a trade-off between an "effort cost", which captures the idea that servers exert more effort when working at a faster rate, and a "value of idleness", which assumes that servers value having idle time. Next, we characterize the symmetric Nash equilibriu
Today, location-based applications and services such as friend finders and geo-social networks are very popular. However, storing private position information on third-party location servers leads to privacy problems. In our previous work, we proposed a position sharing approach for secure management of positions on non-trusted servers, which distributes position shares of limited precision among servers of several providers. In this paper, we propose two novel contributions to improve the original approach. First, we optimize the placement of shares among servers by taking their trustworthiness into account. Second, we optimize the location update protocols to minimize the number of messages between mobile device and location servers.
The problem of online scheduling of multi-server jobs is considered, where there are a total of $K$ servers, and each job requires concurrent service from multiple servers for it to be processed. Each job on its arrival reveals its processing time, the number of servers from which it needs concurrent service and an online algorithm has to make scheduling decisions using only causal information, with the goal of minimizing the response/flow time. The worst case input model is considered and the performance metric is the competitive ratio. For the case, when all job processing time (sizes) are the same, we show that the competitive ratio of any deterministic/randomized algorithm is at least $Ω(K)$ and propose an online algorithm whose competitive ratio is at most $K+1$. With equal job sizes, we also consider the resource augmentation regime where an online algorithm has access to more servers than an optimal offline algorithm. With resource augmentation, we propose a simple algorithm and show that it has a competitive ratio of $1$ when provided with $2K$ servers with respect to an optimal offline algorithm with $K$ servers. With unequal job sizes, we propose an online algorithm whose
We consider a model inspired by compatibility constraints that arise between tasks and servers in data centers, cloud computing systems and content delivery networks. The constraints are represented by a bipartite graph or network that interconnects dispatchers with compatible servers. Each dispatcher receives tasks over time and sends every task to a compatible server with the least number of tasks, or to a server with the least number of tasks among $d$ compatible servers selected uniformly at random. We focus on networks where the neighborhood of at least one server is skewed in a limiting regime. This means that a diverging number of dispatchers are in the neighborhood which are each compatible with a uniformly bounded number of servers; thus, the degree of the central server approaches infinity while the degrees of many neighboring dispatchers remain bounded. We prove that each server with a skewed neighborhood saturates, in the sense that the mean number of tasks queueing in front of it in steady state approaches infinity. Paradoxically, this pathological behavior can even arise in random networks where nearly all the servers have at most one task in the limit.