WebAssembly (Wasm for short) brings a new, powerful capability to the web as well as Edge, IoT, and embedded systems. Wasm is a portable, compact binary code format with high performance and robust sandboxing properties. As Wasm applications grow in size and importance, the complex performance characteristics of diverse Wasm engines demand robust, representative benchmarks for proper tuning. Stopgap benchmark suites, such as PolyBenchC and libsodium, continue to be used in the literature, though they are known to be unrepresentative. Porting of more complex suites remains difficult because Wasm lacks many system APIs and extracting real-world Wasm benchmarks from the web is difficult due to complex host interactions. To address this challenge, we introduce Wasm-R3, the first record and replay technique for Wasm. Wasm-R3 transparently injects instrumentation into Wasm modules to record an execution trace from inside the module, then reduces the execution trace via several optimizations, and finally produces a replay module that is executable sandalone without any host environment - on any engine. The benchmarks created by our approach are (i) realistic, because the approach records
WebAssembly (Wasm) is an emerging binary format that draws great attention from our community. However, Wasm binaries are weakly protected, as they can be read, edited, and manipulated by adversaries using either the officially provided readable text format (i.e., wat) or some advanced binary analysis tools. Reverse engineering of Wasm binaries is often used for nefarious intentions, e.g., identifying and exploiting both classic vulnerabilities and Wasm specific vulnerabilities exposed in the binaries. However, no Wasm-specific obfuscator is available in our community to secure the Wasm binaries. To fill the gap, in this paper, we present WASMixer, the first general-purpose Wasm binary obfuscator, enforcing data-level (string literals and function names) and code-level (control flow and instructions) obfuscation for Wasm binaries. We propose a series of key techniques to overcome challenges during Wasm binary rewriting, including an on-demand decryption method to minimize the impact brought by decrypting the data in memory area, and code splitting/reconstructing algorithms to handle structured control flow in Wasm. Extensive experiments demonstrate the correctness, effectiveness an
WebAssembly (Wasm), as a compact, fast, and isolation-guaranteed binary format, can be compiled from more than 40 high-level programming languages. However, vulnerabilities in Wasm binaries could lead to sensitive data leakage and even threaten their hosting environments. To identify them, symbolic execution is widely adopted due to its soundness and the ability to automatically generate exploitations. However, existing symbolic executors for Wasm binaries are typically platform-specific, which means that they cannot support all Wasm features. They may also require significant manual interventions to complete the analysis and suffer from efficiency issues as well. In this paper, we propose an efficient and fully-functional symbolic execution engine, named SeeWasm. Compared with existing tools, we demonstrate that SeeWasm supports full-featured Wasm binaries without further manual intervention, while accelerating the analysis by 2 to 6 times. SeeWasm has been adopted by existing works to identify more than 30 0-day vulnerabilities or security issues in well-known C, Go, and SGX applications after compiling them to Wasm binaries.
In recent years, stealthy Android malware has increasingly adopted sophisticated techniques to bypass automatic detection mechanisms and harden manual analysis. Adversaries typically rely on obfuscation, anti-repacking, steganography, poisoning, and evasion techniques to AI-based tools, and in-memory execution to conceal malicious functionality. In this paper, we investigate WebAssembly (Wasm) as a novel technique for hiding malicious payloads and evading traditional static analysis and signature-matching mechanisms. While Wasm is typically employed to render specific gaming activities and interact with the native components in web browsers, we provide an in-depth analysis on the mechanisms Android may employ to include Wasm modules in its execution pipeline. Additionally, we provide Proofs-of-Concept to demonstrate a threat model in which an attacker embeds and executes malicious routines, effectively bypassing IoC detection by industrial state-of-the-art tools, like VirusTotal and MobSF.
Most programs compiled to WebAssembly (Wasm) today are written in unsafe languages like C and C++. Unfortunately, memory-unsafe C code remains unsafe when compiled to Wasm -- and attackers can exploit buffer overflows and use-after-frees in Wasm almost as easily as they can on native platforms. Memory-Safe WebAssembly (MSWasm) proposes to extend Wasm with language-level memory-safety abstractions to precisely address this problem. In this paper, we build on the original MSWasm position paper to realize this vision. We give a precise and formal semantics of MSWasm, and prove that well-typed MSWasm programs are, by construction, robustly memory safe. To this end, we develop a novel, language-independent memory-safety property based on colored memory locations and pointers. This property also lets us reason about the security guarantees of a formal C-to-MSWasm compiler -- and prove that it always produces memory-safe programs (and preserves the semantics of safe programs). We use these formal results to then guide several implementations: Two compilers of MSWasm to native code, and a C-to-MSWasm compiler (that extends Clang). Our MSWasm compilers support different enforcement mechanis
Browser fingerprinting defenses have historically focused on detecting JavaScript(JS)-based tracking techniques. However, the widespread adoption of WebAssembly (WASM) introduces a potential blind spot, as adversaries can convert JS to WASM's low-level binary format to obfuscate malicious logic. This paper presents the first systematic evaluation of how such WASM-based obfuscation impacts the robustness of modern fingerprinting defenses. We develop an automated pipeline that translates real-world JS fingerprinting scripts into functional WASM-obfuscated variants and test them against two classes of defenses: state-of-the-art detectors in research literature and commercial, in-browser tools. Our findings reveal a notable divergence: detectors proposed in the research literature that rely on feature-based analysis of source code show moderate vulnerability, stemming from outdated datasets or a lack of WASM compatibility. In contrast, defenses such as browser extensions and native browser features remained completely effective, as their API-level interception is agnostic to the script's underlying implementation. These results highlight a gap between academic and practical defense str
The extended Berkeley Packet Filter (eBPF) is extensively utilized for observability and performance analysis in cloud-native environments. However, deploying eBPF programs across a heterogeneous cloud environment presents challenges, including compatibility issues across different kernel versions, operating systems, runtimes, and architectures. Traditional deployment methods, such as standalone containers or tightly integrated core applications, are cumbersome and inefficient, particularly when dynamic plugin management is required. To address these challenges, we introduce Wasm-bpf, a lightweight runtime on WebAssembly and the WebAssembly System Interface (WASI). Leveraging Wasm platform independence and WASI standardized system interface, with enhanced relocation for different architectures, Wasm-bpf ensures cross-platform compatibility for eBPF programs. It simplifies deployment by integrating with container toolchains, allowing eBPF programs to be packaged as Wasm modules that can be easily managed within cloud environments. Additionally, Wasm-bpf supports dynamic plugin management in WebAssembly. Our implementation and evaluation demonstrate that Wasm-bpf introduces minimal o
WebAssembly (Wasm) is a low-level bytecode language and virtual machine, intended as a compilation target for a wide range of programming languages, which is seeing increasing adoption across diverse ecosystems. As a young technology, Wasm continues to evolve -- it reached version 2.0 last year and another major update is expected soon. For a new feature to be standardised in Wasm, four key artefacts must be presented: a formal (mathematical) specification of the feature, an accompanying prose pseudocode description, an implementation in the official reference interpreter, and a suite of unit tests. This rigorous process helps to avoid errors in the design and implementation of new Wasm features, and Wasm's distinctive formal specification in particular has facilitated machine-checked proofs of various correctness properties for the language. However, manually crafting all of these artefacts requires expert knowledge combined with repetitive and tedious labor, which is a burden on the language's standardization process and authoring of the specification. This paper presents Wasm SpecTec, a technology to express the formal specification of Wasm through a domain-specific language. Th
While the Open Radio Access Network Alliance (O-RAN) architecture enables third-party applications to optimize radio access networks at multiple timescales, real-time distributed applications (dApps) that demand low latency, high performance, and strong isolation remain underexplored. Existing approaches propose colocating a new RAN Intelligent Controller (RIC) at the edge, or deploying dApps in bare metal along with RAN functions. While the former approach increases network complexity and requires additional edge computing resources, the latter raises serious security concerns due to the lack of native mechanisms to isolate dApps and RAN functions. Meanwhile, WebAssembly (Wasm) has emerged as a lightweight, fast technology for robust execution of external, untrusted code. In this work, we propose a new approach to executing dApps using Wasm to isolate applications in real-time in O-RAN. Results show that our lightweight and robust approach ensures predictable, deterministic performance, strong isolation, and low latency, enabling real-time control loops.
In this era, significant transformations in industries and tool utilization are driven by AI/Large Language Models (LLMs) and advancements in Machine Learning. There's a growing emphasis on Machine Learning Operations(MLOps) for managing and deploying these AI models. Concurrently, the imperative for richer smart contracts and on-chain computation is escalating. Our paper introduces an innovative framework that integrates blockchain technology, particularly the Cosmos SDK, to facilitate on-chain AI inferences. This system, built on WebAssembly (WASM), enables interchain communication and deployment of WASM modules executing AI inferences across multiple blockchain nodes. We critically assess the framework from feasibility, scalability, and model security, with a special focus on its portability and engine-model agnostic deployment. The capability to support AI on-chain may enhance and expand the scope of smart contracts, and as a result enable new use cases and applications.
WebAssembly (WASM) has emerged as a crucial technology in smart contract development for several blockchain platforms. Unfortunately, since their introduction, WASM smart contracts have been subject to several security incidents caused by contract vulnerabilities, resulting in substantial economic losses. However, existing tools for detecting WASM contract vulnerabilities have accuracy limitations, one of the main reasons being the coarse-grained emulation of the on-chain data APIs. In this paper, we introduce WACANA, an analyzer for WASM contracts that accurately detects vulnerabilities through fine-grained emulation of on-chain data APIs. WACANA precisely simulates both the structure of on-chain data tables and their corresponding API functions, and integrates concrete and symbolic execution within a coverage-guided loop to balance accuracy and efficiency. Evaluations on a vulnerability dataset of 133 contracts show WACANA outperforming state-of-the-art tools in accuracy. Further validation on 5,602 real-world contracts confirms WACANA's practical effectiveness.
WebAssembly is the fourth officially endorsed Web language. It is recognized because of its efficiency and design, focused on security. Yet, its swiftly expanding ecosystem lacks robust software diversification systems. We introduce WASM-MUTATE, a diversification engine specifically designed for WebAssembly. Our engine meets several essential criteria: 1) To quickly generate functionally identical, yet behaviorally diverse, WebAssembly variants, 2) To be universally applicable to any WebAssembly program, irrespective of the source programming language, and 3) Generated variants should counter side-channels. By leveraging an e-graph data structure, WASM-MUTATE is implemented to meet both speed and efficacy. We evaluate WASM-MUTATE by conducting experiments on 404 programs, which include real-world applications. Our results highlight that WASM-MUTATE can produce tens of thousands of unique and efficient WebAssembly variants within minutes. Significantly, WASM-MUTATE can safeguard WebAssembly binaries against timing side-channel attacks,especially those of the Spectre type.
The performance of large language models (LLMs) and large multimodal models (LMMs) depends heavily on the quality and scale of their pre-training datasets. Recent research shows that large multimodal models trained on natural documents where images and text are interleaved outperform those trained only on image-text pairs across a wide range of benchmarks, leveraging advanced pre-trained models to enforce semantic alignment, image-sequence consistency, and textual coherence. For Arabic, however, the lack of high-quality multimodal datasets that preserve document structure has limited progress. In this paper, we present our pipeline Wasm for processing the Common Crawl dataset to create a new Arabic multimodal dataset that uniquely provides markdown output. Unlike existing Arabic corpora that focus solely on text extraction, our approach preserves the structural integrity of web content while maintaining flexibility for both text-only and multimodal pre-training scenarios. We provide a comprehensive comparative analysis of our data processing pipeline against those used for major existing datasets, highlighting the convergences in filtering strategies and justifying our specific des
WebAssembly (WASM) is an immensely versatile and increasingly popular compilation target. It executes applications written in several languages (e.g., C/C++) with near-native performance in various domains (e.g., mobile, edge, cloud). Despite WASM's sandboxing feature, which isolates applications from other instances and the host platform, WASM does not inherently provide any memory safety guarantees for applications written in low-level, unsafe languages. To this end, we propose Cage, a hardware-accelerated toolchain for WASM that supports unmodified applications compiled to WASM and utilizes diverse Arm hardware features aiming to enrich the memory safety properties of WASM. Precisely, Cage leverages Arm's Memory Tagging Extension (MTE) to (i) provide spatial and temporal memory safety for heap and stack allocations and (ii) improve the performance of WASM's sandboxing mechanism. Cage further employs Arm's Pointer Authentication (PAC) to prevent leaked pointers from being reused by other WASM instances, thus enhancing WASM's security properties. We implement our system based on 64-bit WASM. We provide a WASM compiler and runtime with support for Arm's MTE and PAC. On top of that,
Performance debugging in WebAssembly (Wasm) runtimes is essential for ensuring the robustness of Wasm, especially since performance issues have frequently occurred in Wasm runtimes, which can significantly degrade the capabilities of hosted services. Many performance issues in Wasm runtimes result from suboptimal compilation of input Wasm programs, for which existing performance debugging methods primarily designed for application-level inefficiencies are not well-suited. In this paper, we present WarpL, a novel mutation-based approach that aims to identify the exact suboptimal instruction sequences responsible for the performance issues in Wasm runtimes, thereby narrowing down the root causes. Specifically, WarpL obtains a functionally similar mutant in which the performance issue does not manifest, and isolates the exact suboptimal instructions by comparing the machine code of the original and mutated programs. We implement WarpL as an open-source tool and evaluate it on 12 real-world performance issues across three widely used Wasm runtimes. WarpL identified the exact causes in 10 out of 12 issues. Notably, we have used WarpL to successfully diagnose six previously unknown perfo
WebAssembly (Wasm) is a low-level portable code format offering near native performance. It is intended as a compilation target for a wide variety of source languages. However, Wasm provides no direct support for non-local control flow features such as async/await, generators/iterators, lightweight threads, first-class continuations, etc. This means that compilers for source languages with such features must ceremoniously transform whole source programs in order to target Wasm. We present WasmFX, an extension to Wasm which provides a universal target for non-local control features via effect handlers, enabling compilers to translate such features directly into Wasm. Our extension is minimal and only adds three main instructions for creating, suspending, and resuming continuations. Moreover, our primitive instructions are type-safe providing typed continuations which are well-aligned with the design principles of Wasm whose stacks are typed. We present a formal specification of WasmFX and show that the extension is sound. We have implemented WasmFX as an extension to the Wasm reference interpreter and also built a prototype WasmFX extension for Wasmtime, a production-grade Wasm engi
WebAssembly (abbreviated as Wasm) was initially introduced for the Web but quickly extended its reach into various domains beyond the Web. To create Wasm applications, developers can compile high-level programming languages into Wasm binaries or manually convert equivalent textual formats into Wasm binaries. Regardless of whether it is utilized within or outside the Web, the execution of Wasm binaries is supported by the Wasm runtime. Such a runtime provides a secure, memory-efficient, and sandboxed execution environment designed explicitly for Wasm applications. This paper provides a comprehensive survey of research on WebAssembly runtimes. It covers 98 articles on WebAssembly runtimes and characterizes existing studies from two different angles, including the "internal" research of Wasm runtimes(Wasm runtime design, testing, and analysis) and the "external" research(applying Wasm runtimes to various domains). This paper also proposes future research directions about WebAssembly runtimes.
The growth in the adoption of the WebAssembly (WASM) standard has given rise to a rapidly increasing landscape of binary applications that are natively ported to the environment of websites. The flexibility of WASM has made it the preferred way to run fast and resource-heavy applications, replacing a field that JavaScript previously monopolized. Despite its success, researchers have raised concerns over the security implementations of WASM, demonstrating that binary vulnerabilities, such as Buffer Overflows and Use After Free, remain a present danger for WASM binaries. Our work aims to demonstrate that such vulnerabilities, when occurring on a WebAssembly module, can affect the behavior of a web application in unexpected ways, enabling an attacker to exploit vulnerabilities that are typical of the web security landscape. We provide several scenarios to provide examples of how each binary vulnerability might lead to a web security vulnerability, such as SQL Injections, XS-Leaks, and SSTI. Our results show that binary vulnerabilities can invalidate common security mechanisms that web developer implement in their applications, demonstrating how the security of WASM modules remains a p
Recently, the WebAssembly (or Wasm) technology has been rapidly evolving, with many runtimes actively under development, providing cross-platform secure sandboxes for Wasm modules to run as portable containers. Compared with Docker, which isolates applications at the operating system level, Wasm runtimes provide more security mechanisms, such as linear memory, type checking, and protected call stacks. Although Wasm is designed with security in mind and considered to be a more secure container runtime, various security challenges have arisen, and researchers have focused on the security of Wasm runtimes, such as discovering vulnerabilities or proposing new security mechanisms to achieve robust isolation. However, we have observed that the resource isolation is not well protected by the current Wasm runtimes, and attackers can exhaust the host's resources to interfere with the execution of other container instances by exploiting the WASI/WASIX interfaces. And the attack surface has not been well explored and measured. In this paper, we explore the resource isolation attack surface of Wasm runtimes systematically by proposing several static Wasm runtime analysis approaches. Based on t
The security of modern JavaScript (JS) engines is critical since they provide the primary defense mechanism for executing untrusted code on the web. The recent integration of WebAssembly (Wasm) has transformed these engines into complex polyglot environments, creating a novel attack surface at the JS-Wasm interaction boundary due to the distinct type systems and memory models of two languages. This boundary remains largely underexplored, as previous works mainly focus on testing JS and Wasm as two isolated entities rather than investigating the security implications of their cross-language interactions. This paper proposes Weaver, an effective greybox fuzzing framework specifically tailored to uncover vulnerabilities at the JS-Wasm boundary. To comply with the language constraints, Weaver uses a type-aware generation strategy, meticulously maintaining the dual-type representation for every generated variables. This allows fuzzer to validly utilize variables across the language boundary. Besides, Weaver leverages the UCB-1 algorithm to intelligently schedule mutators and generators to maximize the discovery of new code paths. We have implemented and evaluated Weaver on three JS engi