Modern video codecs have been extensively optimized to preserve perceptual quality, leveraging models of the human visual system. However, in split inference systems-where intermediate features from neural network are transmitted instead of pixel data-these assumptions no longer apply. Intermediate features are abstract, sparse, and task-specific, making perceptual fidelity irrelevant. In this paper, we investigate the use of Versatile Video Coding (VVC) for compressing such features under the MPEG-AI Feature Coding for Machines (FCM) standard. We perform a tool-level analysis to understand the impact of individual coding components on compression efficiency and downstream vision task accuracy. Based on these insights, we propose three lightweight essential VVC profiles-Fast, Faster, and Fastest. The Fast profile provides 2.96% BD-Rate gain while reducing encoding time by 21.8%. Faster achieves a 1.85% BD-Rate gain with a 51.5% speedup. Fastest reduces encoding time by 95.6% with only a 1.71% loss in BD-Rate.
Mainstream image and video coding standards -- including state-of-the-art codecs like H.266/VVC, AVS3, and AV1 -- adopt a block-based hybrid coding framework. While this framework facilitates straightforward optimization for Peak Signal-to-Noise Ratio (PSNR), it struggles to effectively optimize perceptually-aligned metrics such as Multi-Scale Structural Similarity (MS-SSIM). To address this challenge, this paper proposes a low-complexity method to enhance perceptual quality in VVC intra coding by transferring bit allocation knowledge from end-to-end image compression. We introduce a lightweight model trained with perceptual losses to generate a quantization step map. This map implicitly captures block-level perceptual importance, enabling efficient derivation of a QP map for VVC. Experiments on Kodak and CLIC datasets demonstrate significant advantages, both in execution time and perceptual metric performance, with more than 11% BD-rate reduction in terms of MS-SSIM. Our scheme provides an efficient, practical pathway for perceptual enhancement of traditional codecs.
For the last few decades, the application of signal-adaptive transform coding to video compression has been stymied by the large computational complexity of matrix-based solutions. In this paper, we propose a novel parametric approach to greatly reduce the complexity without degrading the compression performance. In our approach, instead of following the conventional technique of identifying full transform matrices that yield best compression efficiency, we look for the best transform parameters defining a new class of transforms, called HyGTs, which have low complexity implementations that are easy to parallelize. The proposed HyGTs are implemented as an extension of High Efficiency Video Coding (HEVC), and our comprehensive experimental results demonstrate that proposed HyGTs improve average coding gain by 6% bit rate reduction, while using 6.8 times less memory than KLT matrices.
A code equivalence between index coding and network coding was established, which shows that any index-coding instance can be mapped to a network-coding instance, for which any index code can be translated to a network code with the same decoding-error performance, and vice versa. Also, any network-coding instance can be mapped to an index-coding instance with a similar code translation. In this paper, we extend the equivalence to secure index coding and secure network coding, where eavesdroppers are present in the networks, and any code construction needs to guarantee security constraints in addition to decoding-error performance.
The total energy consumption of today's video coding systems is globally significant and emphasizes the need for sustainable video coder applications. To develop such sustainable video coders, the knowledge of the energy consumption of state-of-the-art video coders is necessary. For that purpose, we need a dedicated setup that measures the energy of the encoding and decoding system. However, such measurements are costly and laborious. To this end, this paper presents an energy estimator that uses a subset of bit stream features to accurately estimate the energy consumption of the HEVC software encoding process. The proposed model reaches a mean estimation error of 4.88% when averaged over presets of the x265 encoder implementation. The results from this work help to identify properties of encoding energy-saving bit streams and, in turn, are useful for developing new energy-efficient video coding algorithms.
Consider a communication scenario over a noiseless channel where a sender is required to broadcast messages to multiple receivers, each having side information about some messages. In this scenario, the sender can leverage the receivers' side information during the encoding of messages in order to reduce the required transmissions. This type of encoding is called index coding. In this paper, we study index coding with two cooperative senders, each with some subset of messages, and multiple receivers, each requesting one unique message. The index coding in this setup is called two-sender unicast index coding (TSUIC). The main aim of TSUIC is to minimize the total number of transmissions required by the two senders. Based on graph-theoretic approaches, we prove that TSUIC is equivalent to single-sender unicast index coding (SSUIC) for some special cases. Moreover, we extend the existing schemes for SSUIC, viz., the cycle-cover scheme, the clique-cover scheme, and the local-chromatic scheme to the corresponding schemes for TSUIC.
Polarization-adjusted convolutional (PAC) codes, as a concatenated coding scheme based on polar codes, is able to approach the finite-length bound of binary-input AWGN channel at short blocklengths. In this paper, we extend PAC codes to the fields of source coding and joint source-channel coding and show that they can also approach the corresponding finite-length bounds at short blocklengths.
Lossless and near-lossless image compression is of paramount importance to professional users in many technical fields, such as medicine, remote sensing, precision engineering and scientific research. But despite rapidly growing research interests in learning-based image compression, no published method offers both lossless and near-lossless modes. In this paper, we propose a unified and powerful deep lossy plus residual (DLPR) coding framework for both lossless and near-lossless image compression. In the lossless mode, the DLPR coding system first performs lossy compression and then lossless coding of residuals. We solve the joint lossy and residual compression problem in the approach of VAEs, and add autoregressive context modeling of the residuals to enhance lossless compression performance. In the near-lossless mode, we quantize the original residuals to satisfy a given $\ell_\infty$ error bound, and propose a scalable near-lossless compression scheme that works for variable $\ell_\infty$ bounds instead of training multiple networks. To expedite the DLPR coding, we increase the degree of algorithm parallelization by a novel design of coding context, and accelerate the entropy c
An efficient two-layer coding method using the histogram packing technique with the backward compatibility to the legacy JPEG is proposed in this paper. The JPEG XT, which is the international standard to compress HDR images, adopts two-layer coding scheme for backward compatibility to the legacy JPEG. However, this two-layer coding structure does not give better lossless performance than the other existing single-layer coding methods for HDR images. Moreover, the JPEG XT has problems on determination of the lossless coding parameters; Finding appropriate combination of the parameter values is necessary to achieve good lossless performance. The histogram sparseness of HDR images is discussed and it is pointed out that the histogram packing technique considering the sparseness is able to improve the performance of lossless compression for HDR images and a novel two-layer coding with the histogram packing technique is proposed. The experimental results demonstrate that not only the proposed method has a better lossless compression performance than that of the JPEG XT, but also there is no need to determine image-dependent parameter values for good compression performance in spite of
This paper characterizes the second-order coding rates for lossy source coding with side information available at both the encoder and the decoder. We first provide non-asymptotic bounds for this problem and then specialize the non-asymptotic bounds for three different scenarios: discrete memoryless sources, Gaussian sources, and Markov sources. We obtain the second-order coding rates for these settings. It is interesting to observe that the second-order coding rate for Gaussian source coding with Gaussian side information available at both the encoder and the decoder is the same as that for Gaussian source coding without side information. Furthermore, regardless of the variance of the side information, the dispersion is $1/2$ nats squared per source symbol.
The quantum advantage of dense coding is studied, considering general encoding quantum operations. Particular attention is devoted to the case of many senders, and it is shown that restrictions on the possible operations on the senders' side may make some quantum state useless for dense-coding. It is shown, e.g., that some states are useful for dense coding if the senders can communicate classically (but not quantumly), yet they cannot be used for dense coding, if classical communication is not allowed. These no-go results are actually independent of the particular quantification of the quantum advantage, being valid for any reasonable choice. It is further shown that the quantum advantage of dense coding satisfies a monogamy relation with the so-called entanglement of purification.
The Chapter begins with a discussion of the constraints and needs of video coding systems. The lack in flexibility of traditional monolithic codec specifications, not suitable to model commonalities among codecs and foster reusability among successive codec generations/updates, was the main trigger for the development of a new standard initiative within the ISO/IEC MPEG committee, called reconfigurable video coding (RVC). The MPEG-RVC framework exploits the dataflow nature behind video coding to foster flexible and reconfigurable codec design, as well as to support dynamic reconfiguration. The Chapter goes on to consider that the inherent resiliency of various functional blocks (like motion estimation in the high-efficiency video coding, HEVC) and the varying levels of user perception make video coding suitable to apply approximate computing techniques. Approximate computing, if properly supported at design time, allows achieving run-time trade-offs, representing a new direction in hardware-software codesign research. The main assumption behind approximate computing, exploited within video coding, is that the degree of accuracy (in this case during codec execution) is not required
We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional coding with increased modelling capacity and similar tractability as previous work. We apply these methods to image reconstruction, using, in one instance, representations created for semantic segmentation on the Cityscapes dataset, and in another instance, representations created for object detection on the COCO dataset. In both experiments, we obtain similar performance between the conditional and residual methods, with the resulting rate-distortion curves contained within our baselines.
We study index-coding problems (one sender broadcasting messages to multiple receivers) where each message is requested by one receiver, and each receiver may know some messages a priori. This type of index-coding problems can be fully described by directed graphs. The aim is to find the minimum codelength that the sender needs to transmit in order to simultaneously satisfy all receivers' requests. For any directed graph, we show that if a maximum acyclic induced subgraph (MAIS) is obtained by removing two or fewer vertices from the graph, then the minimum codelength (i.e., the solution to the index-coding problem) equals the number of vertices in the MAIS, and linear codes are optimal for this index-coding problem. Our result increases the set of index-coding problems for which linear index codes are proven to be optimal.
Algebraic space-time coding allows for reliable data exchange across fading multiple-input multiple-output channels. A powerful technique for decoding space-time codes in Maximum-Likelihood (ML) decoding, but well-performing and widely-used codes such as the Golden code often suffer from high ML-decoding complexity. In this article, a recursive algorithm for decoding general algebraic space-time codes of arbitrary dimension is proposed, which reduces the worst-case decoding complexity from $O(|S|^{n^2})$ to $O(|S|^n)$.
This paper studies second-order coding rates for memoryless channels with a state sequence known non-causally at the encoder. In the case of finite alphabets, an achievability result is obtained using constant-composition random coding, and by using a small fraction of the block to transmit the type of the state sequence. For error probabilities less than 1/2, it is shown that the second-order rate improves on an existing one based on i.i.d. random coding. In the Gaussian case (dirty paper coding) with an almost-sure power constraint, an achievability result is obtained used using random coding over the surface of a sphere, and using a small fraction of the block to transmit a quantized description of the state power. It is shown that the second-order asymptotics are identical to the single-user Gaussian channel of the same input power without a state.
Link failures in wide area networks are common and cause significant data losses. Mesh-based protection schemes offer high capacity efficiency but they are slow and require complex signaling. Additionally, real-time reconfiguration of a cross-connect threatens their transmission integrity. On the other hand, coding-based protection schemes are proactive. Therefore, they have higher restoration speed, lower signaling complexity, and higher transmission integrity. This paper introduces a coding-based protection scheme, named Coded Path Protection (CPP). In CPP, a backup copy of the primary data is encoded with other data streams, resulting in capacity savings. This paper presents an optimal and simple capacity placement and coding group formation algorithm. The algorithm converts the sharing structure of any solution of a Shared Path Protection (SPP) technique into a coding structure with minimum extra capacity. We conducted quantitative and qualitative comparisons of our technique with the SPP and, another technique, known as p-cycle protection. Simulation results confirm that the CPP is significantly faster than the SPP and p-cycle techniques. CPP incurs marginal extra capacity on
Polar codes are capacity achieving error correcting codes that can be decoded through the successive-cancellation algorithm. To improve its error-correction performance, a list-based version called successive-cancellation list (SCL) has been proposed in the past, that however substantially increases the number of time-steps in the decoding process. The simplified SCL (SSCL) decoding algorithm exploits constituent codes within the polar code structure to greatly reduce the required number of time-steps without introducing any error-correction performance loss. In this paper, we propose a faster decoding approach to decode one of these constituent codes, the Rate-1 node. We use this Rate-1 node decoder to develop Fast-SSCL. We demonstrate that only a list-size-bound number of bits needs to be estimated in Rate-1 nodes and Fast-SSCL exactly matches the error-correction performance of SCL and SSCL. This technique can potentially greatly reduce the total number of time-steps needed for polar codes decoding: analysis on a set of case studies show that Fast-SSCL has a number of time-steps requirement that is up to 66.6% lower than SSCL and 88.1% lower than SCL.
Most of multipath multimedia streaming proposals use Forward Error Correction (FEC) approach to protect from packet losses. However, FEC does not sustain well burst of losses even when packets from a given FEC block are spread over multiple paths. In this article, we propose an online multipath convolutional coding for real-time multipath streaming based on an on-the-fly coding scheme called Tetrys. We evaluate the benefits brought out by this coding scheme inside an existing FEC multipath load splitting proposal known as Encoded Multipath Streaming (EMS). We demonstrate that Tetrys consistently outperforms FEC in both uniform and burst losses with EMS scheme. We also propose a modification of the standard EMS algorithm that greatly improves the performance in terms of packet recovery. Finally, we analyze different spreading policies of the Tetrys redundancy traffic between available paths and observe that the longer propagation delay path should be preferably used to carry repair packets.
Skele-Code is a natural-language and graph-based interface for building workflows with AI agents, designed especially for less or non-technical users. It supports incremental, interactive notebook-style development, and each step is converted to code with a required set of functions and behavior to enable incremental building of workflows. Agents are invoked only for code generation and error recovery, not orchestration or task execution. This agent-supported, but code-first approach to workflows, along with the context-engineering used in Skele-Code, can help reduce token costs compared to the multi-agent system approach to executing workflows. Skele-Code produces modular, easily extensible, and shareable workflows. The generated workflows can also be used as skills by agents, or as steps in other workflows.