COIN: A Revolutionary Image Compression Technique from Oxford
Written on
Chapter 1: Introduction to COIN
Researchers from the University of Oxford have unveiled an innovative image compression technique that significantly surpasses the traditional JPEG standard when operating at low bitrates. This breakthrough does not rely on entropy coding or the need to learn a weight distribution.
Recent advancements in autoencoder technologies for lossy image compression have garnered significant interest within both machine learning and image processing fields. These autoencoders work on a straightforward concept: they convert an image, generally represented as a vector of pixel intensities, into a quantized format. This process decreases the information needed to either store or transmit the image.
Instead of saving the RGB values for each individual pixel, the new method retains the weights of a neural network specifically trained on the image. The researchers have named this approach 'COIN' (COmpressed Implicit Neural representations).
COIN functions by overfitting a small multilayer perceptron (MLP)—a variant of feedforward neural networks—to encode the image. This method maps pixel coordinates to RGB values (commonly referred to as an implicit neural representation) and subsequently transmits the MLP's weights. During the decoding phase, the transmitted MLP is evaluated at every pixel location, allowing for the reconstruction of the image.
Section 1.1: Challenges in Image Compression
The primary challenge in the image compression workflow is the overfitting of MLPs, particularly due to the presence of high-frequency details in natural images. Recent methods have incorporated sinusoidal encodings and activations to address this issue. The new research reveals that MLPs utilizing sine activations can effectively handle large images (393k pixels) with remarkably compact networks (8k parameters).
To optimize the model's size, the researchers employed architecture search techniques along with weight quantization. They executed a hyperparameter exploration concerning the width and layer count of MLPs, quantizing weights from 32-bit to 16-bit precision, which is sufficient to exceed the JPEG standard at lower bitrates.
Subsection 1.1.1: Decoding Flexibility
One of COIN's notable features is its adaptable decoding method. Images can be progressively decoded by evaluating functions at different pixel locations, a task that proves challenging for earlier autoencoder-based techniques.
Section 1.2: Experimental Results
To evaluate COIN's effectiveness, the researchers conducted tests using the Kodak image dataset, which consists of 24 images sized at 768×512 pixels. The model was benchmarked against three existing autoencoder-based neural compression models (BMS, MBT, and CST) as well as against the JPEG, JPEG2000, BPG, and VTM image codecs.
The researchers began by identifying optimal combinations of depth and width for the MLPs that represent an image (for instance, for 0.3bpp with 16-bit weights) to establish the best model architectures under specific parameter constraints (measured in bits per pixel, or bpp). The findings indicate that, at lower bitrates, the proposed method outperforms the JPEG standard, even in the absence of entropy coding.
Chapter 2: Performance Insights
The experiments reveal that COIN consistently outperforms the JPEG standard after 15,000 training iterations, continuing to show improvement thereafter. Furthermore, the quality of the compression is contingent upon the selected architecture, with various optimal configurations suited for different bpp values. The Oxford research team expresses optimism that continued exploration in this domain will lead to a new category of methods for neural data compression.
For further reading, the paper titled "COIN: Compression with Implicit Neural Representations" is available on arXiv. Stay informed on the latest in AI research by subscribing to our popular newsletter, Synced Global AI Weekly, for weekly updates.