SEAOTTER: Sensor Embedded Autoencoding with One-Time Transcode for Efficient Reconstruction
Dan Jacobellis, Neeraja J. Yadwadkar
- Published
- Jun 2, 2026 — 17:28 UTC
Problem
This work addresses the limitations of existing visual data compression methods in robotics, particularly the inefficiencies of conventional codecs like JPEG and newer codecs such as AV1/AVIF, which require excessive computational resources for encoding. The authors highlight the gap in the literature regarding the need for a compression framework that balances high-quality reconstruction with low power and bandwidth usage, while also maintaining compatibility with established JPEG infrastructure. This paper is a preprint and has not undergone peer review.
Method
The proposed framework, SEAOTTER (Sensor Embedded Autoencoder with One-Time Transcode for Efficient Reconstruction), leverages a two-stage approach. First, it employs a learned latent representation through a sensor-embedded autoencoder, which captures the essential features of the visual data. Second, it incorporates a one-time transcoding mechanism that utilizes a learnable JPEG color and quantization transform. This transform is designed to optimize the compression process, enhancing the accuracy of global, dense, and vision-language-based perception tasks. The authors train both general-purpose and task-aware transcoding pipelines using a pre-trained, frozen encoder, allowing for efficient adaptation to various tasks without the need for extensive retraining.
Results
SEAOTTER achieves a remarkable compression ratio of 200:1 while outperforming the AVIF codec in several key metrics. Specifically, it demonstrates a 7-fold increase in encoding speed and a 3.5-fold increase in decoding speed compared to AVIF. Additionally, SEAOTTER improves ImageNet top-1 accuracy by 8% over the baseline, showcasing its effectiveness in maintaining visual fidelity despite aggressive compression. These results indicate a significant advancement in the efficiency of visual data transmission in robotics applications.
Limitations
The authors acknowledge that while SEAOTTER improves upon existing methods, it may still face challenges in scenarios with extremely low bandwidth or power constraints, where even the proposed optimizations may not suffice. Furthermore, the reliance on a pre-trained encoder may limit the adaptability of the framework to novel tasks that require fine-tuning. The paper does not address potential issues related to the generalization of the learned transcoding transform across diverse datasets or the impact of varying sensor characteristics on performance.
Why it matters
The implications of SEAOTTER are substantial for the field of cloud robotics, where efficient data transmission is critical for real-time applications. By combining the advantages of learned representations with the compatibility of JPEG, this framework paves the way for more effective use of visual data in resource-constrained environments. The work encourages further exploration of hybrid approaches that leverage both traditional and modern compression techniques, as published in arXiv cs.LG.
By Turing Wire editorial staff · Jun 2, 2026 · Editorial standards →
Source: arXiv cs.LG