Polynomial-Time Robust Multiclass Linear Classification under Gaussian Marginals
Ilias Diakonikolas, Giannis Iakovidis, Mingchen Ma
- Published
- May 20, 2026 — 17:19 UTC
Problem
This paper addresses the gap in the understanding and algorithmic development for robust multiclass linear classification under Gaussian marginals, particularly for cases where the number of classes ( k \geq 3 ). While the binary classification scenario has a well-established theoretical framework, the multiclass case remains underexplored, especially regarding the computational complexity and sample efficiency of robust algorithms. The authors highlight that existing robust algorithms for ( k=3 ) exhibit exponential dependence on the inverse of the desired accuracy, which poses significant limitations. This work is presented as a preprint and has not yet undergone peer review.
Method
The authors introduce a pairwise improper-learning framework that leverages new structural insights into multiclass linear classifiers. The proposed method operates under the Gaussian distribution assumption for the input features. The key contributions include:
- A demonstration that the standard multiclass perceptron algorithm requires super-polynomially many samples and updates, even with clean labels, indicating a fundamental limitation in the binary-to-multiclass transition.
- An efficient learner that achieves an error bound of ( \widetilde O(k^{3/2}\sqrt{\mathrm{opt}}) + \epsilon ) for general ( k ), where ( \mathrm{opt} ) denotes the optimal error of the best classifier.
- A refined localization-based framework that provides an error of ( O(\mathrm{opt}) + \epsilon ) specifically for ( k=3 ), and ( \mathrm{poly}(k)\mathrm{opt} + \epsilon ) for geometrically regular ( k )-class linear classifiers. The training complexity is polynomial in terms of the number of classes and does not depend on the dimensionality of the input space.
Results
The proposed methods were evaluated against established baselines in multiclass classification. The pairwise improper-learning framework achieved an error of ( \widetilde O(k^{3/2}\sqrt{\mathrm{opt}}) + \epsilon ) for general ( k ), which is a significant improvement over previous methods that had exponential sample complexity. For ( k=3 ), the localization-based approach yielded an error of ( O(\mathrm{opt}) + \epsilon ), demonstrating a clear advantage over existing algorithms. The results indicate that the new methods can operate efficiently even as the number of classes increases, which is a notable advancement in the field.
Limitations
The authors acknowledge that their results are primarily focused on Gaussian marginals, which may limit the applicability of their findings to other distributions. Additionally, while the proposed methods show polynomial-time complexity, the constants involved in the error bounds may still be large, potentially affecting practical performance. The paper does not address the robustness of the methods against adversarial examples or label noise beyond the Gaussian assumption.
Why it matters
This work has significant implications for the development of robust multiclass classifiers, particularly in applications where the number of classes is large and the underlying data distribution is complex. By providing efficient algorithms with dimension-independent error guarantees, the findings could facilitate advancements in various domains such as computer vision, natural language processing, and any field requiring multiclass classification. The results also pave the way for further research into robust learning frameworks that can handle more general distributions and settings.
Authors: Ilias Diakonikolas, Giannis Iakovidis, Mingchen Ma
Source: arXiv:2605.21428
URL: https://arxiv.org/abs/2605.21428v1
By Turing Wire editorial staff · May 20, 2026 · Editorial standards →
Source: arXiv cs.LG