Notable training methods

Principled Algorithms for Optimizing Generalized Metrics in Multi-Label Learning

Mehryar Mohri, Yutao Zhong

Published: May 27, 2026 — 17:23 UTC

Problem
This paper addresses the gap in optimizing complex evaluation metrics, such as the $F$-measure and Jaccard index, in multi-label classification tasks. Existing theoretical frameworks, particularly the Empirical Utility Maximization (EUM), have primarily focused on asymptotic Bayes-consistency, which limits their applicability in practical scenarios. The authors propose a more robust approach by introducing the concept of $H$-consistency, which provides stronger guarantees for finite sample sizes. This work is presented as a preprint and has not yet undergone peer review.

Method
The authors develop novel surrogate loss functions specifically designed for multi-label learning that ensure provable $H$-consistency bounds. These surrogates allow for optimization with non-asymptotic guarantees tailored to the hypothesis class and finite samples. A key technical contribution is the combinatorial formulation of these surrogates, which enables exact decomposition and operates in strictly $O(l)$ time, where $l$ is the number of labels, without requiring approximations. Building on this foundation, the authors introduce the Multi-Label Metric Optimization (MMO) algorithm, which is capable of optimizing generalized linear-fractional metrics. The training process leverages large-scale datasets, including MS-COCO and Reuters-21578, to validate the proposed methods.

Results
The proposed MMO algorithms demonstrate superior performance compared to state-of-the-art continuous baselines across multiple benchmarks. On the MS-COCO dataset, the authors report a significant improvement in the $F$-measure, achieving a 5% increase over the best existing methods. For the Reuters-21578 dataset, the Jaccard index shows a 7% enhancement compared to traditional approaches. The experiments also highlight the scalability of the algorithms, effectively handling high-sparsity scenarios typical in deep learning applications.

Limitations
The authors acknowledge that while their approach provides theoretical guarantees, the practical implementation may still face challenges in extremely high-dimensional spaces or with highly imbalanced label distributions. Additionally, the focus on generalized linear-fractional metrics may limit applicability to other types of metrics not covered in this framework. The paper does not address the computational overhead associated with the initial setup of the surrogate loss functions, which may impact real-time applications.

Why it matters
This work has significant implications for multi-label learning, particularly in domains where complex evaluation metrics are critical, such as image tagging and text categorization. By providing a principled approach to optimizing generalized metrics, the authors pave the way for more effective learning algorithms that can be applied in real-world scenarios. The theoretical advancements in $H$-consistency and the practical performance improvements over existing methods suggest that future research can build upon these foundations to further enhance multi-label classification systems.

Authors: Mehryar Mohri, Yutao Zhong
Source: arXiv:2605.28767
URL: https://arxiv.org/abs/2605.28767v1

By Callan Zhang · May 27, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.LG