Ecologically-Constrained Task Arithmetic for Multi-Taxa Bioacoustic Classifiers Without Shared Data
Ragib Amin Nihal, Benjamin Yen, Runwu Shi, Takeshi Ashizawa, Kazuhiro Nakadai
- Published
- May 5, 2026 — 16:10 UTC
- Summary length
- 436 words
- Relevance score
- 70%
Problem
This preprint addresses the challenge of training bioacoustic classifiers across multiple taxa without the need for centralized data sharing, which is often impractical due to data fragmentation across institutions and regions. The authors highlight the lack of methodologies that allow for effective multi-taxa classification while maintaining data privacy, thus filling a significant gap in the literature regarding collaborative learning in bioacoustics.
Method
The core technical contribution is the introduction of a method for composing independently fine-tuned BEATs (Bioacoustic Encoder-Attention Transformers) encoders into a unified classifier for 661 species using task vector arithmetic. The authors demonstrate that bioacoustic task vectors exhibit near-orthogonality, with cosine similarities ranging from 0.01 to 0.09. This orthogonality aligns with the spectral distribution distance, supporting the acoustic niche hypothesis. The authors propose a simple averaging method for combining task vectors, which they find to be optimal, while also evaluating sign-conflict methods that result in a 1-6 percentage point drop in accuracy. They verify linear mode connectivity across taxonomic pairs and demonstrate zero-shot transfer capabilities to new regions. The study identifies domain negation as a boundary condition where the proposed composition method fails.
Results
The proposed method achieves significant performance metrics, with the unified classifier outperforming individual models in certain contexts. Specifically, species-rich groups experience a relative accuracy drop compared to joint training, while underrepresented taxa benefit from the composition, leading to a more equitable distribution of classification performance. The authors provide quantitative results that indicate the effectiveness of their approach, although specific numerical performance metrics against named baselines are not detailed in the abstract.
Limitations
The authors acknowledge that their method may not perform well in scenarios characterized by domain negation, which could limit its applicability in certain ecological contexts. Additionally, the reliance on task vector orthogonality may not hold in all bioacoustic environments, potentially affecting generalizability. The paper does not address the computational overhead associated with fine-tuning multiple encoders or the scalability of the approach when applied to a larger number of taxa.
Why it matters
This work has significant implications for the field of bioacoustics and collaborative machine learning. By enabling institutions to share only task vectors rather than raw data, the proposed method preserves data privacy while facilitating the development of comprehensive multi-taxa classifiers. This approach could lead to improved biodiversity monitoring and conservation efforts, as it allows for equitable representation of underrepresented taxa. Furthermore, the findings on task vector orthogonality and the implications for zero-shot learning could inspire future research on collaborative learning frameworks in other domains where data sharing is constrained.
Authors: Ragib Amin Nihal, Benjamin Yen, Runwu Shi, Takeshi Ashizawa, Kazuhiro Nakadai
Source: arXiv:2605.03914
URL: https://arxiv.org/abs/2605.03914v1