Notable evaluation benchmarks

Zero-Shot to Full-Resource: Cross-lingual Transfer Strategies for Aspect-Based Sentiment Analysis

Jakob Fehle, Nils Constantin Hellwig, Udo Kruschwitz, Christian Wolff

Original source

arXiv cs.CL

Problem
This preprint addresses the gap in capability for Aspect-Based Sentiment Analysis (ABSA) in non-English languages, despite significant advancements in transformer-based models. The literature predominantly focuses on English, leaving a need for effective multilingual ABSA strategies. The authors evaluate state-of-the-art ABSA approaches across seven languages (English, German, French, Dutch, Russian, Spanish, and Czech) and four subtasks (Aspect Category Detection (ACD), Aspect Category Sentiment Analysis (ACSA), Target Aspect Sentiment Detection (TASD), and Aspect Sentiment Query Prediction (ASQP)).

Method
The authors systematically compare various transformer architectures under three settings: zero-resource, data-only, and full-resource. They employ cross-lingual transfer, code-switching, and machine translation techniques. The study utilizes fine-tuned Large Language Models (LLMs) and few-shot models, assessing their performance across the aforementioned ABSA subtasks. The core technical contribution includes the introduction of two new German datasets: an adapted GERestaurant and the first German ASQP dataset (GERest). The training compute specifics are not disclosed, but the evaluation emphasizes architecture-specific strategies, revealing that cross-lingual training on multiple non-target languages yields the best transfer performance for fine-tuned LLMs, while smaller encoder or sequence-to-sequence models benefit from code-switching.

Results
Fine-tuned LLMs achieved the highest overall scores across the ABSA tasks, particularly excelling in complex generative tasks. Few-shot models approached this performance in simpler setups, indicating that smaller encoder models also remain competitive. The results demonstrate that cross-lingual training significantly enhances performance, with fine-tuned LLMs outperforming baselines in all tasks. Specific performance metrics are not disclosed in the abstract, but the authors indicate substantial improvements over existing multilingual ABSA methods, particularly in the context of the newly introduced datasets.

Limitations
The authors acknowledge that their work is limited by the focus on only seven languages and the potential biases in the datasets used. They do not address the scalability of their methods to additional languages or dialects, nor do they explore the implications of domain-specific language variations. Additionally, the reliance on LLMs may introduce computational overhead, which could limit accessibility for practitioners with fewer resources.

Why it matters
This work has significant implications for advancing multilingual ABSA, particularly in non-English contexts. By providing new datasets and demonstrating effective cross-lingual transfer strategies, the authors pave the way for future research that can leverage these findings to enhance sentiment analysis capabilities across diverse languages. The architecture-specific insights can inform the design of more efficient models tailored for multilingual applications, potentially broadening the accessibility of sentiment analysis tools in global markets.

Authors: Jakob Fehle, Nils Constantin Hellwig, Udo Kruschwitz, Christian Wolff
Source: arXiv:2604.26619
URL: https://arxiv.org/abs/2604.26619v1