Notable agents robotics

WEQA: Wearable hEalth Question Answering with Query-Adaptive Agentic Reasoning

Yuwei Zhang, Tong Xia, Bianca Emmerich, Yu Yvonne Wu, Dimitris Spathis, Xin Liu

Published: Jun 16, 2026 — 16:45 UTC

Problem
The paper addresses the gap in effective medical question answering using wearable health data, which is characterized by continuous, high-dimensional, and longitudinal data streams. Current language models (LMs) struggle to align this data with text-centric distributions due to the diversity of sensor modalities and user intents. This work is particularly relevant as it is a preprint and has not undergone peer review, highlighting the need for further validation in the literature.

Method
The authors propose WEQA, a query-adaptive agent framework that integrates large language model (LLM) reasoning with specialized analytical tools for wearable health data. The architecture consists of an LLM controller that synthesizes execution plans based on the nature of the query. This controller dynamically routes queries to the appropriate combination of sensor analysis and pretrained models, allowing for tailored responses. Additionally, the framework incorporates grounded response auditing, leveraging external knowledge to enhance the accuracy of the answers. The authors curated a benchmark comprising four open wearable datasets, which include analytic and predictive tasks across three health domains, to evaluate the framework’s performance.

Results
WEQA demonstrates a 24% improvement in accuracy over baseline models, including standard LLMs and agentic baselines, on the curated benchmark. The evaluation included a blinded study involving 12 medical experts and 8 users, which indicated substantial gains in perceived usefulness and clinical soundness of the responses generated by WEQA. These results suggest that the integration of query-adaptive reasoning with wearable data significantly enhances the quality of medical question answering.

Limitations
The authors acknowledge that the framework’s performance may be contingent on the quality and diversity of the training data from the wearable datasets. They also note that the reliance on external knowledge for response auditing could introduce variability based on the sources used. Additionally, the study’s sample size for expert evaluation is relatively small, which may limit the generalizability of the findings. Other potential limitations not explicitly mentioned include the scalability of the framework to other health domains and the computational overhead associated with dynamic query routing.

Why it matters
The implications of this work are significant for the future of health informatics and personalized medicine, as it provides a novel approach to leveraging wearable health data for improved medical decision-making. By integrating LLMs with specialized analytical tools, WEQA paves the way for more accurate and contextually relevant health insights, which could enhance patient care and clinical outcomes. This research contributes to the growing body of literature on AI applications in healthcare, as published in arXiv.

By Callan Zhang · Jun 16, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: arXiv cs.AI