Notable foundation models

New AI model called "Count Anything" does exactly what it says, and that's harder than it sounds

Published: Jun 13, 2026 — 17:00 UTC

Problem — The paper addresses the gap in existing AI models that struggle with counting objects in diverse image contexts, such as crowded scenes or microscopic samples. Current systems often require specific configurations or fail to generalize across different object densities and types. This work is particularly relevant as it is a preprint and has not undergone peer review, indicating the need for further validation in the research community.

Method — “Count Anything” employs a novel architecture that integrates a transformer-based model with a text-to-image processing pipeline. The model is trained on a diverse dataset that includes various object types and densities, allowing it to generalize across different scenarios. The training process utilizes a combination of supervised learning with a custom loss function designed to minimize counting errors. The authors do not disclose specific training compute resources, but the model’s architecture suggests a significant computational investment to achieve its performance.

Results — In comparative evaluations, “Count Anything” demonstrates a 50% reduction in error rates compared to existing counting models on benchmark datasets. While specific baseline models are not named, the improvement is substantial enough to indicate a meaningful advancement in the field. The model’s performance is particularly notable in scenarios with moderate object density, although it still encounters challenges with extremely dense arrangements and ambiguous counting prompts.

Limitations — The authors acknowledge that “Count Anything” struggles with counting in scenarios involving very dense objects, where occlusion and overlapping can lead to inaccuracies. Additionally, the model’s reliance on text prompts introduces ambiguity, as the interpretation of terms can vary significantly. The paper does not address potential biases in the training dataset or the model’s performance across different demographic or contextual factors, which could impact its applicability in real-world scenarios.

Why it matters — The development of “Count Anything” has significant implications for various applications, including surveillance, biological research, and inventory management, where accurate object counting is critical. The ability to count objects using natural language prompts could streamline workflows and enhance user interaction with AI systems. This work lays the groundwork for future research into more robust counting algorithms and the integration of multimodal inputs in AI models, as published in The Decoder.

By Callan Zhang · Jun 13, 2026 · Editorial standards →

Summarised from the primary source with AI assistance under human editorial oversight. Turing Wire is not a primary source — read the original for the authoritative account.

Source: The Decoder