LENS: Learning to Segment Anything with Unified Reinforced Reasoning

1Huazhong University of Science and Technology   2vivo Mobile Communication Co., Ltd  

AAAI 2026 (Oral)

LENS: Reasoning & Segmentation


Abstract

Text-prompted image segmentation enables fine-grained visual understanding and is critical for applications such as human-computer interaction and robotics. However, existing supervised fine-tuning methods typically ignore explicit chain-of-thought (CoT) reasoning at test time, which limits their ability to generalize to unseen prompts and domains. To address this issue, we introduce LENS, a scalable reinforcement-learning framework that jointly optimizes the reasoning process and segmentation in an end-to-end manner. We propose unified reinforcement-learning rewards that span sentence-, box-, and segment-level cues, encouraging the model to generate informative CoT rationales while refining mask quality. Using a publicly available 3-billion-parameter vision–language model, i.e., Qwen2.5-VL-3B-Instruct, LENS achieves an average cIoU of 81.2% on the RefCOCO, RefCOCO+, and RefCOCOg benchmarks, outperforming the strong fine-tuned method, i.e., GLaMM, by up to 5.6%. These results demonstrate that RL-driven CoT reasoning serves as a robust prior for text-prompted segmentation and offers a practical path toward more generalizable Segment Anything models.


Model Structure

An Overview of LENS framework




More Results

ReasonSeg Reasoning Demos


Demo 1
Demo 2
Demo 3
Demo 4

Referring Expression Segmentation




Failure Cases

Due to the lack of multi-object training data, the model tends to produce incomplete segmentation, overlapping targets, or missed objects in multi-object segmentation tasks, resulting in suboptimal segmentation performance.


Acknowledgements

We would like to express our sincere gratitude to everyone who contributed to this work, including those who provided valuable discussions, technical support, and inspiration.

BibTeX


@misc{zhu2025lens,
    title={LENS: Learning to Segment Anything with Unified Reinforced Reasoning},
    author={Lianghui Zhu and Bin Ouyang and Yuxuan Zhang and Tianheng Cheng and Rui Hu and Haocheng Shen and Longjin Ran and Xiaoxin Chen and Li Yu and Wenyu Liu and Xinggang Wang},
    year={2025},
    eprint={2508.14153},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}