VISTA: View-Consistent Self-Verified Training for GUI Grounding copertina

VISTA: View-Consistent Self-Verified Training for GUI Grounding

VISTA: View-Consistent Self-Verified Training for GUI Grounding

Ascolta gratuitamente

Vedi i dettagli del titolo
Teaching AI to click the right button on a screen — GUI grounding — sounds simple but is surprisingly brittle. A core training problem is that reinforcement learning often collapses: on hard instances, every rollout fails, so there's no useful learning signal; on easy ones, every rollout succeeds, equally uninformative. VISTA solves this by generating multiple crops of the same GUI screenshot, comparing model predictions across geometrically different but semantically equivalent views. A self-verification mechanism further stabilizes training by anchoring on cases where the model has already produced a correct answer. Results across five benchmarks show consistent accuracy improvements, with the strongest gains on the most challenging GUI grounding tasks. Applications include desktop automation agents, accessibility tools, and software testing frameworks. Authors: Xinyu Qiu, Yunzhu Zhang, Heng Jia, Shuheng Shen, Changhua Meng, Linchao Zhu Paper: https://arxiv.org/abs/2606.14579v1
adbl_web_anon_alc_button_suppression_t1
Ancora nessuna recensione