AHs 2025 GazeLLM: Multimodal LLMs incorporating Human Visual Attention copertina

AHs 2025 GazeLLM: Multimodal LLMs incorporating Human Visual Attention

AHs 2025 GazeLLM: Multimodal LLMs incorporating Human Visual Attention

Ascolta gratuitamente

Vedi i dettagli del titolo

3 mesi a soli 0,99 €/mese

Dopo 3 mesi, 9,99 €/mese. Si applicano termini e condizioni.

A proposito di questo titolo

Processing high-resolution video with AI requires massive computational resources. GazeLLM offers an elegant solution inspired by human vision: use eye-tracking to focus only on what matters. By cropping first-person video to a small region around the user's gaze point, the system reduces pixel input to just one-tenth while achieving task comprehension equal to or better than full-resolution video. User evaluations across six real-world activities—cooking, bike repair, first aid, and sports—showed that gaze-focused video produces higher quality task descriptions than both full videos and center-cropped alternatives.

Jun Rekimoto. 2025. GazeLLM: Multimodal LLMs incorporating Human Visual Attention. In Proceedings of the Augmented Humans International Conference 2025 (AHs '25). Association for Computing Machinery, New York, NY, USA, 10 pages. https://doi.org/10.1145/3745900.3746075

Ancora nessuna recensione