Header

UZH-Logo

Maintenance Infos

A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues


Armitage, Jason; Impett, Leonardo; Sennrich, R (2023). A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues. In: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2 January 2023 - 7 January 2023. Institute of Electrical and Electronics Engineers, online.

Abstract

In a busy city street, a pedestrian surrounded by distractions can pick out a single sign if it is relevant to their route. Artificial agents in outdoor Vision-and-Language Navigation (VLN) are also confronted with detecting supervisory signal on environment features and location in inputs. To boost the prominence of relevant features in transformer-based systems without costly preprocessing and pretraining, we take inspiration from priority maps - a mechanism described in neuropsychological studies. We implement a novel priority map module and pretrain on auxiliary tasks using low-sample datasets with high-level representations of routes and environment-related references to urban features. A hierarchical process of trajectory planning -with subsequent parameterised visual boost filtering on visual inputs and prediction of corresponding textual spans - addresses the core challenge of cross-modal alignment and feature-level localisation. The priority map module is integrated into a feature-location framework that doubles the task completion rates of standalone transformers and attains state-of-the-art performance for transformer-based systems on the Touchdown benchmark for VLN. We release code (https://github.com/JasonArmitage-res/PM-VLN) and data (https://zenodo.org/record/6891965.YtwoS3ZBxD8).

Abstract

In a busy city street, a pedestrian surrounded by distractions can pick out a single sign if it is relevant to their route. Artificial agents in outdoor Vision-and-Language Navigation (VLN) are also confronted with detecting supervisory signal on environment features and location in inputs. To boost the prominence of relevant features in transformer-based systems without costly preprocessing and pretraining, we take inspiration from priority maps - a mechanism described in neuropsychological studies. We implement a novel priority map module and pretrain on auxiliary tasks using low-sample datasets with high-level representations of routes and environment-related references to urban features. A hierarchical process of trajectory planning -with subsequent parameterised visual boost filtering on visual inputs and prediction of corresponding textual spans - addresses the core challenge of cross-modal alignment and feature-level localisation. The priority map module is integrated into a feature-location framework that doubles the task completion rates of standalone transformers and attains state-of-the-art performance for transformer-based systems on the Touchdown benchmark for VLN. We release code (https://github.com/JasonArmitage-res/PM-VLN) and data (https://zenodo.org/record/6891965.YtwoS3ZBxD8).

Statistics

Citations

Altmetrics

Downloads

3 downloads since deposited on 16 Mar 2023
3 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Conference or Workshop Item (Paper), not_refereed, original work
Communities & Collections:06 Faculty of Arts > Institute of Computational Linguistics
06 Faculty of Arts > Digital Visual Studies
06 Faculty of Arts > Zurich Center for Linguistics
Dewey Decimal Classification:000 Computer science, knowledge & systems
410 Linguistics
Scopus Subject Areas:Physical Sciences > Artificial Intelligence
Physical Sciences > Computer Science Applications
Physical Sciences > Computer Vision and Pattern Recognition
Language:English
Event End Date:7 January 2023
Deposited On:16 Mar 2023 11:33
Last Modified:30 Jan 2024 02:45
Publisher:Institute of Electrical and Electronics Engineers
Series Name:IEEE Workshop on Applications of Computer Vision (WACV)
Number:22597860
ISSN:1550-5790
ISBN:9781665493468
OA Status:Green
Publisher DOI:https://doi.org/10.1109/WACV56688.2023.00115
Official URL:https://ieeexplore.ieee.org/document/10030783
Related URLs:https://openaccess.thecvf.com/content/WACV2023/html/Armitage_A_Priority_Map_for_Vision-and-Language_Navigation_With_Trajectory_Plans_and_WACV_2023_paper.html