Multi-Perspective Vision-Based Navigation

Felipe Felix Arias, Victor Gonzalez
University of Illinois at Urbana-Champaign
Research Project for Computer Vision (CS 543)

[Technical Report] [Code **Coming Soon**]

We extend the existing vision-based navigation state-of-the-art to take in inputs from multiple agents/perspectives.


Visual navigation is the process by which camera-equipped robots find collision free paths to desired locations relying only on their camera input. Despite being an often studied problem, it is difficult for deep-learning algorithms to solve due to the size of the state space, partial observality, and the reliability of reinforcement learning algorithms. In this work, we extend popular visual navigation algorithms to include perspectives from two robots rather than one during learning. Usually, the problem is defined from the perspective of the robot that is trying to reach the goal observation (whether it be an explicit image or a semantic description). However, we propose that taking advantage of the camera input of multiple robots could help the learning process due to the additional information a third person perspective provides. In summary, we explore the usage of a third person perspective during visual navigation, propose a new, non-egocentric, goal definition for visual navigation, and show that visual-navigation from a third person perspective is possible in the context of deep reinforcement learning.

Sample RGB, segmentation, and depth images (respectively). Both views are in the same environment, the bottom row is from the third-person perspective (a second robot) and the top is from the first-person perspective (robot seen in bottom row).

Website adapted from Unnat Jain, Jingxiang Lin, Richard Zhang and Deepak Pathak.