LIFE: Lighting Invariant Flow Estimation

arXiv 2021

Zhaoyang Huang^{1, 3}, Xiaokun Pan^2, Runsen Xu², Yan Xu¹,
Ka Chun Cheung³, Guofeng Zhang², Hongsheng Li¹

¹CUHK-SenseTime Joint Laboratory, The Chinese University of Hong Kong
²State Key Lab of CAD & CG, Zhejiang University
³NVIDIA AI Technology Center, NVIDIA
^* denotes equal contributions

Paper

Code

Abstract

We tackle the problem of estimating flow between two images with large lighting variations. Recent learning-based flow estimation frameworks have shown remarkable performance on image pairs with small displacement and constant illuminations, but cannot work well on cases with large viewpoint change and lighting variations because of the lack of pixel-wise flow annotations for such cases. We observe that via the Structure-from-Motion (SfM) techniques, one can easily estimate relative camera poses between image pairs with large viewpoint change and lighting variations. We propose a novel weakly supervised framework LIFE to train a neural network for estimating accurate lighting-invariant flows between image pairs. Sparse correspondences are conventionally established via feature matching with descriptors encoding local image contents. However, local image contents are inevitably ambiguous and error-prone during the cross-image feature matching process, which hinders downstream tasks. We propose to guide feature matching with the flows predicted by LIFE, which addresses the ambiguous matching by utilizing abundant context information in the image pairs. We show that LIFE outperforms previous flow learning frameworks by large margins in challenging scenarios, consistently improves feature matching, and benefits downstream tasks.

Replacing the picture in videos via predicted flows

We select the painting The Starry Night and print it on an A4 paper. We record videos that capture the paper in different situations and extract frames from them. Then we use LIFE to individually predict flows from the painting to the frames and warp other images to the frames via the flows. Note that the image is warped via flows only and we do not use any other strategies. The excellent visual effect of the video synthesized via LIFE demonstrates the high robustness and accuracy of LIFE, which also reveals potential augmented reality~(AR) applications according to LIFE.