Abstract:
Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement. To d...Show MoreMetadata
Abstract:
Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement. To date, the dominant methods are CNN-based, leaving plenty of room for improvement. In this work, we propose TransFlow, a transformer architecture for optical flow estimation. Compared to dominant CNN-based methods, TransFlow demonstrates three advantages. First, it provides more accurate correlation and trustworthy matching in flow estimation by utilizing spatial self-attention and cross-attention mechanisms between adjacent frames to effectively capture global dependencies; Second, it recovers more compromised information (e.g., occlusion and motion blur) in flow estimation through long-range temporal association in dynamic scenes; Third, it introduces a concise self-learning paradigm, eliminating the need for complex and laborious multi-stage pre-training procedures. The versatility and superiority of TransFlow extend seamlessly to 3D scene motion, yielding competitive outcomes in 3D scene flow estimation. Our approach attains state-of-the-art results on benchmark datasets such as Sintel and KITTI-15, while also exhibiting exceptional performance on downstream tasks, including video object detection using the ImageNet VID dataset, video frame interpolation using the GoPro dataset, and video stabilization using the DeepStab dataset. We believe that the effectiveness of TransFlow positions it as a flexible baseline for both optical flow and scene flow estimation, offering promising avenues for future research and development.
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 46, Issue: 12, December 2024)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Optical Flow ,
- Temporal Association ,
- Flow Estimation ,
- CNN-based Methods ,
- Motion Estimation ,
- 3D Motion ,
- 3D Scene ,
- Adjacent Frames ,
- Motion Blur ,
- Dynamic Scenes ,
- Dominant Method ,
- 3D Flow ,
- Transformer Architecture ,
- Optical Flow Estimation ,
- Global Dependencies ,
- Complex Need ,
- Laborious Procedures ,
- Convolutional Neural Network ,
- Decoding ,
- Consistent Estimates ,
- Cost Volume ,
- Scene Depth ,
- Occluded Regions ,
- 3D Point ,
- Current Frame ,
- Target Domain ,
- Object Motion ,
- Geometric Consistency ,
- KITTI Dataset ,
- Position Embedding
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Optical Flow ,
- Temporal Association ,
- Flow Estimation ,
- CNN-based Methods ,
- Motion Estimation ,
- 3D Motion ,
- 3D Scene ,
- Adjacent Frames ,
- Motion Blur ,
- Dynamic Scenes ,
- Dominant Method ,
- 3D Flow ,
- Transformer Architecture ,
- Optical Flow Estimation ,
- Global Dependencies ,
- Complex Need ,
- Laborious Procedures ,
- Convolutional Neural Network ,
- Decoding ,
- Consistent Estimates ,
- Cost Volume ,
- Scene Depth ,
- Occluded Regions ,
- 3D Point ,
- Current Frame ,
- Target Domain ,
- Object Motion ,
- Geometric Consistency ,
- KITTI Dataset ,
- Position Embedding
- Author Keywords