Journals & Magazines >IEEE Transactions on Pattern ... >Volume: 46 Issue: 12

Optical Flow as Spatial-Temporal Attention Learners

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement. To d...Show More

Metadata

Abstract:

Optical flow is an indispensable building block for various important computer vision tasks, including motion estimation, object tracking, and disparity measurement. To date, the dominant methods are CNN-based, leaving plenty of room for improvement. In this work, we propose TransFlow, a transformer architecture for optical flow estimation. Compared to dominant CNN-based methods, TransFlow demonstrates three advantages. First, it provides more accurate correlation and trustworthy matching in flow estimation by utilizing spatial self-attention and cross-attention mechanisms between adjacent frames to effectively capture global dependencies; Second, it recovers more compromised information (e.g., occlusion and motion blur) in flow estimation through long-range temporal association in dynamic scenes; Third, it introduces a concise self-learning paradigm, eliminating the need for complex and laborious multi-stage pre-training procedures. The versatility and superiority of TransFlow extend seamlessly to 3D scene motion, yielding competitive outcomes in 3D scene flow estimation. Our approach attains state-of-the-art results on benchmark datasets such as Sintel and KITTI-15, while also exhibiting exceptional performance on downstream tasks, including video object detection using the ImageNet VID dataset, video frame interpolation using the GoPro dataset, and video stabilization using the DeepStab dataset. We believe that the effectiveness of TransFlow positions it as a flexible baseline for both optical flow and scene flow estimation, offering promising avenues for future research and development.

Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 46, Issue: 12, December 2024)

Page(s): 11491 - 11506

Date of Publication: 03 October 2024

ISSN Information:

PubMed ID: 39361459

DOI: 10.1109/TPAMI.2024.3463648

Funding Agency:

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

IEEE Keywords
- Estimation ,
- Optical flow ,
- Transformers ,
- Three-dimensional displays ,
- Computer vision ,
- Image motion analysis ,
- Decoding ,
- Costs ,
- Correlation ,
- Computer architecture
Index Terms
- Optical Flow ,
- Temporal Association ,
- Flow Estimation ,
- CNN-based Methods ,
- Motion Estimation ,
- 3D Motion ,
- 3D Scene ,
- Adjacent Frames ,
- Motion Blur ,
- Dynamic Scenes ,
- Dominant Method ,
- 3D Flow ,
- Transformer Architecture ,
- Optical Flow Estimation ,
- Global Dependencies ,
- Complex Need ,
- Laborious Procedures ,
- Convolutional Neural Network ,
- Decoding ,
- Consistent Estimates ,
- Cost Volume ,
- Scene Depth ,
- Occluded Regions ,
- 3D Point ,
- Current Frame ,
- Target Domain ,
- Object Motion ,
- Geometric Consistency ,
- KITTI Dataset ,
- Position Embedding
Author Keywords

Contents

Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.

IEEE Keywords
- Estimation ,
- Optical flow ,
- Transformers ,
- Three-dimensional displays ,
- Computer vision ,
- Image motion analysis ,
- Decoding ,
- Costs ,
- Correlation ,
- Computer architecture
Index Terms
- Optical Flow ,
- Temporal Association ,
- Flow Estimation ,
- CNN-based Methods ,
- Motion Estimation ,
- 3D Motion ,
- 3D Scene ,
- Adjacent Frames ,
- Motion Blur ,
- Dynamic Scenes ,
- Dominant Method ,
- 3D Flow ,
- Transformer Architecture ,
- Optical Flow Estimation ,
- Global Dependencies ,
- Complex Need ,
- Laborious Procedures ,
- Convolutional Neural Network ,
- Decoding ,
- Consistent Estimates ,
- Cost Volume ,
- Scene Depth ,
- Occluded Regions ,
- 3D Point ,
- Current Frame ,
- Target Domain ,
- Object Motion ,
- Geometric Consistency ,
- KITTI Dataset ,
- Position Embedding
Author Keywords

References is not available for this document.

Optical Flow as Spatial-Temporal Attention Learners

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Optical Flow as Spatial-Temporal Attention Learners

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?