Enlighten Fusion Multiscale Network for Infrared and Visible Image Fusion in Dark Environments | IEEE Journals & Magazine | IEEE Xplore

Enlighten Fusion Multiscale Network for Infrared and Visible Image Fusion in Dark Environments


Abstract:

Most infrared and visible image fusion algorithms often struggle in dark environments where texture details in the visible image are largely obscured, although they are d...Show More

Abstract:

Most infrared and visible image fusion algorithms often struggle in dark environments where texture details in the visible image are largely obscured, although they are demonstrated to achieve good performance under normal illumination. To mitigate the dark environments issue, a novel Enlighten Fusion Multi-scale Network (EFMN) is proposed in this letter, which incorporates enhanced features at different scales into the main fusion network for lighting up the contexts in the darkness. With a sub-network enhancing the low-light visible image, multi-scale features are progressively enhanced and extracted. Then, a group of Fusion Modules (FM) are designed to fuse the features coarsely in multiple branches. Finally, the fused features are further refined by 1 × 1 convolution units to produce the resultant image. The processing of coarse fusion and then refinement at feature levels works effectively. Extensive experiments have shown that the proposed EFMN improves fusion performance in dark environments both subjectively and objectively. The improvements also facilitate typical down-stream vision tasks, such as object detection.
Published in: IEEE Signal Processing Letters ( Volume: 30)
Page(s): 1167 - 1171
Date of Publication: 28 August 2023

ISSN Information:


I. Introduction

Image fusion aims to integrate valuable information from different source images into one fused image [1]. Infrared images, which are based on thermal radiation imaging, can easily distinguish thermal targets since targets are reflected by high gray value. The quality of infrared images is also robust under harsh environments. However, they may lack rich textures due to the limits of imaging sensors. In contrast, visible images are able to capture abundant texture details, which is suitable for human visual perception. Comprehensive and accurate description of the scene can be produced by merging these two different modal images [2].

Contact IEEE to Subscribe

References

References is not available for this document.