I. Introduction
Camouflaged object detection (COD) seeks to accurately identify and segment highly blended visual objects within their surrounding environment. It has great potential for applications in areas such as healthcare [1], military [2], and agriculture [3]. With the rise of deep learning techniques, COD methods have made tremendous progress. To illustrate, Fan et al. [4] introduced their COD10 K dataset and a simple but effective network SINet. Similarly, C2FNet [5] built contextual connections by aggregating intermediate and high-level features. FDNet [6] employed feature grafting and interference sensing mechanisms to fine tune COD tasks. In addition, FSANet [7] enhanced object features through the multiplicative fusion of features, CamoFocus [8] enhanced the performance of COD through the Feature Splitting and Modulation module and the Context Refinement Module, DIRNet [9] improved the accuracy of COD by employing a Bilateral Interaction Module and an Adjacent Aggregation Interaction Module, and DINet [10] enhanced the three-dimensional perception capability for RGB-D COD by fusing depth maps. Taking the above methods as an example, despite having made some advances in COD methods, the high similarity of visual features between camouflage objects and backgrounds remains challenging. This similarity leads to multiple problems in the detection process, including multi-target omission and small target misjudgment, etc. To solve the problems of multi-target omission and small-target misjudgment in COD, some methods based on frequency domain transformation have been preliminarily explored. For instance, Zhong et al. [11] introduced frequency domain features as additional cues to enhance the capability of detecting camouflaged objects against the background. He et al. [12] proposed a feature decomposition and edge reconstruction model. Similarly, Cong et al. [13] proposed frequency sensing and correction fusion modules based on octave convolution. Liang et al. [14] proposed an efficient Frequency Injection Module that injects frequency domain cues at different stages to enhance feature representation.