Fourier Transform to Group Feature on Generated Coarser Contours for Fast 2D Shape Matching

Fourier descriptors are classical global shape descriptors with high matching speed but low accuracy. To obtain higher accuracy, a novel framework for forming Fourier descriptors is proposed and named as MSFDGF (multiscale Fourier descriptor using group feature). MSFDGF achieves multiscale description by generating coarser contours. Then, a group of complementary features are extracted on the generated coarser contours. Finally, Fourier transform is performed on the features. MSFDGF-SH is a new global descriptor using the MSFDGF framework and shape histograms. Experiments are conducted on four databases, which are MPEG-7 CE-1 Part B, Swedish Plant Leaf, Kimia 99 and Expanded Articulated Database, to evaluate the performance of MSFDGF-SH. The experimental results show that MSFDGF-SH is an effective and efﬁcient global shape descriptor. This new descriptor has a high accuracy of 87.76%, which exceeds the Shape Tree on the MPEG-7 CE-1 Part B dataset. This is the ﬁrst Fourier descriptor that surpasses the Shape Tree method in terms of both accuracy and speed on this dataset


I. INTRODUCTION
Shape is an important feature in plant leaf retrieval [1], trademark retrieval [2] and object recognition in blurred images. Shape descriptor is an important tool for extracting shape features of objects in 2D images.
Although the researches on post-processing methods [3]- [13] in the field of shape retrieval have been extensive for years, many scholars are still working on designing better shape descriptors because they can provide the original dissimilarity/similarity between shapes, which is the basis of shape matching. A ineffective shape descriptor cannot obtain high accuracy in shape retrieval, no matter how advanced a post-processing method is combined with. Therefore, the study of shape descriptors has never stopped The associate editor coordinating the review of this manuscript and approving it for publication was Shenghong Li. and a large number of excellent descriptors [14]- [34] have been proposed.
These local descriptors, such as SC (shape context) [35], IDSC (inner-distance shape context) [36], TAR (trianglearea representation) [37] and Shape Tree [38], have achieved highly accurate experimental results on the MPEG-7 CE-1 Part B shape database, but they all perform poorly in terms of matching efficiency. The matching efficiency of global descriptors MDM (multiscale distance matrix) [39], FD (Fourier descriptor) [40] and WD (wavelet descriptor) [41], is very high, but their performance in terms of accuracy is poor. The shape descriptor AP&BAP (angular pattern and binary angular pattern) [42] has thus been proposed to achieve both matching accuracy and efficiency with multiscale description and efficient distance metrics.
Inspired by AP&BAP, researchers then focus more on designing gloabal descriptors that are efficient in the matching process. However, the design of this type of descriptors VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ is extremely difficult. Among new global descriptors only HSC (hierarchical string cuts) [43] is at the same level of discriminability as AP&BAP.
To this end, a novel framework for forming global Fourier descriptors and a new descriptor based on this framework are proposed in our study. Using the Fourier transform, we proposed various approaches to improve the discriminability of the global descriptor as much as possible. These approaches include constructing multiple scales and improving structure of the spatial signature. The performance of this descriptor on accuracy and speed is as good as that of the excellent global descriptors HSC [43] and AP&BAP [42]. It performs even better than Shape Tree [38] on MPEG-7 CE-1 Part B in terms of both accuracy and speed.
The rest of this paper is organized as follows: Section 2 discusses related work. In Section 3, the new framework and descriptor are introduced in detail. In Section 4, the computational complexity of the proposed method is estimated. In Section 5, a number of well-known databases are used to evaluate the performance of the new method, in terms of effectiveness and efficiency. Finally, Section 6 concludes the paper.

II. RELATED WORK
In the last fifteen years, shape representation methods based on contour sampling points have developed much faster than the area-based ones. Usually, a contour is a set of uniformly sampled points on the outline of a shape. In this section, these contour-based methods are discussed in detail.

A. LOCAL DESCRIPTORS
The describing process of most local descriptors is to calculate a feature for each contour point or segment. This feature is typically a vector or matrix. Ignoring the relative order between features, a shape is described as a feature set. The matching process of local descriptors is to find the best correspondence between two sets of elements (features). The optimization algorithm is used to find the optimal correspondence between the two sets of elements, and the matching cost under the optimal correspondence is the dissimilarity (distance) between two shapes.
The SC [35] has been one of the most important descriptors in the field of shape matching. It sets each point in the contour as a reference point in turn, then calculates the distance of other points relative to the reference point, and builds a shape histogram (distance histogram) to describe the corresponding reference point. Finally, N (the number of contour sampling points) shape histograms are obtained. These shape histograms are put together to form a set (SC feature), which describes the shape. The matching process of two shapes is to compute the distance between their SC features. Therefore, shape matching becomes matching of two sets of shape histograms. The χ 2 distance is used to measure the difference between two histograms. Matching two sets of shape histograms is to calculate the minimum sum of χ 2 distances between two sets of shape histograms.
Finally, the minimum sum (matching cost) is the dissimilarity between two shapes. Dynamic programming [3] can be used in the process to find the optimal correspondence between two sets of shape histograms.
SC has an enhanced version IDSC [36], which performs better than the original SC in representing articulated shapes. The IDSC uses inner-distance instead of Euclidean distance used in the original SC when describing the relationship between two contour points. Inner-distance refers to the shortest path connecting two points inside the shape. Another major advantage of IDSC is its strong compatibility. Many post-processing algorithms based on learning [3]- [9] use IDSC to obtain the matching results between shapes as basis of learning.
The SC and IDSC are two important local descriptors, as they all achieve high retrieval rates on MPEG-7 CE-1 Part B and they are complementary to obtain higher retrieval rates. The complementarity between IDSC and SC is described in detail in [4]. However, SC and IDSC still have shortcomings. In terms of practice, they run too slowly to meet the practical requirements as they all use DP (dynamic programming).
In [38], Felzenszwalb et al. describe a hierarchical representation for shapes that captures shape information at multiple levels of resolution. Usually this method is called Shape Tree, and it achieves very high retrieval rate (87.70%) on MPEG-7 CE1 Part B. This high retrieval rate had not been surpassed by Fourier descriptors before our method was proposed. Overall most local descriptors with DP perform effectively in terms of accuracy. However, all of them have high computational costs.

B. GLOBAL DESCRIPTORS
Generally, in a global descriptor, a shape is represented by a feature vector (or matrix) extracted from the whole contour, and matching is conducted by comparing such representation vectors (or matrices) [42]. In the matching process, a global descriptor is suitable for using efficient distance metrics such as Euclidean distance and city block distance.
FDs (Fourier descriptors) are classical global descriptors. FD-CCD (Fourier descriptor based on centroid contour distance) [40] is taken as an example to introduce the characteristics of such descriptors. The Euclidean distance from each contour point to the centroid point is put into a sequence in order. Then, Fourier transform is used on the Euclidean distance sequence and the transformed result is the FD-CCD feature. The dissimilarity is the city block distance between two FD-CCD features belonging to two shapes respectively. WDs (wavelet descriptors) [41], [44] are also global descriptors, and they have similar effectiveness and efficiency as FD-CCD.
MDM (multiscale distance matrix) [39], which captures the shape geometry while being invariant to translation, rotation, scaling, and bilateral symmetry, is an important shape descriptor in global descriptors. It combined multiscale description and distance metrics to achieve high efficiency and effectiveness in plant leaf retrieval. However, unfortunately, MDM's discriminability is limited, and its performance on some important databases, such as the MPEG-7 CE-1 Part B, does not reach the level of local descriptors in terms of accuracy.
In [42], Hu et al. propose two novel shape features, AP (angular pattern) and BAP (binary angular pattern), and a multiscale integration of them (AP&BAP) for shape matching. AP&BAP is a much significant descriptor, which allows many scholars to see the hope of the global descriptor in terms of accuracy. The previous FD [40] and MDM [39] are still inaccurate, relative to the local descriptors. Yet, AP&BAP is both fast and accurate. The retrieval rate of AP&BAP with χ 2 distance (87.04%) even surpassed SC+DP (86.80%) and IDSC+DP (85.40%) on MPEG-7 CE-1 Part B shape database. In terms of speed, AP&BAP continues the advantages of the global descriptor.
The HSC (hierarchical string cuts) [43] method is proposed to partition a shape into multiple level curve segments of different lengths from a point moving around the contour to describe the shape gradually and completely from the global information to the finest details. HSC continues the great breakthrough of global descriptors. In the experiments, it gets a higher retrieval rate (87.31%) than AP&BAP (87.04%), on the MPEG-7 CE-1 Part B shape database with a faster speed.
Kaothanthong et al. [45] propose a shape signature named DIR (distance interior ratio) that utilizes intersection pattern of the distribution of line segments with the shape, and a histogram alignment method for adjusting the interval of the histogram according to the distance distribution. DIR is a recent attempt at global descriptors. Its speed is as fast as FD, which is faster than HSC. Its retrieval rate is 10% higher than FD on MPEG-7 CE-1 Part B shape database. However, it still does not reach the level of AP&BAP and HSC in terms of discriminability. This result shows how difficult it is to design an effective global descriptor.

A. GROUP FEATURE
Fourier transform is a commonly used technique in fast shape matching. It usually transforms a spatial feature vector of a shape into a sequence of coefficients in the frequency domain. An element of the spatial feature vector is determined by its corresponding contour point. These elements are arranged in the same order as the contour points in the closed contour. An obvious problem is that different starting point positions in the closed contour will result in different spatial feature vectors. In the Fourier transform, the operation of abandoning phase information solves this problem (see Eq. (1)). This special Fourier transform can get the starting point position invariance, thus it avoids the computation process of finding the best starting point. This is the reason why the Fourier transform technique is widely used for fast shape matching.
where v s is the spatial feature vector, N is the length of the vector and f is the output frequency sequence.
With the uniqueness of transforming result of Eq. (1), distance metric is used to measure the difference between frequency coefficients sequences of two shapes. This matching process has very low computational consumption. But the discriminability of most previous Fourier descriptors is not so effective. This is mainly because the previous Fourier descriptors often rely on a single spatial feature, such as FD-CCD (Fourier descriptor based on the centroid contour distance) [40] and FD-FPD (Fourier descriptor based on the furthest point distance) [46]. Therefore, it is a feasible method to improve the discriminability of the Fourier descriptor that more and better spatial features are used together in transforming.
Such a combination of spatial features is called Group Feature (GF), and the GF-based Fourier descriptor is named as FDGF. For example, CCD and FPD can form a GF, which is named GF-CCD&FPD. The experiments can prove that the discriminability of FDGF-CCD&FPD is better than FD-CCD and FD-FPD, but still can not reach the level of a local descriptor (such as TAR [37]). In terms of discriminability, in order to reach the level of local descriptors, it is necessary to design an effective GF. The combination of CCD and FPD is a simple GF containing two feature units. An ideal GF should contain some feature units, which are highly complementary and lowly correlated. These feature units preferably have the ability to describe any contour points in various shapes. This is like the orthogonal basis in an Euclidean space.

B. BIN VECTOR
Shape histogram can be used as a GF, although it always appears in local descriptors [35], [36]. A histogram is rarely used in FD, probably because scholars are accustomed to use a single spatial feature vector. The histogram h i d s describes the distribution of the remaining points in the contour c d relative to c d (i d ).
where h i d s is the shape histogram of the i d th contour point is the value of the bth bin in h i d s . These B bins uniformly divide the log-polar plane centered on c d (i d ). c d is a contour represented by a sampling points sequence. c d (i d ), i d ∈ Z , is the i d th point in the contour c d that has N sampling points. Since the contour is closed, and c d (i d + 1) are two adjacent points of c d (i d ) on the contour. Therefore, shape histogram is a set of feature units that describe a contour point. VOLUME 8, 2020 The feature units in this set are highly complementary and lowly correlated. In other words, the shape histogram is an excellent GF.
The Fourier descriptor based on the GF of shape histogram is called FDGF-SH. In the process of extracting the FDGF-SH feature, a new feature Bin Vector (BV) is required as Fourier transform cannot deal with N shape histograms directly. v i b b (a BV) is a column vector generated by Eq. (3). BV is the key that is used to transform a local feature to a global feature. Setting . This Fourier coefficient sequence is still a column vector, just like the previous Fourier descriptor. Subsequently, f 1 , f 2 , . . . , f B are used to form a feature matrix F.
Since the lower frequency components are more stable than higher frequency ones in FDGF-SH, only a few low frequency coefficients are used in the matching process with weighted city block distance. The weighted city block distance between two contours C 1 and C 2 in the FDGF-SH feature space are represented as D( where f i b C 1 and f i b C 2 are the f i b sequence of C 1 and C 2 . In Eq. (5) K is far smaller than N . Usually, K < 3log 2 N .
FDGF-SH can surpass local descriptors easily in terms of efficiency. However, in terms of accuracy, FDGF-SH may be still not at the level of local descriptors. Therefore, FDGF-SH should continue to be improved.

C. FEATURE ON A GENERATED COARSER CONTOUR
Human eyes sometimes automatically filter some local details and preserve coarser features to reduce interference caused by noises, when recognizing shapes. This approach can be used in the design of shape descriptors. In our study, three approaches for obtaining variable-level coarser contours are proposed.

1) MEDIAN FILTERING TO A CONTOUR
Median filtering is used to generate the level t coarser contour. It is characterized by a linear increase of filtering core scale as t increases. The FDGF-SH feature on level t coarser contour is represented as FDGF-SH-MFtC in this median filtering approach. The level t coarser contour is generated with where the level 0 coarser contour c 0 is the original contour containing N sampling points. It can be seen that the number of sampling points for the level t coarser contour c t mf is still N . FDGF-SH-MFtC feature is extracted from contour c t mf using FDGF-SH method. The weighted city block distance is still used in the matching process of FDGF-SH-MFtC. D MFtC (C 1 , C 2 ) is the distance (dissimilarity) in FDGF-SH-MFtC feature space between two shapes C 1 and C 2 .
In practice, when t is large, for example t = N /4, the adjacent points in c t mf may overlap each other. Even if they don't overlap, sampling points are excessive as shown in Fig. 1. A large number of sampling points are used for describing a very simple shape, and they are nonuniformly distributed as shown in Fig. 1. It is not conducive to reduce the computional cost of the large scale features in extracting and matching processes. The advantage is that the variation of coarser contours between adjacent levels is small, so there are many levels to use. Enough levels makes it is easy to find a tacit combination of levels for more effective description. Theoretically, 0 ≤ t ≤ N − 1 and t ∈ Z . When t = 0, c t mf is still c 0 . When t = N − 1, c t mf is only one point. The shapes at different scales are shown in Fig. 2.

2) DOWNSAMPLING TO A CONTOUR
In this downsampling version N is required to be in the power of 2. The number of points in the downsampled contour decreases exponentially, as t increases. In this approach, there are fewer (log 2 N + 1) levels of coarser contours. The level t coarser contour using downsampling approach is generated by where h i d ,n 0 s is the shape histogram of c t,n 0 ds (i d ) in the contour c t,n 0 ds . Finally, the FDGF-SH-DStC feature is calculated base on the set of these average shape histograms. The weighted city block distance is still used in the matching process. The distance between two shapes C 1 and C 2 in the FDGF-SH-DStC feature space is expressed as D DStC (C 1 , C 2 ).
The number of contour sampling points at each level is different from each other, the larger the t, the fewer the contour sampling points. In addition, there are fewer levels  Theoretically, in downsampling approach, 0 ≤ t ≤ log 2 N and t ∈ Z . When t = 0, c t,n 0 ds is still c 0 . When t = log 2 N − 1, c t,n 0 ds is a line segment. When t = log 2 N , c t,n 0 ds is a point. c t,n 0 ds at each level is shown in Fig. 5.

3) SPATIAL FILTERING TO A SHAPE
Inspired by [5], spatial filtering is also incorporated. Closing operation, which is defined as a dilate operation followed by an erosion operation using the same SE (structuring element), is used to generate coarser contours, as it can reduce some finer features and reserve coarser features of a shape. In this spatial filtering approach, the FDGF-SH feature of the level t coarser contour is represented as FDGF-SH-SFtC. To generate the level t coarser contour the original image is processed by a SE of level t size. The 'disk' SE is used in this approach.
VOLUME 8, 2020 The level t coarser shape is extracted from im t . Then, N sampling points are uniformly extracted from the contour of the level t shape to form c t sf . The FDGF-SH-SFtC feature is extracted from c t sf . The weighted city block distance is still used in the matching process. The distance between two shapes C 1 and C 2 in the FDGF-SH-SFtC feature space is expressed as D SFtC (C 1 , C 2 ). Different levels coarser shapes are shown in Fig. 6.

D. MULTISCALE FOURIER DESCRIPTOR
FDGF-SH features on multiple generated coarser contours can be used to generate the multiscale Fourier descriptor. It is obvious that the higher the level, the coarser the feature, and the lower the level, the finer the feature.
To achieve the goal of multiscale description, multiple FDGF-SH features are used together. FDGF-SH-MFtC, FDGF-SH-DStC and FDGF-SH-SFtC features are used to form multiscale features MSFDGF-SH-MF, MSFDGF-SH-DS and MSFDGF-SH-SF respectively. MSFDGF-SH-MF is used as an example to illustrate how to integrate. The minimum distance (MD) and the sum distance (SD) are two common approaches. In Eq. (10), D mf m is the MD between the MSFDGF-SH-MF features of two shapes.
Since the range of the distance at each level of the coarser contour is different from each other, α t m (α t m increases when t increases) is used to normalize the distances. S v is a set of the used values of t, which denotes the level of the coarser contour. Only a few levels are used in the matching process. D mf s is the SD as shown in Eq. (11). Since the importance of the distance at each level is different from each other, α t s (α t s decreases when t increases) is used to normalize the distances.
How to generate S v is a problem. For consideration of training speed, the Sequential Forward Selection method in [42] is used in selecting the combination of scales S v . First, 30% images in dataset are randomly selected as a training subset. The single scale with the highest accuracy is set as the starting point of the scale combination, and the remaining scales are set as candidates. Nextly, each single candidate is put into the combination by turn in order to find the best candidate, which makes the new combination obtain the highest accuracy. Then, this best candidate is put into the combination and removed from the candidates. This process of finding best candidate is performed iteratively until no new scale is put into the combination S v , which means that integrating a larger scale combination will damage the performance of the descriptor in terms of accuracy.
When the MD metric approach is used with three descriptors (MSFDGF-SH-MF+MD, MSFDGF-SH-DS+MD and MSFDGF-SH-SF+MD), the α t m is computed by the Eq. (12).
where the T m is the max level of the generated coarser contour. In SD Eq. (13) is used. In MD dif m ≥ 0, but in SD dif s ≤ 0 always.
In weighted city block distance w k (computed out by Eq. (14)) decreases linearly as k increases in both MD and SD.
MSFDGF is a framework to form a multiscale Fourier descriptor. In this framework, many GFs can be used and SH is just one of them. Three approaches in GCC (generated coarser contour) are used to implement multiscale description. MSFDGF-SH-MF, MSFDGF-SH-DS and MSFDGF-SH-SF are three descriptors using SH (shape histogram) base on the MSFDGF framework.

IV. COMPUTATIONAL COMPLEXITY
Eqs (15)- (16) are two commonly used formulas in this section. The To extract shape histograms of all coarser contours, MSFDGF-SH-MF spends O(N 3 ), MSFDGF-SH-DS spends O(N 2 ) (see Eq. (18)) and MSFDGF-SH-SF spends O(n s N 2 ) respectively, as the shape histograms of one contour cost O(N 2 ) [36].  BN log N ). Then The computational complexity in matching process plays a decisive role in online large database retrieval [43], therefore the computational complexities of some state-of-theart descriptors in matching stage is used to compare with MSFDGF-SH, as shown in Table 1. In Table 1, n s means how many levels are used in MSFDGF-SH-SF and it is smaller than 7 in experiments. B means how many bins in the shape histogram, and it is smaller than N usually. B a means how many bins in AP, and it is 24 in [42]. M b means how many bits the BAP festure has at the largest scale. In the experiments in [42], M b = 12. In HSC M h N , and M h = 7 in the experiments in [43].

V. EXPERIMENTAL RESULTS
The MSFDGF-SH-MF, MSFDGF-SH-DS and MSFDGF-SH-SF are evaluated in terms of both effectiveness and efficiency. These evaluating databases include MPEG-7 CE1 Part B, Kimia 99 [47], Swedish Plant Leaf [48] and Expanded Articulation Database [36]. All the algorithms are written using Matlab and run on a PC with Intel(R) Core(TM) i7-7700K 4.20 GHz CPU and 16 GB DDR4 RAM under Windows 10. As the DP part in SC+DP and IDSC+DP consumes large computation, it is implemented in C in order to be comparable to global descriptors like AP&BAP [42], HSC [43] and MSFDGF-SH. N is always 512 in all the experiments. In MSFDGF-SH-SF, to make one SE is suitable to all shapes, shapes are normalized to have a convex hull's area near 5000 [5]. The size of SE is 5t at level t coarser contour.
In the experiment on each dataset, the SFS technique is used to find a good combination of scales S v . K = 23, pwr m = 1, pwr s = 2, and the values of dif m and dif s (as shown in the Table 2) are set emperically.
A. RESULTS ON MPEG-7 CE-1 PART B SHAPE DATABASE MPEG-7 CE-1 Part B shape database [35], [36], [43] is widely used in shape matching research. This database contains 70 categories, each containing 20 different shapes, VOLUME 8, 2020 so this database contains 1400 silhouette images. Two examples in each category are shown in Fig. 7.
The test method is called ''Bull-eye test'' [35], [36], [43]. In Bull-eye test, a shape in the database is set as a query in a retrieval and matched with all the shapes in the database. The correct (that is the query shape and the retrieved one belong to the same category) matches of the top 40 most similar (smallest dissimilarity) shapes are counted. The number of correct matches divided by 20 is the score of a retrieval. The retrieval rate of Bull-eye test is the average score of all retrievals where each shape is set as the query in a retrieval.
Matching time is used to test the performance of each algorithm in terms of efficiency. Matching time refers to the time it takes to match the feature of the query to features of all shapes in the database.
The Precision-Recall curves of some descriptors are shown in Fig. 8. For quantitative analysis, the area enclosed by the curve and the coordinate axis is used to determine which the best is. The Table 4 shows that MSFDGF-SH-SF+SD has the biggest area 0.847558.    Fig. 9. This database is often used to test the classification ability of a shape descriptor. The test method in [36] is used to test the classification ability of the MSFDGF-SH descriptor. 25 images randomly selected from each species are used as models and the remaining images are used as testing images [43].   Matching time is also used to evaluate the performance of each algorithm in terms of efficiency. In the experiment of recognition, matching time refers to the time it takes to match the feature of a testing image to features of all model images.

C. RESULTS ON KIMIA 99 DATABASE
The Kimia 99 [47] database is a common database. This database contains 9 categories, each containing 11 shapes, as shown in Fig. 10. In this experiment, each shape is set as the query and matched to the remaining shapes. Then the correct matches of the top 10 most similar shapes of each query are counted. The post-processing algorithm LP (label propagation) [3] on shape retrieval performs well on this database when used in combination with IDSC+DP. In order to be fair, all algorithms are combined with LP.
In Table 6, it can be seen that MSFDGF-SH-DS+MD+LP In terms of matching time, the approaches using MSFDGF-SH (less than 1 ms), HSC (2.89 ms) and AP&BAP (9.57 ms) consume less time as global descriptors than IDSC+DP (392.05 ms) and SC+DP (592.73 ms) as local descriptors.

D. RESULTS ON EXPANDED ARTICULATED DATABASE
The Articulated database [36] is a database to test the articulation insensitivity of shape descriptors. It contains 8 categories, each containing 5 shapes. The Tools database [51], which has the same function, contains 7 categories, each containing 5 shapes. In our study, these two databases are merged into a new database Expanded Articulated Database. Obviously, Expanded Articulated Database contains 15 categories, each containing 5 shapes (see Fig. 11).
The test method is as the same as that in the experiment on Articulated Database in [36]. In this test method, each shape is set as the query and matched with other shapes in the database. Then the correct matches of the top 4 most similar shapes of each query are counted. The combination (IDSC) of SC and ID (inner-distance) [36] performs well on Articulated Database. ID can also be used in combination with MSFDGF-SH, so this database is also used to test the compatibility of MSFDGF-SH with ID.
In terms of matching time, the approaches using MSFDGF-SH (less than 2.5 ms), HSC (2.69 ms) and AP&BAP (7.52 ms) consume less time as global descriptors than IDSC+DP (308.56 ms) and SC+DP (467.21 ms) as local descriptors.

E. DISCUSSION
Three versions (MSFDGF-SH-MF, MSFDGF-SH-DS and MSFDGF-SH-SF) of MSFDGF-SH all exceed the classical IDSC+DP in terms of both effectiveness and efficiency on MPEG-7 CE-1 Part B shape database. Surprisingly, one version MSFDGF-SH-SF exceeds HSC, AP&BAP and even Shape Tree. This is the first time that a Fourier descriptor exceeds Shape Tree on this dataset in terms of both accuracy and speed. MSFDGF-SH performs better than IDSC+DP, SC+DP and AP&BAP on Swedish Plant Leaf. The performance show that MSFDGF-SH have strong robustness for different application scenarios. On other datasets MSFDGF-SH also performs effectively, especially On Kimia 99, on which all three versions of MSFDGF-SH achieve the perfect performance. On Expanded Articulated Database, all the 3 versions of MSFDGF-SH exceed IDSC+DP, which is so good at dealing with articulated shapes.

VI. CONCLUSION
AP&BAP [42] is a milestone for global shape descriptors. Many researchers have attempted to design effective global descriptors, however it is difficult to achieve both effectiveness and efficiency. The MSFDGF frame and MSFDGF-SH descriptor proposed in this article is a new attempt.
Our experiments show that MSFDGF is a flexible and effective framework to form a global descriptor. The descriptors using MSFDGF, such as MSFDGF-SH, are both efficient and effective.