Discriminative Features for Texture Retrieval Using Wavelet Packets

Wavelet Packets (WPs) bases are explored seeking new discriminative features for texture indexing. The task of WP feature design is formulated as a learning decision problem by selecting the filter-bank structure of a basis (within a WPs family) that offers an optimal balance between estimation and approximation errors. To address this problem, a computationally efficient algorithm is adopted that uses the tree-structure of the WPs collection and the Kullback-Leibler divergence as a discrimination criterion. The adaptive nature of the proposed solution is demonstrated in synthetic and real data scenarios. With synthetic data, we demonstrate that the proposed features can identify discriminative bands, which is not possible with standard wavelet decomposition. With data with real textures, we show performance improvements with respect to the conventional Wavelet-based decomposition used under the same conditions and model assumptions.


I. INTRODUCTION
In the current information age, we have access to unprecedented sources of digital image content. Consequently, being able to index and organize these documents based solely on the content extracted from the signals without relying on metadata or expensive human annotations has become a central problem [1]- [21]. In this context, an important task in image processing is texture retrieval. This problem has been richly studied over the last two decades with different frameworks and approaches [3]- [21], including, more recently, deep learning approaches [22], [22]- [26], [26]- [29].
In a nutshell, the texture retrieval problem can be formulated in two stages. The first stage, feature extraction (FE), implies the creation of low-dimensional descriptions of the image (i.e., the dimensionality reduction phase) with the objective of capturing the semantic high-level information that discriminates relevant texture classes. The second stage proposes a similarity measure (SM) on the feature space to The associate editor coordinating the review of this manuscript and approving it for publication was Gangyi Jiang. compare and organize the images in terms of their signal content.
For the FE stage, the Wavelet transform (WT) has been widely adopted as a tool to decompose and organize the signal content in sub-spaces associated with different levels of resolution (or scale) information [30], [31]. Based on this sub-space decomposition, energy features have been used as a signature that represents the salient texture attributes for texture indexing [32]. For the SM stage, numerous approaches have been proposed to compare images based on their feature representations using perceptual studies, heuristics and statistical principles [3]- [6], [9], [11], [14], [19], [33]- [35]. For statistical principles, we highlight the seminal work of Do and Vetterli [4] and its numerous extensions [12], [16], [19], [33]. They proposed an hypothesis testing (HT) formulation, where the optimal (joint) process for FE and SM is derived in a closed-form.
Convolutional neural networks (CNNs) have also been adopted for the task of image retrieval because of its great success in computer vision, including object detection and classification [37], [38], visual object tracking [39], [40], image segmentation [41], automatic image cropping [42] and saliency detection [39], [40], [42]- [45]).Deep learning models are often trained with larger datasets. However, in the particular case of texture retrieval, the available databases are often limited in size. For that reason, Liu et al. [22] adopted a pre-trained CNN to obtain features for the texture retrieval task. Despite all the advances in deep learning, the lack of bigger databases for texture retrieval has allowed traditional techniques to be still important for texture retrieval tasks [46], [47].
In this paper we follow the HT formulation presented in the seminal work by Do and Vetterli [4]. Motivated by the recent adoption of convolutional based deep representations (obtained for different layers), we revisit the FE phase by studying the rich family of Wavelet Packets (WPs). WPs are bases that offer a rich collection of decompositions of the image space in terms of space-scale components [30], [31]. This collection is efficiently constructed by several layers of convolutions (linear) and down-sampling (non-linear) operations using a two-channel filter (TCF) as a basic building block. From this hierarchical construction, WPs can be organized and indexed by a collection of embedded trees of different depths and structures. A particular example of the WP family is the Wavelet basis, which is obtained by iterating the TCF in the low frequency, therefore, producing a multiresolution partition of the image space. However, we can construct a rich collection of sub-space decompositions of the image space by iterating the TCF in different bands [31], gaining access to information from different layers in the context of the convolution-based network of representations that is used to create WPs bases [30], [31].This rich spacescale family of decompositions of the image space (organized in a network of quad-trees) can be used to find not only new convolutional-based features that could be more effective for texture discrimination, but also new representations that can be adapted to the task (i.e., learned from some available data).

A. CONTRIBUTION
The contributions of this study include the following: • We revisit the statistical framework in [4] to include a family of transform-based representations of the image space with a non-uniform number of transform coefficients per sub-band, and within this context, we introduce WPs for texture retrieval.
• We formulate the design of WP discriminative features (for texture retrieval) as a basis selection (BS) problem.
• BS is presented as a learning-decision task to identify the basis that offers the optimal tradeoff between approximation (feature discrimination) and estimation (complexity) errors.
• We show that BS is equivalent to a minimum costtree pruning problem with natural connections with the type of minimum cost-tree pruning algorithms used in classification and regression trees (CART) [48].
• A systematic experimental validation of the proposed WP solutions is presented covering synthetic data as well as four real datasets with different image sizes and texture classes. The results are promising and demonstrate the ability of our framework to obtain taskoriented discriminative features (learnt from data).
• Finally, we show that WP representations offer concrete improvements when compared with Wavelet based features in several scenarios. The new designed WP features explore new frequency bands that help to enhance the discrimination among texture classes.

B. ORGANIZATION
The rest of the paper is organized as follows: Section II introduces WPs and the setting of the HT problem. Section III presents the mathematical formulation of WP based texture retrieval and Section IV addresses the learning problem of WP basis selection. Section V summarizes the steps implemented in our methodology. Section VI presents the experimental validation with real and synthetic data. Section VII discusses the connection between our work and deep learning representations. Section VIII concludes our study providing final remarks and future directions for this research.

II. PRELIMINARIES
This section provides a brief background on WP and the statistical setting used for the indexing problem. Comprehensive expositions can be found in [30], [31], [49].
Finally, a basis in the WP collection is indexed by Importantly, for each (j i , p i ), we can obtain an equivalent fil- Here, we present a version of this result that considers a non-uniform rate of information (coefficients) per sub-band, which is needed for the case of WP texture analysis. Let us assume that µ θ i is equipped with a density function f θ i (x) 3 that is fully characterized by a vector of parameters θ i ∈ . Then, we use the likelihood log f θ (x) for each θ ∈¯ = {θ i : i = 1, . . . , M } to solve Eq. (3). Furthermore, we consider that is a finite dimensional vector of dimension L t ≥ 1 and, consequently, D = T t=1 D t . Consistent with this cartesian product splitting of X, we assume that each More specifically, Eq. (4) means that x 1 , . . . , x T is a set of independent but non identically distributed vectors, where each t component x t corresponds to i.i.d. realizations of the density f θ t i . We assume that the query image x comes from an underlying parametric model µ θ q ∈ P(X) that has the same independent structure stated for the database models, i.e., θ q = (θ 1 q , . . . , θ T q ) and x t,1 , . . . , x t,D t are i.i.d. realizations of f θ t q for each t ∈ {1, . . . , T }. In this context, if we fix t and we take D t −→ ∞ (and consequently D −→ ∞), the law of large numbers implies that [50] lim and h(f θ t q ) is the differential entropy [51] of f θ t q . Finally, assuming a non-uniform rate of convergence, we define for all t ∈ {1, . . . , T } such that T t=1 w t = 1. Then, asymptotically (as the number of observations goes to infinity) the ML principle in Eq. (3) reduces to the minimum weighted divergence decision: ∀i = 1, .., N . We will show that the non-uniform rate assumption in Eq. (6) is important for the adoption of this HT framework in the context of WPs.

III. WAVELET PACKET BASED TEXTURE RETRIEVAL
We contextualize the framework in Section II-B for texture indexing when images are represented in a WP domain. For that, we propose an extension of the inter-band independence texture model structure in [4], [16], [33] considering the sub-space partitions induced by WPs. First, we introduce some important notations and nomenclature to formalize the idea.

A. ROOTED TREE-REPRESENTATION
We adopt a rooted tree notation for the WPs family. A treestructure creates a particular WP basis by iterating the four-channel filter (FCF) as illustrated in Fig. 2b. Let J denotes the maximum number of iterations of the sub-band and E be the collection of arcs on V × V that characterizes the full balanced rooted tree with root v root = (0, 0) in Fig. 1.
Instead of representing a tree as a collection of arcs in G, we use the convention used by Breiman et al. [48] in which sub-graphs are represented as a subset of nodes of the full graph. In particular, a rooted quad-tree T = {v 0 , v 1 , . . .} ⊂ V is defined as a collection of nodes: the root, internal nodes and leaf nodes. We define L(T ) as the set of leaves of T and I(T ) as the set of internal nodes, where consequently T = L(T ) ∪ I(T ). We say that a rooted quad-tree S is a subtree of T if S ⊂ T , and if the root of S and T are the same. Then, S is a pruned version of T , denoted by S T . If the root of S is an internal node of T , then S is a branch of T . For any v ∈ T , we denote the largest branch of T rooted at v by T v . The size of a rooted quad-tree T is the cardinality of L(T ) and is denoted by |T |. Finally, T full ≡ V denotes the full quad-tree, and consequently, the collection of WP bases will be indexed by T ⊂ V : T T full . Any pruned version of T full in Fig. 2b represents a particular WP basis by the iteration of the FCF. In particular, for an arbitrary rooted tree T T full of size M , each of its leaf nodes represents a sub-space generated by the application of WP. Then, the sub-space decomposition produced by WP with tree-representation T is given by: Each of these sub-spaces is induced by a basis B p k j k with k = 1, · · · , M , so the WP basis induced by T is given by, Finally, for any x ∈ X, we can determine the transformed coefficients by projecting the image in the elements of B T (see Eq. (20) in Appendix). In particular, considering the realistic finite dimensional case, where dim(X) = 4 J (2D dyadic image) 4 for any (j k , p k ), the projection of x in U j k p k is determined by 4 J −j k transformed coefficients that are obtained by Eq. (21). For simplicity, we map the index (n 1 , n 2 ) ∈ 1, . . . , 2 J −j k 2 to n ∈ 1, . . . , 4 J −j k to represent the transformed coefficients as a 1D vector. Thus, the trans- and the transformed coefficients for the basis B T are given and represented by It is worth noting that the number of transformed coefficients of the node (j, p) ∈ V scales like 4 J −j . 4 Without loss of generality the pixel based representation of x corresponds to the transformed coefficients of the trivial WP basis

B. TEXTURE RETRIEVAL
Let us consider an arbitrary tree T and for any x ∈ X its pdf has the following product of marginal structure: where Eq. (11), the components of the image projected at the subbands of WP are independent, and within each sub-band its transformed coefficients are i.i.d. characterized by a pdf. For this pdf, we consider the simplest model adopted in [4], i.e., we consider the Generalized Gaussian model (GGM) with zero mean and parametrized by θ = (α, β) ∈ R 2 in the following way: Finally, it is important to note the scaling on the number of transformed coefficients for an arbitrary node (j k , p k ) ∈ L(T ). If we denote the size of the image by L = 4 J for some J > 0 (the dyadic case studied in Section III-A) then the size of the vector D p k j k (X ) is L/4 j k , which is an exclusive function of j k (the number of arcs that connects (j k , p k ) with the root (0, 0)). If T is not a balanced tree [48], [52], we have an asymmetric number of transformed coefficients per subband. This should be considered in the asymptotic connection derived between ML and the divergence principle in Eq. (7).
In the context of the texture indexing problem, we have M probability models representing the texture database  j k ,p k )) q ) k=1,...,K ∈ that produces a realization x. Then, the solution of (3), considering the regime L −→ ∞ and T T full , reduces to: from Eqs. (7) and (12). Finally, to implement Eq. (13) a first-stage (feature extraction) is conducted to obtain a sufficient statistic that summarizes the information of x in each of the sub-band indexed by (j k , p k ). In particular, the ML criterion is used to estimate θ Remark 1: It is worth noting the importance of the nonuniform scaling on the number of transformed coefficients considered in Section II-B, because it follows that w (j k ,p k ) = 4 −j k for all k = 1, . . . , K in (13). Consequently, for the selection of the closest M models, the terminal nodes that are closer to the root are more significant in the decision than nodes that are deeper in the tree. This observation is due to the number of coefficients of the first groups, which is orders of magnitude greater than the number of coefficients of the second group.

IV. WAVELET PACKET BASIS SELECTION
Any tree T T full in the family of WPs provides a valid representations for the indexing problem. This rises the problem of basis selection (BS). A clear objective for this task is seeking the basis that maximizes the discrimination among the texture classes considering the HT in Section III-B. However, texture discrimination is not an exclusive criterion for this task. The complexity of the tree also needs to be considered, as a large tree (in terms of the number of leaves) implies deeper leaves with their reduced number of transformed coefficients to estimate the parameters in the FE phase of the indexing task (see Eq. (13)). This issue rises the existence of a non-trivial estimation error in the FE phase that needs to be considered for BS. For that reason, the BS can be posed as a statistical learning problem that finds an optimal balance between an estimation and an approximation error [48], [52]- [54]. In particular, we state the following regularization problem whereR(T ) models the discrimination quality of the features induced by T and (T ) represents its learning complexity. λ is a regularization parameter that models the compromise between the fidelity and cost in this context. In particular, we adopt the tree size as a metric for (T ) = |T |, as it has been used in CART [48] and other tree learning problems [52], [55]- [58] to model estimation errors. The assumption here is that the deviation of the estimated parameters in Eq. (13) from the true parameters is proportional to the size of the tree [48], [52]. For the fidelity measure,R(T ), we consider a global indicator of pair-wise weighted divergence, used in Eq. (13), between classes given by

.,T
: c = 1, . . . , M denotes (in the simplest case) the selection of one model per class from the database. The use of the weighted divergence as an indicator of the discrimination capacity of the indexing task is justified from the Stein's lemma [51, Th. 12.8.1], where the weighted divergence determines the error exponent of the type 2 error given a fixed type 1 error in a two class (hypothesis testing) problem.

A. MINIMUM COST-TREE PRUNING ALGORITHM
The type of regularization problem stated in Eq. (14) has been addressed in the context of decision trees for which efficient solutions are available [48], [52]. For the application of these results in our context, the fidelity measureR(T ) must be additive [48], [52], 5 which follows from its construction aŝ withR((j t , p t )) defined as ,p t ) [48], [52].

{t} end if end for
Set k = k + 3 end while Ensure: Sequence of tree-structure decomposition of T k Finally, solving Eq. (14) requires to know the true tradeoff between fidelity and cost functions that we denote it by λ * . In this work, the selection of λ * is done considering an empirical risk minimization (ERM) approach over the admissible set of tree solutions given by {R 1 , . . . , R m }.

V. SUMMARY OF THE MODELING STAGES
The different stages presented in this work in Sections III and IV are summarized in Algorithm 2.

VI. EXPERIMENTAL ANALYSIS
In this section, we present the results for the family of adaptive trees obtained as the solutions of Eq. (17). 6 The complexity of this algorithm is O(|T full | · log(|T full |)). See details in [52] and reference therein.

Algorithm 2 Wavelet Packet Texture Retrieval Procedure
Step 1: Get images from textures without overlap.
Step 2: Compute Wavelet Packet coefficients from a full decomposition.
Step 3: Wavelet Packet decomposition is represented as a tree-structure, where each node contains coefficients.
Step 4: Compute the parameters α and β of the GGM that represent the Wavelet Packet coefficients.
Step 5: Compute the best basis representation T * , using Algorithm 1.
Results will be presented for different datasets to evaluate the adaptive nature of the framework and its performance as a function of the tree size. In particular, we will evaluate the family of WP solutions indexed by T k * , k = 3 l + 1 with l = 0, . . . , |T full | − 1 /3 in terms of the performance as well as the structure of their filter-bank decomposition and partition of the 2D frequency plane.

A. SYNTHETIC TEXTURE SCENARIO
Before presenting the results on real texture databases, we evaluate the adaptive capacity of our framework on a controlled (synthetic) two texture indexing problem. For this purpose, WP statistical models (following the GGM presented in Section III) were selected by simply choosing a balanced tree of depth 3. In this context, the statistics of the GGMs for each of their leaves were considered the same except for one of the terminal nodes, which is the node that (by design) offers the discrimination power for the task (see the expression in Eq. (7)). The idea of this evaluation was to localize all the texture discrimination information in a specific sub-band to see if the solutions in Eq. (17) promote more resolutions in this target frequency band as k grows. Synthetic samples were created by simulating the transformed coefficients of the two models. From these data, the fidelity measure was estimated using Eq. (15) considering the weighted divergence for 16 examples per class. Fig. 3 reports the estimation of the weighted divergence as well as the energy that was considered in this analysis as a reference (non-discriminative) fidelity indicator. In particular, these figures plots the term ,p t )) ) in Eq. (15) associated with the additive contribution of the node (j t , p t ) indexed by row j t and column p t . From these figures, it is possible to see how the weighted divergence appropriately captures the frequency bands that are more discriminative for the task. For the illustration, we consider two contexts where the discriminative bands are indexed by the pair (3,21) and (3,49), respectively. From the result, we see that our method captures the discrimination of this task in the right bands and, consequently, T k * , k = 3 l + 1 with l = 0, . . . , |T full | − 1 /3 increases the resolution in the bands that are more informative. In contrast, Figure 3 shows that the amount of energy in the leaves is not always mapped to the parents and also the modified node is not always the most energetic one. As expected, those results indicate that energy as a fidelity measure does not capture the most informative bands for the task.

B. REAL TEXTURE SCENARIOS
For the analysis on real texture datasets, we consider subsets from the VisTex [59], Brodatz [60], and STex [61] datasets. We also consider the ALOT [62] dataset. The subset of the VisTex database corresponds to the 40 color textures (of 512 × 512 pixels) used in [4]. The subset of Brodatz consists of 30 gray-scale textures of 640 × 640 pixels each, which is the setting used in [32]. Moving on to larger sets, we extract two collection of images from the STex and ALOT databases, respectively, to create the full STex, full ALOT, reduced STex, and reduced ALOT. The full STex and full ALOT contain 436 color and 250 gray-scale textures of 1024 × 1024 pixels and 1536 × 1024 pixels, respectively. The reduced STex and reduced ALOT consist of 40 color and gray-scales textures of 1024 × 1024 pixels and 1536 × 1024 pixels, respectively. Each texture from the VisTex and Brodatz datasets is divided into 16 and 25 non-overlapping textures of 128 × 128 pixels, respectively. Each texture of the Full STex, Full ALOT, Reduced STex, and Reduced ALOT is divided into 16 non-overlapping textures of 256 × 256 pixels. After these divisions, the color images are transformed into grayscale versions and, subsequently, all images in the collections are normalized to zero mean and unit variance, which is a standard normalization used in previous studies [4], [5], [8], [9], [63]. In total, we use six databases with different image sizes and number of textures, which offer a rich context to evaluate the potential of WP based texture indexing.
For each context, we divide the dataset in training and testing in a proportion of 3 : 7, where 30% of the data is used to find the tree-structure solving Eq. (17) and the remaining 70% is used for the evaluation of the texture retrieval. The training phase involves computing the fidelity measure in Eq. (15) to obtain the family T k * , k . The testing phase computes the performances of this family as a function of the size of the tree. For the retrieval performance, we use one query example per class in the dataset and we use the standard average recall metric adopted in [4] given by where C is the number of total classes in the dataset (in our case C = {40, 30, 436, 250}), M is the number of examples we have per class (16 and 25 depending on the dataset) and k m,c denotes the true class of the M closest retrieved image when the query image belongs to the class c. Finally, the complete analysis uses Daubechies 4 as the mother Wavelet basis because it is the one that shows the best performances for our analysis.

1) ANALYSIS OF THE SUB-BAND MODEL FITTING
Considering that WPs offer a rich range of filter-bank decomposition, we numerically evaluate the fitting of the GGM considered in Eq. (12) to model the statistical dependencies of the transform coefficients in each of the induced subbands as considered in Section III. We use the ML estimator VOLUME 7, 2019 FIGURE 4. Histograms and ML fitting of the WP sub-band coefficients associated with the nodes indexed by (2,4), (2,5), (2,6), (2,7), (2,12), (2,13), (2,14), (2,15). The Generalized Gaussian Model in Eq. (12) is used for the histogram fitting. The image 20_c1l1 from ALOT database is used for illustration. of its parameters for numerous sub-bands and textures in our rich collection of datasets. In general, it was observed that the GGM model captures the marginal statistics of the transformed coefficients of any arbitrary band and, consequently, the modeling extension adopted in Section III seems reasonable. For illustration, we present some of these fittings in Fig. 4.

2) WAVELET PACKETS RETRIEVAL PERFORMANCES
We present the retrieval performance associated with the solutions of Eq. (17) for each of the datasets adopting the weighted divergence in Eq. (15). For completeness, we also consider the tree solutions obtained for the same regularization problem but adopting the non-weighted divergence and the energy per-band as alternatives fidelity measures. Fig. 5 shows the retrieval performance as a function of the size of the tree for the three fidelity measures. The dashed lines correspond to the wavelet decomposition, which serves as our baseline. Overall, we observe the expected tradeoff between estimation and approximation errors in the evolution of each of the performance curves. At the beginning, more complex trees significantly improve the retrieval accuracy. Then, the estimation error dominates, implying a saturation, which leads to a drop of the retrieval performances as k increases. It is important to note that these changes in the regime of the performance curves are a function of the complexity of the task. For the smallest (VisTex and Brodatz) and largest (STex and ALOT) image size datasets, these changes occur in the range [10 − 34] of the tree sizes.
From these curves, we can determine the trees that offer the optimal balance between estimation and approximation errors and, consequently, the best performance for the tasks. In general, the solution obtained with the proposed weighted divergence shows one of the best performance in almost all the scenarios. There are three exceptions that occur for the Full ALOT, Reduced ALOT and Reduced STex. However, the performances deviate from the best solution relative for each case (obtained with the energy fidelity and non-weighted divergence) by only 0.76%, 1.58%, and 0.31%, respectively. It is also interesting to note that WP solutions obtained with the energy as a fidelity measure present competitive results in all the scenarios. The exception is the VisTex and Reduced ALOT datasets where the non-weighted and weighted divergences show the best performance, respectively.
Finally, we provide the performance of the Wavelet representation as a baseline for performance comparison (see dashed lines in Fig. 5). Most notably, we confirm our conjecture that the family of WPs and their richer sub-band decompositions offer representations with relevant improvements in indexing performance. This gain is more prominent in the case of the largest datasets, where alternative WP trees show more significant gains with respect to the conventional Wavelet solution. For the case of smaller databases (VisTex and Brodatz), the improvements are not significant, which can be explained from the observation that in the context of smaller texture images WPs do not have room to take advantage of the discrimination power of non-conventional (Wavelet-type) frequency bands as we rapidly move to the regime were estimation error dominates the performance curves. Interestingly in this context, the smaller-sized trees (sizes 4, 7 and 10) matches the Wavelet solution and, consequently, we recover as part of our basic selection formulation the conventional Wavelet solution [4]. Table 1 summarizes the best performance obtained (optimal trees) for each fidelity measure and the performances of the Wavelet solution. The last column of Table 1 shows the gain in relative percentage   with respect to the Wavelet representation, which shows that the best improvements in retrieval performance are achieved for the larger-size texture datasets.

3) ANALYSIS OF THE OPTIMAL TREE STRUCTURE OF WPS
The filter-bank decomposition of the best WP solutions for each database are presented in Fig. 6. We observe that the solutions follow a Wavelet type of path as low frequencies are iterated in the decomposition at the beginning of the WP decomposition. However, for many of the scenarios, there are non-trivial deviations from the Wavelet structure as other frequency bands are iterated in the process of creating the optimal trees. These non-Wavelet type bands offer better discrimination than the recursive iteration of the low frequency that defines a Wavelet decomposition.

VII. DISCUSSION: CONNECTION WITH CNN
CNN can be seen as a filter-bank strategy (convolutionbased), equipped with a deep architecture and some nonlinear stages designed to learn from data representations that captures salient features in low dimensionality. In the standard and ideal use of CNN, the representations are learned from supervised data in an end-to-end manner by minimizing a loss function (for example the cross entropy or Kullback-Leibler divergence) [37], [41], [43].
The WP framework presented in this work offers some interesting connections with deep learning features obtained from a pre-trained CNN. In terms of structure, our WP-based features share both the convolutional (linear) stage and the nonlinear stage (attributed to a dimensionality reduction phase) presented in CNNs. In our setting, however, a statistical-based reduction is performed for each of the WP sub-bands by estimating the parameters α and β of the GMM models to reduce dimensionality. This lossy reduction per sub-space comes from our model assumption, in the sense that the α and β parameters are sufficient statistics when minimum probability of error criterion is adopted. Much work remains to be done to fully explore the connection between our framework constructed over the idea of WP basis selection with an MPE criterion and the datadriven approaches used to adapt pre-trained CNN for image indexing [22], [23], [26], [29].

VIII. CONCLUSION AND FUTURE WORK
This work shows performance improvements for texture retrieval from the design of new filter-bank discriminatory features using the rich collection of Wavelet Packet (WP) bases. The tree-indexed WP collection was used to find an adequate balance between feature discrimination and learning (over-fitting) complexity. It should be noted that the WP filter-bank structure was central to address the problem of optimal representation as a minimum cost-tree pruning algorithm, which is reminiscent of the solution proposed in the context of classification and regression trees (CART). We show how adaptive the proposed WP solution is to the nature of the problem in terms of the number of classes and the size of the image (aspects of the problem that are tightly related with over-fitting). Overall, WPs offer features that outperform the Wavelet representation as evidence that the exploration of different sub-bands offers better texture discrimination than the standard Wavelet filter-bank partition.
This work is focused on the signal representation aspects of the transform-based representation. For representation analysis, we adopted one of the basic model structures used in [16], [33]. We leave the investigation of more complex and sophisticated models for future work. Along these lines, we conjecture that the methodology presented will benefit from the adoption of more sophisticated texture models that consider non-trivial spatial intra-band dependencies, interband dependencies [13], [16] and more complex parametric distributions [12], [33], [64], [65]. Another interesting direction for future work would be to derive precise expressions for the parameter estimation error taking into account the observation that deeper nodes have less data for the estimation of the model parameters.
Eq. (22) is linear but not time invariant. Therefore, it is inaccurate to talk about the frequency response associated with the process of projecting (x(t 1 , t 2 )) into the WP sub-space U p j . Pavez and Silva [55] addressed this issue by considering only the equivalent filtering part in Eq. (22). This consideration offers a characterization of the frequency content associated with each sub-space, from which we can define the frequency decomposition achieved by a given WP basis.