The Structural Information Potential and Its Application to Document Triage

This article introduces structural information potential (SIP), a measure of information based on the potential of structures to be informative about their content. An example of this concept is the clustered appearance that typically characterizes the first page of scientific articles, which summarizes the article’s contents and provides additional data, yielding potentially the largest and most diverse amount of information from a single page in the shortest time with the least effort. This characteristic makes SIP particularly well-adapted to triage tasks (i.e., rapid decision-making under conditions of uncertainty and limited resources), an application illustrated by means of a case study on classifying document images. The SIP method consists in unifying the Shannon entropy, the Fourier transform, the fractal dimension, and the golden ratio into a single equation and several algorithmic components. While the application domain is document images, the concept has generic character. The method results in a mathematically and perceptually coherent pattern space, characterized by continuous transition between uniform, clustered, and regular configurations, and corresponding to a structural information potential with a well-defined maximum. The maximum SIP leads to the identification of shapes and patterns with minimal structural redundancy, termed “fluorescent objects” as a complement to regular graphs and the Platonic solids.


I. INTRODUCTION
T HIS article introduces a new measure of information, the Structural Information Potential (SIP). SIP defines information as the potential of structures to be informative about their nature and utility. A readily available example is the clustered appearance typical of the first page of scientific articles, such as this one; these pages summarize the article's content and provide additional metadata, thus yielding potentially the largest and most diverse amount of information from a single page in the shortest time with the least effort. The optimality of the data transmission rate, as influenced by the structure of the communication channel represented by documents, is also a critical factor in document triage, the main application studied in this article to evaluate the SIP measurement method.
In the document domain, the above observation is supported by compelling empirical evidence from the practice [1] [2] [3] [4], theory [5] [6] [7], psychology [8] [9] [10] [11] [12] [13], and history [14] [15] of document design (these references are a small representative sample from a much larger bibliography). For example, much of the effort invested in microtypography [16] [17] (already an aesthetic quality and commercial factor in the early days of typography) concerns the spacing of characters (kerning) and words (justification), character ligatures, hyphenation rules (ladders), trailing paragraph lines (widows and orphans), and so on, for the purpose of producing visually homogeneous pages of text (a characteristic denoted as "page gray") [18] [19]. The principal motivations for this painstaking effort are the desire to keep semantic units visually grouped, as well as to prevent vertical streaks (rivers) from emerging due to chance alignments of spaces or similar letters, drawing the reader's attention towards spurious shapes devoid of content-related information (see the six lines above) [20]. Conversely, an intentionally clustered page pattern is the result of graphic designers arranging distinct informational units so as to augment hierarchical and typological distinctiveness; the goal is to improve legibility, speed up access to information, and guide the reader's gaze with minimal interference. This strategy for written communication is a product of evolution, a centuries-long shift away from homogeneous layouts driven by the increasing availability of written information. Clustered patterns can also emerge as a natural part of the document lifecycle; these may be introduced in the post-production stage either intentionally (e.g., by layers of annotations) or accidentally (e.g., due to physical degradation). In the absence of specific search goals or prior knowledge about the content, the most informative documents are those with clustered patterns.
It is possible to generalize beyond documents, given that uniform and regular signals, images, objects, and events are in general less informative than structured entities. The paradigm resulting from these insights postulates a correspondence between informativeness, structure, a uniformclustered-regular pattern continuum, scale-space filling, and structural redundancy. This article accordingly aims to devise a quantitative pattern description method for ordering patterns along said continuum, and further develops a conceptual framework to aid in identifying structures with minimal redundancy.
What this approach cannot provide is an estimation of information potential in the absence of structure, or a semantic content analysis. SIP is no substitute for text recognition and visual scene interpretation. Instead, its purpose is to characterize information at the level of structural organization.
Great strides in characterizing structural informativeness have been made in various scientific fields. However, the application of the proposed solutions to the task of classifying images, particularly text-based document images, has been found to be insufficient, as no prior approach has been able to satisfy both mathematical and perceptual desiderata. The present article substantiates this claim and presents a solution.
In essence, the proposed method characterizes a distribution in the scale-space domain with respect to the degree of redundancy. This characterization is achieved by unifying a number of classical concepts from the fields of information theory (Shannon entropy), pattern analysis (Mandelbrot's fractals), signal processing (the Fourier transform), combinatorics (the golden ratio), and graph theory (the chromatic concept) into a single analytic formula and several algorithmic components.
The application domain of the method is restricted in this article to images, more specifically to text-based document images. Given the generic nature of some of the core concepts, the concluding section considers its application to other media, such as video and three-dimensional data.
The relevance of the present article stems from the status of information measurement as a fundamental theoretical and practical issue across a broad range of scientific and technical fields. Its foremost contribution consists of an operational method to measure informativeness from structure, for application to images and possibly other data types. The elaboration of the pattern phase space and the identification of fundamental shapes with minimal structural redundancy are further theoretical contributions.
This article demonstrates a practical application of SIP to document triage via a real-life case study. The task in question is a type of classification similar to triage in emergency medicine, defined as rapid decision-making for critical matters under conditions of uncertainty and with limited resources. While triage cannot yield optimal solutions due to the constraints under which it operates, it is a useful and sometimes necessary step before more sophisticated procedures (such as semantic document analysis) are implemented. The proposed method fulfills the task in an explainable way, with reduced algorithmic complexity and assumptions. Section 2, "Related work", presents advantageous and limiting factors of major approaches to information measurement, explaining the need for a new method, while introducing elements used in the proposed method. Section 3, "Method", is the theoretical core of the article, in which the analytical formula and algorithmic components of the Structural Information Potential measurement method are described and justified; the design of structures with minimal redundancy is also discussed. Section 4, "Experiments", integrates the proposed method into a cybernetic perspective, by demonstrating how it helps translate patterns to information in support of a decision-making process. Section 5, "Discussion", concerns technical matters and future work. Section 6, "Conclusions", summarizes the theoretical and practical significance of the proposed method and applications.

II. RELATED WORK A. GENERIC QUANTIFICATION OF PATTERN IRREGULARITY
Information theory has a central place among the domains relating pattern structures to informativeness. It is rooted in the Shannon entropy, H [21], which defines the amount of information, uncertainty, and choice, and quantifies it via the well-known equation H = − n k = 1 p k log 2 p k , where p k is the occurrence probability of data class k from among n classes [22, pp. 393]. The normalized variant is given by the relative entropy, H r = H/H max , H max = log 2 (n) [22, p. 398]. This formulation produces a data distribution in which the extrema are on one hand the uniform distribution of values over all possible classes (H r = 1), and on the other hand the concentration of values into a single class (H r = 0). In the context of grayscale images, these correspond respectively to a uniformly monochromatic image and an image in which the number of present gray-level values equals the number of pixels. The application domain of this equation is nominal data, i.e., independent categories, such as the values of a fair dice. Therefore, this equation (and many of its variants) is not directly applicable to ordinal and structural data; given that the measure is independent of the configuration of sampling points (such as pixels in an image), it is not suitable for the goal pursued in this article.
Nevertheless, it is worth mentioning some notable entropy definitions as a way to exemplify the extent and vitality of research in this field, to point to research directions, to underscore the interdisciplinary character, and to identify ideas related to this article's topic. The Rényi and Tasllis entropies are generalizations of the Shannon entropy and define a parametrized family of entropies [23], [24]. Aiming at quantifying biodiversity, the field of ecology has contributed to the research on entropy with rich theories of diversity, as well as with formalisms, such as allowing for weighted probability classes, which express the similarity between species [25], [26]. From the field of nuclear physics comes the strength of structure [27, pp. 137-144] defined in 1939 by Satosi Watanabe (then a student of Werner Heisenberg) as J, the difference between the sum of entropies of u parts of a system, each containing v u entities, and the entropy of the whole, containing n = u i = 1 v ui : This method has been utilized in document layout segmentation [28]. The inclusion of both parts and the whole in the characterization of structures is also important to the concepts addressed this article. Approximate Entropy (ApEn) is a measure of "irregularity" created to extend information entropy to structural data [29] [30] [31]. ApEn has been refined through many parametric [32] [33] and algorithmic variants [34] [35] [36] and compares well with other methods [37] [38]. From its inception onwards, it has been successfully applied to various biosignals [37] [38], and later to a broad range of other applications such as online signature verification [39], speaker recognition [40], radar jamming [41], earthquake prediction [42], and cryptography [31] [43]. In brief, ApEn is obtained by sliding a signal over itself, measuring the distance between the two within a window of given length according to the difference between the maxima, and computing a logarithmic average; this process is carried out for two different window lengths and the irregularity index is obtained as the difference between these partial results. For a finite discrete signal, there exists both a lower and upper bound; this is zero for perfectly periodic patterns, with higher values indicating greater irregularity. One very interesting aspect of this measure is that it is possible to compute number sequences with maximal irregularity (this upper bound is neither white noise nor a deterministic fractal) [44] [45]. The method may be extended to the analysis of images using vectorized bidimensional windows [46] [47]. This approach, in addition to improved parametrization [32], was used in this article to process document images. The results are discussed in Section 4, "Experiments", and illustrated in Fig.  6. While clustered documents are classified at one end of the spectrum, as desired, a mix of uniform (predominately empty) and homogeneous (predominately text) patterns appear at the other end, which is undesirable, since these patterns differ both in terms of appearance and information potential. This counter-intuitive behavior has been explained in the past by the observation that ApEn is a measure of irregularity rather than complexity [48].
Fractals [49] are deterministic or stochastic self-similar or self-affine mathematical objects with an interesting property from the point of view of information theory: their clustered structure, which results from their infinite filling of the scalespace, maximizes their information potential. This is one of the reasons why fractal-like structures abound in nature; for example, the energetic intake of plants is optimized via the organization of branches and leaves around stems according to a power law [50]. One of the most reliable methods for determining the fractal dimension, D, uses spectral analysis and defines it as the slope, β, of the linear fit of the log-power spectrum vs log-frequency: D = (c + β)/2, where c = 6 for an image, and c = 4 for a signal [51, pp. 97-114]. Two requirements must be satisfied if the measured entity is to be deemed a fractal: namely, the phase must be uniformly random [51, pp. 99] and the power spectrum must follow a power law, 1/f β [52] [53]. By definition, this is not the case for nonfractal structures that are nevertheless clustered, and even less so for other pattern types along the uniform-clustered-regular continuum. The use of the fractal dimensions is therefore not appropriate for use in characterizing patterns of such a broad spectrum. However, the concept of fractality does provide a useful framework for thinking about SIP, especially as concerns its maximal value.
Spectral analysis is useful for the characterization of clusteredness. This is because, unlike spatial entropy, the frequency domain captures spatial organization in a compact manner that is amenable to mathematical manipulation, and also captures scale variation (in a similar way to the fractals). Two popular spectral analysis methods are spectral entropy and spectral flatness.
Spectral entropy, H s , is [54] defined as the Shannon entropy of the probabilities, P , associated with the frequency components of the power spectrum, S, given by the discrete Fourier transform, F, of the data, X, of length n: Variations of the definition include obtaining the power spectrum from the discrete cosine transform or by way of autocorrelation, along with using entropies other than Shannon's. Understanding the properties of these analytical expressions so as to be able to thoroughly explain their effects on empirical data remains an active research field [55]. This is especially critical for the analysis of biomedical data, a major application domain of spectral entropy [56]- [58], where it is used in particular for the clinical interpretation of EEG signals (e.g., for monitoring depth of patient sedation during anesthesia [59]). Audio signals analysis (e.g., urban soundscape classification [60], dolphin whistle segmentation [61], abnormal milling sounds detection [62]) and speech analysis (e.g., speaker identification [63], noise quality assessment [64]) are other common application domains. Spectral entropy has been less frequently applied in image processing, but is used for image quality assessment [65], scene saliency analysis [66], and camera focus estimation [67]. The common goal is the need to distinguish between regular and irregular patterns, where the latter are usually of interest, a task for which spectral entropy has been found useful. The problem remains that "irregularity" has many possible formal definitions, and means different things for different data types, tasks, and contexts; moreover, VOLUME 4, 2016 the spectral entropy equation has its own peculiar effects on the data, with the interaction between the two not always being well understood. When applied to image classification, for example, it results in a mix of homogeneous and uniform patterns (see online Supplemental Material).
Spectral flatness, or SF [68, pp. 112-115] [69], [70], is a widely used measure of signal structuredness [71]- [73] (e.g., as an audio descriptor in the MPEG-4 file format [74]). It is defined as the ratio of the geometric and arithmetic mean of a signal's power spectrum, S, taken over its n frequency components: Its principal utility comes from the bounds of the expression being zero for an impulse and one for a uniform distribution in the frequency domain. This method also results in a perceptually unsatisfactory classification of images (Fig. 6). Point set analysis and spatial statistics are concerned with the characterization of the distribution of point-like objects and events in space [75] [76] [77], while the related field of discrepancy theory deals specifically with characterizing the irregularities of distribution [78] [79]. Geographical information systems, ecology, astrophysics, and material science are among the typical application domains. We will herein briefly focus on the latter, as it bears a direct relation to both the topic and methods of the present article. Based on the empirical observation of the structure of physical matter, the continuous pattern space extending from uniform to clustered to homogeneous has been identified as a useful classification concept in material science to characterize such properties as surface roughness and particle dispersion. Much effort has therefore been invested in quantifying these patterns, with some of the classical methods being based on the nearest-neighbor distribution, morphological operations (dilation followed by counting), and Dirichlet tessellation [80]. To date, some of the best-performing approaches rely on spectral fractal analysis, using variations of the methods described in the preceding paragraphs [81] [82, pp. 81-98] [51, p. 108]. The utility of the fractal paradigm has however been called into question by practitioners on practical grounds, due to the difficulty of measuring fractality, as well as on theoretical grounds pertaining to its appropriateness as a model of the observed data [83] [51, p. 109]. In summary, we note that spectral analysis is a powerful method of characterizing structures, that better methods are required, and that methods and data must be compatible.
Graph theory can be applied to model discrete patterns, such as the individual pixels of digital document images, as well as the visual and logical entities of document layouts [84]. A dynamic sub-field studies irregular or color graphs, whose properties derive from the value of their edges; for example, a rainbow graph is one with distinct edge values [85] [86]. The topic relates to this article not only because of its focus on pattern irregularity (as opposed to regular structures, such as the Platonic regular bodies), but also because it deals with determining maximal irregularity, which the approaches reviewed above have yet to achieve. After an extensive literature survey, however, it was not possible to find previous work on maximal irregularity applicable to patterns such as document images, even for basic shapes (such as the triangle). Furthermore, another well-known practical element makes the graph theory approach to pattern classification problematic: the degree of computational complexity for data with millions of sampling points, such as document images, creates high computational costs, especially in comparison with other methods such as spectral analysis.
For the sake of completeness, it is worth mentioning other important fields that deal with irregularity (in the graph theoretical sense) and redundancy (in the information theoretical sense): combinatorics (particularly combinatorial geometry) [ [95].
As a concluding remark, it can be stated that a common goal of the reviewed perspectives is the characterization of structuredness, as distinct from uniformity, regularity, and randomness, and under the moniker "information," as a proxy for the utility that can be derived from pattern analysis. The goal is thus doubly defined in terms of methods and applications. The review has highlighted how a thorough understanding of the empirical application domain supports the development of successful theoretical methods. This insight is reflected in the work reported herein, particularly in the description of the design, psychology, history, and life cycle of documents that has shaped the theory of Structural Information Potential.

B. LAYOUT-BASED DOCUMENT INFORMATIVENESS TRIAGE
The classification of documents with respect to their informativeness is a task common to a number of applications, notably document overview, retrieval, summarization, and presentation. The problem becomes more critical as the number of documents to be processed increases (e.g., within organizations, libraries, and archives), but is equally relevant for a single document, such as when browsing a digital book. Informativeness is a relational quality, in that it depends on both the stimulus and the observer. In cases where prior knowledge about the user is lacking, or where there are many users with heterogeneous interests, a user-independent estimation of document informativeness is necessary. An additional constraint is the speed with which users may acquire the information presented to them. From a statistical perspective, the most commonly applied solutions rely on some form of data summarization and sampling.
Summarization consists in beginning with a given document set or item and synthesizing a new, more compact one that retains the characteristic features of the original. For text documents, this typically involves removing text chunks so as to reduce the overall redundancy and fit the text in a smaller spatial frame, producing something like a more or less extended abstract or even a title [96] [97]. Image summarization is conceptually similar but more difficult to realize; for example, representing a person by her or his face and a wood by a single tree [98] [99], removing empty areas from document pages [100], replacing a color image with a blackand-white line art sketch, or reducing an image to its dominant colors [101]. Documents representing three-dimensional data (e.g., buildings or landscapes) and multimodal documents (e.g., videos) are far more complex to represent compactly (a high-quality movie trailer goes beyond simple cut-and-paste; it is an artistic project in itself) [102] [103].
This article's contention that spatial organization is informative has also been applied to document summarization. For example, this author has introduced the Document Towers visualization paradigm, which represents the three-dimensional structure of bounding boxes of paragraphs, images, and other entities in paginated documents as architectural wire-mesh models that resemble buildings and cities [104]. The quantified Structural Information Potential is encoded as a colorcoded "ribbon", which allows users to take stock of features such as document fragmentation, regularity, and outliers without opening the document itself. In addition to facilitating overview and navigation, this information enables document type, quality, and other insights to be inferred, often serendipitously.
Sampling differs from summarization in that it does not create new entities but instead aims to identify a limited amount of existing document parts that are representative of the whole. A further difference can be compared to the distinction between statistical expectation and the probability density function: while summarization often ends up presenting the average content or the most informative (e.g., the table of contents), sampling may provide the full range of content types (e.g., cover, text, figure, index). Semantic layout analysis is therefore a dominant method used to sample document images, as it benefits from not requiring character recognition (a substantial argument in support of its performance and quality, particularly given that noisy, historical, and/or handwritten documents still present challenges for this process) [105] [106]. Another sampling approach is pattern-driven; for example, a handwriting dataset can be compactly represented by a few "vantage point" samples [107]. This is a classical pattern classification and clustering problem, resulting in an ordering specific to an application domain and dataset. A good example of the complexity of defining interestingness is given in [9], where the interaction of factors as diverse as color, layout, content, reading ergonomy, and readership is analyzed. The contrast along multiple dimensions in terms of types and number of entities between neighboring document pages has also been used as a criterion for informativeness, in the framework of Shannon's information theoretical definition of information as the amount of "surprise" [108]. The work presented in [109] is the closest to a pure pattern-based measure of document informativeness such as that described in this article. Although it aims at being a fast, simple, and approximate method (in the spirit of the triage task), it does not address spatial organization. Instead, page informativeness is quantified as the degree of (chromatic) saturation and (achromatic) lightness computed over the connected components of binarized pages and weighted by their size.
This succinct survey reveals how the quantification of informativeness has been approached at various points on the spectrum, ranging from pattern to semantic to contextual analysis. However, the quantification of spatial organization and the derivation of insights therefrom remains a fruitful research direction for computational document analysis.

III. METHOD
The empirical insights expounded in the introduction suggest that pattern informativeness varies along a uniform-clusteredregular continuum. The goal of this section is to introduce a quantitative description method that facilitates the ordering of patterns according to their Structural Information Potential. A discussion of maximal SIP, and its correspondence with minimal structural redundancy, is also presented.

A. STRUCTURAL INFORMATION POTENTIAL
The core machinery of the Structural Information Potential measurement consists of the Shannon entropy of the logarithm of the power spectrum of a binary image's Fourier transform. Formally, the value SIP ∈ [0, 1] is obtained through a number of equations and algorithmic steps: where I is an intensity image, B a binarization process, F the image's Fourier transform [110], | · | 2 the power spectrum (S), n the number of image pixels, V a vectorization algorithm for the input matrix (described below), H r the relative Shannon entropy, T a transfer function used for convenience purposes (also described below), and dSIP ∈ [−1, +1] the divergence from maximum SIP . The utility of dSIP is to concomitantly provide information about the intensity of the information potential as given by the absolute value, |dSIP |, and locate the pattern type on the uniform-clustered-regular continuum, made explicit by the signum, sgn dSIP .
Rationales -The choice of the frequency domain for pattern analysis has a number of benefits [111]: it facilitates the integration of sample points across space, and thus the description of spatial structures; it enables characterization of the degree of clustering via the scale-space spectrum of frequencies; finally, it allows for a pattern phase space to be obtained for which the maxima correspond to regular and uniform patterns in the spatial domain.
The use of the logarithm of the power spectrum stems from the method (outlined above) for determining the fractal dimension and is intended to allow the characterization of clustered patterns. Normalizing the power in Eq. 5 makes it independent of image size; adding one to the squared magnitude avoids the logarithm of zero and negative output values. To put SIP in a fractal perspective, SIP is a measure of how far a pattern is from potentially being a fractal.
In addition to having a dimensionality reduction effect, the role of the Shannon entropy is to help devise a linear space VOLUME 4, 2016 in which patterns are ordered from uniform to clustered to regular. This can be achieved if a transform is found such that the extrema of the pattern space correspond to a vector of zeros except for one value (i.e., an impulse) and a vector with equal values (i.e., uniform), respectively; in that case, the entropy will range from zero to one.
Enabling operation -A uniform signal in the time domain has as its pair in the frequency domain an impulse with frequency of zero [112, p. 33]. As this frequency is ignored in the SIP measurement method, the Shannon entropy value of the remaining power spectrum will also be zero, as desired. Considering that the Fourier transform uses sines and cosines as base functions, the sinusoid is the regular pattern with the fewest spectral artifacts (e.g., aliasing, harmonics, Gibbs effect [113, pp. 194-200, 218-222]). It becomes an impulse in the frequency domain [112, p. 29], and thus has zero entropy and represents an undesirable outcome for our goal. However, by binarizing the sinusoid, a rectangular pulse train is obtained, which corresponds in the frequency domain to a set of harmonics of the form 1/(f π), f ∈ N odd [114, pp. 102-113], the effect of which is to increase the signal's spectral entropy. As the spatial length of the pulses expands and the pattern becomes more uniform within the finite signal bounds, the power distribution is increasingly compressed towards the lower frequencies [113, p. 201] and the spectral entropy decreases, as desired.
In summary: (a) we translate the problem of devising an analytic formulation of the pattern-informativeness space in the frequency domain in order to be able to characterize spatial structures; (b) we employ the entropy because its extrema are the impulse and the uniform distributions; (c) we apply data binarization to transform the spectral representation of regular patterns from the impulse to the uniform distribution, so that the spectral entropy yields the desired uniform-clusteredregular continuum. Taking advantage of the spectral artifacts introduced by binarization is key to obtaining measurements of the spectral entropy that order patterns in a perceptually consistent manner.
The global Otsu binarization algorithm [115] has been used to produce the SIP measurements presented in the figures of this article. This general-purpose method was appropriate for our intention to preserve document noise, a particularly important aspect of the card images presented in the case study, as noise was found to have a direct impact on the quality of the optical character recognition. In the case of the head pictures of Fig. 10, however, the background shadow was impinging on the preservation of facial details that were the focus of interest during binarization; for this reason, the locally adaptive algorithm of Raleigh was chosen [116]. -Conversion from color images to grayscale is realized by converting the images to the perceptual CIELAB color space and extracting the lightness channel, L* [117, pp. 30, 95, 200-212]. The binarization step of the SIP measurement is not necessary for data that is already binary.
Supporting elements -The reduction of the power spectrum dimensionality to a vector is performed to ensure the rotation-independent measurement of patterns. The step consists in averaging data points of identical frequency. Due to quantization, however, digital images have sparse spectral representations; for example, the lowest frequency is expressed only at two orientations, 0 and π radians, in the usual Cartesian representation of the Fourier transform. To guarantee sufficient data and avoid flattening the spectrum (which artificially increases entropy), the frequencies are therefore rounded to the nearest integer prior to averaging the corresponding power spectrum values. We also disregard the direct component (DC; it has a frequency of 0) from the computation, which has no impact on the image pattern, since it represents the mean signal power. The vectorization procedure can be formally expressed as where the vector ω contains the frequencies corresponding to the power spectrum values S , the matrix into which the vector is indexed, with ω ∈ R ≥0 ; w is defined over the integer frequencies, w ∈ N ≥0 ; n is the index of the rounded Nyquist frequency into the unique values of w; and card(w i ) is the number (cardinality) of samples for a given integer frequency.
The transfer function T calibrates the Structural Information Potential value to facilitate easier mathematical and cognitive manipulation. In the next two sections, it is shown that the minimal value (= 0) of the SIP equation (eq. 5) inside the transfer function corresponds to a uniform pattern, and the maximum (= 1) to a regular pattern, while the clustered pattern with minimal redundancy has an intermediary value (− log 2 (( Since the latter pattern is the most informative, the SIP values are remapped using a polynomial of order two as transfer function, such that the value of the clustered pattern becomes the center of the distribution and its range is [−1, +1]. Similar to the approach used for the minimal redundancy scalar, we apply the logarithm to the entire spectrum: The implication is that the dSIP value is a measure of divergence from the maximum Structural Information Potential, with its sign indicating the pattern towards which it leans: positive for regular and negative for uniform. Limitations -Two patterns generate aberrant SIP values. These are, however, peculiar enough to not be of concern in most practical cases. The first is the perfectly regular structure of the checkerboard and stripes when aligned with the orthogonal image raster. At half-cycles of one and two pixels, their frequency spectra are impulses, hence the entropy has value zero rather than the expected value one. At longer cycles, they also have entropy closer to zero due to the strong harmonic components introduced by the sharp edges. The same issue affects the second problematic pattern, a single square aligned with the image raster; due to the step function of the function having the sinc function sinc(a) = sin(aπ)/(aπ) as frequency domain pair [113, pp. 212-215], its harmonics increase entropy to nearly one, contrary to what would be expected from a pattern with a large uniform surface. Fig. 1 shows the SIP classification of sample document images along a unidimensional pattern space. The progression from regular pattern to clustered to uniform is readily observable and conforms to the problem requirements. Note the parallel between the terminologies from various fields (preeminently typography, signal processing, and material sciences). The conceptual system introduced here, and further discussed below, may be used as a model to describe the correspondence between the visual and functional organization of documents. At the paragraph level, the penmanship and typographical ideals are the production of a homogeneous pattern (appearing as gray when observed from a distance), while the functional hierarchy is reflected in a clustered layout (the technical term is "asymmetrical"). Postproduction annotations and degradations disturb the intended (ir)regularity, introducing randomness. Blank pages (etymologically French for "white", another chromatic term in the field of documents) mark the end of a functional unit, as well as containing a reduced amount of semantic information.

B. PATTERN PHASE SPACE
The uniformly random pattern is located between the clustered and the regular, since it does not comprise large uniform areas. Random patterns can occur due to signal noise or annotations (in the case of text documents). Note that other pattern spaces are possible: the uniform-random spectrum [118] is appropriate for patterns with statistically constant homogeneity, such as homogeneously distributed line segments of quasi-equal length and variable orientation. In this case, the spectrum simply represents the amount of randomness or redundancy of the variable feature, which is not location and hence does not describe structural information.
Patterns with semantic value, such as images of faces, are obviously very important pattern types for human beings, and can be located anywhere in the proposed pattern space. Because these patterns require semantic analysis and contextual information, they cannot be considered from a low-level computer vision perspective such as that described here, just as the Shannon entropy casts information in purely mathematical terms.

C. GRAPH STRUCTURES WITH MINIMAL REDUNDANCY
While previous sections dealt with the measurement of SIP, this section concerns the definition and design of mathematical objects with minimal structural redundancy; that is, those that maximize the Structural Information Potential.
While measurement may be employed independently of design, a discussion of the latter supports a firmer understanding of the former. In particular, such discussion provides a formal rationale for the calibration of SIP values, as well as visual evidence and quantitative characterization of objects with maximal SIP value. Furthermore, it reveals links between SIP and some interesting mathematical concepts, thereby creating an opening for generalizing SIP to structures other than images.
We will refer to objects with minimal structural redundancy as "fluorescent" objects, in reference to both "rainbow" graphs (which have edge values that are distinct, but not necessarily maximally distinct) and to the fact that fluorescence maximizes perceptual color discrimination. The proposed notation is F vV dD {·}, where d is the object's embedding dimension, v the number of vertices, and the placeholder {·} may be used to specify edges. For example, F 3V 2D {e 12 , e 13 , e 23 } : {x 1 , y 1 , ..., x 3 , y 3 } designates a fluorescent triangle and the vertices' coordinates.
In the following, the argument will proceed from one-, to two-, to three-dimensional objects, from shapes to patterns, and will conclude with some remarks of a more general order.
How might a whole be divided into two parts so as to maximize the difference between the parts, while concomitantly maximizing their respective sizes? The first condition of this problem is satisfied when either of the parts vanishes in the limit, while the second condition corresponds to the two parts being equal. The overall solution lies in between these extrema and is found by determining the value at the intersection of the functions representing the conditions, i.e., f (x) = 1 − x and f (x) = x/(1 − x), for x ∈ [0, 1 2 ] and f (x) = x and f (x) = (1 − x)/x, for x ∈ [ 1 2 , 1], or solving the equation x 2 −3x+1 = 0 and x 2 +x−1 = 0. For the definition domains, these have the solutions x 1 = (1 − √ 5)/2 + 1 ≈ 0.3819 and x 2 = (1 + √ 5)/2 − 1 ≈ 0.6180. These values are also known, respectively, as the conjugate of the reciprocal, Φ ′ , and the reciprocal, Φ, of the golden ratio, ϕ = (1 + √ 5)/2 = 1/Φ = 1/(1 − Φ ′ ) ≈ 1.6180. A diagrammatic representation of these solutions (Fig. 2 (a)) allows us to state the criterion for determining the minimal structural redundancy (or maximal SIP) as the minimum of the maximum of the ratios of the part and the whole: R = {a/(a + b), b/(a + b), min(a, b)/ max(a, b)}, SIP max = min(max(R)), where a + b = 1, and (a, b) ∈ [0, 1]. In other words, maximal SIP corresponds to the minimum of the range of the ratios of a system's components. The use of these ratios is equivalent to using the relative Shannon entropy, H, for each edge pair ( Fig.  2 (b)). The max formulation can be avoided by combining ratios and entropy: SIP max = max(H(− log 2 (R))) ( Fig. 2  (c)). The similarity between this equation and Eq. 5 indicates that maximal SIP is expected for − log 2 (Φ), which is the reason why this value was used for calibration in the SIP  Remark -Note the inclusion of the whole as a third element in the measure of the sizing of the two segments. This is explained by the whole representing the highest scale of the scale-space domain defined by the segments, thus making it part of the system. This is the case in many application domains, particularly those that are subject to human factors, such as document design and perception. For instance, the sizing according to the golden ratio of the width of a text column and a figure placed next to each other ensures maximal legibility for each while giving prominence to one of them.
Let us now extend the problem to two dimensions and ask the following question: What is the triangle with the most dissimilar edges? Above, we identified the solution for three collinear points that define a degenerate triangle. While this triangle has minimal redundancy between its parts when taken pairwise, it also has zero area, unlike common triangles. More importantly, it has maximal redundancy between a part and a subset: the triangle base AC equals the sum of the other edges, AB and BC. Therefore, we introduce the ratio of the base and the sum of the other edges as part of the problem formalism, with the effect of increasing the size of the triangle to a certain equilibrium point below that of an equilateral triangle, for which redundancy is maximal. We now solve the equations AB/BC = BC/AC = AC/(AB + BC), given the coordinates x and y of the triangle apex, with AB = x 2 + y 2 , BC = (1 − x) 2 + y 2 , and AC = 1. The solutions in the abscissa interval [0, 1 2 ] yield x ≈ 0.3774 and y ≈ 0.4269. This   is the elemental shape with maximal SIP, as per our definition (Fig. 3). A remarkable trait of this triangle is that it extends an essential property of the golden ratio -that of minimal redundancy between parts and whole -to two dimensions. Fig. 4 illustrates the extension of the fluorescence principle to three dimensions, here for a tetrahedron with minimal edge redundancy. This tetrahedron represents a pendant of the equilateral tetrahedron, one of the five regular Platonic solids. The locations of the vertices of this graph have been computationally determined by placing vertices v 1 and v 2 at two adjacent nodes of a unit cube, then placing vertices v 3 and v 4 in turn at all locations of an orthogonal grid with resolution of 0.025 units. To avoid triangles with identical shape but different sizes and orientations (metamers), locations for which the length of edges ending in vertices v 1 or v 2 is more than unity were excluded. The measure of redundancy was based on the ratio of pairwise edges, as well as the ratio of one edge taken at a time and its adjacent two edges. The latter criterion is adopted from the triangle case discussed above, and its effect is to avoid facets with zero area. This benefits the fluorescence measurement, as it neatly facilitates the quantification of redundancy across dimensions and structure types, such as mesh surfaces and solids. (This observation highlights the possibility of using criteria other than the ratio of edge subsets, such as the perimeter and the volume, to avoid shape degeneration; these would lead to different solutions, but would be more loosely consistent with the problem definition in terms of edge ratios alone. Instead, they may serve special applications, such as maximization of the volume generally resulting in convex solids.) The inquiry into minimal structural redundancy can be pursued for graphs with an arbitrary number of edges, such as polygons and mesh surfaces, or of arbitrary structure, such as circular, tree-like, or network-shaped graphs. In this article, the goal is to open a window into these possibilities. We conclude this section with two remarks: one on the link between fluorescence and SIP (a), and another on the human perception of fluorescence in relation to its mathematical definition (b).
a) The concept of Structural Information Potential posits a correlation between structure and information, based on the scale-space distribution of the structure, which defines a pattern spectrum ranging from compact to homogeneous via clustered and random. This (fractal) perspective on patterns is quantified by the SIP method in the frequency domain using the Fourier transform and the Shannon entropy. The concept of fluorescence attempts to characterize structures in terms of component redundancy, and to this end adopts a graph theoretical formalism of the ratio of combinations of edge subsets. The integration of these two perspectives is realized through the concept of scale-space redundancy, which subtends the pattern spectrum and has a direct information theoretical meaning (Fig. 1). Specifically, a pattern that maximizes both the number and the size of constituent entities will fill the scale-space in an optimal manner while having maximal Structural Information Potential and minimal structural redundancy. This is the fluorescent clustered pattern, which stands in opposition to the uniform pattern of a maximally sized single entity, as well as the homogeneous pattern of minimally sized and maximally numerous (equal to the Nyquist frequency) entities.
The partial identity of the SIP and fluorescence equationsthe outer term H(log 2 (·)) is common to both -suggests an equivalence even at the algebraic level. The well-known Wiener-Khichin theorem, which relates the power spectrum to autocorrelation [119], may provide a means of equating the inner SIP term (the power spectrum of an image) with the inner fluorescence term (the combinatorial edge ratios of a graph). From a performance point of view, however, the Fourier-based method of scale-space redundancy measurement is bound to outperform the graph-combinatorial method due to the computational complexity of the latter for images of practical size. Nevertheless, the study of graph redundancy offers a powerful and versatile instrument to understand the elemental levels from which fluorescent patterns emerge. The goal of this section is to provide precisely such a low-level explanation of the SIP method.
b) Human estimation of pattern redundancy differs from mathematically derived values, which is an important factor to consider when measuring man-made patterns (such as documents) or evaluating psychophysically formal methods of redundancy measurement. A well-known example is the systematic bias in line bisection (Fig. 5 (a)) [120]) . The location of the fluorescent line partition and the drawing of the fluorescent triangle exhibit similar variability and divergence from the mathematically defined shapes (Fig. 5 (b)-(c)). Psychological aspects, furthermore, interact with cultural and social ones, forming a much more complex and dynamic ecosystem of constraints upon informativeness than the basic, low-level uniform-clustered-regular pattern-informativeness spectrum. The subjective and contextual dimensions of human pattern perception may nevertheless be directly relevant to the evaluation of the SIP measurement method; for instance, to explain why observers might disagree about the redundancy of a given pattern. A converse issue is that of humandesigned fluorescent configurations, such as the text/figure sizing in document layout mentioned above, whose divergence from the mathematical model may even be intentional. In particular, this may often be the case with regard to the uncritical application of rules often shunned by artists, keen observers of form, as illustrated by the words of the illustrious French photographer Henri Cartier-Bresson: "I hope that we will never see the day when the merchants will sell [golden ratio] diagrams engraved on camera displays." [121, pp. 26-27]. The quote exemplifies the need to develop models of layout irregularity that integrate human factors. When applied to documents, the desideratum has been carried out in the design of the SIP formalism, which emerged from empirical observation and reflects low-level perception. At such levels of complexity, cognitive automatisms are more amenable to modelization by the pattern-informativeness spectrum. The task of document triage, with its emphasis on fast temporal information processing, is a good case study for testing the model and shall be discussed next.

IV. EXPERIMENTS
In this section, we apply the method of document ordering developed above to a small set of representative real data, then compare the results to those of other methods. Next, we use a case study to test the robustness of our approach on a large dataset with respect to the triage task. Finally, we demonstrate that the proposed concept may be generalized to objects beyond text-based document images and applications other than triage. Fig. 6 juxtaposes document pages ordered using the major methods discussed in this article: Structural Information Potential, Approximate Entropy, spectral flatness, and the ratio of ink pixels to page area. The data are sampled from the 104 binarized pages of an issue of the New Yorker magazine (the online Supplemental Material shows all pages, as well as providing the quantitative evaluation). This particular dataset was chosen for display among the hundred analyzed owing to its diversity of text, drawing, and image patterns, which provides exemplary illustration of the entire uniform-clustered-regular spectrum against which we want to test our methods. We first observe that the ink/page ratio method leads to a mix of structurally unrelated patterns, and to a split of homogeneous clusters, such as the mostly empty pages (the third image and the third from last image of the top-most block in the figure). Approximate Entropy is very good at clustering multiscale patterns, such as the first three pages of the sequence in the second block, which depict either entities with different absolute sizes (e.g., people and robots in the street, the rays of the trilobite), or a diversity of sizes resulting from a perspective view (rows of bars in a prison hallway). However, the method fails to group together homogeneous patterns (text-only pages are interspersed with text and illustration pages) and empty pages. Spectral flatness (third block) groups empty pages and follows them with multiscale patterns, but fails to consistently group homogeneous patterns (the result of spectral entropy are similar, but less good). Structural information potential (last block) achieves a perceptually gradual ordering of the pages from empty to multiscale to homogeneous.

B. CASE STUDY
This section presents a real-world application of the Structural Information Potential measurement method. The beneficiary is the Swiss National Library, and the goal is to merge the information of corresponding bibliographical records in electronic and analog formats [122]. The case study covers the pilot phase of an ongoing project, in which computational methods are assessed in view of deciding on the next steps.
The analog records are palm-sized paper cards, typewritten or printed, with handwritten annotations, crossings-out, stamps, bar codes, rulings, and other graphical elements (Fig.  7 (c)-(h)). A total of 1.2 million cards were scanned from highcontrast black-and-white films (accessible at http://siibns.ch/ french/cat1_frame.htm), at a resolution of circa 500 by 300 pixels; this introduces various artifacts, such as background noise and an irregular border around the cards. Some cards contain no records, but only captions for a sequence of cards in the wooden trays that were accessed by library patrons searching the catalog. The texts often mix together two or more of the four national languages of Switzerland (French, German, Italian, and Romanche), as well as English, Latin, and other languages. The majority describe monographs and serial publications, but also maps; the content is inconsistently semantically structured, is generally not made up of full sentences, abounds in entity names, alphanumeric shelf marks, ISBNs, price tags in various currencies, and other codified data, and is rich in typographical formatting of logical entities. Such a wide typological variety of information, brevity, and visual complexity presents a challenge for automatic recognition. Attempts to perform optical character recognition (OCR) on the whole dataset, using the open-source Tesseract software [123] and the commercial Google Vision [124], revealed that the obtained text is not directly exploitable in the library's public catalog (Fig. 7 (i)). It was therefore envisaged to provide users with digital images of cards along the electronic record in the same graphical interface window. However, which analog record corresponds to which electronic record is unknown, and the two sets do not overlap. Consequently, the technical project objective of matching document images and electronic texts is preceded by a feasibility analysis. This requires a fast and appropriate classification of the cards. Here, "fast" concerns readying the technical resources, the human interaction with the data, and the lax requirements for classification quality and sophistication -hence, a triage task.
The dSIP values of the dataset images were measured and the cards located in the pattern-informativeness space. The ordering of the 1.2 million items was checked visually for perceptual consistency and found to be satisfactory; a sample of the results are presented in Fig. 8. The patterns in this sample are representative (a) in semantic terms, as the sample represents an entire bibliographical index section, containing both various header cards and reference cards (section CDU 00000001), (b) in document typological terms, as it contains printed and handwritten text, noisy images, and other significant features, and (c) in statistical terms, as the sample is roughly uniformly distributed over the pattern spectrum (see the location of the three highlighted cards in the overall distribution of Fig. 7 (b)). The ordering is meaningful as a typological categorization of bibliographical cards and suggests several potential matching strategies between physical and electronic records. The cards with clustered visual patterns are so because, usually, they exhibit greater typological information variety; in such cases, the matching algorithm could be tuned to privilege the logical structure as the matching criterion over linguistic matching. Cards with homogeneous patterns are more likely to contain longer coherent linguistic sequences, such as sentences, for the matching of which syntactic analysis would be preferable. Cards with low SIP value are either empty, are non-record section headers, or contain very little text; as a result, they may yield very poor matching quality, and therefore may best be visually inspected and excluded from matching.
The classification of the cards using the measure of Structural Information Potential was appreciated by both software engineers and library managers for multiple reasons. It is independent of the OCR output; it produces useful results under conditions of uncertainty about the cards' content; moreover, it allows for rapid overview of a large digital image collection that would otherwise remain largely invisible, and further involves the human in the matching process. Thus, card triage by SIP became a complement to OCR in decision-making on tasks such as go/no-go for automatic record matching, selection of a card subset with an expected matching quality, estimation of expected matching quality, and evaluation of project resources (e.g., costs and duration of groundtruthing and quality control). Fig. 7 (a) shows the color-coded dSIP values for the 1.2 million bibliographical cards in their physical sequence in the original library card trays. Runs of similar dSIP values are apparent, such as the conspicuous blue (i.e., low dSIP) central column corresponding to author indices and mostly empty cards. The heterogeneously colored area on the righthand side corresponds to cards indexing placenames, which have varying degrees of visual density and clustering due to the variety of information sources and the continuous updating of the card information, involving many writing technologies and annotation layers. The visualization's pixels are interactively linked to the card images, so that a visual investigation may reveal that the blue run represents author index cards and can thus be removed from the matching process. This visualization is useful for a context-oriented analysis of the card dataset. If, for example, an equivalence has been established between dSIP values and expected matching quality, then it may be used to predict the matching quality of the various thematic classes of the cards. Fig. 7 (b) represents the dSIP values of the card dataset in histogram form. This aggregated data yields several insights. By extracting sample cards along the distribution to determine the patterns to which the values correspond, a visual and quantitative estimation can be made regarding the amount of various card types and the expected matching quality. It can be observed that the bulk of the cards have a clustered appearance, while cards with homogeneous information distribution are not predominant. A further operationalizable insight enabled by the histogram pertains to outlier detection ( Fig. 7 (g), (h)). In this case, a first observation concerns the extreme dSIP values: scanned images with the objects of interest partially out of frame are found in the lower values, while the higher values contain images with strong noise. Second, the left-most cluster of the distribution predominantly comprises section headers. While these do not contain bibliographical information, and should thus be excluded from the matching process to avoid a detrimental impact, these cards may be useful for matching, since they identify the topic to which the subsequent cards belong. This topic may then be related to those extracted from the electronic records, thereby increasing the probability of correct matches. The bivariate plot in Fig. 9 supports a finer analysis of the histogram, specifically that it is the result of a mixture of pattern classes characterized by different ink densities, as well as non-linear and loose covariation between the amount and distribution of ink on the cards.
The question might be asked as to whether simply classifying the binarized card images by the amount of black pixels (i.e. ink) would not provide insights similar to the Structural Information Potential. Fig. 9 demonstrates that this is not the case: documents with identical ink density can have very different ink distributions, and vice-versa, as a result of distinct clusters with irregular shape and spread. May, then, a classification by the amount of recognized characters be a sufficient estimate of matching quality? As illustrated by Fig. 7 (i), this would not be effective either; even cards with rich linguistic contextual information can result in very few characters being identified.
From a machine learning perspective, the classification of document images according to SIP may be utilized to optimize the sampling of the training and test images. For example, a uniform random sampling of the card dataset (i.e. the typical sampling procedure) may result in an underrepresentation of less numerous but semantically important classes, such as scanning errors and cards with strong noise or that are largely empty, which are visible at the distribution extrema of the VOLUME 4, 2016 . This graphic shows the density distribution of the Structural Information Potential plotted against the ink pixel density in the digital images of the library card dataset. An "ink pixel" represents an inked area of the card surface, as opposed to the non-inked writing substrate; in the cards illustrating this article, ink pixels appear in black; noise modifies the groundtruth value of pixels. For legibility purposes, pixels above the density level of 0.4 ink pixels have been slightly dilated. Note the two pixels above level 0.8. The 1.2 million data points were aggregated for visualization in a 1000-by-1000-pixel raster.
In conclusion, the measurement of Structural Information Potential, coupled with interactive data analysis, provided computer scientists with an inexpensive tool to support record matching, particularly for sample selection and expected quality estimation. It also aided library managers in making faster and more informed decisions regarding an information technology project on issues such as go/no go, expected quality, and resource estimation. This case study is an example of the usefulness of SIP for both quickly extracting information from patterns and acting upon this information.

C. GENERALIZATION TO NON-TEXTUAL DOCUMENTS
The generic nature of the Structural Information Potential concept as defining a pattern distribution space allows it to be applied to a variety of tasks and data types. Its main utility advocated in this article is as a quantitative measure of informativeness, which makes it appropriate for triage tasks, as discussed above. Notably, however, the above case study found that SIP was also useful as part of a decision-making process, involving the estimation of OCR quality and needed resources, and as a measure of image quality, given its ability to distinguish images with uniform noise and erroneously framed scans. Furthermore, SIP offers a simple computational solution for classifying document pages according to the predominance of text, the presence of illustrations, the number of post-production visual artifacts (e.g., annotations, stamps, signatures) and the amount of noise (i.e., traces of document degradation); these abilities may be translated into document navigation functionalities and implemented in document read-ers. The characterization of document layouts is also valuable to historians, as it supports a quantitative analysis of the evolution of written communication. Beyond documents, the classification into compact, clustered, and homogeneous distributions is of direct interest to materials science. Fig. 10 illustrates the application of SIP to data types other than documents and tasks other than triage. According to the SIP concept, the patterns with dSIP values closest to 0 are the most informative. The top row illustrates the case of image retrieval from a large set of very similar samples. The SIP-based automatic extraction of keyframes from videos facilitates the identification of frames with potentially high information content. The frame with minimal dSIP value indeed shows more visual details than other frames, such as text, cables, and pistons. In the middle row, SIP is used to determine the most informative point of view of a three-dimensional object: this is the three-quarter view for a head sculpture, which integrates anatomical elements of both face and profile in a single shot. The bottom row shows how SIP may be used for classifying aerial images of urban settlements on the uniform-clusteredregular continuum. The impact of the natural topography and cultural-historical factors on the hierarchical urban clustering come readily to mind when contemplating this classification; for example, the meandering water channels of Venice versus the commercial hub of medieval Strasbourg reflected in a spiderweb road network versus the plain of Los Angeles making a flat urban grid affordable.

V. DISCUSSION
This section provides an in-depth discussion of some important rationales and implications of the SIP measurement. It thus helps better understand the behavior of this instrument, and opens directions for future research.
Binarization -It may be useful to develop a SIP measurement without binarization. One reason is that it would remove a certain degree of uncontrollable data distortion inherent in the strong reduction of its dynamic range. Another reason is that it would facilitate the application of the SIP method to signals, for which binarization constitutes a drastic distortion. However, developing a method for data with large dynamic ranges creates the need for a second method for binary data (a common data type in many application domains, including document processing). Here we can observe an advantage of the binarization step, which enables both intensity and binary data to be handled with a single method.
Metamerism -Because of dimensionality reduction (in the case of images being reduced from two dimensions to a scalar), many differently patterned data will have an identical measured value (the metamerism effect). This is one reason why triage is a good application for the SIP method presented here, namely because it is tolerant to imprecision. One can imagine incorporating parameters into the SIP method to allow the description of various pattern features (for example, pixel density).
Psychopysics -As the pattern ordering resulting from the SIP is primarily intended to have humans as end users, it is VOLUME  Phase -The reader may have noticed that the SIP measurement method disregards the phase information arising from the image transform from the spatial to the frequency domain. This may be inconsequential for speech, where information is largely carried by the frequency spectrum [113, pp. 355-358] [125], [126]; however, phase is critical for image processing, in that radically different patterns result from identical frequency spectra with different phases, while edge location and strength are moreover strongly phase-dependent [113, pp. 355-358] [127]- [130]. This difference may explain to some extent why the spectral methods discussed in the related work section are successful in audio processing but less commonly used for image classification. From a practical point of view, however, the examples presented in this article demonstrate the robustness of the method when phase is discarded. This surprising fact demands a theoretical explanation.
As a first remark, it has been experimentally demonstrated that some binary image types may be reconstructed from frequency magnitude only and zero or random phase, and that this may further depend on the presence or absence of a single FIGURE 11. Fluorescent pattern generated from a fluorescent triangle. Note the similarity with the naturally occurring patterns of seashells [132].
pixel [131]. While this phenomenon has yet to be elucidated, it constitutes evidence in favor of magnitude-only image analysis. Second, the extrema of the uniform-clustered-regular pattern space are phase-independent [110, pp. 74-81, 106-107], which reduces the problem to one of explaining why clustered patterns occur around the center of the pattern space. We may recall that fractals (the patterns that maximize clustering) depend in their overall shape on the exponent of the power distribution of the frequency spectrum and a random phase. In fact, any clustered image pattern will exhibit some degree of power magnitude distribution. Moreover, the multitude of harmonics introduced by binarization create a phase that is increasingly better modelled by a random distribution. Taken together, these aspects suggest that within the limits of finite and discrete images, patterns approach fractality at and in the vicinity of maximal SIP or minimal spatial redundancy. This may be an important reason why it is practically possible to robustly order images along the uniform-clustered-regular pattern space on the basis of magnitude alone.
Minimal redundancy pattern -We investigated minimal structural redundancy for only the most basic graphs (line bisection, triangle, and tetrahedron), stopping short at identifying the fluorescent pattern(s) with arbitrary size. A family of fractals is the likely answer -but is there a fractal dimension more apt to produce fluorescence than another? Furthermore, considering deterministic fractals (such as the Koch curve [49, pp. 87-91], as opposed to Perlin noise [133]), it could be argued that there may exist a single fractal that minimizes structural redundancy. I propose that the design of such a pattern may consist in taking the simplest structure of a given dimension (e.g., the fluorescent triangle for two dimensions) and replicating it infinitely with a certain change rate (Fig.  11). The result consists, in fact, in an affine transform of the module of well-known fractals (Cantor dust, Sierpiński gasket and tetrahedron, and Julia set [49, pp. 65-79, 120-123]), so as to morph them into a fluorescent shape.
Epistemology -One fascinating mathematical aspect of minimal structural redundancy is how one might go about thinking about it; in other words, its epistemology. Specifically, it is richer to conceive the problem as the design of fluorescent structures than their discovery. Asking "What might the definition of minimal structural redundancy be?", rather than "Which is the structure with the least redundancy?", explicitly embeds the possibility of multiple answers into the inquiry process, as well as contextualizing the problem in respect to the questioner, the data, and the application. For example, a fluorescent graph becomes regular in terms of edge length when the number of vertices tends to infinity! If this behavior is not desired, a new definition of minimal redundancy may be created (for example, stipulating only local fluorescence). This same experimental conceptualization of mathematical definitions was followed by Gary Chartrand, Paul Erdős and Ortrud Oellermann in their article "How to Define an Irregular Graph": "In research, the goal is not only to come up with a definition that seems natural but to arrive at a class of graphs with interesting, and perhaps even some surprising, properties." [85, pp. 39].

VI. CONCLUSIONS
This article has introduced Structural Information Potential (SIP), a measure of information based on pattern configuration. Its utility was illustrated through a real-life case study for the task of document image triage. On this task, SIP performs better than other methods in both mathematical and perceptual terms.
The main theoretical significance of the work consists in (a) the development of a formalism that defines the uniformclustered-regular pattern-informativeness space, which organizes fundamental pattern types in a mathematically and perceptually coherent fashion and relates them to an information potential, and (b) the development of a conceptual basis and analytical methods for the identification of shapes and patterns with minimal structural redundancy.
In practical terms, Structural Information Potential is a useful classification method for triage-like conditions, characterized by decision-making under conditions of uncertainty and time pressure, when it becomes efficient to generate information about content through the analysis of structures. The generic nature of SIP makes it appropriate for many other applications, such as image quality assessment (to detect noisy and erroneously imaged data), predicting OCR output quality before applying OCR, identifying informative keyframes in video streams, and as a document navigation functionality.

ACKNOWLEDGMENT
The author would like to thank Andreas Fischer of the University of Applied Sciences and Arts Western Switzerland (HES-SO), Fribourg, for sharing many reflective refections, and for having offered to work on the library card dataset. Irvin Schick, formerly with Bolt Beranek and Newman, Cambridge, MA, USA, now at the École des Hautes Études en Sciences Sociales, Paris, France, has been, over the years and continents, a reliable devil's advocate for testing the author's hypotheses.