Shadow Pattern-Enhanced Building Height Extraction Using Very-High-Resolution Image

Building height is valuable for a variety of foci in urban studies. The traditional field investigations are not practical for the updates of massive building height in a large-scale urban area. Given the relationship between building structures and their shadow sizes, the building shadow becomes practical for estimating its corresponding building height when its geometrical shape is visible in newly emerging very-high-resolution (VHR) images. However, the shadow shape of different buildings might vary significantly, posing a great challenge to determining the edge of shadow useful for predicting building height. This study proposes a shadow pattern classification system (ShadowClass) to summarize the varied shadow shapes into a number of pattern categories and employ a cutting-edge CNN model to classify the extracted shadows into a pattern for automatically determining the edge of a building shadow being useful for building height estimation. We integrated the proposed approach into two branches of the state-of-the-art approaches: shadow-based building height estimation with open cyberinfrastructure and shadow-based building height estimation with VHR image. The experimental results proved that the proposed method could be a practical solution for single and isolated buildings that have their complete shadow shape.


I. INTRODUCTION
B UILDING height is valuable for a variety of foci in urban studies, such as flight safety control, urban air pollution [1], local temperature prediction [2], and residential energy consumption [3]. Although a traditional field investigation can obtain accurate building height information, it is laborious and time-consuming work. Moreover, the elevation-related digital products (e.g., LiDAR data, DEM, DSM, and DTM) created for public use only cover selected places, as updating these data is costly. Given the relationship between building structures and their shadow sizes, the building shadow becomes practical for estimating its corresponding building height when its geometrical shape is visible in newly emerging very-high-resolution (VHR) images [4], [5]. The building shadow has been affirmed to be an alternative data source to support building height estimation, where elevation-related digital products are not updated in a timely manner, or the cost of elevation-related data products could be overpriced [6], [7]. Generally speaking, building height estimation using shadow shape consists of two major parts: First, extracting building shadows and, second, calculating the geometrical relationship between the shadow of a building and its height. Currently, researchers have proposed well-developed approaches for shadow extraction from a VHR image [8], [9]. Teke et al. [10] presented a shadow detection method that employed the near-infrared waveband to remote noises, such as trees and grass. Elbakary and Iftekharuddin [11] improved the conventional geometric active contours to enhance shadow detection in a VHR image.
In comparison to building shadow extraction, building shadow-based height estimation has several issues to be addressed. Based on the early stage research efforts on calculating the height of a building using its shadow shape [12], [13], Comber et al. [6] proposed a rule-based classification to determine the stories of residential buildings based on shadow width. Their proposed machine learning model has not taken the geometrical positions of the sun and the sensor into account. Massalabi et al. [14] proposed key parameters derived from shadows being useful for building height prediction, including solar elevation, solar azimuth, sensor elevation, sensor azimuth, and the relative position of a building. Then, Wang and X. Wang [15], Shao et al. [16], Kim et al. [17], and Lee and Kim [18] have proposed a variety of approaches involving linear function, shadow matching, etc., to develop the geometrical relationship among these key parameters. Furthermore, several investigations have created shadow delineations to extend the geometrical relationship involving sun position, sensor position, and building position to the three-dimensional (3-D) space [5], [19], [20]. To further simplify the calculation process, a number of approaches have been proposed, including using the relative geometrical position of the sun, sensor, and building [21], employing the 3-D geometrical relationship [22]. Currently, Qi et al. [5] used the information accessed from google earth to calculate the slope angle between the solar elevation and azimuth, and the satellite elevation and azimuth. Otherwise, Qi and Wang [23] proposed a method called corner-shadow-length ratio to calculate building height, considering both the geometrical relationships between the building roof structure and solar, sensor, and building positions. This method supported the measurement of This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ building height differently between flat roofs and pitched (sloping) roofs.
Although previous works have proved that shadows provide useful information to characterize the building structure, their shapes might vary significantly, posing a great challenge to determining the edge of shadow useful for predicting building height. Shadow shapes are always influenced by a variety of factors, such as building roof, building structure, sun azimuth, sensor azimuth, and neighboring land cover. Especially, buildings of contemporary architecture, such as skyscrapers and landmarks, always comprise varying and complex structures, thus making it impossible to assign their shadow shapes to a simple pattern category. Moreover, slope roofs might not be visually recognizable in the geometrically corrected VHR image.
Therefore, we propose a shadow pattern classification system (ShadowClass) to summarize the varied shadow shapes into a number of pattern categories, and employ a cutting-edge CNN model to classify the extracted shadows into a pattern defined by ShadowClass for automatically determining the edge of a building shadow being useful for building height estimation. We integrated the proposed approach into the state-of-the-art approaches regarding building height estimation.
The rest of this article is organized as follows. Section II discusses those works that can help in building height estimation from VHR images. Section III reports the proposed methodological framework. Section IV presents the results of height estimation of multiple-story buildings. Finally, Section V concludes this article.

A. Geometrical Relationships Among Building, Building Shadow, Collection Sensor, and Solar Position
Shadows are always observed from elevated artificial architectures, elevated hills and mountains, and clouds in a VHR image when these elevated objects block the visible light emitted by the sun. According to the way that the shadow was generated, Arevalo et al. [24], [25] divided shadows into self-shadows and cast shadows. A self-shadow is the dark area of an object itself, where light is not available. Conversely, a cast shadow is the dark area that is approximately similar to the projection of an object's shape, where illumination is blocked by this object. The shadow visible in a VHR image is a cast shadow, which make it possible to access the height of an elevated building based on the projection of its shadow shape. Fig. 1 provides the plan view [see Fig. 1(a) and (b)] and the profile view [see Fig. 1(c)] to present the geometrical relationship among building height, solar and sensor elevation, and solar and sensor azimuth. α and β, respectively, denote the solar elevation and the sensor elevation, and θ 1 and θ 2 , respectively, refer to the solar azimuth and the sensor azimuth. In Fig. 1(a), the sun and the sensor are located on the same side. Conversely, the sun and the sensor are located on the opposite sides of Fig. 1(b). In these two subgraphs, the shadow edges visible from a VHR image are the line segments L AB and L AO 2 , respectively. Fig. 1(c) illustrates the plan view of the geometrical relationships among building position, sensor azimuth, and solar azimuth. As shown in Fig. 1(a)-(c), solar elevation, sensor elevation, solar azimuth, and sensor azimuth may affect the shadow shape of a building. Moreover, Fig. 1(d) illustrates the geometrical relationship of the solar and sensor azimuth, the solar and sensor elevation, and the building position in a 3-D space. In this subgraph, the visible shadow edge is the line segment AO 2 .

B. Two Main Branches of State-of-the-Art Shadow-Based Building Height Estimation
Based on a review on the related works and the availability of image metadata, image spatial resolution, and other factors, we divided the state-of-the-art approaches for building height estimation into two main branches: shadow-based building height estimation with open cyberinfrastructure and shadowbased building height estimation with VHR image. Table I comparatively lists the details of these two branches of methods.
Open cyberinfrastructure, such as google earth pro, generally provides free VHR images and essential parameters for building height estimation-solar and sensor azimuth, which are the length of the line segments AO 2 and BO 2 in Fig. 1(d). Qi et al. [5] and Qi and Wang [23] have reported this type of method by using the information derived from google earth to predict building height.
Method 2: Shadow-based building height estimation with VHR image.
A VHR image generally provides the metadata, including spatial resolution and solar elevation, which are useful for estimating building height with shadow shapes [4], [6], [16]. In general, the workflow of these approaches includes the following steps. 1) extracting building shadows from the VHR image; 2) calculating the shadow length; 3) converting the pixel-unit shadow length into its meter-unit counterpart; 4) predicting the building height based on the shadow length and solar elevation. This type of method could generate a result more precise than Method 1, but it might be challenging for a large-scale urban area due to data storage load and data acquisition cost.
Above of all, Method 1 and Method 2 need a user to manually determine the line segments L AB and L AO 2 , as shown in Fig. 1(d), based on various building shadows. Thus, an approach that support to automatically determine the line segments L AB and L AO 2 would promote the state-of-the-art methodology for building height estimation. Fig. 2 shows the workflow of shadow pattern-enhanced height estimation with open cyberinfrastructure (google earth pro) and VHR image. The details of orange part, purple part, and blue parts are reported in Section III-A, Section III-B, and Section III-C, respectively. Fig. 3 shows our proposed ShadowClass, including seven basic types of shadow pattern and complex patterns that mix multiple 2) CNN-Based Shadow Pattern Classification: a) Data augmentation for the pattern samples in Shadow-Class: To enhance the robustness and transferability of CNN for shadow pattern classification, we employed data augmentation to increase the diversity of intraclass training samples and the similarity of interclass training samples [26], [27]. The used data augmentation included the following three strategies. Above of all, these three data augmentation strategies generated additional 876 shadow pattern samples for every original shadow pattern sample (original shadow sample + 2 flipped shadow samples + (1+2)×72 rotated shadow samples + (1+2+(1+2)×72)×4 scaled shadow samples).

A. Shadow Pattern Classification and Useful Shadow Edge Determination 1) Shadow Pattern Classification System (ShadowClass):
b) CNN-reinforced shadow pattern classification: We employ a state-of-the-art CNN model called Inception_ResNet_V2 to classify the extracted shadow regions into a pattern defined in the ShadowClass. This CNN has achieved state-of-the-art accuracy in image scene classification in the prestigious benchmark dataset called ILSVRC.
Inception_ResNet_V2 is a systematic neural network integrating the architecture of two CNNs, namely Inception V3 and ResNet, to take the advantage of network depth in the classification while reducing a large load in terms of time and computation. The dimension of the input image is 299 × 299 × 3. In the first state, Inception-ResNet-V2 processes the input image with the Stem of Inception V4 to generate a feature map with dimensions of 35 × 35 × 256. The feature map is then processed consecutively by five independent blocks of Inception-ResNet-A and compressed by the block of Reduction-A. The results obtained by Reduction-A are further processed consecutively by ten independent blocks of Inception-ResNet-B and compressed by the block of Reduction-B. The results generated by Reduction-B are processed consecutively by ten independent blocks of Inception-ResNet-C and compressed by average pooling. Furthermore, the 1792 features generated by average pooling are processed with dropout. Finally, the output of Softmax classifier is a label that identifies a shadow pattern that the exacted shadow belongs to.
We fine tuned the pretrained Inception_ResNet_V2 with the original shadow pattern samples in ShadowClass and the extended samples obtained by data augmentation. Then, we used the fine-tuned Inception-ResNet-V2 to classify the shadow regions extracted from open cyberinfrastructure (google earth pro) or VHR image into a shadow pattern or a complex pattern predefined by ShadowClass, as shown in Fig. 3.
c) Shadow pattern-based useful shadow edge determination: Fig. 4 shows the seven basic classes of shadow patterns and the patterns that mix multiple basic classes of shadow patterns. Gray polygons represent the shadow regions, and the red and orange dotted lines with round nodes denote the length of a shadow-derived line useful for building height estimation.
When we obtain a pattern classification result by Subsection III-(A)-2, we can then determine that the shadow edge was useful for building height estimation.

B. Shadow Pattern-Enhanced Height Estimation With Open
Cyberinfrastructure (Google Earth Pro) 1) Solar Declination: Solar declination is the angle that specifies the solar position between the incident orientation of sunlight and the earth's equator. Cooper [28] and Bourges [29] proposed an algorithm to calculate solar declination based on the date when the VHR image was produced. The algorithm is expressed as follows: ω = 0.3723 + 23.2567 sin δ + 0.1149 sin 2δ − 0.1712 sin 3δ − 0.758 cos δ + 0.3656 cos 2δ + 0.0201 cos 3δ (1) where ω refers to the solar declination, and δ is calculated by the following equation: where Y is the year when the VHR image was created, and t is the nth day of the year Y . IP( * ) is the function that generates the integral part of * .
2) Solar Elevation: Solar elevation, which is the angle β in Fig. 1(d), specifies the angle of sunlight over the horizontal dimension on the ground surface. Solar elevation is calculated by the following equation [5], [23]: where σ is the latitude of a building, δ can be achieved by (2), and φ is the solar hour angle, which is obtained by the following equation: where min() and −min() are used in the morning and afternoon, respectively. Moreover, a, b, and c in (4) are expressed as follows: where θ_2 is the sensor azimuth, as shown in Fig. 1(d), the details of accessing sensor azimuth are presented in Section III-B3.

3) Solar and Sensor Azimuth Calculation:
Based on the type of shadow pattern and useful shadow edge obtained by Section III-A, we then access their attributes using the ruler tool in google earth pro, as shown in Fig. 5 [5], [23].
For example, in Fig. 3, the shadow is classified into a complex pattern that includes Pattern 1, Pattern 2, and Pattern 4 in ShadowPattern. Then, we can determine the edge of this shadow being useful for estimating building height by Fig. 4. Thus, we can determine that the useful shadow edge is the yellow line in Fig. 5(c). Moreover, Fig. 5(b) illustrates the ground length and the heading of the yellow line drawn on the VHR image corresponding to the sensor azimuth and the length of BO 2 in Fig. 1(d), respectively. Fig. 5(c) shows the ground length and the heading of another yellow line drawn on the VHR image corresponding to the solar azimuth and the length of AO 2 in Fig. 1(d), respectively.  Fig. 1(a), when the sun and the sensor are on the same side, line segment AB (L AB ) is the shadow visible from the VHR image. The length of this line segment is measured by the following equation:

4) Building Height Estimation Using Google Earth Pro: As shown in
As shown in Fig. 1(b), when the sun and the sensor are on the opposite sides, line segment AO 2 is the shadow visible from the VHR image. The length of this line segment is measured by the following equation: The solar elevation (β) can be obtained by (3). Then, we estimated the building height based on the visibility of building wall.
1) When the building wall is visible in the VHR image.
As shown in Fig. 5(b), we can obtain the solar azimuth and the length of BO 2 when the building wall is visible in the VHR image. On the basis of the geometrical relationship, as shown in Fig. 5(d), there are two approaches to calculate the building height. The first approach uses solar elevation, which was introduced in (3), and the length of AO 2 , which is expressed as follows: where H is the building height.
Another approach uses the solar and sensor azimuth and the length of AO 2 and BO 2 . The following equations are based on the geometrical relationship, as shown in Fig. 5(d): In (9), tan α and L AB are the unknown parameters. By consolidating three expressions in (9), building height is expressed as follows: 2) When the building wall is not visible in the VHR image. As shown in Fig. 4(b), the solar azimuth and the length of BO 2 are not available when the building wall is invisible in the remote sensing image. Therefore, the expression in (10) is the only approach that can be used to calculate building height from shadow length.

C. Shadow Pattern-Enhanced Building Height Estimation With VHR Image
Shadow pattern-enhanced approach for building height estimation consists of four sections. The first section converts an RGB color image into a grayscale image and then performs image enhancement using an algorithm called adaptive gamma correction with a weighted distribution (AGCWD) [30], [31]. Grayscale conversion and contrast enhancement have been reported to make shadows distinct from other land cover features [4], [8], facilitating efficient shadow extraction. The second section exploits an algorithm called simple linear iterative clustering (SLIC) to segment the image preprocessed by the previous section into multiple superpixels and then classifies these superpixels into shadow and nonshadow regions. The third section detects and simplifies the contour of each shadow region. The last section calculates the shadow length defined in ShadowClass. We predicted the building height by considering the shadow length and the relationship among the solar position, sensor position, and building position. 1) Image Preprocessing: AGCWD enhances the contrast of an image by dynamically applying the parameters derived from the whole image content. We used AGCWD to process the contrast of VHR images to draw shadow regions distinct from other land covers through three steps [30]: histogram analysis, weighting distribution adjustment, and gamma correction.
The cumulative density function (cdf) and the probability density function (pdf(i)) of a VHR image are expressed as follows: where num i is the frequency of intensity i, num all is the total number of pixels in the image, and pdf(i) and cdf(i) are the pdf and the cdf of intensity i, respectively. Based on the cdf and the pdf in (11), the adaptive gamma correction (AGC) converts the original intensity i into a new value i agc by the following expression, where pdf max is the maximal probability density function: where σ is the user-defined parameter to control the distribution of histogram statistics, and pdf min is the minimal probability density function. Accordingly, the cdf with weighting distribution (cdf wd ) is expressed as follows: where C low and C high are the low-contrast or high-(or moderate-)contrast image, respectively, and τ is the threshold used for a binary contrast classification. Based on (12) and (13), the original intensity i in the VHR image becomes i agcwd after AGCWD by the following expression: 2) Shadow Region Extraction: a) Superpixel-based segmentation: Compared with the graph-based segmentation approaches, superpixel-based segmentation, such as SLIC [32], is more efficient for grouping the connected pixels into meaningful subregions. Moreover, SLIC speeds up the process of clustering multiple pixels by measuring the distance over distance over space and intensity (color) differences between two connected pixels. In SLIC, the image space, including intensity and spatial space, is represented as (L, A, B, X, Y ), where L, A, and B denote the three channels of image color space, and X and Y denote the distance over the horizontal and vertical dimensions, respectively.
As the image enhanced by AGCWD only contains one channel, the intensity space (D intensity ) and the spatial space (D spatial ) in the AGCWD-enhanced VHR image are represented as (I, X, Y ), where I is the intensity of one channel. Then, every pixel in an image joins the nearest cluster center pixel. The "nearest" is measured by the distance of image space, which is expressed as follows: (15) where θ is the ratio between spatial distance and intensity difference. A higher θ generates a result that contains superpixels within a larger size, and vice-versa. N denotes the approximate number of superpixels after segmentation. Moreover, for a pixel located at position (x 0 , y 0 ), the image gradient is computed by the L 2 norm, which is shown in the following equation: where L 2 () is the L 2 norm, and I(x, y) is the intensity vector of a pixel at the coordinate (x, y). b) Nonshadow area removal: In a VHR image covering urban areas, the shadow of trees may overlap with artificial architectures. Moreover, the near-infrared waveband is not available in the majority of VHR images for calculating the normalized difference vegetation index. Instead, we applied an algorithm called the triangular greenness index (TGI) to detect vegetated areas with RGB channels [33]. The following equation shows the TGI value of a pixel: where w gree , w red , and w blue refer to the intensity of the green, red, and blue wavelengths of a pixel, respectively. Then, we set a threshold (0.2) to select the pixels with a high TGI value and removed these pixels and their connected shadow regions from the original result of the shadow region extraction. However, noises still exist after the processing with TGI due to the high details available in the VHR image. Thus, we manually removed the remainders according to the land cover shown in the VHR image. c) Contour detection and edge simplification: The originally extracted shadow region always encompasses rough edges and might carry complicated shapes. Therefore, raw shadow regions fail to precisely characterize the form of a building body, let alone model the relationship between shadow shape and building height. To address the challenges mentioned above, this study performed contour detection and edge simplification to smooth and straighten the rough shadow edges.
We employed the marching squares algorithm (MSA) to detect the contour from the rough shadow edges in every 2 × 2 pixels' window based on the raw shadow extraction result. Fig. 6(a) shows all 16 possible configurations observed in the 2 × 2 pixels' window. Each circle denotes a pixel that only has either of two binary values-namely, shadow or nonshadow-which are represented by black and white, respectively. The red line is the edge interpolated in the 2 × 2 pixels' window to divide the shadow and the nonshadow regions-that is, the edge of the shadow region. Except for cases 1 and 2, in which no edge exists, an edge can be drawn from other cases. All 16 cases were used to generate an approximate contour from the rough shadow edges, maintaining the tradeoff between the number of vertices remaining and the similarity of the approximate contour and the original shadow shape.
Then, we used the Ramer-Douglas-Peucker algorithm to simplify the contours generated by the MSA to represent the shadow shape with approximate line segments containing fewer vertices, which includes the following steps.
Assuming that the original curve {v 1 , v 2 , . . . , v k }, where v k is the sequentially numbered vertex in this curve.
Step 1: Create a line segment connecting the starting point v 1 and the ending point v k and then define the distance to generate a buffer zone around the line segment. The buffer size is given based on the specific applications.
Step 2: Remove all vertices inside the buffer zone. The remaining vertices are Step 3: Create a new line segment connecting v 1 and v 2 and then define the distance to generate a buffer zone around the line segment.
Step 4: Examine whether v 2 is located in the buffer zone and remove or maintain v 2 by following the rules defined in Step 2.
Step 5: Repeat Steps 3 and 4 using other vertices until the line segment is connected to the end point.  d) Building height estimation using shadow region: When generating the simplified edge contour of a shadow region, we used the method presented in Section III-A to determine the type of shadow pattern for this shadow region extracted from a VHR image. Then, based on the shadow pattern of this shadow region, we could determine the shadow segment useful for building height estimation.
Assume that the length of an extracted shadow region is p num pixels in the VHR image and that the spatial resolution is sr. The length of this shadow (L AO 2 ) is computed by the following equation: Then, we substitute the L AO 2 obtained by (18) into (8) to obtain the height of the building associated with this shadow.

A. Experimental Dataset Shadow Pattern Classification
We collected 18 test images covering Los Angles, San Diego, and Las Vegas from google earth pro, which are shown in Fig. 7. The shadows in these collected test images varied in scale, size, orientation, and shape and were located in different landscape scenarios and contexts, including downtown, dense residential, sparse residential, and industrial areas.
In addition, we created the training dataset for shadow pattern classification. First, we created five original training images labeled by each basic pattern of ShadowClass, as shown in Fig. 2, and generated other training samples by data augmentation mentioned in Section III-A.

B. Shadow Pattern Classification
On the basis of the results of shadow polygon simplification, we fine tuned an ImageNet-Pretrained Inception_ResNet_V2 model, which was accessed from the TensorflowGithub repository, with the dataset prepared in ShadowClass. The fine-tuned CNN model was used to classify every simplified shadow polygon into a predefined basic shadow pattern of ShadowClass.  8. Illustration of the buildings visualized by VHR images, the results of shadow extraction, the shadow pattern to which an extracted shadow was assigned, and the selected shadow length for building height estimation. Table II presents the result of shadow pattern classification. The classification accuracy varied in a range between 90% and 95%.
The results, as listed in Table II, prove that our proposed definition for characterizing the shape of shadow is well recognizable for machine learning approaches for automatically processing and learning.
In Fig. 8, the gray polygons refer to the basic pattern in ShadowClass. Then, in the shadow pattern demonstrations and simplified shadow polygons, we drew red dotted lines that identify the position of the line segment from which we calculated the length for building height estimation. The lengths of these line segments were calculated using the method expressed in (8) and (18).

C. Shadow Length Extraction From Google Earth Pro
We accessed the solar and sun azimuth, image date, geographical coordinates, and altitude using the ruler tool in google earth pro and measured the length of a selected line segment useful for building height estimation on google earth pro. Then, we used the methods, as expressed in Section III-B, to calculate the building height. Fig. 9 shows the line segment we selected from google earth for building height estimation. The red line is the length of BO 2 and the solar azimuth (θ 2 ), as shown in Fig. 5(b), and the yellow line is the length of AO 2 and the sensor azimuth measured with the ruler tool (θ 1 ), as shown in Fig. 5(c). Fig. 9. Illustration of buildings visualized by the VHR image, the line segment for accessing the length of AO 2 and the sensor azimuth (θ 1 ), as shown in Fig. 5(c), and the length of BO 2 and the solar azimuth (θ 2 ), as shown in Fig. 5(c). Fig. 10. Illustration of the selected building shadow extraction by the Gabor filter, histogram equalization, traditional AGC, and AGCWD [30], [31]. Fig. 11. Illustration of the selected buildings visualized by VHR images, images enhanced by AGCWD, and results of raw shadow extraction, shadow contour extraction, and shadow polygon simplification.

D. Shadow Region Extraction From VHR Image
The result of shadow extraction plays a decisive role in building height estimation. Fig. 10 compares the image enhancement results with a Gaussian filter, histogram equalization, AGC, and AGCWD. The result generated by histogram equalization could not support shadow extraction because the objects were obviously visible in the shadow regions. The Gabor filter outperformed histogram equalization in differentiating between shadows and other land covers. However, the distribution of grayscale, or the intensity histogram, seemed imbalanced in the results, which might pose a challenge to illuminating the shadow regions to make them more distinct from other dark land covers, such as asphalt. Both AGC and AGCWD could make the shadow regions much more distinguishable. Only a few differences were observed in the results processed by AGC and AGCWD. AGCWD can effectively be used to generate an image-making shadow and make other land covers distinguishable. Fig. 11 shows the results of raw shadow extraction, shadow contour detection, and shadow region extraction after polygon simplification of the image processed by AGCWD. As the VHR image presented the detailed shapes of a majority of land covers, the results of the raw shadow extraction contained tree shadows, roads, and other dark land covers within the RGB channels. Moreover, a variety of objects visible in the shadow regions made these shadow incomplete and fragmented. Thus, raw shadow extraction from VHR images could not obtain a precise shadow region for building height estimation.
Furthermore, we removed the tree shadows using the method introduced above. As roads were connected to building shadow regions in some cases, we had to visually remove these roads. In the result of primary contour detection, the contours of shadow shapes still carried rough edges for precisely measuring the length of a line segment. We further simplified the contours of shadow shapes using the Ramer-Douglas-Peucker algorithm. In the results of shadow contour simplification, many rough shadow edges were smoothed and became straight.
Then, we classified the contour-simplified shadow into a pattern defined by ShadowClass and calculated the length of the useful shadow edge with spatial resolution. Table III lists the detailed information related to the testing of VHR images accessed from google earth pro, including image data, the city where the building is located, approximate geographical coordinates, and altitude. The 3-D buildings layer in google earth pro enables the measurement of the height of buildings through the 3-D path function of the ruler. For buildings lower than around eight floors, for which the corresponding 3-D models were not created, google street view photographs were used to measure their height and compare the results of ground truth building height to predictive building height.

E. Comparison of Building Height Estimation With Google Earth Pro and VHR Image
Moreover, we found the precise prediction of the height of buildings lower than five floors to be difficult since the building wall or the line segment BO 2 of these buildings was generally invisible from the VHR image. Thus, we only provided the results of building height estimation by Method 2.
As listed in Table III, the precision might be an issue from the results calculated by Method 1 (shadow-based building height estimation using open cyberinfrastructure). The errors were mainly from the position of the line segment selected from google earth pro. We have asked a couple of people to select and measure the line segment, finding that a tiny offset in the line selection would lead to a great difference in the result of building height estimation. However, we also found that the gap among different line segment selections was not large since the line segment could be visually recognized in a VHR image. Thus, we found that the influence of the errors of line segment selection was lower than 5 m.
Method 2 (shadow-based building height estimation using VHR image) could generate a higher precise result. The imprecise shadow region extraction mainly resulted in the errors observed in the results generated by the second approach. Moreover, the objects touching a shadow with a similar intensity accounted for the major challenge in precise shadow region extraction. However, the offset of the shadow region had less influence on the final product of building height estimation, as pixel length was relatively small in the VHR image.
Above of all, the results of building height estimation by Method 1 and Method 2 were close to the ground truth values. Without the support of elevation products (e.g., LiDAR, DTM, etc.), the shadow in both oblique and orthorectified VHR images could support the height prediction for low-, mid-, and high-floor buildings. But we have to admit that Method 1 and Method 2 could be practical for single and isolated buildings since these buildings could have their complete shadow shape.
On the occasion that the shadow of a building was complete and not overlapped with other shadows, the results in Table III also justify the conclusion that shadow-based building height estimation has good transferability in predicting the height of various types of buildings, such as apartments, houses, stores, tanks, and skyscrapers. The results in Table III and the illustrations in Fig. 11 confirm the practicality of the shadow patterns defined in ShadowClass in dealing with a variety of shadows of different shapes, orientations, scales, and sizes to calculate their length for building height prediction.

V. CONCLUSION
Shadows visible in a VHR image have been discovered to offer an economic solution to support large-scale building height estimation. Previous work proposed a number of approaches to represent the geometrical relationship among building positions, shadow shapes, and the sun and solar positions. Then, various solutions according to the availability of data sources are essential to promote shadow-based building height estimation. However, how to automatically determine which parts of shadow edges would be useful has not been profoundly investigated yet.
To promote the performance of the state-of-the-art approaches for shadow-based building height estimation, this study also proposes a classification system called ShadowClass to characterize building shadow patterns. The patterns defined in ShadowClass are valuable in determining the shadow length useful for building height estimation. For single and isolated buildings that have their complete shadow shape, the proposed method could be a practical solution.
In the future, efforts to generate accurate shadow extraction from VHR images maybe valuable. Moreover, a framework that integrates state-of-the-art CNNs into the process of shadow extraction and building height estimation is worthy of substantial attention.