An Improved Method for Individual Tree Segmentation in Complex Urban Scenes Based on Using Multispectral LiDAR by Deep Learning

Urban trees, as a characteristic element of the urban ecosystem, exert significant influences on climate supervision. Therefore, the extraction of individual trees in urban areas holds significant research value. However, the complexity of features in urban areas poses challenges to existing single tree segmentation algorithms, as they may be influenced by other nontree features. In this study, to reduce the influence of nontree categories, enhance the identification of edge features between adjacent tree crowns, and achieve precise delineation results of the single urban tree, an improved multistage method was proposed for tree points extraction and individual tree segmentation in urban scenes using multispectral LiDAR. First, the original three single-channel point clouds were preprocessed by intensity interpolation to generate a three-wavelength multispectral point cloud. Second, the Point Transformer deep learning network was employed for extracting urban tree points. Third, an improved tree mapping algorithm was introduced for individual tree segmentation in urban scenes, utilizing the extracted tree points. Finally, manual individual tree labeling and the high-resolution digital orthophoto map of the region were incorporated to measure the delineation precision of individual trees. It shows that the intersection over union of tree category in urban scene reaches 96.0%. Moreover, the F1-score for overall individual tree segmentation attains 92.8%. However, a comparison with existing algorithms reveals that the proposed method outperforms the traditional raster-based watershed method or point cloud clustering-based layer-stacking approach in the urban scene, improving the overall accuracy of single tree segmentation by 21.9% and 16.0%, respectively. These results highlight the enhanced applicability of the proposed multistage algorithm for urban scenes.

Abstract-Urban trees, as a characteristic element of the urban ecosystem, exert significant influences on climate supervision.Therefore, the extraction of individual trees in urban areas holds significant research value.However, the complexity of features in urban areas poses challenges to existing single tree segmentation algorithms, as they may be influenced by other nontree features.In this study, to reduce the influence of nontree categories, enhance the identification of edge features between adjacent tree crowns, and achieve precise delineation results of the single urban tree, an improved multistage method was proposed for tree points extraction and individual tree segmentation in urban scenes using multispectral LiDAR.First, the original three single-channel point clouds were preprocessed by intensity interpolation to generate a three-wavelength multispectral point cloud.Second, the Point Transformer deep learning network was employed for extracting urban tree points.Third, an improved tree mapping algorithm was introduced for individual tree segmentation in urban scenes, utilizing the extracted tree points.Finally, manual individual tree labeling and the high-resolution digital orthophoto map of the region were incorporated to measure the delineation precision of individual trees.It shows that the intersection over union of tree category in urban scene reaches 96.0%.Moreover, the F1score for overall individual tree segmentation attains 92.8%.However, a comparison with existing algorithms reveals that the proposed method outperforms the traditional raster-based watershed method or point cloud clustering-based layer-stacking approach in the urban scene, improving the overall accuracy of single tree segmentation by 21.9% and 16.0%, respectively.These results highlight the enhanced applicability of the proposed multistage algorithm for urban scenes.Index Terms-Individual tree crown (ITC) segmentation, multispectral LiDAR, point cloud deep learning, tree points extraction, urban scene.

I. INTRODUCTION
R ECENTLY, rapid urbanization has brought about the emergence of various environmental challenges and urban issues [1], such as air pollution, the heat island effect, and declining biodiversity [2], [3].Urban trees, as a distinctive element of the urban ecosystem [4], fulfill vital ecological roles in climate regulation [5], [6].Consequently, conducting comprehensive investigations and assessments of urban tree resources becomes crucial [7].Accurate identification of individual urban trees, along with a precise estimation of their biomass, is a significant research endeavor within this context [8], [9].The airborne LiDAR system (ALS) has emerged as a prominent remote sensing technology, enabling rapid acquisition of highprecision and high-resolution 3-D point cloud data depicting surface contours [10].Notably, LiDAR exhibits exceptional penetration capabilities through tree canopies.Its ability to capture large-scale and precise 3-D data and cost-effectiveness has addressed the limitations of manual measurements.Consequently, LiDAR finds extensive utilization in land cover classification [11], individual tree crown (ITC) segmentation [12], and forest structure estimation [13].In response to evolving application needs and increased accuracy requirements, LiDAR technology has advanced from single-channel systems to multispectral and even hyperspectral systems [14], [15].LiDAR with multispectrums capture synchronous spatial and spectral information of targets [16], resulting in improved visualization and enhanced feature recognition capabilities [17] when compared to traditional single-channel laser point clouds.However, it is worth noting that similar to multi-and hyperspectral remote sensing imagery, an increased number of spectral channels leads to higher spectral resolution but greater data redundancy [18], [19].This also presents challenges in the development and production of point cloud scanning systems with more channels.Thus, twoor three-band LiDAR is more practical and more suitable for wide-scale scientific research [20].Among the various multispectral LiDAR systems, the Titan three-wavelength LiDAR, manufactured by Teledyne Optech, Inc., has gained significant attention [21], [22], [23], [24].Renowned for its maturity and performance, this system has been widely applied in the domains of ITC detection and tree species identification [25].
In the field of forestry, the delineation of ITC using LiDAR point clouds has received considerable attention.Traditional algorithms for ITC segmentation can be broadly categorized into two types [26].First, canopy height model (CHM) based methods, which involve rasterizing the point cloud into two dimensions.These methods identify local maxima of the raster pixels as tree vertex locations and then delineate individual tree canopies through regional growth of these vertices based on specific conditions.Common algorithms within this category include a marker-controlled watershed approach [27], a tree mapping algorithm [28], and an energy function minimization-based approach [29].However, these algorithms can be affected by segmentation accuracy issues when dealing with data involving tree crowns that are cross stacked or have uneven height distributions [30].Second, point cloud clustering-based approaches, which involve performing extensive clustering of the 3-D point cloud to segment individual trees.Compared to CHM-based methods, these approaches avoid information loss resulting from the conversion of 3-D point cloud data to 2-D raster format [31].Common algorithms within this category include K-means clustering [32], point cloud segmentation approach [33], and layer stacking algorithm [34].Although both types of traditional algorithms mentioned above exhibit favorable performance in ITC segmentation, their input data are restricted to a specific format.In cases where nontree points exist, these algorithms may lead to over-or undersegmentation issues to a certain extent.Therefore, directly applying these algorithms for single tree extraction in complex urban scenes containing multiple feature types is not suitable.
In order to improve the segmentation accuracy, some ITC segmentation methods specifically for urban scenes have been proposed.Many scholars have noticed the values of multispectral LiDAR in feature recognition [35], [36], and proposed multistage extraction methods that first extract tree points in various types of features in urban scenes and then combine them with traditional single tree segmentation algorithms.Such methods usually use multiple spectral channels of multispectral point clouds to construct spectral feature indices (e.g., NDVIs), and then use the threshold method for masking or machine learning approaches such as SVM [37] and Random Forest (RF) [20] for point cloud classification and tree point extraction.With the advancements in deep learning, networks such as PointNet [38], RandLA-Net [39], and DGCNN [40] have emerged, enabling direct semantic segmentation to point clouds and offering new avenues for ITC extraction in urban scenes.Some algorithms use the PointNet network combined with voxel segmentation [41], [42] to extract tree points and obtain better results.However, the method suffers from the limitation of the network, which only learns the spatial information but may not fully utilize its spectral features of multispectral LiDAR.Based on the aforementioned analysis, there is an urgent need to design an approach to exclude the influence of nontree points in urban scenes, achieve precise tree point extraction, and improve the accuracy of ITC segmentation.
In this study, we propose an improved multistage approach for extracting tree points and performing ITC segmentation in urban scenes based on multispectral point clouds, leveraging deep learning techniques.This method employs the Point Transformer deep learning network, enabling improved learning and utilization of both spatial and spectral information present in multispectral LiDAR.This enables the extraction of tree points while minimizing the influence of other features within the urban scene.In addition, the original tree mapping algorithm is improved to enhance the edge features between adjacent canopies in CHM by morphological opening and closing reconstruction, and set a point threshold to filter out sparse single wood point sets to reduce oversegmentation.Finally, to highlight the performance of our method, we also compare with existing different types of algorithms for the extraction of single trees in the urban scene.
The contributions and innovations of this article include the following.First, we utilize multispectral point cloud data, which are different from ordinary single-channel point clouds, in conjunction with the state-of-the-art Point Transformer network to achieve accurate extraction of tree points in urban scenes.Second, the introduction of morphological opening and closing by reconstruction techniques to improve the original tree mapping algorithm [28], rectifying the local maxima and minima within the CHM, enhancing tree vertex recognition accuracy and edge features of the adjacent tree canopy.In addition, a postprocessing method of single-tree point number thresholding is proposed to reduce issues related to oversegmentation of ITC.

A. Study Area
The dataset employed in this experiment comprises threewavelength multispectral LiDAR data obtained using the airborne Optech Titan system.The flight area is located in and around the University of Houston, Texas, United States [20].Fig. 1 illustrates the high-resolution digital orthophoto map (DOM) image of the entire study area.
Our study area consists of 17 plots of size 300 m × 300 m, which is a subset of the flight dataset.Three of these plots were selected as test sets for tree points extraction and ITC segmentation experiments, while the remaining plots were exclusively utilized to train the deep learning network, as indicated by the highlighted region in Fig. 1.This area represents a complex urban scene characterized by abundant urban tree resources, varying tree distribution density, diverse tree species, and significant variations in tree height between tall and short trees.In addition, common urban features such as grass, buildings, vehicles, and electrical equipment are present.Therefore, it is of great research value for the ITC segmentation of urban scenes.

B. Titan Multispectral LiDAR Point Cloud Dataset
The Optech Titan, an ALS operating at three different wavelengths, is not strictly classified as a multispectral LiDAR, as its three laser channels disperse beams independently.It is a single-sensor system that employs three active lasers with wavelengths of 532 nm (green), 1064 nm (near-infrared), and 1550 nm (mid-infrared).Each laser beam is sampled at a frequency of 300 kHz.Specific parameters for the Optech Titan can be found in Table I.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I DETAILED PARAMETERS OF OPTECH TITAN
The final production dataset was in LASer file (LAS) format (http://dase.grss-ieee.org/).In this study, we define the point cloud data of urban scenes into six categories (instead of tree and nontree categories to get refined tree point extraction results): road, grass, building, tree, car, and powerline.The sample point cloud data of three single bands and classification results according to ground truth are shown in Fig. 2.

III. METHODS
The flowchart in Fig. 3 illustrates the proposed algorithm for delineating a single canopy in urban scenes using multispectral LiDAR combined with the Point Transformer deep learning network.First, through interpolating three single-intensity point clouds, the original data are fused into a three-channel multispectral point cloud.After applying ground filtering, total multispectral data within the study area are regionally divided, and semantic labels are assigned to generate training, validation, and testing sets.Here, all semantic labels are manually assigned to each point cloud data.These sets are then utilized for training the Point Transformer deep learning network, which extracts tree points from the whole urban scene data.Finally, after building CHM by the extracted tree points, an improved tree mapping approach is introduced to segment ITCs.

A. Preprocessing
The original Titan dataset includes point cloud files of its three independent intensity channels, and the point cloud data of each channel contain only the spectral intensity value of that channel.Previous studies [20], [43], [44] have shown that the fusion of spectral channels in three-band multispectral point cloud data enhances its information extraction capabilities compared to single-channel point cloud data.In order to take advantage of the multispectral information in the point cloud, just like utilizing the multiple spectral channels of each image pixel to help classify as in the case of multispectral raster images, we fused the three single-intensity-channel point clouds of the same region in the original Titan dataset into the corresponding true three-channel multispectral point cloud data of the region through intensity interpolation at first.
Our intensity interpolation method is based on [43].We use the point cloud data from the 1550 nm channel as a reference.For each point in this channel, we traverse the dataset, searching the k-point clouds from the two adjacent channels that are nearest to the current point.Subsequently, we employ the inverse distance-weighted interpolation method to allocate spectral intensities from the point clouds of the 1064 nm channel and the 532 nm channel to the 1550 nm channel's point cloud.The result is a true multispectral point cloud, matching the count of the 1550 nm channel, where each point contains information from all three spectral channels.Specifically, if the current point in the 1550 nm channel is denoted as  the central point.Subsequently, utilizing the 1064 nm channel as an illustrative instance, the intensity value I i 1064 of p i 1550 within this channel can be computed using formula (1), with a parallel approach employed for calculating the intensity value within the 532 nm channel where d i n represents the Euclidean distance between p i 1550 and p i,n 1064 , and I i,n 1064 represents the intensity value corresponding to p i,n 1064 .
After that, the cloth simulation filtering method [45] is used for differentiating above-ground points from ground points, facilitating our subsequent manual annotation of semantic labels in the point cloud data.Subsequently, we manually assign semantic labels to the point cloud data, categorizing them into six distinct classes: road, grass, building, tree, car, and powerline.The detailed split of the training set, validation set, and test set is illustrated in Fig. 1.We also convert the aforementioned point cloud datasets into ASCII format to adapt the subsequent Point Transformer network.Each ASCII point cloud file in this experiment comprises a matrix of n × 7, where n denotes the point number within the region, and 7 represents the features, encompassing spatial location (x, y, z), spectral intensity (I 1550 , I 1064 , I 532 ), and label.In addition, we choose three regions with different tree distribution densities as test sites to verify the generalizability of our approach.We categorized tree density levels based on the number of trees and stem density (number of trees per unit area).Due to the different distribution of trees in urban scenes compared to traditional forested areas, we calculated tree coverage only in regions with concentrated tree distribution, excluding areas devoid of tree growth such as buildings and main roads.In the three selected test plots, stem density exhibits a gradient increase from low to high, and it demonstrates a certain level of distinctiveness.The specific descriptions are demonstrated in Table II.

B. Urban Tree Points Extracting by Point Transformer Network
Owing to the potential influence of nontree elements on the accuracy of ITC segmentation, we initially conducted semantic segmentation to extract points that only contain the category of tree.In comparison to traditional RF algorithms, the Point Transformer [46] network demonstrates enhanced capability in learning and leveraging the spectral channel features of multispectral point cloud data, thereby improving the accuracy of urban tree points extraction.
Point clouds can be perceived as sets of irregular points in a multidimensional space.The self-attention mechanism, functioning as a set operator, is highly suitable for 3-D point cloud processing.Therefore, the Point Transformer is built upon the foundation of the self-attention mechanism.
The Point Transformer network adopts a typical encoderdecoder structure, consisting of three fundamental modules: transition down, transition up, and the point transformer block.The transformer layer serves as the core of the Point Transformer block.It represents a vector self-attention and position encoding layer.During the encoding stage, by utilizing the transformer layer, the network can learn local feature information at different levels of point clouds.For more details, refer to the original paper [46].

C. Individual Tree Crown Segmentation Using Improved Tree Mapping Algorithm
Based on the extracted tree points from the whole point cloud data in urban scenes, we propose an improved tree mapping algorithm to perform ITC segmentation.First, we generate digital terrain model through ground points and construct a digital surface model using the extracted tree points in the previous step.Then, we obtain the CHM by ( 2) and taking their intersection.It is important to note that the pixel resolution for all three models is set at 1 m × 1 m.CHM = DSM − DTM. ( The original tree mapping algorithm [28] employs Gaussian median filtering to preprocess and a circular moving window to identify local maxima within the CHM.Subsequently, utilizing these local maxima as initial seed points, ITC is delineated by growing the area outward from the seed points under the constraints of four manually set threshold parameters according to tree crown-growing method [47].Then, the initial ITC points are extracted by masking the corresponding point cloud data with the vector polygons representing the single canopy divisions in the CHM.
CHM images exhibit a characteristic gradient descent from tree crown vertices to the edges.The pixel values also gradually decrease from vertices to the edges.This structure aids in extracting tree crown vertices and setting specific threshold conditions for region growing to delineate individual crown layer pixels.
Due to the linear scanning pattern of airborne multispectral LiDAR sensors, there is approximately a 0.5 m gap between each scan line in the acquired point cloud data.This results in an incomplete representation of the structure of ITCs.Consequently, when converting tree points to CHM, there are gaps corresponding to the scan lines between pixels representing the same ITC (see Fig. 4).The incomplete representation of tree crown pixels disrupts the gradient characteristics of ITCs and the gradual decrease in pixel values from tree vertices to edges.This significantly impacts the accuracy of subsequent tree vertex search and crown region growth.Based on this, we introduce morphological opening and closing by reconstruction techniques to improve the original tree mapping algorithm.As one of the methods in multiscale morphology, morphological opening and closing by reconstruction has the advantage of not introducing new edges and edge displacement, meeting the requirements for edge localization.This makes it well-suited for processing CHM, as it reconstructs the pixel gradient inside a single tree crown while avoiding changes at the tree crown edge.To mitigate noise, enhance tree vertex recognition accuracy, and improve the representation of canopy contours, we reconstruct the CHM with morphological gradients using morphological opening and closing by reconstruction operation based on the Gaussian median filtering results, as demonstrated in the following formulas [48]: where • and • are morphological opening and closing by reconstruction, respectively; • and • represent morphological open and closed operations; g is the input raster image and B is the structure element, i.e., the convolution kernel, typically adopting a rectangular shape with dimensions of k×l; reconstruction by dilation operation is denoted by δ (rec) and reconstruction by erosion denoted by ε (rec) .Furthermore, the reconstruction by dilation and erosion operation can be defined as follows: where ρ represents the reference function and it is set to the same with the input raster image g in morphological opening and closing by reconstruction operations.
To elaborate further, the concept of morphological opening by reconstruction involves reconstructing an image by performing a dilation operation after an initial open operation.Conversely, morphological closing by reconstruction entails reconstructing an image by applying an erosion operation after a closed operation.We first perform opening and then closing by reconstruction to CHM.By integrating both the two morphological reconstruction techniques, it becomes feasible to eliminate both bright and dark details present in the gradient images.This process also helps in rectifying the local maxima and minima within the CHM.Consequently, oversegmentation resulting from noise and intricate details can be mitigated effectively.
In addition, for our proposed multistage individual tree segmentation method, the quality of the tree point extraction results in the first stage directly influences the accuracy of the subsequent individual tree segmentation algorithm.Specifically, in the tree point extraction results, there are usually some points belonging to categories such as buildings and power lines that are oversegmented as tree points.These points are often discrete and sparse.After converting these points into CHM pixels, some are treated as outliers and removed during morphological opening and closing reconstruction.The remaining points are identified as independent individual trees, leading to a certain degree of oversegmentation in the final individual tree segmentation results.Based on the sparsity characteristic of each point set in these oversegmented individual trees, meaning they contain relatively few points, we set an individual tree point count threshold.We traverse each point set, and if the current point set's count of tree points is less than our set threshold, we consider it as an oversegmented sparse individual tree and exclude it.We conducted a detailed statistical analysis of the point counts in each individual tree point set in the initial individual tree segmentation results.Through iterative experiments, using a rolling shutter operation, we identified the optimal point number threshold that effectively removes oversegmented trees while preserving correctly segmented trees.According to our screening, a point number threshold of ten points is reasonably suitable for our dataset.

D. Accuracy Evaluation Method 1) Evaluation of Tree Point Extraction: Intersection over
Union (IoU) score is a standard performance metric for object category segmentation problems [49].Therefore, we utilize IoU to evaluate the accuracy of tree point extraction after semantic segmentation by machine learning or deep learning networks.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

The calculation formula is as follows:
IoU = TP sem TP sem + FP sem + FN sem (7) where TP sem is the number of points correctly classified as belonging to the positive class, FP sem is the number of points incorrectly classified as belonging to the positive class, and FN sem is the number of points incorrectly classified as belonging to the negative class.The abovementioned parameters are used solely during the accuracy assessment stages of semantic segmentation and tree point extraction.
2) Evaluation of Individual Tree Crown Segmentation: We obtained the ground truth for ITCs by manually segmenting and labeling the point cloud data, serving as the basis for evaluating the accuracy of individual tree segmentation, and we used the high-resolution DOM image of the respective area as Supplementary Material.The manual labeling of individual trees in the three test plots was conducted using the Cloud Compare software (https://www.cloudcompare.org/),resulting in the delineation of 306, 138, and 149 single trees in each respective plot.The labeling process took a total duration of about 8 h.The ground truth for ITCs can be found in visual results of ablation experiments and comparisons with other methods in Sections IV-C and IV-D.We compared the experimental results with the true values and calculated recall, precision, and F-score values as accuracy evaluation indexes, respectively, to assess the performance of the urban scene ITC segmentation algorithm in this study.The formulas are given as follows [26]: F − score = 2 × recall × precision recall + precision (10) where True positive (TP) is the number of trees segmented correctly, false negative (FN) represents the number of segmented trees that were not detected, while false positive (FP) is the number of segmented trees but not existed in reality [41].Note that the parameters here refer to the quantity of trees not the point count, as mentioned in Section III-D1.

A. Results of Tree Points Extraction in Urban Scene by Point Transformer
In this experiment, the training of the Point Transformer network was conducted on an NVIDIA RTX 3090 GPU.During the training stage, we set the batch size to 6, with point number of 4096 × 16 per batch.The number of epochs is set to 250, the learning rate is 0.001, and we use the Adam optimizer with a dropout rate of 0.5.Other parameters, such as network depth, remain consistent with the original paper of the network [46].
As shown in Fig. 5, the trend curves of both loss and mean accuracy (mAcc) illustrate a decrease and increase in general, respectively.During the training process, a significant increase in the loss value occurred at the 109th epoch.Subsequently, for the For our three-channel fused data, the OA and mIoU of all three study regions reached more than 88% and 81%, respectively.In particular, the tree class demonstrated the highest IoU value among all classes, exceeding 92% in all three test plots (quantitative details can be found in Section IV-C).This indicates that tree points were extracted with minimal misclassifications and omissions, providing a solid foundation for accurate subsequent ITC segmentation.To highlight the advantages of the Point Transformer deep learning network, we compared the tree point extraction results using the mainstream RF classifier in this researching aspect with spatial location, spectral intensity, and six spectral indices [20].
In addition, besides the machine learning-based RF, we tested the tree point extraction ability of common deep learning networks on this dataset.The split of the dataset for the comparative experiments is consistent with our own experiment, as illustrated in Fig. 1.During the training process, we set the batch size to 6, with a point number of 4096 × 16 per batch.The number of epochs is set to 250, the learning rate is 0.001, and we use the Adam optimizer with a dropout rate of 0.5.These parameters remain consistent with those used in training Point Transformer.Other parameters, such as network depth, k-nearest neighbors, and so on, are kept consistent with the optimal parameters proposed in the references corresponding to this network.The results are summarized in Fig. 6 and Table III.
Notably, all deep learning networks, except for PointNet, outperformed the RF classifiers in terms of tree point extraction accuracy.Since the Titan dataset exhibits relatively low point cloud density, the PointNet was found to be less suitable.The remaining four networks demonstrated robust spatial and spectral  feature learning capabilities.Among these, the DGCNN network achieved a balanced tree point extraction performance across the three test sets, with its tree category IoU ranging from 93.63% to 95.17%.However, as seen in Fig. 6, the phenomenon of oversegmentation of tree points in DGCNN is quite pronounced.Some fragmented building and power line points are segmented as tree points, which will exacerbate the issue of oversegmentation in subsequent individual tree segmentation.On the other hand, RandLA-Net and Point Transformer network, which involve a downsampling process, experienced a slight impact on tree point IoU in Area 2-1, where the point cloud density is relatively low.However, they exhibited higher tree point extraction accuracy in the other two areas.In specific terms, RandLA-Net and Point Transformer achieved tree category IoU values of 90.17% and 92.85%, respectively, in Area 2-1.However, in the other two regions, both networks achieved tree category IoU values exceeding 94% and 97%, respectively.From the visual results, it can be observed that these two downsampling networks may not capture the completeness of individual tree point clouds as effectively as DGCNN.However, they exhibit less oversegmentation of tree points, which is more favorable for the subsequent accuracy of individual tree segmentation.The missing points in smaller parts of the individual tree point cloud do not significantly impact the overall identification of a single tree.In contrast, the oversegmentation of fragmented tree points would significantly increase the number of oversegmented trees during individual tree segmentation.Overall, the Point Transformer Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.network demonstrated the best tree point extraction performance across the three test sets, boasting an overall tree point IoU of 95.97%.This is the primary reason why we ultimately selected this network for our study.

B. Results of Urban Scene Individual Tree Crown Segmentation
To overcome the limitations of traditional ITC segmentation algorithms when nontree points are included, we have developed an improved multistage method for extracting ITC in urban scenes.Therefore, the performance of extracting tree points in the previous step has a direct impact on the accuracy of ITC segmentation.Based on the analysis of experimental results, we have observed that this impact primarily manifests as oversegmentation in the extraction of ITC.The Point Transformer network, with its superior ability to capture detailed information for above-ground category segmentation, ensures that the overall point cloud of each tree is correctly classified.This means that there are no instances where the point cloud of an entire tree is erroneously assigned to other classes, resulting in minimal impact on undersegmentation.However, there may be some individual points within the point cloud of other features (e.g., building) that are misclassified as tree category.During segmentation, these misclassified individual points may be treated as individual trees, leading to a certain degree of oversegmentation in the extraction of ITC.To address this issue, we apply a filtering process to the CHM prior to segmentation.Performing morphological opening and closing by reconstruction on the CHM enhances the edge features between adjacent canopies as shown in Fig. 7, to some extent, and mitigates the oversegmentation phenomenon.By setting a point threshold, we filter out sparsely segmented trees, thereby mitigating the oversegmentation caused by these misclassified fractions.The results of ITC segmentation under three study areas with different tree distribution densities in this article are illustrated in Fig. 8 and Table IV.
We apply a random color assignment to each individual tree (repeated colors are not applicable for adjacent trees) to differentiate between different single tree.Obviously, the proposed algorithm performs exceptionally well on the Titan dataset, achieving an overall accuracy of 92.8% for ITC segmentation.In area 3-3, where the tree density is relatively low, the algorithm achieves the highest accuracy of 95.6%.Although the accuracy of ITC segmentation slightly decreases as tree density increases, even in area 2-1, where the density is the highest, the overall accuracy still reaches a commendable 91.2%.The overall recall rate across the three study areas stands at 93.8%, with minimal variation based on tree distribution density, and fewer undersegmentation phenomena.
The main reasons for the undersegmentation of single trees may include the close proximity of two trees with a significant height difference.In such cases, local maxima are sought at tree vertices, where the pixel values at high tree vertices are much larger than those at short tree vertices.Consequently, pixels corresponding to the short tree vertices may not be included in the local maxima.In addition, individual low trees with very small height may be filtered out as noise during the segmentation stage.In contrast, the accuracy of the algorithm is more influenced by the density of tree distribution.The ITC segmentation accuracy increases from 89.1% to 97.0% when transitioning from highdensity to low-density regions.In medium-and low-density areas, where single trees are more sparsely arranged, the edge features between adjacent canopies exhibit a certain degree of differentiation, resulting in fewer instances of oversegmentation.Conversely, in high-density areas, where trees are closely spaced, the edge features of two adjacent intertwined canopies become fused during the region growth process, leading to an oversegmentation phenomenon.

C. Ablation Experiment
To affirm the validity and benefits of our preprocessed threechannel multispectral data, we conducted training using singlechannel raw datasets in the Point Transformer network, maintaining identical network parameters and processes.The sole distinction lies in the input features, where our fused data simultaneously incorporate three spectral intensity values, while the raw data employs single-channel intensity values.The quantized results are detailed in Table V. Concurrently, a visual comparison of tree points extraction results between the three-band fusion    data and the single-band data (using the 1550 nm channel as a reference) is depicted in Fig. 6.
It is noteworthy that the prediction results obtained from our fused data outperform the other three single-band raw datasets.Specifically, the mIoU and IoU for the tree category of the fused data exhibit the highest values across all three test regions.Notably, the IoU for the tree category demonstrates an improvement ranging from 2% to 5% when compared to the raw data.This substantiates the efficacy of our fused data in the task of extracting tree points from urban scenes.
Furthermore, to validate the effectiveness of the components in our proposed multistage segmentation method, we established three comparative groups based on RF with the original tree mapping algorithm, point transformer with the original tree mapping method, as well as our proposed algorithm.These three comparison groups were set up to assess the performance of the point transformer and the improved tree mapping algorithm on our multispectral point cloud dataset in urban scenes.The visual results and quantitative evaluations are shown in Fig. 9 and Table VI, respectively.
It can be observed that the oversegmentation phenomenon is more severe in the ITC segmentation results based on the RF classifier.In contrast, the precision value and F-score based on the point transformer with the original tree mapping algorithm have improved by 12.2% and 9.3%, respectively, compared to the former.This is mainly because, compared to the RF classifier, the point transformer has better tree point extraction capability, resulting in fewer cases where buildings, powerlines, and other objects are mistakenly classified as tree categories.As a result, the oversegmentation phenomenon in the final single-tree segmentation results is also reduced.On this basis, our proposed algorithm further improves the accuracy of ITC delineation and filters out some sparsely segmented single-tree point sets caused by errors in the tree point extraction stage.Our improved tree mapping algorithm shows an increase of 6% in the precision value and 2.9% in the F-score compared with the original algorithm, demonstrating the effectiveness of the improvement.

D. Comparison With Existing ITC Segmentation Methods
We also compared the performance of our algorithm with current existing ITC segmentation algorithms.For a fair comparison, we selected three distinct types of ITC segmentation Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.[51] and layer stacking [34], the input data consist of the raw data without tree point extraction.For the multistage segmentation methods, PointNet with gradient-based clustering proposed by Chen et al. [41] and DGCNN with our enhanced tree mapping algorithm, we employ the tree points extracted by the deep learning network in the preceding stage as input data for the subsequent segmentation of ITC.Regarding the two instance segmentation algorithms based on image deep learning, Faster R-CNN proposed by Windrim and Bryson [52] and RandLA-Net with YOLO-v3 proposed by Chang et al. [53], the former directly utilizes the feature maps composed of the raw data as input to the image deep learning network.Meanwhile, the latter, also being a multistage method, utilizes the tree points extracted by RandLA-Net to construct the feature maps as input for the subsequent instance segmentation.The visual results and quantitative evaluations are shown in Fig. 10 and Table VII.
Compared with the traditional algorithms, the multistage algorithms and instance segmentation algorithms based on image deep learning networks have significant advantages in urban scenes.This is mainly because nontree objects such as buildings and powerlines in the input data have a significant impact on traditional algorithms, leading to severe oversegmentation issues.In contrast, the multistage algorithms use deep learning networks to extract accurate tree points before ITC segmentation, mitigating the influence of nontree objects to some extent.In addition, instance segmentation algorithms based on image deep learning also have the capability of filtering out nontree targets.
In the multistage methods based on deep learning and point cloud clustering, there is a degree of oversegmentation observed in the segmentation of ITC instances in the latter stage, attributed to the limitations of the point cloud semantic segmentation network employed.In contrast, DGCNN with our improved tree mapping algorithm achieves higher accuracy owing to the comparatively less satisfactory tree point extraction performance of PointNet.Our proposed algorithm, on the other hand, outperforms the former by 3.7% and 7.8% in terms of overall accuracy and p-value, respectively, which proves that our utilization of Point Transformer as a tree point extractor can provide better data conditions for the subsequent ITC segmentation.
Both comparison groups of instance segmentation algorithms based on deep image learning exhibit relatively favorable results.However, their performance in urban scenes may not match the original performance observed in forested areas.In the case of the Faster R-CNN-based algorithm, the construction of the input feature map is relatively simple.Without prior extraction of tree points, some real tree individuals are susceptible to being overlooked due to the similarity of their features with other nontree entities.For RandLA-Net with YOLO-v3, which also belongs to the multistage method, the segmentation accuracy is further improved.This improvement stems from more accurate tree point extraction and the incorporation of additional tree structure information in the feature map construction.Nevertheless, compared with forested areas, the greater variability in tree height and canopy scale within urban scenes still impacts the training results of image instance segmentation to some extent.Our proposed method outperforms the former by 1.3% in terms of the final F-score in ITC segmentation.In addition, our approach eliminates the need for extensive preprocessing before initiating image deep learning network training and the prolonged training periods.This underscores the robust applicability and value of our research in this field.

V. CONCLUSION
This article discussed the potential and effectiveness of multispectral LiDAR by using the point cloud deep learning for urban scene ITC segmentation tasks.The improved multistage ITC segmentation method was proposed which can efficiently reduce the influence of nontree categories and enhance the identification of edge features between adjacent tree crowns.First, we perform Point Transformer deep learning network to extract tree points precisely from original urban scene data.On this dataset, its overall IoU of tree category reaches 96.0%, which is 16.2% higher than the mainstream RF algorithm in this aspect.Then, an improved tree mapping algorithm was proposed to achieve ITC segmentation based on the tree points extracted by Point Transformer, with an overall accuracy of 92.8%, reducing the oversegmentation phenomenon of the individual tree to a certain extent.In addition, compared to traditional CHM-based and point cloud clustering-based algorithms, our method exhibits a remarkable improvement in overall accuracy, with enhancements of 21.9% and 16.0%, respectively.Thus, our method exhibited the potential for the extraction of a single tree in urban scenes.
In summary, the contributions of this article are realizing precise tree points extraction in urban scenes using Point Transformer deep learning network and introducing morphological opening and closing by reconstruction techniques to improve the tree mapping algorithm.In addition, a postprocessing method of single-tree point number thresholding is proposed to filter out the sparse single tree.
However, this study does not take into account the extracted completeness of each individual tree.In future work, we will further research the above-mentioned problem and explore the inversion of urban tree biomass based on the existing findings.In addition, considering the advancements in hyperspectral Li-DAR, we will also explore its potential in urban scene feature classification and biomass inversion.

Fig. 1 .
Fig. 1.DOM image of the study area.The region marked by the red box represents the training plots, the area designated by the blue box serves as the validation plots, and the region outlined in yellow indicates the test plots.

Fig. 2 .
Fig. 2. Sample point cloud displayed by the single intensity of (a) 1550 nm channel, (b) 1064 nm channel, (c) 532 nm channel, and (d) sample point cloud rendered by ground truth.

Fig. 4 .
Fig. 4. (a) Top-down view of tree point cloud.The point cloud data are composed of scanning strips, and red circles indicate the gaps between different scan lines within the same tree crown.(b) CHM image constructed from the point cloud in (a).Red circles highlight the gaps between pixels representing the same tree crown caused by the gaps between scan lines.(c) CHM after morphological open and closed reconstruction processing from (b).Its pixel values decrease gradually from the tree crown vertex toward the surroundings, reaching the minimum value at the tree crown edge.

Fig. 5 .
Fig. 5. Trend curves of (a) training loss and (b) training-mAcc for point transformer network in this experiment.

Fig. 6 .
Fig. 6.Comparison of tree points extraction results among different machine learning classifier and deep learning networks on the multispectral point cloud dataset for (a) area 2-1 (high density), (b) area 3-3 (low density), and (c) area 5-3 (medium density).The red circular box highlights the significant disparities in the tree points extraction results between the different methods.To substantiate the advantages of employing multispectral fused data, we also present the training results for the point transformer using single-band raw data (using 1550 nm band as a reference).

Fig. 7 .
Fig. 7. (a) Original CHM without preprocessing.(b) CHM after morphological opening and closing by reconstruction.The gaps between individual crown layer pixels caused by scanning lines are eliminated, and the reconstructed pixel values decrease gradually from the tree top to the surrounding area.

Fig. 9 .
Fig. 9. Results of ablation experiment for (a) area 2-1 (high density), (b) area 3-3 (low density), and (c) area 5-3 (medium density).The regions highlighted by the red boxes in the figure emphasize the significant segmentation differences between different components.Here, RF represents random forests classifier; PT represents point transformer network; OTM represents original tree mapping algorithm.The red circular box highlights the significant disparities in the tree points extraction results between the different methods.

Fig. 10 .
Fig. 10.Results of comparison with existing ITC segmentation methods for (a) area 2-1 (high density), (b) area 3-3 (low density), and (c) area 5-3 (medium density).Here, GBC represents gradient-based clustering; ITM represents our improved tree mapping algorithm.The red circular box highlights the significant disparities in the tree points extraction results between the different methods.
An Improved Method for Individual Tree Segmentation in Complex Urban Scenes Based on Using Multispectral LiDAR by Deep Learning Jian Yang , Ruilin Gan , Binhan Luo , Ao Wang, Shuo Shi , Member, IEEE, and Lin Du

TABLE II DETAILED
PARAMETERS OF TEST REGIONS FOR ITC SEGMENTATION Fig. 3. Flowchart of the methodology proposed in this article.

TABLE III COMPARISON
OF TREE POINTS EXTRACTION RESULTS BETWEEN DIFFERENT MACHINE LEARNING OR DEEP LEARNING METHODS

TABLE V COMPARISON
OF TREE POINT EXTRACTION PERFORMANCE BETWEEN MULTIBAND FUSED DATA AND SINGLE-BAND RAW DATAAuthorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.