Web-SpikeSegNet: Deep Learning Framework for Recognition and Counting of Spikes From Visual Images of Wheat Plants

Computer vision with deep learning is emerging as a significant approach for non-invasive and non-destructive plant phenotyping. Spikes are the reproductive organs of wheat plants. Detection and counting of spikes considered the grain-bearing organ have great importance in the phenomics study of large sets of germplasms. In the present study, we developed an online platform, “Web-SpikeSegNet,” based on a deep-learning framework for spike detection and counting from the wheat plant’s visual images. The architecture of the Web-SpikeSegNet consists of 2 layers. First Layer, Client-Side Interface Layer, deals with end user’s requests and corresponding responses management. In contrast, the second layer, Server Side Application Layer, consists of a spike detection and counting module. The backbone of the spike detection module comprises of deep encoder-decoder network with hourglass network for spike segmentation. The Spike counting module implements the “Analyze Particle” function of imageJ to count the number of spikes. For evaluating the performance of Web-SpikeSegNet, we acquired the wheat plant’s visual images, and the satisfactory segmentation performances were obtained as Type I error 0.00159, Type II error 0.0586, Accuracy 99.65%, Precision 99.59% and F1 score 99.65%. As spike detection and counting in wheat phenotyping are closely related to the yield, Web-SpikeSegNet is a significant step forward in the field of wheat phenotyping and will be very useful to the researchers and students working in the domain.


Background
Wheat is one of the major food crops grown yearly on 215 million hectares globally [Wheat in the world CGIAR: https://wheat.org/wheat-in-the-world/ ].It supersedes maize and rice in terms of protein sources in low-and middle-income nations.Climate change and associated abiotic stresses are the key factors of yield loss in wheat.Generic improvement in yield and climate resilience is critical for sustaining food security.One of the key aspects of genetic improvement is the determination of complex genome × environment × management interactions [1].High-dimensional plant phenotyping is needed to bridge the genotype-phenotype gap in plant breeding and plant health monitoring in precision farming.Visual imaging is the most commonly used cost-effective method to quantitatively study of plant growth, yield, and adaptation of biotic and abiotic stresses.Besides, it is strongly reasoned that the imminent trend in plant phenotyping will depend on imaging sensors' combined tools and machine learning [2].Yield estimation in wheat has received significant attention from researchers.The number of spikes/ears determines the grain number per unit area and thus yield.Counting of spikes through traditional methods using naked-eye is a tedious and time-consuming job.Presently, non-destructive image analysis based phenotyping is gaining momentum and proved as less laborious and fast method.A cluster of research works available in the area of computer vision to detect and characterize spikes and spikelets in wheat plants [3,4,5,6,7,8].In computer vision, the problem of spike detection lies under the domain of pixel-wise segmentation of object.[4], [5] and [7] used manually defined color intensities and textures for spike segmentation.Pound et al. (2017) [6] and Hasan et al. (2018) [8] used Autoencoder [9] and Region-based Convolutional Neural Network (R-CNN) [9] deep-learning technique, respectively, to detect and characterize spikes with greater than 90 percent accuracy.Recently, Misra et al.(2020) [3] developed a deep learning model known as SpikeSegNet, which was reported as an effective and robust approach for spike detection (accuracy: 99.91 percent) and counting (accuracy: 95 percent) from visual images irrespective of various illumination factors.In this paper, a web-solution is presented as "Web-SpikeSegNet" for spike segmentation and counting from visual images of wheat plants for easy accessibility and quick reference.The developed web-solution has a wide application in the plant phenomics domain and will be useful for the researchers and students working in the field of wheat plant phenotyping.Web-SpikeSegNet is platform independent and is readily accessible by at the URL: http://spikesegnet.iasri.res.in/.

Implementation
Web SpikeSegNet is developed based on the approach give by Misra et al. (2020) [3].The approch is based on convolutional encoder-decoder deep-learning technique for pixel-wise segmentation of spikes from the wheat plant's visual images.The architecture of the network was inspired by UNet [10], SegNet [11], and PixISegNet [12], which are popularly used in various sectors for pixel-wise segmentation of objects.SpikeSegNet consists of two modules viz., Local Patch extraction Network (LPNet) and Global Mask Refinement Network (GMRNet), in sequential order.The details of the approach are given in [3].Input images were divided into patches before entering into the LPNet module to facilitate local features' learning more effectively than the whole input image.LPNet was used in extracting and understanding the contextual and local features at the patch level.Output images of the LPNet are further refined at GMRNet for better segmentation of the spikes as given in Fig. 1.SpikeSegNet network was trained using visual images of the wheat plant and its corresponding ground-truth segmented mask images with class labels (i.e., spike regions of the plant image).Details of the dataset preparation for training the network were given in [3].SpikeSegNet provides significant segmentation performance at pixel-level in spike detection and counting, and is also proved as a robust approach when tested for different illumination levels that may occur in the field conditions.
Architecture of the proposed software -"Web-SpikeSegNet" Web-SpikeSegNet is web-based software for the detection and counting of spikes from visual images of the wheat plant.It is developed and implemented on the Linux operating system with 32 GB RAM and NVIDIA GeForce GTX 1080 Ti graphics card (with the memory of 11 GB).PyCharm version 5.0 integrative development environment developed by Jetbrains [https://www.jetbrains.com/] was used for the development of the software.The software architecture consists of two layers, namely Client-Side Interface Layer (CSIL) and Server Side Application Layer (SSAL).The architecture of Web-SpikeSegNet is given in Fig. 2. Hyper Text Markup Language [13], Cascading Style Sheets [14], and JavaScript [15] technologies were used as a building block of CSIL which deals with the end-user's requests and its corresponding responses management.SSAL consists of two modules: spike detection and spike counting module.Spike detection module was developed using python libraries such as Tensorflow [16], Keras [17], Numpy [18], Scipy [19], Matplotlib [20] and OpenCV [21] for constructing and implementing the deep learning model.Convolutional encoder network [9] (Encoder SpikeSegNet), decoder network [9] (Decoder SpikeSegNet), and bottleneck network ( [9], [12]) using stacked hourglasses (Bottleneck SpikeSegNet) are the backbone of LPNet, GMRNet and correspondingly the SpikeSegNet.The number of encoders, decoders, and stacked hourglasses was estimated empirically, as given in [3], to produce the best results by considering the optimum performances.Encoder SpikeSegNet consists of 3 encoder blocks, and the output feature-maps of each encoder block are forwarded to the next encoder block for further feature extraction.Each encoder block consists of two convolution layers, each with the square filter of size 3*3 [22] with a varying number of filters (16,64,128) followed by ReLU [23] and max-pooling layer with a window size of 2*2 [24].Square filters are popularly used in state-of-art methods [25], and the mentioned window size is considered as standard [10,26].Batch Normalization, a statistical procedure, is done to improve the performance as well as stability of the network.Decoder SpikeSegNet network facilitates a special operation called transpose convolution [27], which up-sampled the incoming features to regenerate or decode the same.The resulting up-sampled feature maps are then concatenated/ merged with the corresponding encoded feature maps of the Encoder SpikeSegNet.Merge operation helps in transferring the spatial information across the network for better localization of the segmented masks.The Decoder SpikeSegNet contains three decoder blocks, and each decoder block consists of two convolution layers (with filter size 3*3) with a varying number of filters (128, 64, 16) as opposite to each encoder block in Encoder SpikeSegNet and followed by ReLU operation to decode the features.The output of the final decoder was fed into the "SoftMax" (liu2016large) activation layer for classifying objects (or spikes).Bottleneck SpikeSegNet network contains three hourglasses, which provide more confident segmentation by concentrating the most essential features captured at various occlusions, scale, and viewpoints [10,8].Each hourglass comprises a sequence of residual blocks containing three convolution layers of filter size 1*1, 3*3, and 1*1 sequentially with depth (or, number of filters) 128, 128, and 256 respectively, estimated empirically on the basis of optimal performances.Algorithms for implementing Encoder SpikeSegNet, Decoder SpikeSegNet, Bottleneck SpikeSegNet, LPNet, and GMRNet are presented in Algorithm 1, 2, 3, 4, and 5, respectively.The Spike counting module is integrated with the output of the Spike detection module in SSAL.For this purpose, the "Analyze Particle" functions of imageJ [28] was applied to the output image of GMRNet, which is a segmented mask image or binary image containing spike region only."Analyze Particle" function implements a flood-fill technique [29] for counting of object.

Performance measurement of Web-SpikeSegNet
For evaluating the segmentation performance to detect the spikes, 100 wheat plant's visual images were captured and tested.Images were acquired using the LemnaTec facility installed at Nanaji Deshmukh Plant Phenomics Center, New Delhi, India.For the purpose, the resulting segmented images (I pred ) using the Web-SpikeSegNet software are compared with the corresponding ground-truth mask images (I grtr ) which were prepared by ensuing the steps mentioned in [3].Segmentation performances are calculated using the following [Eq.(1) to Eq. ( 10)] statistical parameters [30,31,32]: Type I Error (E 1 ): For any r th test image, exclusive-OR operation is done to compute pixel-wise classification error (P ix Err r ) between (I pred ) and the corresponding (I grtr ) image of size p×q, E 1 is computed by averaging the P ix Err r of all the test images: Algorithm 2 Decoder SpikeSegNet: Decoding operation of SpikeSegNet T Where, n is the total number of test images.E 1 lies within [0, 1].If the value of E 1 is close to "0", it refers minimum error, whereas if E 1 is close to "1", it signifies large error.
Type II error (E 2 ): For any r th test image, the error rate E 2 r is computed by the average of false-positives (FPR) and false negatives (FNR) rates at the pixel level defined as: Where, Algorithm 3 Bottleneck SpikeSegNet E 2 is computed by taking the average errors of all the input test images as given below: Following performance parameters are also used for measuring the segmentation performance of the Web-SpikeSegNet at pixel level to identify/detect spikes as follows: • True positive (TP): number of pixels correctly classified as spikes.
• True Negative (TN): number of pixels correctly classified as non-spikes (other than spike pixels).
• False Positive (FP): number of non-spike pixels classified as spikes pixels.
• False Negative (FN): number of spike pixels classified as non-spikes pixels.Then Precision, Recall, F-measure and Accuracy can be defined as: measures the percentage of detected pixels are actually spikes measures the percentage of actually spikes spike pixels are detected measures performance of the Web-SpikeSegNet measures robustness of the Web-SpikeSegNet in detecting or identifying spikes

Results
To demonstrate the working environment of Web-SpikeSegNet, a case study is presented here.For this purpose, sample images of the wheat plant grown in the pot were collected using the LemnaTec imaging sensor installed at Nanaji Deshmukh Plant Phenomics Centre, ICAR-IARI, New Delhi, India.The architecture of Web-SpikeSegNet mentioned in section 3, and design of the software consists of 5 sections, namely "Home page", "Spike Detection and Counting", "Help", "Contact Us", and "Sample Data set".The "Home page" contains basic information about SpikeSeg-Net, and the flow diagram of the steps needs to be followed to recognize and count the spikes of the uploaded wheat plant image (Fig. 3).The "Sample Data set" section facilitates sample visual images of wheat plant for the experiment.Spike Detection and Counting module is the center of attention of the software.The user has to follow the following steps to detect and count the spikes: 1 Select and upload visual image of wheat plant of size 1656*1356 consisting of above ground parts only (Fig. 4) as discussed in [3]. 2 Click on "Generate Patches" button for dividing the whole image into patches (Fig. 5).Here, the visual image is divided into 100 pixel overlapping patches (each patches of size 256*256) which work as input to the LPNet module.Therefore, from one visual image of size 1656*1356, 180 patches of size 256*256 will be generated as shown in (Fig. 6). 3 Click on "Run LPNet" to run the LPNet module for extracting contextual and spatial features at patch level (Fig. 7).Output of the LPNet are the segmented images of size 256*256 corresponding to the patch images.4 The output of LPNet are merged to generate the segmented image of size 1656*1656 that contains some inaccurate segmentation of spikes and further refined at global level by clicking on "Run GMRNet" button(Fig.8). 5 For counting the wheat spikes, click on "Count"button and the corresponding spikes count will be displayed on the next window (Fig. 9).

Performance analysis of Web-SpikeSegNet
Segmentation performances of the Web-SpikeSegNet for the mentioned statistical parameters (eq. 1 to eq. 10) are computed and the average values are presented in Table 1.As the performance of spike detection is calculated at pixel level, the value of E1 (=0.00159) depict that on an average only 104 pixels are mis-classified among 65,536 pixels which is the pixel size of one image i.e., 65,536 (256 * 256).Accuracy of the approach as well as the developed software is around 99.65 %.The average precision value reflects that 99.59% of the detected spikes are actually spike pixels and the robustness of the approach is also ∼ 100%.

Conclusions
Recognition and counting of spikes for the large set of germplasms in a nondestructive way is an enormously challenging task.This study developed web-based software "Web-SpikeSegNet" using the robust SpikeSegNet approach, which is based on digital image analysis and deep-learning techniques.The software is freely available for the researchers and students are working particularly in the field of wheat plant phenotyping.Further, it is a useful tool in the automated phenomics facility to automate the phenology based treatment.Web-SpikeSegNet is a significant step toward studying the wheat crop yield phenotyping and can be extended to the other cereal crops.Select and upload visual image of wheat plant "Generate Patches" button is used for dividing the uploaded visual image into over-lapping patches Overlapping patch generation from the uploaded visual image after clicking on "Generate Patches" Output of the LPNet are the segmented images corresponding to the patch images The output of LPNet are merged and further refined at global level by clicking on "Run GMRNet" button By clicking on "Count" button corresponding spikes count will be displayed

Figure 3 :Figure 4 :Figure 5 :Figure 6 :Figure 7 :Figure 8 :Figure 9 :
Figure 3: Home page of Web-SpikeSegNet contains basic information about SpikeSegNet and the flow diagram of the steps need to be followed to recognize and counting the spikes of the uploaded wheat plant image

Figures Figure 1 Flow 2
Figures

Table 1 :
Segmentation performance analysis of Web-SpikeSegNet Figure1: Flow diagram of SpikeSegNet: Here, input is visual image of wheat plant of size 1656*1356 .The input image is divided into patches of size 256*256 before entering into the LPNet.The output of LPNet are patch-by-patch segmented mask images which are then combined to form the mask image as per the size of the input visual image.This image may contain some sort of inaccurate segmentation of the object (or, spikes) and are refined at global level using GMRNet network.The output of GMRNet network is nothing but the refined mask image containing spike regions only Figure2: Architecture of Web-SpikeSegNet: The software architecture consists of two layers, namely Client-Side Interface Layer (CSIL) and Server Side Application Layer (SSAL).CSIL deals with the end-user's requests and its corresponding responses management.SSAL consists of two modules: spike detection and spike counting module.