Automatic Calcification Morphology and Distribution Classification for Breast Mammograms With Multi-Task Graph Convolutional Neural Network

The morphology and distribution of microcalcifications are the most important descriptors for radiologists to diagnose breast cancer based on mammograms. However, it is very challenging and time-consuming for radiologists to characterize these descriptors manually, and there also lacks of effective and automatic solutions for this problem. We observed that the distribution and morphology descriptors are determined by the radiologists based on the spatial and visual relationships among calcifications. Thus, we hypothesize that this information can be effectively modelled by learning a relationship-aware representation using graph convolutional networks (GCNs). In this study, we propose a multi-task deep GCN method for automatic characterization of both the morphology and distribution of microcalcifications in mammograms. Our proposed method transforms morphology and distribution characterization into node and graph classification problem and learns the representations concurrently. We trained and validated the proposed method in an in-house dataset and public DDSM dataset with 195 and 583 cases,respectively. The proposed method reaches good and stable results with distribution AUC at 0.812 <inline-formula><tex-math notation="LaTeX">$\pm$</tex-math></inline-formula> 0.043 and 0.873 <inline-formula><tex-math notation="LaTeX">$\pm$</tex-math></inline-formula> 0.019, morphology AUC at 0.663 <inline-formula><tex-math notation="LaTeX">$\pm$</tex-math></inline-formula> 0.016 and 0.700 <inline-formula><tex-math notation="LaTeX">$\pm$</tex-math></inline-formula> 0.044 for both in-house and public datasets. In both datasets, our proposed method demonstrates statistically significant improvements compared to the baseline models. The performance improvements brought by our proposed multi-task mechanism can be attributed to the association between the distribution and morphology of calcifications in mammograms, which is interpretable using graphical visualizations and consistent with the definitions of descriptors in the standard BI-RADS guideline. In short, we explore, for the first time, the application of GCNs in microcalcification characterization that suggests the potential of using graph learning for more robust understanding of medical images.


I. INTRODUCTION
A CCORDING to Global Cancer Statistics 2020, breast cancer has overtaken lung cancer as the most common cancer around world [1]. Even so, the good news is that the 5-year survival rate for breast cancer can be as high as 90% if it is detected early before it progresses to metastatic cancer [2]. Mammography is currently the most effective tool for early detection of breast cancer, and it is widely adopted in breast cancer screening [3]. Mammography images commonly have high resolution, which enables the detection of microcalcifications (MCs) at an early stage. MC clusters are important early signs of breast cancer, accounting for approximately 50% of the diagnosed cases [4], [5]. An MC cluster contains at least 3 individual MCs where each MC is a small amount of calcium deposits in breast tissue and appears as small bright spots in mammograms [6].
Different types of MCs are associated with different probabilities of malignancy [7]. Formally, the American College of Radiology Breast Imaging Reporting and Data System (ACR BI-RADS) classifies calcifications into either the 'typically benign' or 'suspicious' category based on the morphology and distribution of calcifications [8]. Morphology describes the form of calcifications based on shape, size, brightness, roughness etc. Distribution describes how calcifications spread throughout the breast tissue. The morphology and distribution of calcifications, illustrated in Fig. 1, are the most important characteristics considered by radiologists to provide appropriate follow-up recommendations. Fig. 1. Examples of morphology and distribution types. Types of suspicious morphology include coarse heterogeneous, fine pleomorphic, amorphous and fine linear (fine-linear branching). The types of distribution includes diffuse, regional, cluster(grouped), linear and segmental. All morphology and distribution descriptors are listed in the order of increasing risk of malignancy from left to right.
Recently, numerous deep learning based computer-aided diagnosis (CADx) methods have been developed in medical imaging, especially mammography [9], [10], [11], [12], [13], [14], [15], [16]. Lotter et al. developed a convolutional neural network based CADx system to perform malignancy classification of mammograms and digital breast tomosynthesis [16]. The system outperformed five breast-imaging specialists in datasets from U.K., USA and China [16]. Liu et al. introduced anatomy-aware graph convolutional network into mammogram mass detection task [13]. The proposed model showed statistically significant improvements compared to the state-of-the-art performance. More specifically, for calcifications, many CADx methods have been proposed by researchers to classify calcification clusters into benign or malignant [17], [18], [19], [20], [21], [22], [23], [24], [25]. Alam et al. [17], [19] selected calcification density, distances from cluster centroids, cluster areas and calcification sizes to discriminate between benign and malignant calcification clusters. Singh et al. [18] utilized shape and texture features to determine malignancy. Although the effectiveness of these features have been proven, existing CADx methods are unable to characterize the MCs into the descriptors of morphology and distribution, as recommended by ACR BI-RADS [8]. Automatic characterization of calcifications is important to reproduce the chain of reasoning for mammogram interpretation, leading to more accurate and robust understanding of mammograms.
To address this challenge, we formulate the characterization of calcifications in mammograms as a multi-task classification problem and propose a graph neural network framework. Firstly, we transform the calcifications in mammography images to graphical data to represent the spatial and visual information. That is, each calcification is represented as a node, and nodes are connected according to their geometric relationships with their nearby calcifications. Following the transformation, we formulate the morphology classification into a 'node classification task' and the distribution classification into a 'graph classification' task. We propose a multi-task model with graph convolutional neural networks (GCNs) to solve both tasks. GCN is a deep learning based method that extends convolutional operations to graphical data. GCN is designed to aggregate each vertex's feature with the features from the neighboring vertices to learn relationship-aware representations for graph or node classification tasks. By employing GCNs [26], [27], [28], we incorporate both local patch features and topological structures. We developed a multi-task learning framework to automatically abstract data representations that are applicable to both the morphology and distribution classification tasks. This ensures the generalizability of the proposed model. Our main contributions are as follows: 1) We transform information of calcification in mammography images into graphical representations. 2) We propose a deep GCN based framework to model the node and graph embeddings for both morphology and distribution tasks. 3) We develop a multi-task GCN-based solution to characterize both the morphology and distribution descriptors simultaneously. We demonstrated with extensive experiments that the proposed multi-task training strategy leads to better and more robust performance compared to models trained on a single task and other baseline models.

A. Problem Definitions
The structure of proposed model is divided into graph construction and multi-task GCN. In the first step, we transform the calcifications in mammography images into graphical data by using a convolutional neural network (CNN) based feature extractor and graph transformation functions. Following graph construction, the proposed GCN jointly learns representations for node and graph classification with the multi-task training strategy. The end-to-end framework is illustrated in Fig. 2.
Let x I be a mammography image, x c be the set of calcifications in the image. A set of N mammography images We transform image set X to graphical set G with G i ∈ G and There are two tasks to investigate: (1) node (morphology) classification, where each vertex v has a label y v and we aim to learn function f and representation r v such that the vertex label could be predicted as y v = f (r v ); (2) graph (distribution) classification, where the graph has a label y g and we aim to learn function g and representation vector r g to predict the label of the graph as y g = g(r g ).
The focus of this study is developing a multi-task GCN to effectively learn the node and graph embeddings for morphology and distribution classification of calcifications. These calcifications' locations, x C , are annotated by radiologists. If deployed as a real-world CADx application, the proposed model should be equipped with a detection module which automatically detects calcifications. The detection module is not included in this study so as not to dilute the focus of the study. Such detection modules can be developed based on several existing studies which achieved accuracy and ROC-AUC over 95% [29], [30], [31]. The impacts of integrating a calcification detection module can be further studied in future studies.

B. Graph Construction
Graph construction is demonstrated in part (a) of Fig. 2. For each mammography image with calcifications (x I i , x c i ), we define a set of patches as P = {p 1 , p 2 , . . . , p n }, where p represents an image patch that locates at the center of a calcification with dimension M × M . We extract high level features from patches P with a convolutional neural network (CNN) as a feature extractor. We concatenate extracted features with the normalized coordinates of the patches to form the node feature h v . The edge features h e are defined as relative Cartesian coordinates of linked nodes. Following node and edge feature extraction, we construct two types of graphs based on the spatial connectivity relationship between calcifications: 1) K-nearest neighbor (KNN) graph G knn : Creates edges if the nodes are within the k nearest neighbors. KNN graphs have been widely adopted in point cloud classification and segmentation [32], [33], [34], image classification [35], etc. However, it may cause information loss from disconnected neighbors in dense calcification clusters or introduce noise when the node is an outlier from the calcification cluster. 2) Radius graph G radius : Creates edges based on node positions to all other nodes within a given distance. The radius graph solves the limitations introduced by the KNN graph described above. However, it is affected by a constant distance threshold which may cause information loss for vertices beyond the threshold. The process of graph construction is shown in Algorithm 1.

C. Deep Graph Convolutional Network
The constructed multi-graph inputs are then fed into deep GCN, as illustrated in Fig. 2(b). The weights of the proposed GCN are shared across multi-graph inputs. The design of weight sharing targets to learn the common features that can describe the characteristics of both graphs. Following [28] and [36], we used GCN blocks with Normalization → ReLU → GraphConv → Addition and GENeralized Aggregation Networks (GENconv) as GraphConv backbone. In GENconv, the message construction function p (l) is defined to apply on vertex feature h where the ReLU(·) represents the rectified linear unit activation function [37], 1(·) is an indicator function which equals to 1 when edge features exist otherwise 0, and is a small positive constant. SoftMaxAgg β is then used as the message aggregation function and defined as: where N (v) is the set of neighbors of vertex v and β is a hyper-parameter which controls the aggregation function. Message normalization MsgNorm is then introduced to address the over-smoothing and gradient vanishing problem in training deep GCNs. MsgNorm normalizes the features of the aggregated message m where s is a learnable scaling factor. The aggregated message m (l) v is first normalized by its 2 norm and then scaled by the 2 v by a factor of s. The scaling factor s is set to be a learnable scalar with an initialized value of 1.

D. Multi-Task Learning
In this study, the proposed multi-task GCN is trained to jointly perform morphology and distribution classification. In general, the model was trained by a multi-task loss L MT = w m L m + w d L d where w m L m and w d L d are weighted cross-entropy loss for morphology and distribution classification, respectively. In ACR BI-RADS guideline, morphology and distribution of calcifications are equally important. Therefore, we introduced GradNorm [38] to learn both tasks at an equal pace. To explain GradNorm in the proposed method, we define the necessary quantities as below: r W : The subset of the full network weights W ⊂ W. The weights of the last shared layer is generally chosen as W .
The average value of gradient norms over all tasks for training step t.
The loss ratio as the inverse training rate of task i at step t; The relative inverse training rate of task i at step t. In order to balance the gradient magnitudes G (i) W for each task, the mean gradient norm across all tasks G W is set as the common scale target. The relative inverse training rate of task i, r i (t), is used to balance the learning pace of all tasks. The target gradient norm for task i is: where α controls the strength of the restoring force which pulls tasks back to a common training rate. A higher value of α indicates a higher strength to enforce training rates to be balanced. Equation (4) provides the target gradient norms for task i. At each training step t, we update the loss weights w i (t) to bring gradient norms close to the target for task i. L 1 loss between the actual gradient norms and the target at each time step for each task is introduced as L grad and we summed L grad across both morphology and distribution classification tasks.

III. EXPERIMENTAL RESULTS AND DISCUSSION
A. Datasets 1) TMU Dataset: We collected a full field digital mammogram dataset for this study from the Wan Fang Hospital, Taipei Medical University (TMU), from June 2010 and October 2018. The dataset contains 387 mammography images from 200 patients who were classified as ACR BI-RADS category 4 and 5 with documented calcifications from the original radiological reports. All cases were confirmed breast cancers from biopsy tests.
Descriptors of morphology and distribution were annotated by a senior radiological technologist and carefully reviewed by a panel with two senior radiologists in a joint meeting. Our clinical annotators are breast imaging experts to ensure the reliability of the ground truths in the annotation process. The radiological technologist is a senior radiographical technologist with 15 years of experience in mammogram reading. The review panel consists of the professor in the Department of Radiology, Taipei Medical University, and chief of breast imaging in Wang Fang Hospital with 32 and 20 years of experience, respectively. To assess the impact of inter-observer variability on the ground truths, we evaluated the agreement between the radiological technologist and one of the senior radiologists in the review panel using the Cohen's kappa [39]. The inter-observer kappa values are 0.978 and 0.992 on annotating distribution and morphology descriptors, respectively. The kappa results indicate a high degree of agreement between annotators, thus inter-observer variability has little impact on obtained ground truths (kappa 0.81-1.00: almost perfect agreement [40]).
The study was jointly approved by NUS Institutional Review Board (NUS-IRB) (Approval No. 2019/00159) and Joint Institutional Review Board of Taipei Medical University (TMU-JIRB) (Approval No. N202006039). We excluded 5 cases with no biopsy confirmation, malignant phyllodes tumor or low image qualities. The basic characteristics of the final cohort is shown in online Appendix Table 1 [41].
2) CBIS-DDSM Dataset: We validated our proposed method on the CBIS-DDSM (Curated Breast Imaging Subset of DDSM) dataset. CBIS-DDSM [42] is an updated and standardized version of the Digital Database for Screening Mammography (DDSM) dataset [43]. The DDSM dataset is a publicly available database of 2,620 scanned film mammography studies. The cases were annotated with region of interests (ROIs) for calcifications and masses, and BI-RADS descriptors for calcification morphology, calcification distribution, mass shape, mass margin and breast density. Following the same inclusion criteria as the TMU dataset, we included cases which were classified as ACR BI-RADS category 4 and 5 with annotated calcifications. We excluded cases which contained calcifications with more than one morphology type, because CBIS-DDSM does not provide separate ROI annotations for multiple morphology descriptors. The number of such cases is relatively small (<10%). As a result, we extracted 583 mammography images from CBIS-DDSM for this study.

1) Implementation Details:
The experiments were implemented with PyTorch framework and Pytorch Geometric package [44], [45]. The dimension of calcification patches was set at 14 × 14 (1.32 mm × 1.32 mm), as the size of calcifications are generally less than 14 pixels in mammograms [46]. Hyper-parameters were selected through grid search over potential parameters. Empirically, the hidden size in proposed network was set at 128, α was set at 1.5, k was set for KNN at 4 and distance threshold for radius graph was set at 112. Initial learning rate was set at 10 −3 and decayed by 1 10 every 10 epochs. The models were trained by Adam optimizer on an Ubuntu server with 4 NVIDIA V100 GPU cards for 100 epochs. The models were trained and validated independently in TMU and DDSM dataset in 5-fold cross validation manner. The splitting of the folds was performed on the patient level such that there was no overlapping of mammograms from same patients between training and testing folds. No statistically significant differences were found between training and testing folds in all demographic variables (details in online Appendix Table 2-6) [41]. We also conducted ablation studies to evaluate the model's performance after removing each proposed module in order to understand the proposed module's contribution to the overall model. To ensure reproducibility, our implementations of the experiments are publicly available on GitHub. 1 2) Performance Comparison: To the best of our knowledge, there is no state-of-the-art models to characterize morphology and distribution of calcifications in mammography images. In order to establish baselines for comparison, we employed multiple popular CNN and GCN models that have been widely and successfully applied in medical imaging as baseline models. For CNN baseline models, we regarded both the distribution and morphology classification tasks as a multi-classification problem. Distribution baseline models take mammography images X I as input to predict the types of distribution. For morphology baseline models, the set of patches P defined in Section II-B is used as inputs. Each patch p is located at the center of a calcification and considered as an independent input to baseline models. Similar to vertices in constructed calcification graphs, each patch is associated with a morphology label. The baseline models classify the patch set into morphology categories. For GCN baseline models, we evaluate the models' performance for graph and node classification tasks separately. The employed baseline models include: 1) ResNet [47]: ResNet has been one of the most successful and popular network architectures in computer vision field since proposed in 2015. Residual blocks with skip connections were proposed to solve the problem of gradient vanishing in training deep neural networks. ResNet and its variants have been successfully adopted in many applications such as medical image classification, segmentation, synthesis etc [48], [49], [50], [51].  [56], [57]. We used MobileNetV2 [58] in this study. 4) EfficientNet [59]: EfficientNets is proposed using network architecture search, which performs compound scaling in depth, width, and resolution. EfficientNets achieved the state-of-the-art performance in various benchmark datasets with significantly reduced parameters compared to other models. We adopted EfficientNet-B0 in experiments of this study. 5) GCN (vanilla) [27]: GCN is proposed to generalize convolution operations to non-Euclidean graphical data. GCN has been successfully applied in medical tasks such as COVID-19 classification [60], drug discovery [61] and brain fMRI analysis [62]. 6) Graph attention network (GAT) [63]: GAT is one of the most successful variant of the vanilla GCN. GAT introduced masked self-attention into graph convolution operations to apply weights to information propagation from neighbouring vertices. GAT has demonstrated its potential in medical tasks such as Alzheimer's disease analysis [64], identification of bipolar disorder [65] and medical image enhancements [66]. We addressed several image quality issues to ensure the fair comparison with baseline models. The CBIS-DDSM dataset was collected from scanned analog films. As a result, the image quality is much poorer compared to digital mammograms. We applied the preprocessing techniques including CLAHE enhancement and lesion segmentation [67]. For the TMU dataset, a small amount of collected mammography images were overexposed (<10%), showing bright and white areas in breasts. This overexposure problem actually does not affect the performance of the proposed model because the overexposed areas do not overlap with calcifications and the proposed model takes calcification patches as inputs, however, the overexposure may affect the performance of baseline models because the baseline models take the entire mammography images as inputs. To help the baseline model overcome this issue, we preprocessed the images by removing the overexposed regions from the affected mammograms.
In our experiments, both distribution and morphology classification tasks are formulated as multi-class classification tasks. Following the standard medical guideline BI-RADS fifth edition [8], the number of classes are 5 and 4 for distribution and morphology descriptors, respectively. The examples of the descriptors are shown in Introduction. We used the multi-class AUC as primary evaluation metrics [68]. AUC was evaluated at the node and graph level for morphology and distribution classification, respectively. In addition, we evaluated precision, recall, F1-score and accuracy for comparative purposes. All performance metrics were evaluated with weighted average method across multiple classes [69].
95% confidence intervals (95% CI) and statistical tests were used for performance comparison. Confidence intervals were computed with 1000 bootstraps [70]. Randomized permutation tests were used to test for statistically significant differences [71]. To overcome multiple comparison, the significance level was adjusted to 0.008 using Bonferroni correction [72].

3) Results:
As Tables I and II shows, our proposed model demonstrated leading performance across both tasks in two datasets. For the classification task on distribution, compared with the baseline models, ResNet, DenseNet, MobileNet, EfficientNet, vanilla GCN and GAT, our proposed model demonstrated a mean ROC-AUC improvement of 0.152, 0.238, 0.124, 0.138, 0.189 and 0.187 in the TMU dataset, respectively. In addition, our proposed model achieved mean improvements of 0.077, 0.067, 0.113 and 6.658 on precision, recall, F1-score and accuracy respectively, compared to best results in baseline models. For the classification task on morphology, the improvements of ROC-AUC, precision, recall, F1-score and accuracy were 0.069, 0.091, 0.018, 0.058 and 1.821 in the TMU dataset, respectively, compared to best results in baseline Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  II  THE ROC-AUC COMPARISON ON CBIS-DDSM DATASET BETWEEN  BASELINE MODELS AND PROPOSED MODEL ON DISTRIBUTION AND  MORPHOLOGY CLASSIFICATION   models [Table I]. Similarly, the proposed model outperformed all baseline models in the mean ROC-AUC in the CBIS-DDSM dataset, with maximum improvements of 0.268 and 0.148 in distribution and morphology classification tasks, respectively [ Table II]. ROC-AUC results of each type of distribution and morphology descriptors are shown in Appendix Table 7 and 8. [41] The improvements on distribution classification task can be attributed to the design of GCN which captures the geometrical relationships between calcifications, thereby improving the ability to distinguish distribution types. For morphology, the improvements can be attributed to the message propagation from neighboring vertices with the same morphology type. Calcifications with the same morphology tend to locate in a nearby region or cluster. Therefore, the feature propagation from neighbors enhances the proposed model to distinguish morphology.

C. Ablation Study 1) Ablation Experiments for Multi-Task Network:
We separately trained task-specific models by removing the distribution or morphology branch respectively [Tables III and IV]. Although statistical significance was not reached due to the limited sample size, the multi-task model outperformed the task-specific models in both tasks. For the distribution classification task, the multi-task architecture demonstrated 0.009 and 0.022 higher ROC-AUC than the task-specific architecture on the TMU and CBIS-DDSM datasets, respectively. For the morphology classification task, the proposed multi-task model achieved mean ROC-AUC improvements of 0.020 and 0.062 compared to the task-specific architectures in the two datasets. The improvements can be attributed to the fact that distribution and morphology are associated and jointly affect the radiologists' decision-making on malignancy diagnosis. For example, in ductal carcinoma in situ and invasive ductal carcinoma, fine linear or linear branching calcifications often have a segmental ductal distribution [73]. Fine pleomorphic and linear branching calcifications in a segmental distribution are highly suspicious for malignancy [73]. The design of the multi-task network learns the shared representation from the morphology and distributed labels, thus achieving improvements on both tasks.
2) Ablation Experiments for Depth of Deep GCNs: To investigate the effectiveness of depths of Deep GCN, we compared with different number of graph convolutional layers in the proposed network. The experiment results showed that relative larger number of GCN layers improves the performance, though no statistical significance was found due to the limited study sample size. In the TMU dataset, when the number of GCN layers increase from 2 to 8 layers, the mean ROC-AUC of distribution and morphology classification tasks increased by 0.009 and 0.016, respectively. When the number of GCN layers was further increased to 16 layers, the performance of the two tasks dropped by 0.011 and 0.028, respectively. A similar trend was also observed in the experiments on the CBIS-DDSM dataset.
In GCNs, single layer of GCN considers nearest neighbor while networks with multiple GCN layers perform message propagation and fusion from multi-hop neighbors. As mentioned, calcifications with same morphology locate in a nearby region or cluster and distribution considers how calcifications spread over the breast. To a certain extent, when the depth of GCN increases, message propagation from more hops of neighbors enhance the network's ability in classifying nodes and graphs. However, when the network depth increases further, the message propagation from further nodes may be harmful for morphology classification because the further nodes may not have the same type of morphology. Deeper GCN in this study may also suffer from over-smoothing and gradient vanishing problems, which could be further investigated in future studies.

3) Ablation Experiments for Multi-Graph Fusion:
To investigate the effectiveness of multi-graph fusion, we compared with multi-task model with single radius or KNN graph as input to GCN. The experiment results showed that the multi-graph fusion improves the robustness of the model. The improvements were statistically significant compared to the GCN model with the KNN graph, while the improvements are not statistically significant compared to the GCN model with the radius graph. Comparing with single graph GCN models, the proposed multigraph model achieved maximum ROC-AUC improvements of 0.096 and 0.078 for the distribution classification task, 0.069 and 0.109 for the morphology classification task in two datasets. As mentioned in Section II-B, individual graph has limitations in either morphology or distribution classification task. The design of multi-graph fusion enhances the model's ability to  III  ABLATION STUDY: THE PERFORMANCE COMPARISON ON TMU DATASET BETWEEN THE ABLATION MODELS AND THE PROPOSED MODEL ON DISTRIBUTION  AND MORPHOLOGY CLASSIFICATION learn representations from two graphs, thereby improving on both classification tasks.

D. Discussion
In this study, we proposed a multi-task GCN model to jointly classify morphology and distribution descriptors of calcifications in mammography images. The proposed model demonstrated improved performance compared to multiple representative baseline models. The improvements were statistically significant across two datasets, suggesting the model has the potential to generalize well across different demographics and image qualities. Compared to the recent application of GCN on mammograms by Liu et al. [13], our study is focused on the classification of morphology and distribution descriptors of calcifications, rather than mass detection in mammograms. In addition, the proposed model was designed to model the morphology and distribution descriptors simultaneously, which is, to the best of our knowledge, the first application of multi-task mechanism on the characterization of calcifications in mammograms.  As discussed in Experiments (Section III-B) and Ablation Study (Section III-C), the findings in experiment results can be explained with clinical guidelines that characterize distribution and morphology descriptors of calcifications. We further introduced GNNexplainer [74], to enhance the interpretability of the proposed model and to support our findings. GNNexplainer generates explanations by identifying the subgraphs of the computational graphs and node feature subsets that have the greatest impacts on the GNN's predictions. Two case studies are shown in Fig. 3 with original mammography image, radiological annotations and explanation graphs generated from GNNexplainer. In case study (A), the calcifications are distributed in cluster distribution and the cluster marked in the green outline is identified as coarse heterogeneous morphology. GNNexplainer highlights the edges with crucial roles in node and graph prediction. For graph prediction, more edges between calcifications in the cluster are displayed, indicating that these nodes and edges are more influential in classifying the graph as cluster distribution. For node classification, GNNexplainer generated a crucial subgraph for node 6 which contains nodes from the calcification cluster and highlights the edges between calcifications in this cluster. In case study (B), crucial edges are connected across the calcifications nodes and form the segmental distribution ( Figure B1). In addition, there are two kinds of morphology in case study (B): fine pleomorphic and fine linear. As shown in Fig. 3 (B2), the classification of node 24 is based on feature propagation from neighboring nodes 21, 23, 25 and 26, which are all with the same morphology. The classification of node 12 with fine linear morphology is explained in Fig. 3 (B3). The information propagation from neighboring nodes with fine linear morphology plays a crucial role in the node classification. The results from GNNexplainer supported our interpretation of the experiment results. With the development of interpretation tools on graph networks, we believe that more explanation and insights could be achieved in future research.
Moreover, we only included malignant cases with ACR BI-RADS 4 and 5 in this study. This inclusion criteria is based on the consideration that the classification of distribution and morphology descriptors are more important in malignant cases for patient care. Therefore, the effectiveness of the proposed method on benign cases has not been assessed in this study. The extraction and annotation of benign cases will continue to enrich the calcification dataset for future studies.

IV. CONCLUSION
We proposed a multi-task GCN model to tackle the challenging problem of characterization of calcifications morphology and distribution in mammography images, which is a essential task for any effective computerized assisted detection tools for mammography. Through experiments, we demonstrated that our proposed model outperformed the baseline and also the single-task models.