Benchmarking Self-Supervised Learning on Diverse Pathology Datasets | IEEE Conference Publication | IEEE Xplore

Benchmarking Self-Supervised Learning on Diverse Pathology Datasets


Abstract:

Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learn...Show More

Abstract:

Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learning (SSL) has shown to be an effective method for utilizing unlabeled data, and its application to pathology could greatly benefit its downstream tasks. Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. Our study is conducted using 4 representative SSL methods on diverse downstream tasks. We establish that large-scale domain-aligned pre-training in pathology consistently out-performs ImageNet pre-training in standard SSL settings such as linear and fine-tuning evaluations, as well as in low-label regimes. Moreover, we propose a set of domain-specific techniques that we experimentally show leads to a performance boost. Lastly, for the first time, we apply SSL to the challenging task of nuclei instance segmentation and show large and consistent performance improvements. We release the pre-trained model weights11https://lunit-io.github.io/research/publications/pathology_ssl.
Date of Conference: 17-24 June 2023
Date Added to IEEE Xplore: 22 August 2023
ISBN Information:

ISSN Information:

Conference Location: Vancouver, BC, Canada

1. Introduction

The computational analysis of microscopic images of human tissue – also known as computational pathology – has emerged as an important topic of research, as its clinical implementations can result in the saving of human lives by improving cancer diagnosis [49] and treatment [42]. Deep Learning and Computer Vision methods in pathology allow for objectivity [15], large-scale analysis [20], and triaging [5] but often require large amounts of annotated data [52]. However, the annotation of pathology images requires specialists with many years of clinical residency [37], resulting in scarce labeled public datasets and the need for methods to train effectively on them.

Self-supervised pre-training on pathology data improves performance on pathology downstream tasks compared to ImageNet-supervised baselines. The y-axes show absolute differences in downstream task performance (Top-1 Acc. or mPQ Score). Linear evaluation (left) is performed on 4 classification tasks (BACH, CRC, PatchCamelyon, and MHIST) and 1 nuclei instance segmentation task (CoNSeP). Label-efficiency (right) is assessed by fine-tuning using small fractions of labeled data from the CoNSeP dataset.

Contact IEEE to Subscribe

References

References is not available for this document.