Abstract:
The earth sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth syste...Show MoreMetadata
Abstract:
The earth sciences research community has an unprecedented opportunity to exploit the vast amount of data available from earth observation (EO) satellites and earth system models (ESM). The ascent and application of artificial intelligence foundation models (FM) can be attributed to the availability of large volumes of curated data, access to extensive computing resources and the maturity of deep learning techniques. Vision transformers (ViT) architectures have been adapted for image and image-like data, such as EO data and ESM simulation output. Pretraining foundation models is a compute intensive process, often requiring 105 - 107 GPU hours for large scale scientific applications. There is a limited body of knowledge on compute optimal methods for pretraining, necessitating a trial and error process. We have performed a series of experiments using ViT backbones at different scales to understand optimal and cost-effective ways to improve scientific throughput. This preliminary benchmark provides an assessment of which architectures and model configurations are favorable in a given scientific context.
Date of Conference: 07-12 July 2024
Date Added to IEEE Xplore: 05 September 2024
ISBN Information: