Toward Reliable Biodiversity Information Extraction From Large Language Models | IEEE Conference Publication | IEEE Xplore

Toward Reliable Biodiversity Information Extraction From Large Language Models


Abstract:

In this paper, we develop a method for extracting information from Large Language Models (LLMs) with associated confidence estimates. We propose that effective confidence...Show More

Abstract:

In this paper, we develop a method for extracting information from Large Language Models (LLMs) with associated confidence estimates. We propose that effective confidence models may be designed using a large number of uncertainty measures (i.e., variables that are only weakly predictive of - but positively correlated with - information correctness) as inputs. We trained a confidence model that uses 20 handcrafted uncertainty measures to predict GPT-4’s ability to reproduce species occurrence data from iDigBio and found that, if we only consider occurrence claims that are placed in the top 30% of confidence estimates, we can increase prediction accuracy from 57% to 88% for species absence predictions and from 77% to 86% for species presence predictions. Using the same confidence model, we used GPT-4 to extract new data that extrapolates beyond the occurrence records in iDigBio and used the results to visualize geographic distributions for four individual species. More generally, this represents a novel use case for LLMs in generating credible pseudo data for applications in which high-quality curated data are unavailable or inaccessible.
Date of Conference: 16-20 September 2024
Date Added to IEEE Xplore: 20 September 2024
ISBN Information:

ISSN Information:

Conference Location: Osaka, Japan

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.