Skip to Main Content
To quantify text-based unstructured information, we propose a method called the direct scoring algorithm (DSA). DSA uses keywords in the document, subjectively-determined numerical weights, and subjectively-designed grammar rules to score individual sentences. We use our methods to score the Beige books produced by the U.S. Federal Reserve, which contain subjective text-based commentary on state of the economy. To assess whether our scores have value in a predictive sense, we use them to construct a linear regression model of future growth in U.S. gross domestic product (GDP). We then compare the performance characteristics of this model with those a similar model based on scores of the same documents produced though subjective reading by professional economists. The comparison demonstrates that the DSA model using the Beige book significantly contributes to the prediction of GDP growth, explaining as much as 69% of the variance compared to the scores created by economic experts. We also add the extracted section scores to a GDP time series prediction model, which uses only structured data as input. The results of this experiment suggest the unstructured information in the Beige books has predictive value that goes beyond that of the structure information used in the time series model, and that our approach has some potential as a means of extracting this information in a semi-automated fashion.