Skip to Main Content
Protein design aims to identify sequences compatible with a given protein fold but incompatible to any alternative folds. To select the correct sequences and to guide the search process, a design scoring function is critically important. It is also important that a design scoring function can characterize the global fitness landscape of many proteins simultaneously. We describe how finding optimal design scoring functions can be understood from two geometric viewpoints, and propose a formulation using mixture of Gaussian kernel functions. We give results of distinguishing native sequences for a major portion of representative protein structures from a large number of alternative decoy sequences. We succeeded in deriving nonlinear scoring function that perfectly discriminate a set of 440 representative native proteins of known protein structures from 14 million sequence decoys. We show that no linear scoring function can have perfect discrimination. In an independent blind test using 194 unrelated proteins, our scoring function misclassifies only 13 native proteins. This compares favorably with 37 or 51 misclassifications when optimal linear functions reported in literature are used.