Adaptation and use of four-body statistical potential to examine thermodynamic properties of proteins
While most proteins in biological systems are inherently stable as a prerequisite to performing their functions, a small number of normally well-behaved proteins can engage in a process of aggregation that eventually leads to the formation of an insoluble material identified as an amyloid. Details of the aggregation process are not fully known, but for some model proteins the process can be initiated with known destabilizing conditions. While no sequence or structural similarities have been observed among the proteins, structural instability associated with a characteristic motif in the protein could be a common thread. The proposed strategy to search for such a feature employs a knowledge-based tool that examines the sequence-structure relationship in a specific target protein based on similar relationships drawn from a large representative sample of proteins. The tool incorporates a computational structural analysis known as tessellation to identify small geometric elements each containing four neighboring amino acid residues, and builds a potential score for the protein based on a statistical analysis of the appearance of these quadruplets in the reference set. Components of the protein potential score can be associated with the residues in the primary sequence leading to a potential profile or vector that characterizes the local compatibility of the protein structure with its sequence. The aim of this effort was to demonstrate a relationship between tessellation-derived potentials and thermodynamic measurements of protein stability. A major part of the study was to improve the representation of the protein environment by computationally hydrating the proteins used in the analysis. Several strategies were investigated for including the surrounding hydration water in the statistical analysis of the reference proteins. The resulting model has been used to successfully correlate the stability of several model proteins and to discriminate native proteins from large groups of decoy structures. Machine learning tools were also employed to search for information content in the potential profile vectors and to seek an association between the potential profiles of mutants of transthyretin and their amyloidogenic behavior.