This distribution compiles numerous physical properties for 2,585 intrinsically disordered proteins (IDPs) obtained by coarse-grained molecular dynamics simulation. This combination comprises "Dataset A" as reported in "Featurization strategies for polymer sequence or composition design by machine learning" by Roshan A. Patel, Carlos H. Borca, and Michael A. Webb (DOI: 10.1039/D1ME00160D). The specific IDP sequences are sourced from version 9.0 of the DisProt database. The simulations were performed using the LAMMPS molecular dynamics engine. The interactions used for simulation are obtained from R. M. Regy , J. Thompson , Y. C. Kim and J. Mittal , Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins, Protein Sci., 2021, 1371 —1379.
This distribution contains experimentally measured data for the extent of retained enzyme activity post thermal stressing for three distinct enzymes: glucose oxidase, lipase, and horseradish peroxidase. The data is used to form conclusions and develop machine learning models as reported in the publication "Machine Learning on a Robotic Platform for the Design of Polymer-Protein Hybrids" by Matthew Tamasi, Roshan Patel, Carlos Borca, Shashank Kosuri, Heloise Mugnier, Rahul Upadhya, N. Sanjeeva Murthy, Michael Webb*, and Adam Gormley. Details regarding the experimental protocols are reported in the aforementioned paper but are briefly discussed in the README.