Data for Coarse-grained Intrinsically Disordered Proteins

Webb, Michael; Patel, Roshan; Borca, Carlos
Issue date: 6 May 2022
This distribution compiles numerous physical properties for 2,585 intrinsically disordered proteins (IDPs) obtained by coarse-grained molecular dynamics simulation. This combination comprises "Dataset A" as reported in "Featurization strategies for polymer sequence or composition design by machine learning" by Roshan A. Patel, Carlos H. Borca, and Michael A. Webb (DOI: 10.1039/D1ME00160D). The specific IDP sequences are sourced from version 9.0 of the DisProt database. The simulations were performed using the LAMMPS molecular dynamics engine. The interactions used for simulation are obtained from R. M. Regy , J. Thompson , Y. C. Kim and J. Mittal , Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins, Protein Sci., 2021, 1371 —1379.

# Filename Description Filesize
1 README 2.67 KB
2 dataset_a_sequences.txt 655 KB
3 dataset_a_encodings.csv 114 KB
4 dataset_a_labels.csv 89.8 KB