Data for Coarse-grained Intrinsically Disordered Proteins

Webb, Michael; Patel, Roshan; Borca, Carlos
Issue date: 6 May 2022
Cite as:
Webb, Michael, Patel, Roshan, & Borca, Carlos. (2022). Data for Coarse-grained Intrinsically Disordered Proteins [Data set]. Princeton University.
  author      = {Webb, Michael and
                Patel, Roshan and
                Borca, Carlos},
  title       = {{Data for Coarse-grained Intrinsically Di
                sordered Proteins}},
  publisher   = {{Princeton University}},
  year        = 2022,
  url         = {}

This distribution compiles numerous physical properties for 2,585 intrinsically disordered proteins (IDPs) obtained by coarse-grained molecular dynamics simulation. This combination comprises "Dataset A" as reported in "Featurization strategies for polymer sequence or composition design by machine learning" by Roshan A. Patel, Carlos H. Borca, and Michael A. Webb (DOI: 10.1039/D1ME00160D). The specific IDP sequences are sourced from version 9.0 of the DisProt database. The simulations were performed using the LAMMPS molecular dynamics engine. The interactions used for simulation are obtained from R. M. Regy , J. Thompson , Y. C. Kim and J. Mittal , Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins, Protein Sci., 2021, 1371 —1379.

Show More
# Filename Description Filesize
1 README 2.67 KB
2 dataset_a_sequences.txt 655 KB
3 dataset_a_encodings.csv 114 KB
4 dataset_a_labels.csv 89.8 KB