Thermodynamic and Dynamics Data for Coarse-grained Intrinsically Disordered Proteins Generated by Active Learning

Webb, Michael; Jacobs, William; An, Yaxin; Oliver, Wesley
Issue date: 6 June 2023
Cite as:
Webb, Michael, Jacobs, William, An, Yaxin, & Oliver, Wesley. (2023). Thermodynamic and Dynamics Data for Coarse-grained Intrinsically Disordered Proteins Generated by Active Learning [Data set]. Princeton University. https://doi.org/10.34770/6tnm-7b56
@electronic{webb_michael_2023,
  author      = {Webb, Michael and
                Jacobs, William and
                An, Yaxin and
                Oliver, Wesley},
  title       = {{Thermodynamic and Dynamics Data for Coar
                se-grained Intrinsically Disordered Prot
                eins Generated by Active Learning}},
  publisher   = {{Princeton University}},
  year        = 2023,
  url         = {https://doi.org/10.34770/6tnm-7b56}
}
Abstract:

This distribution compiles thermodynamic and (where available) dynamic properties of short protein sequences as obtained from coarse-grained molecular dynamics simulations. The dataset features 2114 protein sequences with sequence lengths ranging from N=20 up to N=50 amino acids. The simulation and analysis of these sequences is described in "Active learning of the thermodynamics--dynamics tradeoff in protein condensates'' by Yaxin An, Michael A. Webb*, and William M. Jacobs* (https://doi.org/10.48550/arXiv.2306.03696). Of the 2114 protein sequences, 80 are homomeric polypeptides (replicating a single amino acid for N = 20, 30, 40, and 50), 1266 are sourced from version 9.0 of the DisProt database, and the remaining 768 sequences are novel sequences generated during an active learning campaign described in the aforementioned manuscript. The simulations were performed using the LAMMPS molecular dynamics engine. The interactions used for simulation are obtained from R. M. Regy , J. Thompson , Y. C. Kim and J. Mittal , Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins, Protein Sci., 2021, 1371 —1379. Properties included in this distribution include second virial coefficients, pressure-density data, expectation for phase behavior at 300 K, estimated condensed-phase densities at 300 K (if exist), and condensed-phase self-diffusion coefficients at 300 K (if exist).

Show More
# Filename Description Filesize
1 README.txt 9.16 KB
2 EOS_heteromeric.csv 315 KB
3 EOS_homomeric.csv 16 KB
4 features_heteromeric.csv 168 KB
5 features_homomeric.csv 6.98 KB
6 labels_heteromeric.csv 108 KB
7 labels_homomeric.csv 4.86 KB
8 seq_heteromeric.txt 65.8 KB
9 seq_homomeric.txt 2.88 KB