Link, A. James; Carson, Drew V.; So, Larry; Cheung-Lee, Wai Ling
Abstract:
This entry encompasses the raw NMR spectra used to determine the structure of the lasso peptide achromonodin-1. Within one file are included the five following spectra: COSY, TOCSY, NOESY (150 ms mixing time), NOESY (700 ms mixing time), and C,H HSQC. The file requires Mestrenova software to read. These spectra were used to develop the 3D structure models of achromonodin-1 that are deposited at the protein data bank (PDB) as entry 8SVB.
Bhattacharjee, Tapomoy; Amchin, Daniel; Alert, Ricard; Ott, Jenna; Datta, Sujit
Abstract:
Collective migration -- the directed, coordinated motion of many self-propelled agents -- is a fascinating emergent behavior exhibited by active matter that has key functional implications for biological systems. Extensive studies have elucidated the different ways in which this phenomenon may arise. Nevertheless, how collective migration can persist when a population is confronted with perturbations, which inevitably arise in complex settings, is poorly understood. Here, by combining experiments and simulations, we describe a mechanism by which collectively migrating populations smooth out large-scale perturbations in their overall morphology, enabling their constituents to continue to migrate together. We focus on the canonical example of chemotactic migration of Escherichia coli, in which fronts of cells move via directed motion, or chemotaxis, in response to a self-generated nutrient gradient. We identify two distinct modes in which chemotaxis influences the morphology of the population: cells in different locations along a front migrate at different velocities due to spatial variations in (i) the local nutrient gradient and in (ii) the ability of cells to sense and respond to the local nutrient gradient. While the first mode is destabilizing, the second mode is stabilizing and dominates, ultimately driving smoothing of the overall population and enabling continued collective migration. This process is autonomous, arising without any external intervention; instead, it is a population-scale consequence of the manner in which individual cells transduce external signals. Our findings thus provide insights to predict, and potentially control, the collective migration and morphology of cell populations and diverse other forms of active matter.
This dataset contains example input files, training data sets and potential files related to the publication "First-principles-based Machine Learning Models for Phase Behavior and Transport Properties of CO2." by Mathur et al (2023). In this work, we developed machine learning models for CO2 based on different exchange-correlation DFT functionals. We assessed their performance on liquid densities, vapor-liquid equilibrium and transport properties.
Numerical data is tabulated for all plots (Figures 2, 3a-b, 4-89, S1, S4a-b,d, S5a-b,d, S6-S156) and included as separate spreadsheets categorized by figure in a .zip file in the Supplementary Material. Error bars in Figure 4 show the spread of data observed for 4 and 5 trials on independent samples for MIL-101 and MOF-235, respectively. Figure 6a shows the average of triplicate filtrate test conversions with error propagated based on this spread. Figures 6b and S165 error bars on rate constants are determined based on propagated conversion uncertainty for independent trials and extracted standard deviations of pseudo-first order rate constants from linearized plots. Error bars on other plots represent propagation of experimental uncertainty on single trials.
This distribution compiles numerous physical properties for 2,585 intrinsically disordered proteins (IDPs) obtained by coarse-grained molecular dynamics simulation. This combination comprises "Dataset A" as reported in "Featurization strategies for polymer sequence or composition design by machine learning" by Roshan A. Patel, Carlos H. Borca, and Michael A. Webb (DOI: 10.1039/D1ME00160D). The specific IDP sequences are sourced from version 9.0 of the DisProt database. The simulations were performed using the LAMMPS molecular dynamics engine. The interactions used for simulation are obtained from R. M. Regy , J. Thompson , Y. C. Kim and J. Mittal , Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins, Protein Sci., 2021, 1371 —1379.
This item provides access to all configurations of single-chain nanoparticles analyzed in the manuscript "Sequence Patterning, Morphology, and Dispersity in Single-Chain Nanoparticles: Insights from Simulation and Machine Learning" by Roshan A. Patel, Sophia Colmenares, and Michael A. Webb (DOI: 10.1021/acspolymersau.3c00007). The single-chain nanoparticles derive from 320 unique precursor chains that are distinguished by the fraction of linker beads that decorate a fixed-length polymer backbone and the distribution or blockiness of those linker beads. The data is provided in the form of serialized object using the `pickle' python module. The data was compiled using Python version 3.8.8 and Clang 10.0.0. The Python object loaded from the .pkl file is a nested list, with the first dimension having 7,680 entries for the 7,680 unique single-chain nanoparticles produced in the aforementioned paper. Each of those 7,680 entries is itself a list with 20 entries, representing the 20 different simulation snapshots of the given single-chain nanoparticle. Each of the 20 entries is another list with two entries, with the first being a numpy.ndarray containing the x,y,z coordinates of all the beads comprising the single-chain nanoparticle and the second being a numpy.ndarray with a numerical encoding to indicate whether the beads are backbone (indicated as '0') or linker beads (indicated as '1'). Altogether, this provides 153,600 configurations of single-chain nanoparticles.
This dataset contains input files, training data and other files related to the machine learning models developed during the work by Muniz et al. In this work, we construct machine learning models based on the MB-pol many-body model. We find that the training set should include cluster configurations as well as liquid phase configurations in order to accurately represent both liquid and VLE properties. The results attest for the ability of machine learning models to accurately represent many-body potentials and provide an efficient avenue for water simulations.
This dataset contains all data relevant to a forthcoming publication in which we used molecular simulation methods to study the phase behavior of supercooled water. The dataset contains simulation input and output files, processed data files, and image files used to create all plots in the manuscript. Python analysis scripts are also included, including instructions for how to re-generate all plots in the manuscript.
Gartner, Thomas III; Zhang, Linfeng; Piaggi, Pablo; Car, Roberto; Panagiotopoulos, Athanassios; Debenedetti, Pablo
Abstract:
This dataset contains all data related to the publication "Signatures of a liquid-liquid transition in an ab initio deep neural network model for water", by Gartner et al., 2020. In this work, we used neural networks to generate a computational model for water using high-accuracy quantum chemistry calculations. Then, we used advanced molecular simulations to demonstrate evidence that suggests this model exhibits a liquid-liquid transition, a phenomenon that can explain many of water's anomalous properties. This dataset contains links to all software used, all data generated as part of this work, as well as scripts to generate and analyze all data and generate the plots reported in the publication.
This dataset comprises of data associated with the publication "Transferability of data-driven, many-body models for CO2 simulations in the vapor and liquid phases", which can be found at https://doi.org/10.1063/5.0080061. The data includes calculations for a Many-Body decomposition, virial coefficient calculations, orientational molecular scan energies, potential energy fields, correlation plots of training and testing data, vapor-liquid equilibrium simulations, liquid density simulations, and solid cell simulations.
Muniz, Maria Carolina; Gartner III, Thomas E.; Riera, Marc; Knight, Christopher; Yue, Shuwen; Paesani, Francesco; Panagiotopoulos, Athanassios Z.
Abstract:
This dataset contains all data (including input files, simulation trajectories as well as other data files and analysis scripts) related to the publication "Vapor-liquid equilibrium of water with the MB-pol many-body potential" by Muniz et al. in preparation (2021). In this work, we assessed the performance of the MB-pol many-body potential with respect to water's vapor-liquid equilibrium properties. Through the use of direct coexistence molecular dynamics, we calculated properties such as coexistence densities, surface tension, vapor pressures and enthalpy of vaporization. We found that MB-pol is able to predict these properties in good agreement with experimental data. The results attest to the chemical accuracy of MB-pol and its large range of application across water's phase diagram.
This distribution contains experimentally measured data for the extent of retained enzyme activity post thermal stressing for three distinct enzymes: glucose oxidase, lipase, and horseradish peroxidase. The data is used to form conclusions and develop machine learning models as reported in the publication "Machine Learning on a Robotic Platform for the Design of Polymer-Protein Hybrids" by Matthew Tamasi, Roshan Patel, Carlos Borca, Shashank Kosuri, Heloise Mugnier, Rahul Upadhya, N. Sanjeeva Murthy, Michael Webb*, and Adam Gormley. Details regarding the experimental protocols are reported in the aforementioned paper but are briefly discussed in the README.
The materials include codes and example input / output files for Monte Carlo simulations of lattice chains in the grand canonical ensemble, for determining phase behavior, critical points, and formation of aggregates.
These GROMACS trajectories show the existence of a critical point in deeply supercooled WAIL water. Also included is the code necessary to reproduce the figures in the corresponding paper from these trajectories. From this data the critical temperature, pressure, and density of the model can be found, and critical fluctuations in the deeply supercooled liquid can be directly observed (in a computer-simulation sense).
O'Neill, Eric; Lark, Tyler; Xie, Yanhua; Basso, Bruno
Abstract:
Collection of the underlying spatially explicit data for Available Land for Cellulosic Biofuel Production: A Supply Chain Centered Comparison. Includes raw biomass yield data and soil carbon sequestration potential data for three types of marginal land for the USA midwest at the field level including field areas. Collection also includes raw land rasters for the three types of marginal land, model parameters for the MILP model used in the study, and results used to generate the figures in the paper.
Webb, Michael; Jacobs, William; An, Yaxin; Oliver, Wesley
Abstract:
This distribution compiles thermodynamic and (where available) dynamic properties of short protein sequences as obtained from coarse-grained molecular dynamics simulations. The dataset features 2114 protein sequences with sequence lengths ranging from N=20 up to N=50 amino acids. The simulation and analysis of these sequences is described in "Active learning of the thermodynamics--dynamics tradeoff in protein condensates'' by Yaxin An, Michael A. Webb*, and William M. Jacobs* (https://doi.org/10.48550/arXiv.2306.03696). Of the 2114 protein sequences, 80 are homomeric polypeptides (replicating a single amino acid for N = 20, 30, 40, and 50), 1266 are sourced from version 9.0 of the DisProt database, and the remaining 768 sequences are novel sequences generated during an active learning campaign described in the aforementioned manuscript. The simulations were performed using the LAMMPS molecular dynamics engine. The interactions used for simulation are obtained from R. M. Regy , J. Thompson , Y. C. Kim and J. Mittal , Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins, Protein Sci., 2021, 1371 —1379. Properties included in this distribution include second virial coefficients, pressure-density data, expectation for phase behavior at 300 K, estimated condensed-phase densities at 300 K (if exist), and condensed-phase self-diffusion coefficients at 300 K (if exist).