Mondal, Shanka Subhra; Webb, Taylor; Cohen, Jonathan
A dataset of Raven’s Progressive Matrices (RPM)-like problems using realistically rendered
3D shapes, based on source code from CLEVR (a popular visual-question-answering dataset) (Johnson, J., Hariharan, B., Van Der Maaten, L., Fei-Fei, L., Lawrence Zitnick, C., & Girshick, R. (2017). Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2901-2910)).
This dataset encompasses two distinct sets of data analyzed in the study, namely Asian American Scholar Forum survey data and Microsoft Academic Graph bibleometrics data:
Yu Xie, Xihong Lin, Ju Li, Qian He, Junming Huang, Caught in the Crossfire: Fears of Chinese-American Scientists, Proceedings of the National Academy of Sciences, in press (2023).
This dataset contains example input files, training data sets and potential files related to the publication "First-principles-based Machine Learning Models for Phase Behavior and Transport Properties of CO2." by Mathur et al (2023). In this work, we developed machine learning models for CO2 based on different exchange-correlation DFT functionals. We assessed their performance on liquid densities, vapor-liquid equilibrium and transport properties.
This item provides access to all configurations of single-chain nanoparticles analyzed in the manuscript "Sequence Patterning, Morphology, and Dispersity in Single-Chain Nanoparticles: Insights from Simulation and Machine Learning" by Roshan A. Patel, Sophia Colmenares, and Michael A. Webb (DOI: 10.1021/acspolymersau.3c00007). The single-chain nanoparticles derive from 320 unique precursor chains that are distinguished by the fraction of linker beads that decorate a fixed-length polymer backbone and the distribution or blockiness of those linker beads. The data is provided in the form of serialized object using the `pickle' python module. The data was compiled using Python version 3.8.8 and Clang 10.0.0. The Python object loaded from the .pkl file is a nested list, with the first dimension having 7,680 entries for the 7,680 unique single-chain nanoparticles produced in the aforementioned paper. Each of those 7,680 entries is itself a list with 20 entries, representing the 20 different simulation snapshots of the given single-chain nanoparticle. Each of the 20 entries is another list with two entries, with the first being a numpy.ndarray containing the x,y,z coordinates of all the beads comprising the single-chain nanoparticle and the second being a numpy.ndarray with a numerical encoding to indicate whether the beads are backbone (indicated as '0') or linker beads (indicated as '1'). Altogether, this provides 153,600 configurations of single-chain nanoparticles.
Piaggi, Pablo M; Gartner, Thomas E; Car, Roberto; Debenedetti, Pablo G
The possible existence of a liquid-liquid critical point in deeply supercooled water has been a subject of debate in part due to the challenges associated with providing definitive experimental evidence. Pioneering work by Mishima and Stanley [Nature 392, 164 (1998) and Phys.~Rev.~Lett. 85, 334 (2000)] sought to shed light on this problem by studying the melting curves of different ice polymorphs and their metastable continuation in the vicinity of the expected location of the liquid-liquid transition and its associated critical point. Based on the continuous or discontinuous changes in slope of the melting curves, Mishima suggested that the liquid-liquid critical point lies between the melting curves of ice III and ice V. Here, we explore this conjecture using molecular dynamics simulations with a purely-predictive machine learning model based on ab initio quantum-mechanical calculations. We study the melting curves of ices III, IV, V, VI, and XIII using this model and find that the melting lines of all the studied ice polymorphs are supercritical and do not intersect the liquid-liquid transition locus. We also find a pronounced, yet continuous, change in slope of the melting lines upon crossing of the locus of maximum compressibility of the liquid. Finally, we analyze critically the literature in light of our findings, and conclude that the scenario in which melting curves are supercritical is favored by the most recent computational and experimental evidence. Thus, although the preponderance of experimental and computational evidence is consistent with the existence of a second critical point in water, the behavior of the melting lines of ice polymorphs does not provide strong evidence in support of this viewpoint, according to our calculations.
The materials include codes and example input / output files for Monte Carlo simulations of lattice chains in the grand canonical ensemble, for determining phase behavior, critical points, and formation of aggregates.
Griffies, Stephen M; Beadling, Rebecca L; Krasting, John P; Hurlin, William J
This output was produced in coordination with the Southern Ocean Freshwater release model experiments Initiative (SOFIA) and is the Tier 1 experiment where freshwater is delivered in a spatially and temporally uniform pattern at the surface of the ocean at sea surface temperature in a 1-degree latitude band extending from Antarctica’s coastline. The total additional freshwater flux imposed as a monthly freshwater flux entering the ocean is 0.1 Sv. Users are referred to the methods section of Beadling et al. (2022) for additional details on the meltwater implementation in CM4 and ESM4. The datasets in this collection contain model output from the coupled global climate model, CM4, and Earth System Model, ESM4, both developed at the Geophysical Fluid Dynamics Laboratory (GFDL) of the National Oceanic and Atmospheric Administration (NOAA). The ocean_monthly_z and ocean_annual_z output are provided as z depth levels in meters as opposed to the models native hybrid vertical ocean coordinate which consists of z* (quasi-geopotential) coordinates in the upper ocean through the mixed layer, transitioning to isopycnal (referenced to 2000 dbar) in the ocean interior. Please see README for further details.
Microscopy images are part of a paper entitled "Structured foraging of soil predators unveils functional responses to bacterial defenses" by Fernando Rossine, Gabriel Vercelli, Corina Tarnita, and Thomas Gregor. For detailed acquisition methods see the paper. Experiments were performed between 2019 and 2020 at Princeton University. Two types of images are provided, macroscopic and microscopic widefiled Images. Macroscopic images all show Petri dishes covered in fluorescent bacteria being consumed by amoebae. Images are shown for D. discoideum, P. violaceum, and A. castellanii. Images depicting drug treatments (Nystatin and Fluorouracil) were obtained using D. discoideum. Images used for the creation of a profile were all taken within 30 minutes of each other. Within each directory numbered images are independent replicates. The raw video directory contains time series for dishes under drug treatments. Each numbered folder is a sequence of photos (taken 30 minutes apart of each other) of a single dish. Microscopic images all show amoebae consuming bacteria on a petri dish. The 45 minute videos show either edge cells (located at the edge of amoebae colonies), or inner cells (located 2.5 millimeters towards the center of the colony, from the edge). Videos are confocal stacks, with bacteria showing in green and amoebae appearing as black holes within the bacterial lawn. As was for the macroscopic images, images are shown for D. discoideum, P. violaceum, and A. castellanii. Images depicting drug treatments (Nystatin and Fluorouracil) were obtained using D. discoideum.
This distribution compiles numerous physical properties for 2,585 intrinsically disordered proteins (IDPs) obtained by coarse-grained molecular dynamics simulation. This combination comprises "Dataset A" as reported in "Featurization strategies for polymer sequence or composition design by machine learning" by Roshan A. Patel, Carlos H. Borca, and Michael A. Webb (DOI: 10.1039/D1ME00160D). The specific IDP sequences are sourced from version 9.0 of the DisProt database. The simulations were performed using the LAMMPS molecular dynamics engine. The interactions used for simulation are obtained from R. M. Regy , J. Thompson , Y. C. Kim and J. Mittal , Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins, Protein Sci., 2021, 1371 —1379.
The dataset contains the model file for the Global Adjoint Tomography Model 25 (GLAD-M25). The model file contains parameters defined on the spectral-element mesh and is recommend to be used in SPECFEM3D GLOBE for seismic wave simulation at the global scale.
There has been considerable recent interest in the high-pressure behavior of silicon carbide, a potential major constituent of carbon-rich exoplanets. In this work, the atomic-level structure of SiC was determined through in situ X-ray diffraction under laser-driven ramp compression up to 1.5 TPa; stresses more than seven times greater than previous static and shock data. Here we show that the B1-type structure persists over this stress range and we have constrained its equation of state (EOS). Using this data we have determined the first experimentally based mass-radius curves for a hypothetical pure SiC planet. Interior structure models are constructed for planets consisting of a SiC-rich mantle and iron-rich core. Carbide planets are found to be ~10% less dense than corresponding terrestrial planets.
Geyman, Emily C.; Wu, Ziman; Nadeau, Matthew D.; Edmonsond, Stacey; Turner, Andrew; Purkis, Sam J.; Howes, Bolton; Dyer, Blake; Ahm, Anne-Sofie C.; Yao, Nan; Deutsch, Curtis A.; Higgins, John A.; Stolper, Daniel A.; Maloof, Adam C.
Carbonate mud represents one of the most important geochemical archives for reconstructing ancient climatic, environmental, and evolutionary change from the rock record. Mud also represents a major sink in the global carbon cycle. Yet, there remains no consensus about how and where carbonate mud is formed. In this contribution, we present new geochemical data that bear on this problem, including stable isotope and minor and trace element data from carbonate sources in the modern Bahamas such as ooids, corals, foraminifera, and green algae.
This dataset contains input and output files to reproduce the results of the manuscript "Homogeneous ice nucleation in an ab initio machine learning model" by Pablo M. Piaggi, Jack Weis, Athanassios Z. Panagiotopoulos, Pablo G. Debenedetti, and Roberto Car (arXiv preprint https://arxiv.org/abs/2203.01376). In this work, we studied the homogeneous nucleation of ice from supercooled liquid water using a machine learning model trained on ab initio energies and forces. Since nucleation takes place over times much longer than the simulation times that can be afforded using molecular dynamics simulations, we make use of the seeding technique that is based on simulating an ice cluster embedded in liquid water. The key quantity provided by the seeding technique is the size of the critical cluster (i.e., a size such that the cluster has equal probabilities of growing or shrinking at the given supersaturation). Using data from the seeding simulations and the equations of classical nucleation theory we compute nucleation rates that can be compared with experiments.
This dataset contains all data relevant to a forthcoming publication in which we used molecular simulation methods to study the phase behavior of supercooled water. The dataset contains simulation input and output files, processed data files, and image files used to create all plots in the manuscript. Python analysis scripts are also included, including instructions for how to re-generate all plots in the manuscript.
This dataset comprises of data associated with the publication "Transferability of data-driven, many-body models for CO2 simulations in the vapor and liquid phases", which can be found at https://doi.org/10.1063/5.0080061. The data includes calculations for a Many-Body decomposition, virial coefficient calculations, orientational molecular scan energies, potential energy fields, correlation plots of training and testing data, vapor-liquid equilibrium simulations, liquid density simulations, and solid cell simulations.
This distribution contains experimentally measured data for the extent of retained enzyme activity post thermal stressing for three distinct enzymes: glucose oxidase, lipase, and horseradish peroxidase. The data is used to form conclusions and develop machine learning models as reported in the publication "Machine Learning on a Robotic Platform for the Design of Polymer-Protein Hybrids" by Matthew Tamasi, Roshan Patel, Carlos Borca, Shashank Kosuri, Heloise Mugnier, Rahul Upadhya, N. Sanjeeva Murthy, Michael Webb*, and Adam Gormley. Details regarding the experimental protocols are reported in the aforementioned paper but are briefly discussed in the README.
Data set corresponding to "NAPS: Integrating pose estimation and tag-based tracking." This dataset contains the corresponding videos, tracking scripts, and SLEAP models along with SLEAP, NAPS, and ArUco tracking results.
Petsev, Nikolai D.; Nikoubashman, Arash; Latinwo, Folarin
Source code for our genetic algorithm optimization investigation of conglomerate and racemic chiral crystals. In this work, we address challenges in determining the stable structures formed by chiral molecules by applying the framework of genetic algorithms to predict the ground state crystal lattices formed by a chiral tetramer model. Using this code, we explore the relative stability and structures of the model’s conglomerate and racemic crystals, and extract a structural phase diagram for the stable Bravais crystal types in the zero-temperature limit.