This repository contains the raw photon-by-photon single-molecule FRET (smFRET) trajectories, SAXS data, and MD simulation trajectories, multi-sequence alignment, and gel images for the paper titled "Sub-Domain Dynamics Enables Chemical Chain Reactions in Nonribosomal Peptide Synthetases."
This dataset contains supplementary materials for Chapter 4 and Chapter 5 of Yiheng Tao's PhD dissertation (2022). The dissertation’s abstract is provided here:
Carbon capture, utilization, and storage (CCUS) mitigates climate change by capturing carbon dioxide (CO2) emissions from large point sources, or CO2 from the ambient air, and subsequently reusing the captured CO2 or injecting it into deep geological formations for long-term and secure storage. Almost all current decarbonization pathways include large-scale CCUS, on the order of a billion tonnes (Gt) of CO2 captured and stored each year globally starting in 2030, yet the actual deployment has lagged far behind (around 0.04 Gt CO2 was captured in 2021). In this dissertation, I contribute to several aspects of largescale deployment of CCUS by (1) developing and applying efficient numerical models to simulate geological CO2 storage and (2) identifying key policies to address the bottlenecks of overall CCUS deployment. This dissertation concerns the United States, China, and the Belt and Road Initiative (BRI) region through research projects that are consistent with each location’s current development stage of CCUS.
Chapters 2 and 3 contain computational modeling studies. In Chapter 2, I develop a new series of vertical-equilibrium (VE) models in the dual-continuum modeling framework to simulate CO2 injection and migration in fractured geological formations. Those models are shown to be effective and efficient when properties of the formation allow for the VE assumption. In Chapter 3, I apply a VE model to simulate basin-scale CO2 injection in the Junggar Basin of Northwestern China. The results show that current regional emissions of more than 100 million tonnes of CO2 per year can be stored effectively, thereby confirming the great potential of the Junggar Basin for early CCUS deployment.
Chapters 4 and 5 contain policy analyses. In Chapter 4, I propose a dynamic system consisting of new CO2 pipelines and novel Allam-cycle power plants in the Central United States, and examine how government policies, including an extended Section 45Q tax credit, may improve the economic feasibility of this system. Lastly, in Chapter 5, I investigate and quantify CO2 emissions implications of power plant projects associated with the BRI. I also propose a “greenness ratio” to measure the level of environmental sustainability of BRI in the power sector.
Taylor, Jenny A.; Bratton, Benjamin P.; Sichel, Sophie R.; Blair, Kris M.; Jacobs, Holly M.; DeMeester, Kristen E.; Kuru, Erkin; Gray, Joe; Biboy, Jacob; VanNieuwenhze, Michael S.; Vollmer, Waldemar; Grimes, Catherine L.; Shaevitz, Joshua W.; Salama, Nina R.
Abstract:
Helical cell shape is necessary for efficient stomach colonization by Helicobacter pylori, but the molecular mechanisms for generating helical shape remain unclear. We show that the helical centerline pitch and radius of wild-type H. pylori cells dictate surface curvatures of considerably higher positive and negative Gaussian curvatures than those present in straight- or curved-rod bacteria. Quantitative 3D microscopy analysis of short pulses with either N-acetylmuramic acid or D-alanine metabolic probes showed that cell wall growth is enhanced at both sidewall curvature extremes. Immunofluorescence revealed MreB is most abundant at negative Gaussian curvature, while the bactofilin CcmA is most abundant at positive Gaussian curvature. Strains expressing CcmA variants with altered polymerization properties lose helical shape and associated positive Gaussian curvatures. We thus propose a model where CcmA and MreB promote PG synthesis at positive and negative Gaussian curvatures, respectively, and that this patterning is one mechanism necessary for maintaining helical shape.
The bitKlavier Grand consists of sample collections of a new Steinway D grand piano from nine different stereo mic images, with: 16 velocity layers, at every minor 3rd (starting at A0); Hammer release samples; Release resonance samples; Pedal samples. Release packages at 96k/24bit, 88.2k/24bit, 48k/24bit, 44.1k/16bit are available for various applications.
The bitKlavier Grand consists of sample collections of a new Steinway D grand piano from nine different stereo mic images, with: 16 velocity layers, at every minor 3rd (starting at A0); Hammer release samples; Release resonance samples; Pedal samples.
Release packages at 96k/24bit, 88.2k/24bit, 48k/24bit, 44.1k/16bit are available for various applications.
The bitKlavier Grand consists of sample collections of a new Steinway D grand piano from nine different stereo mic images, with: 16 velocity layers, at every minor 3rd (starting at A0); Hammer release samples; Release resonance samples; Pedal samples. Release packages at 96k/24bit, 88.2k/24bit, 48k/24bit, 44.1k/16bit are available for various applications.
The bitKlavier Grand consists of sample collections of a new Steinway D grand piano from nine different stereo mic images, with: 16 velocity layers, at every minor 3rd (starting at A0); Hammer release samples; Release resonance samples; Pedal samples. Release packages at 96k/24bit, 88.2k/24bit, 48k/24bit, 44.1k/16bit are available for various applications.
The bitKlavier Grand consists of sample collections of a new Steinway D grand piano from nine different stereo mic images, with: 16 velocity layers, at every minor 3rd (starting at A0); Hammer release samples; Release resonance samples; Pedal samples. Release packages at 96k/24bit, 88.2k/24bit, 48k/24bit, 44.1k/16bit are available for various applications.
The bitKlavier Grand consists of sample collections of a new Steinway D grand piano from nine different stereo mic images, with: 16 velocity layers, at every minor 3rd (starting at A0); Hammer release samples; Release resonance samples; Pedal samples. Release packages at 96k/24bit, 88.2k/24bit, 48k/24bit, 44.1k/16bit are available for various applications.
The bitKlavier Grand consists of sample collections of a new Steinway D grand piano from nine different stereo mic images, with: 16 velocity layers, at every minor 3rd (starting at A0); Hammer release samples; Release resonance samples; Pedal samples. Release packages at 96k/24bit, 88.2k/24bit, 48k/24bit, 44.1k/16bit are available for various applications.
This dataset contains all the model output used to generate the figures and data reported in the article "Climate, soil organic layer, and nitrogen jointly drive forest development after fire in the North American boreal zone". The data was generated during spring 2015 using the a modified version of the Ecosystem Demography model version 2, provided as a supplement accompanying the article. The data was generated using the computational resources supported by the PICSciE OIT High Performance Computing Center and Visualization Laboratory at Princeton University. The dataset contains a pdf Readme file which explains in detail how the data can be used. Users are recommended to go through this file before using the data.
Vecchi, Gabriel A.; Landsea, Christopher; Zhang, Wei; Villarini, Gabriele; Knutson, Thomas
Abstract:
These are the data and scripts supporting the manuscript: Vecchi, Landsea, Zhang, Villarini and Knutson (2021): Changes in Atlantic Major Hurricane Frequency Since the Late-19th Century. Nature Communications.
Wang, Rui; Guo, Xuehui; Pan, Da; Kelly, James; Bash, Jesse; Sun, Kang; Paulot, Fabien; Clarisse, Lieven; Van Damme, Martin; Whitburn, Simon; Coheur, Pierre-François; Clerbaux, Cathy; Zondlo, Mark
Abstract:
Monthly, high resolution (~2 km) ammonia (NH3) column maps from the Infrared Atmospheric Sounding Interferometer (IASI) were developed across the contiguous United States and adjacent areas. Ammonia hotspots (95th percentile of the column distribution) were highly localized with a characteristic length scale of 12 km and median area of 152 km2. Five seasonality classes were identified with k-means++ clustering. The Midwest and eastern United States had a broad, spring maximum of NH3 (67% of hotspots in this cluster). The western United States, in contrast, showed a narrower mid-summer peak (32% of hotspots). IASI spatiotemporal clustering was consistent with those from the Ammonia Monitoring Network. CMAQ and GFDL-AM3 modeled NH3 columns have some success replicating the seasonal patterns but did not capture the regional differences. The high spatial-resolution monthly NH3 maps serve as a constraint for model simulations and as a guide for the placement of future, ground-based network sites.
This distribution compiles numerous physical properties for 2,585 intrinsically disordered proteins (IDPs) obtained by coarse-grained molecular dynamics simulation. This combination comprises "Dataset A" as reported in "Featurization strategies for polymer sequence or composition design by machine learning" by Roshan A. Patel, Carlos H. Borca, and Michael A. Webb (DOI: 10.1039/D1ME00160D). The specific IDP sequences are sourced from version 9.0 of the DisProt database. The simulations were performed using the LAMMPS molecular dynamics engine. The interactions used for simulation are obtained from R. M. Regy , J. Thompson , Y. C. Kim and J. Mittal , Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins, Protein Sci., 2021, 1371 —1379.
This item provides access to all configurations of single-chain nanoparticles analyzed in the manuscript "Sequence Patterning, Morphology, and Dispersity in Single-Chain Nanoparticles: Insights from Simulation and Machine Learning" by Roshan A. Patel, Sophia Colmenares, and Michael A. Webb (DOI: 10.1021/acspolymersau.3c00007). The single-chain nanoparticles derive from 320 unique precursor chains that are distinguished by the fraction of linker beads that decorate a fixed-length polymer backbone and the distribution or blockiness of those linker beads. The data is provided in the form of serialized object using the `pickle' python module. The data was compiled using Python version 3.8.8 and Clang 10.0.0. The Python object loaded from the .pkl file is a nested list, with the first dimension having 7,680 entries for the 7,680 unique single-chain nanoparticles produced in the aforementioned paper. Each of those 7,680 entries is itself a list with 20 entries, representing the 20 different simulation snapshots of the given single-chain nanoparticle. Each of the 20 entries is another list with two entries, with the first being a numpy.ndarray containing the x,y,z coordinates of all the beads comprising the single-chain nanoparticle and the second being a numpy.ndarray with a numerical encoding to indicate whether the beads are backbone (indicated as '0') or linker beads (indicated as '1'). Altogether, this provides 153,600 configurations of single-chain nanoparticles.
This distribution contains experimentally measured data for the extent of retained enzyme activity post thermal stressing for three distinct enzymes: glucose oxidase, lipase, and horseradish peroxidase. The data is used to form conclusions and develop machine learning models as reported in the publication "Machine Learning on a Robotic Platform for the Design of Polymer-Protein Hybrids" by Matthew Tamasi, Roshan Patel, Carlos Borca, Shashank Kosuri, Heloise Mugnier, Rahul Upadhya, N. Sanjeeva Murthy, Michael Webb*, and Adam Gormley. Details regarding the experimental protocols are reported in the aforementioned paper but are briefly discussed in the README.
Webb, Michael; Jacobs, William; An, Yaxin; Oliver, Wesley
Abstract:
This distribution compiles thermodynamic and (where available) dynamic properties of short protein sequences as obtained from coarse-grained molecular dynamics simulations. The dataset features 2114 protein sequences with sequence lengths ranging from N=20 up to N=50 amino acids. The simulation and analysis of these sequences is described in "Active learning of the thermodynamics--dynamics tradeoff in protein condensates'' by Yaxin An, Michael A. Webb*, and William M. Jacobs* (https://doi.org/10.48550/arXiv.2306.03696). Of the 2114 protein sequences, 80 are homomeric polypeptides (replicating a single amino acid for N = 20, 30, 40, and 50), 1266 are sourced from version 9.0 of the DisProt database, and the remaining 768 sequences are novel sequences generated during an active learning campaign described in the aforementioned manuscript. The simulations were performed using the LAMMPS molecular dynamics engine. The interactions used for simulation are obtained from R. M. Regy , J. Thompson , Y. C. Kim and J. Mittal , Improved coarse-grained model for studying sequence dependent phase separation of disordered proteins, Protein Sci., 2021, 1371 —1379. Properties included in this distribution include second virial coefficients, pressure-density data, expectation for phase behavior at 300 K, estimated condensed-phase densities at 300 K (if exist), and condensed-phase self-diffusion coefficients at 300 K (if exist).
Extrapolation -- the ability to make inferences that go beyond the scope of one's experiences -- is a hallmark of human intelligence. By contrast, the generalization exhibited by contemporary neural network algorithms is largely limited to interpolation between data points in their training corpora. In this paper, we consider the challenge of learning representations that support extrapolation. We introduce a novel visual analogy benchmark that allows the graded evaluation of extrapolation as a function of distance from the convex domain defined by the training data. We also introduce a simple technique, context normalization, that encourages representations that emphasize the relations between objects. We find that this technique enables a significant improvement in the ability to extrapolate, considerably outperforming a number of competitive techniques.
These GROMACS trajectories show the existence of a critical point in deeply supercooled WAIL water. Also included is the code necessary to reproduce the figures in the corresponding paper from these trajectories. From this data the critical temperature, pressure, and density of the model can be found, and critical fluctuations in the deeply supercooled liquid can be directly observed (in a computer-simulation sense).
Data set corresponding to "NAPS: Integrating pose estimation and tag-based tracking." This dataset contains the corresponding videos, tracking scripts, and SLEAP models along with SLEAP, NAPS, and ArUco tracking results.
This dataset encompasses three distinct sets of data analyzed in the study, namely the survey data on favorability to the US, the survey data on trust in Americans, and the social media data.
This dataset encompasses two distinct sets of data analyzed in the study, namely Asian American Scholar Forum survey data and Microsoft Academic Graph bibleometrics data:
Yu Xie, Xihong Lin, Ju Li, Qian He, Junming Huang, Caught in the Crossfire: Fears of Chinese-American Scientists, Proceedings of the National Academy of Sciences, in press (2023).
Explosive volcanic eruptions have large climate impacts, and can serve as observable tests of the climatic response to radiative forcing. Using a high resolution climate model, we contrast the climate responses to Pinatubo, with symmetric forcing, and those to Santa Maria and Agung, which had meridionally asymmetric forcing. Although Pinatubo had larger global-mean forcing, asymmetric forcing strongly shifts the latitude of tropical rainfall features, leading to larger local precipitation/TC changes. For example, North Atlantic TC activity over is enhanced/reduced by SH-forcing (Agung)/NH-forcing (Santa Maria), but changes little in response to the Pinatubo forcing. Moreover, the transient climate sensitivity estimated from the response to Santa Maria is 20% larger than that from Pinatubo or Agung. This spread in climatic impacts of volcanoes needs to be considered when evaluating the role of volcanoes in global and regional climate, and serves to contextualize the well-observed response to Pinatubo.
Yang, Yuan; Pan, Ming; Beck, Hylke; Fisher, Colby; Beighley, R. Edward; Kao, Shih-Chieh; Hong, Yang; Wood, Eric
Abstract:
Conventional basin-by-basin approaches to calibrate hydrologic models are limited to gauged basins and typically result in spatially discontinuous parameter fields. Moreover, the consequent low calibration density in space falls seriously behind the need from present-day applications like high resolution river hydrodynamic modeling. In this study we calibrated three key parameters of the Variable Infiltration Capacity (VIC) model at every 1/8° grid-cell using machine learning-based maps of four streamflow characteristics for the conterminous United States (CONUS), with a total of 52,663 grid-cells. This new calibration approach, as an alternative to parameter regionalization, applied to ungauged regions too. A key difference made here is that we tried to regionalize physical variables (streamflow characteristics) instead of model parameters whose behavior may often be less well understood. The resulting parameter fields no longer presented any spatial discontinuities and the patterns corresponded well with climate characteristics, such as aridity and runoff ratio. The calibrated parameters were evaluated against observed streamflow from 704/648 (calibration/validation period) small-to-medium-sized catchments used to derive the streamflow characteristics, 3941/3809 (calibration/validation period) small-to-medium-sized catchments not used to derive the streamflow characteristics) as well as five large basins. Comparisons indicated marked improvements in bias and Nash-Sutcliffe efficiency. Model performance was still poor in arid and semiarid regions, which is mostly due to both model structural and forcing deficiencies. Although the performance gain was limited by the relative small number of parameters to calibrate, the study and results here served as a proof-of-concept for a new promising approach for fine-scale hydrologic model calibrations.
Small changes in word choice can lead to dramatically different interpretations of narratives. How does the brain accumulate and integrate such local changes to construct unique neural representations for different stories? In this study we created two distinct narratives by changing only a few words in each sentence (e.g. “he” to “she” or “sobbing” to “laughing”) while preserving the grammatical structure across stories. We then measured changes in neural responses between the two stories. We found that the differences in neural responses between the two stories gradually increased along the hierarchy of processing timescales. For areas with short integration windows, such as early auditory cortex, the differences in neural responses between the two stories were relatively small. In contrast, in areas with the longest integration windows at the top of the hierarchy, such as the precuneus, temporal parietal junction, and medial frontal cortices, there were large differences in neural responses between stories. Furthermore, this gradual increase in neural difference between the stories was highly correlated with an area’s ability to integrate information over time. Amplification of neural differences did not occur when changes in words did not alter the interpretation of the story (e.g. “sobbing” to “crying”). Our results demonstrate how subtle differences in words are gradually accumulated and amplified along the cortical hierarchy as the brain constructs a narrative over time.
Data from the 2007 Developmental Idealism survey conducted in Gansu province in China's northwestern borderlands reveal that Muslims of the Hui and Dongxiang ethnicities reported much higher rates of cohabitation experience than the secular majority Han. Based on follow-up qualitative interviews, we found the answer to lie in the interplay between the highly interventionist Chinese state and the robust cultural resilience of local Islamic communities. Using the 2000 census data and the 2010 China Family Panel Studies data, we further show that women in almost all ten Muslim ethnic groups have higher percentages of underage births and premarital births than Han women, both nationally and in the northwest where most Chinese Muslims live. As the once-outlawed behavior of cohabitation became more socially acceptable during the reform and opening-up era, young Muslim Chinese often found themselves in “arranged cohabitations” as de facto marriages formed at younger-than-legal ages.
This dataset encompasses three distinct sets of data analyzed in the study, namely the survey data on favorability to the US, the survey data on trust in Americans, and the social media data.
The first part of the dataset comprises the analysis in Study 1 and Study 3, which is collected from three surveys, including the Social Attitude Questionnaire of Urban and Rural Residents (SAQURR) in 2019 and 2020, the COVID-19 Multi-Wave Study (CMWS) between 2020 and 2022, and the Survey on Living Conditions (SLC) in 2023.
The second part of the datasets provides information used in Study 4, involving the 2018 and 2020 waves of the CFPS, Baidu Index data, and the COVID-19 cases and deaths data.
The third dataset is provided to depict trends in attitudes toward the US in Study 2.
This dataset comprises of data associated with the publication "Transferability of data-driven, many-body models for CO2 simulations in the vapor and liquid phases", which can be found at https://doi.org/10.1063/5.0080061. The data includes calculations for a Many-Body decomposition, virial coefficient calculations, orientational molecular scan energies, potential energy fields, correlation plots of training and testing data, vapor-liquid equilibrium simulations, liquid density simulations, and solid cell simulations.
Zhou, Mi; Peng, Liqun; Zhang, Lin; Mauzerall, Denise L.
Abstract:
This dataset is created for the paper titled 'Environmental Benefits and Household Costs of Clean Heating Options in Northern China' and published on Nature Sustainability. Based on a 2015 regional anthropogenic emission inventory (base case), we propose seven counterfactual scenarios in which all 2015 residential solid fuel heating in northern China switches to one of the following non-district heating options: clean coal with improved stoves (CCIS), natural gas heaters (NGH), resistance heaters (RH), or air-to-air heat pumps (AAHP). This dataset provides the following gridded information for the base case and each clean heating scenario: (1) annual residential heating emissions for PM2.5/NOx/SO2; (2) monthly mean surface PM2.5 concentrations from the WRF-Chem model; (3) annual PM2.5-related premature deaths calculated by the GEMM model; (4) 2015 population in China; (5) mask for provinces in China; (6) longitude and latitude of each grid center.
The Magnetospheric Multiscale (MMS) mission has given us unprecedented access to high cadence particle and field data of magnetic reconnection at Earth's magnetopause. MMS first passed very near an X-line on 16 October 2015, the Burch event, and has since observed multiple X-line crossings. Subsequent 3D particle-in-cell (PIC) modeling efforts of and comparison with the Burch event have revealed a host of novel physical insights concerning magnetic reconnection, turbulence induced particle mixing, and secondary instabilities. In this study, we employ the Gkeyll simulation framework to study the Burch event with different classes of extended, multi-fluid magnetohydrodynamics (MHD), including models that incorporate important kinetic effects, such as the electron pressure tensor, with physics-based closure relations designed to capture linear Landau damping. Such fluid modeling approaches are able to capture different levels of kinetic physics in global simulations and are generally less costly than fully kinetic PIC. We focus on the additional physics one can capture with increasing levels of fluid closure refinement via comparison with MMS data and existing PIC simulations. In particular, we find that the ten-moment model well captures the agyrotropic structure of the pressure tensor in the vicinity of the X-line and the magnitude of anisotropic electron heating observed in MMS and PIC simulations. However, the ten-moment model has difficulty resolving the lower hybrid drift instability, which has been observed to plays a fundamental role in heating and mixing electrons in the current layer.
Complete dataset of pore water chemical parameters measured at the Marsh Resource Meadowlands Mitigation Bank, a tidal marsh within the New Jersey Meadowlands, from March 2011 to April 2012. Analytes measured include dissolved methane, sulfate, dissolved organic carbon, temperature, salinity, and pH. Measurements were conducted using porewater dialysis samplers, and water was sampled from the surface to a depth of 60 cm.
This dataset contains all the data, model and MATLAB codes used to generate the figures and data reported in the article (DOI: 10.1002/2014JD022278). The data was generated during September 2013 and February 2014 using the Ocean-Land-Atmosphere Model also provided with this package. The data was generated using the computational resources supported by the PICSciE OIT High Performance Computing Center and Visualization Laboratory at Princeton University. The dataset contains a pdf Readme file which explains in detail how the data can be used. Users are recommended to go through this file before using the data.
A subset of the Fermi-LAT public data for use with NPTFit:
https://github.com/bsafdi/NPTFit
The data here is for use with the Jupyter example notebooks provided with the
main code. Details of the files provided are given below. All files are provided
as numpy arrays binned as nside=128 HEALPix maps.
For the full public data, see:
http://fermi.gsfc.nasa.gov/ssc/data/access/