Physical accessibility defines the “active” genome

The cell faces a spectacular topological challenge in packing meters of chromosomal DNA inside a ~5 micron nucleus. The cell's solution to this challenge is the hierarchical folding of genomic DNA into regulated structures, the most basic and important of which is the nucleosome. Only a small fraction of the 6 billion DNA bases comprising the genome are accessible to the machinery of transcription within each cell, while the remainder is compacted and sequestered away by hierarchical folding of DNA into compacted chromatin. Because these changes in chromatin structure require the concerted actions of a host of specialized complexes, the specifics of this physical accessibility encode a durable physical memory of biological state, defining the regulatory landscape of the cell.

Major insights into this regulatory landscape have come from genome-wide methods such as MNase-seq, ChIP-seq, and DNase-seq, allowing identification of positions of transcription factor (TF) binding, active transcription start sites, nucleosomes and nucleosome modifications, enhancers, and insulators in a wide variety of cell lines and tissue samples. However, current methods for assaying chromatin accessibility, nucleosome positions, TF occupancy, or higher-order annotations of biological "state" of chromatin generally require multiple assays and tens or hundreds of millions of cells as input material, averaging out heterogeneity in cellular populations. These large sample requirements have precluded analysis of the chromatin regulatory landscape of rare phenotypically homogenous cellular sub-types.

ATAC-seq allow rapid, sensitive, and integrative “regulomic” analysis

To address these problems, we have developed a new, integrative, multidimensional epigenomic assay that relies on in vitro transposition of sequencing tags into areas of accessible chromatin. This assay of transposase accessible chromatin (ATAC-seq) rapidly generates multimodal data defining the regulatory landscape of chromatin using as few as ~500 cells-3-5 orders of magnitude less than other methods. We have generated preliminary data demonstrating the ATAC-seq assay allows simultaneous, genome-wide information on the positions of 1) open chromatin, 2) transcription factor binding, 3) nucleosomes in regulatory regions, and 4) information on chromatin state annotation.

Beyond Ensemble Measurements: Toward Single-cell “Regulomics”

Ensemble methods of measuring the genomic characteristics of a population “average out” large-scale cell-to-cell heterogeneities, and "drown out" rare variation within the population. As a simple analogy of this problem, let us consider all the shipping routes from the East Coast to the West Coast of the US. Some of these routes maneuver through the Panama Canal, while another fraction navigate the Straits of Magellan. However the ensemble average of these paths travels through the heart of Brazil, and is a path not taken by any boat - in fact this path is impossible! Similarly, almost all of our understanding of large-scale cellular epigenomics comes from looking at such ensemble measurements. We hope to move beyond the ensemble by applying sorting out phenotypically distinct subpopulations of cells, as well as pushing toward single-cell resolution of “regulomic” state using our ATAC-seq methodology.