RNA-protein interactions are fundamental to gene expression, alternative splicing, RNA transport, and chromatin remodeling.


How can we build quantitative and predictive understanding of how sequence variation affects macromolecular function?

Recent genome-wide investigations of regulatory elements (e.g., the ENCODE project) suggest that the majority of the genome actually interacts in some manner with trans-acting factors, and a large number of disease-implicated polymorphisms lie not in protein coding regions but in regulatory regions of DNA which interact directly with DNA-binding factors to regulate gene expression or chromatin structure. These observations have underscored the importance of a quantitative understanding of sequence dependence of affinity for trans-acting factors to predict deleterious polymorphism. Furthermore, RNA-protein interactions are fundamental to gene expression, alternative splicing, RNA transport, and chromatin remodeling. RNA structures at nearly all scales, from long noncoding RNA, to short microRNAs, are implicated in fundamental regulatory roles of basic biological processes such as RNA expression control, chromatin remodeling, and differentiation, as well as pathologies such as cancer. However, because the combinatorial space covered by RNA sequence is astronomical, high-throughput methods for quantitative biochemical investigations of RNA are necessary to quantitatively ground our understanding of RNA stability and RNA-proteins interactions.

Recently, pioneering work from the Burge lab (MIT) demonstrated the use of a high-throughput sequencing device as a post-hoc DNA array. By extending the sequencing quantitation software, this method quantified binding affinity of fluorescently-tagged DNA binding proteins to millions of DNA sequences directly on a high-throughput sequencing chip. We have extended this work to bring the underlying technological innovations at the heart of these high-throughput, quantitative measurements of DNA binding proteins to the vastly more complex landscape of RNA-RBP interactions. Leveraging our previous works we use E. coli RNA polymerase to synthesize RNAs from previously sequenced DNA “clusters” on the Illumina high-throughput sequencing flow cell. The DNA is transcribed such that the synthesized RNA remains physically associated with the template, thereby generating a massive diversity of known RNA sequences directly on a high-throughput sequencing instrument. This platform allows the creation of an RNA array of up to ~10^8 unique RNA features, and parallel measurements of binding of fluorescently labeled RNA binding proteins to all these structures. By allowing increasing concentrations of RBP to bind to the RNA array, we can construct a binding curve for each sequence, enabling quantification of binding energetics on a massive scale. By generating quantitative energetic maps of protein-RNA interactions over millions of RNA variants, we will contribute a unique and powerful dataset for understanding the constraints and requirements for high affinity RNA-protein interactions.



High-Throughput, Single-Molecule Biochemistry

Rare or short-lived nucleic-acid conformations are often hidden in “bulk” biochemical methods because these heterogeneous states, which are often of significant interest both to biological mechanisms and for accurate physical understanding of these biopolymers, are lost in the ensemble nature of the experiments. Single-molecule fluorescence methods have lifted the veil of the ensemble average, allowing direct observation of rare and stochastic biomolecular states, affording substantial mechanistic insight into foundational biological processes. Arguably, nowhere has the impact of single-molecule methods been more acutely felt than in understanding the rich diversity of DNA and especially RNA structural dynamics. However, single-molecule methods are currently extremely cumbersome to apply to large numbers of diverse molecules, making systematic, large-scale investigations of different molecular configurations impossible. In short, the high-throughput revolution in biosciences has left single-molecule biophysics behind. We are working to eliminate this bottleneck, brining single-molecule investigations to the realm of high-throughput biological investigation.