Transcriptome-scale RNase-footprinting of RNA-protein complexes

Zhe Ji, Ruisheng Song, Hailiang Huang, Aviv Regev, and Kevin Struhl. 2016. “Transcriptome-scale RNase-footprinting of RNA-protein complexes.” Nat Biotechnol, 34, 4, Pp. 410-3. Abstract

Ribosome profiling is widely used to study translation in vivo, but not all sequence reads correspond to ribosome-protected RNA. Here we describe Rfoot, a computational pipeline that analyzes ribosomal profiling data and identifies native, nonribosomal RNA-protein complexes. We use Rfoot to precisely map RNase-protected regions within small nucleolar RNAs, spliceosomal RNAs, microRNAs, tRNAs, long noncoding (lnc)RNAs and 3′ untranslated regions of mRNAs in human cells. We show that RNAs of the same class can show differential complex association. Although only a subset of lncRNAs show RNase footprints, many of these have multiple footprints, and the protected regions are evolutionarily conserved, suggestive of biological functions.

Precision and recall estimates for two-hybrid screens

Hailiang Huang and Joel S Bader. 2009. “Precision and recall estimates for two-hybrid screens.” Bioinformatics, 25, 3, Pp. 372-8. Abstract

MOTIVATION: Yeast two-hybrid screens are an important method to map pairwise protein interactions. This method can generate spurious interactions (false discoveries), and true interactions can be missed (false negatives). Previously, we reported a capture-recapture estimator for bait-specific precision and recall. Here, we present an improved method that better accounts for heterogeneity in bait-specific error rates. RESULT: For yeast, worm and fly screens, we estimate the overall false discovery rates (FDRs) to be 9.9%, 13.2% and 17.0% and the false negative rates (FNRs) to be 51%, 42% and 28%. Bait-specific FDRs and the estimated protein degrees are then used to identify protein categories that yield more (or fewer) false positive interactions and more (or fewer) interaction partners. While membrane proteins have been suggested to have elevated FDRs, the current analysis suggests that intrinsic membrane proteins may actually have reduced FDRs. Hydrophobicity is positively correlated with decreased error rates and fewer interaction partners. These methods will be useful for future two-hybrid screens, which could use ultra-high-throughput sequencing for deeper sampling of interacting bait-prey pairs. AVAILABILITY: All software (C source) and datasets are available as supplemental files and at under the Lesser GPL v. 3 license.

HistoneHits: a database for histone mutations and their phenotypes

Hailiang Huang, Alexandra M Maertens, Edel M Hyland, Junbiao Dai, Anne Norris, Jef D Boeke, and Joel S Bader. 2009. “HistoneHits: a database for histone mutations and their phenotypes.” Genome Res, 19, 4, Pp. 674-81. Abstract

Histones are the basic protein components of nucleosomes. They are among the most conserved proteins and are subject to a plethora of post-translational modifications. Specific histone residues are important in establishing chromatin structure, regulating gene expression and silencing, and responding to DNA damage. Here we present HistoneHits, a database of phenotypes for systematic collections of histone mutants. This database combines assay results (phenotypes) with information about sequences, structures, post-translational modifications, and evolutionary conservation. The web interface presents the information through dynamic tables and figures. It calculates the availability of data for specific mutants and for nucleosome surfaces. The database currently includes 42 assays on 677 mutants multiply covering 405 of the 498 residues across yeast histones H3, H4, H2A, and H2B. We also provide an interface with an extensible controlled vocabulary for research groups to submit new data. Preliminary analyses confirm that mutations at highly conserved residues and modifiable residues are more likely to generate phenotypes. Buried residues and residues on the lateral surface tend to generate more phenotypes, while tail residues generate significantly fewer phenotypes than other residues. Yeast mutants are cross referenced with known human histone variants, identifying a position where a yeast mutant causes loss of ribosomal silencing and a human variant increases breast cancer susceptibility. All data sets are freely available for download.

Where have all the interactions gone? Estimating the coverage of two-hybrid protein interaction maps

Hailiang Huang, Bruno M Jedynak, and Joel S Bader. 2007. “Where have all the interactions gone? Estimating the coverage of two-hybrid protein interaction maps.” PLoS Comput Biol, 3, 11, Pp. e214. Abstract

Yeast two-hybrid screens are an important method for mapping pairwise physical interactions between proteins. The fraction of interactions detected in independent screens can be very small, and an outstanding challenge is to determine the reason for the low overlap. Low overlap can arise from either a high false-discovery rate (interaction sets have low overlap because each set is contaminated by a large number of stochastic false-positive interactions) or a high false-negative rate (interaction sets have low overlap because each misses many true interactions). We extend capture-recapture theory to provide the first unified model for false-positive and false-negative rates for two-hybrid screens. Analysis of yeast, worm, and fly data indicates that 25% to 45% of the reported interactions are likely false positives. Membrane proteins have higher false-discovery rates on average, and signal transduction proteins have lower rates. The overall false-negative rate ranges from 75% for worm to 90% for fly, which arises from a roughly 50% false-negative rate due to statistical undersampling and a 55% to 85% false-negative rate due to proteins that appear to be systematically lost from the assays. Finally, statistical model selection conclusively rejects the Erdös-Rényi network model in favor of the power law model for yeast and the truncated power law for worm and fly degree distributions. Much as genome sequencing coverage estimates were essential for planning the human genome sequencing project, the coverage estimates developed here will be valuable for guiding future proteomic screens. All software and datasets are available in and , -, and -, and are also available from our Web site,