We undertake ORF verification and cloning projects for the entire genomes of multiple organisms. In the ORF-cloning pipeline:
- Predicted ORFs are precisely PCR-amplified between annotated initiation and termination codons, either using a cDNA library as template or by RT-PCR, and specific primers are 5’-tailed with Gateway recombinational cloning sites,
- Resulting PCR products are recombined directionally into a Gateway Donor vector to create Entry clones,
- ORF sequence tags (OSTs) are obtained from the Entry clones, providing experimental evidence for the existence and intron-exon structure of the corresponding coding isoform.
The Gateway recombinational cloning strategy we use provides a robust platform for large-scale automated cloning for any genome of interest (Rual et al, Curr Opin Chem Biol 2004; Brasch et al, Genome Res 2004).
This well-defined ORFeome pipeline has:
- provided experimental evidence for the existence of at least 17,300 genes in C. elegans (Reboul et al, Nat Genet 2001),
- generated the first genome-wide attempt at cloning all predicted ORFs for a multicellular organism, leading to ~12,000 verified and cloned worm ORFs (Reboul et al, Nat Genet 2003),
- been used to generate genome-wide collections of cloned C. elegans promoter sequences (Dupuy et al, Genome Res 2004)
- underlain a large-scale RACE approach for proactive experimental definition of the C. elegans ORFeome based on genome-scale application of 5’ and 3’ RACE to experimentally refine full-length worm ORFs (Salehi-Ashtiani et al, Genome Res 2009),
- been used to generate genome-wide ORF collections for the pathogenic bacterium Brucella melitensis (Dricot et al, Genome Res 2004), the yeast Saccharomyces cerevisiae (Yu et al, Science 2008), and are ongoing for the Xenopus ORFeome.
The ORFeome pipeline can be readily adapted to include proto-genes, intermediates in the process of de novo gene origination (Carvunis et al, Proto-genes and de novo gene birth, Nature 2012).
Human ORFeome projects
With the necessary concepts and technologies for genome-wide ORFeome experimental verification in place we turned to the human ORFeome. For hORFeome v1.1 we used directed PCR on an earlier set of cDNAs from the Mammalian Gene Collection to successfully clone 8,107 ORFs into the Gateway entry vector (Rual et al, Genome Res 2004). In our second iteration of the human ORFeome effort, we attempted to clone ORFs from an additional 6,027 cDNAs to generate hORFeome v3.1, which contains 12,212 ORFs, corresponding to 10,214 genes (Lamesch et al, Genomics 2007).
We recently reported the most extensive collection of human ORFs ever in collaboration with the Broad Institute: the human ORFeome version 8.1 (hORFeome v8.1) Entry clone collection (Yang et al, Nat Methods 2011). The difference between v8.1 and earlier human ORF collections is that v8.1 is clonal and sequenced, as each ORF plasmid is derived from a single bacterial colony and all clones are fully sequenced. In total, hORFeome v8.1 includes 16,172 clonal ORFs, mapping to 13,833 human genes. Human ORFeome v8.1 represents the most fully sequenced, flexible and annotated version of the human ORFeome to date. The entire collection of source (Entry) clones is available for research use without any restriction through the ORFeome Collaboration (Temple et al, Hum Mol Genet 2006).
Current efforts are now concentrated on using our high-throughput Code-seq splicing isoform discovery pipeline (see Figure), which incorporates next-generation parallel sequencing platforms with the necessary computational analysis pipeline (Salehi-Ashtiani et al, Nat Methods 2008), for comprehensive genome-wide delineation of the human isoform space.
CCSB Human ORFeome Collections
CCSB C. elegans ORFeome Collections
CCSB C. elegans Promoterome Search