Identification of tens of thousands of novel gene loci in the human and mouse genomes through targeted full-length long-read RNA sequencing
Ontology highlight
ABSTRACT: The GENCODE project is a long-term international effort to produce a comprehensive and accurate map of genes and transcripts for the human and mouse genomes. While the annotation of protein-coding genes is nearly complete, long non-coding RNAs (lncRNAs) remain poorly characterized, with existing catalogs lacking consistency and experimental support. To address this, GENCODE used a targeted RNA sequencing approach to capture RNA from various human and mouse tissues, employing advanced sequencing technologies (ONT, PacBio, and Illumina). This resulted in the prediction of around half a million transcript models for both species. GENCODE then re-engineered its curation pipeline to handle this data, leading to the annotation of 16,817 new human genes (132,049 transcripts) and 22,210 new mouse genes (131,546 transcripts)—a significant increase in lncRNA annotations. The newly identified genes and transcripts have similar features to previously annotated lncRNAs and are linked to human phenotypes through GWAS and evolutionary conservation. Furthermore, the project has expanded the map of lncRNA orthology between humans and mice, especially for disease-associated lncRNAs. These updates enhance the functional interpretation of the human genome, connecting millions of previously unassigned omics data points (e.g., CAGE tags, ChIP-Seq peaks, genetic variants) to specific transcriptional units and regulatory regions. This marks a significant advancement toward a complete lncRNA catalog for human and mouse genomes.
INSTRUMENT(S): Illumina HiSeq 2500, MinION, Sequel II
ORGANISM(S): Homo sapiens
SUBMITTER: Sílvia Carbonell-Sala
PROVIDER: E-MTAB-14562 | biostudies-arrayexpress |
REPOSITORIES: biostudies-arrayexpress
ACCESS DATA