Dataset Information

Genome-wide identification of human functional DNA using a neutral indel model.

ABSTRACT: It has become clear that a large proportion of functional DNA in the human genome does not code for protein. Identification of this non-coding functional sequence using comparative approaches is proving difficult and has previously been thought to require deep sequencing of multiple vertebrates. Here we introduce a new model and comparative method that, instead of nucleotide substitutions, uses the evolutionary imprint of insertions and deletions (indels) to infer the past consequences of selection. The model predicts the distribution of indels under neutrality, and shows an excellent fit to human-mouse ancestral repeat data. Across the genome, many unusually long ungapped regions are detected that are unaccounted for by the neutral model, and which we predict to be highly enriched in functional DNA that has been subject to purifying selection with respect to indels. We use the model to determine the proportion under indel-purifying selection to be between 2.56% and 3.25% of human euchromatin. Since annotated protein-coding genes comprise only 1.2% of euchromatin, these results lend further weight to the proposition that more than half the functional complement of the human genome is non-protein-coding. The method is surprisingly powerful at identifying selected sequence using only two or three mammalian genomes. Applying the method to the human, mouse, and dog genomes, we identify 90 Mb of human sequence under indel-purifying selection, at a predicted 10% false-discovery rate and 75% sensitivity. As expected, most of the identified sequence represents unannotated material, while the recovered proportions of known protein-coding and microRNA genes closely match the predicted sensitivity of the method. The method's high sensitivity to functional sequence such as microRNAs suggest that as yet unannotated microRNA genes are enriched among the sequences identified. Furthermore, its independence of substitutions allowed us to identify sequence that has been subject to heterogeneous selection, that is, sequence subject to both positive selection with respect to substitutions and purifying selection with respect to indels. The ability to identify elements under heterogeneous selection enables, for the first time, the genome-wide investigation of positive selection on functional elements other than protein-coding genes.

SUBMITTER: Lunter G

PROVIDER: S-EPMC1326222 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:Noroviruses (NoVs) are a leading cause of gastroenteritis worldwide, yet host factors that restrict NoV replication are not well understood. Here, we use a CRISPR activation genome-wide screening to identify host genes that can inhibit murine norovirus (MNoV) replication in human cells. Our screens identified with high confidence 49 genes that can inhibit MNoV infection when overexpressed. A significant number of these genes are in interferon and immune regulation signaling networks, but surprisingly, the majority of the genes identified are neither associated with innate or adaptive immunity nor associated with any antiviral activity. Confirmatory studies of eight of the genes validate the initial screening data. Mechanistic studies on TRIM7 demonstrated a conserved role of the molecule in mouse and human cells in restricting MNoV in a step of infection after viral entry. Furthermore, we demonstrate that two isoforms of TRIM7 have differential antiviral activity. Taken together, these data provide a resource for understanding norovirus biology and demonstrate a robust methodology for identifying new antiviral molecules.IMPORTANCE Norovirus is one of the leading causes of food-borne illness worldwide. Despite its prevalence, our understanding of norovirus biology is limited due to the difficulty in growing human norovirus in vitro and a lack of an animal model. Murine norovirus (MNoV) is a model norovirus system because MNoV replicates robustly in cell culture and in mice. To identify host genes that can restrict norovirus replication when overexpressed, we performed genome-wide CRISPR activation screens to induce gene overexpression at the native locus through recruitment of transcriptional activators to individual gene promoters. We found 49 genes that could block murine norovirus replication in human cells. Several of these genes are associated with classical immune signaling pathways, while many of the molecules we identified have not been previously associated with antiviral activity. Our data are a resource for those studying noroviruses, and we provide a robust approach to identify novel antiviral genes.

Dataset Information

Genome-wide identification of human functional DNA using a neutral indel model.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets