A Robust and Rapid Candidate Gene Mapping Pipeline Based on M2 Populations.
Ontology highlight
ABSTRACT: The whole-genome sequencing-based bulked segregant analysis (WGS-BSA) has facilitated the mapping candidate causal variations for cloning target plant genes. Here, we report an improved WGS-BSA method termed as M2-seq to expedite the mapping candidate mutant loci by studying just M2 generation. It is an efficient mutant gene mapping tool, rapid, and comparable to the previously reported approaches, such as Mutmap and Mutmap+ that require studying M3 or advanced selfed generations. In M2-seq, background variations among the M2 populations can be removed efficiently without knowledge of the variations of the wild-type progenitor plant. Furthermore, the use of absolute delta single-nucleotide polymorphism (SNP) index values can effectively remove the background variation caused by repulsion phase linkages of adjacent mutant alleles; and thereby facilitating the identification of the causal mutation in target genes. Here, we demonstrated the application of M2-seq in successfully mapping the genomic regions harboring causal mutations for mutant phenotypes among 10 independent M2 populations of soybean. The mapping candidate mutant genes just in M2 generation with the aid of the M2-seq method should be particularly useful in expediting gene cloning especially among the plant species with long generation time.
Project description:BackgroundThe plant architecture traits of maize determine the yield. Plant height, ear position, leaf angle above the primary ear and internode length above the primary ear together determine the canopy structure and photosynthetic efficiency of maize and at the same time affect lodging and disease resistance. A flat and tall plant architecture confers an obvious advantage in the yield of a single plant but is not conducive to dense planting and results in high rates of lodging; thus, it has been gradually eliminated in production. Although using plants that are too compact, short and density tolerant can increase the yield per unit area to a certain extent, the photosynthetic efficiency of such plants is low, ultimately limiting yield increases. Genetic mapping is an effective method for the improvement of plant architecture to identify candidate genes for regulating plant architecture traits.ResultsTo find the best balance between the yield per plant and the yield per unit area of maize, in this study, the F2:3 pedigree population and a RIL population with the same male parent were used to identify QTL for plant height (PH), ear height (EH), leaf angle and internode length above the primary ear (LAE and ILE) in Changchun and Gongzhuling for 5 consecutive years (2016-2020). A total of 11, 13, 23 and 13 QTL were identified for PH, EH, LAE, and ILE, respectively. A pleiotropic consistent QTL for PH overlapped with that for EH on chromosome 3, with a phenotypic variation explanation rate from 6.809% to 21.96%. In addition, there were major consistent QTL for LAE and ILE, and the maximum phenotypic contribution rates were 24.226% and 30.748%, respectively. Three candidate genes were mined from the three consistent QTL regions and were involved in the gibberellin-activated signal pathway, brassinolide signal transduction pathway and auxin-activated signal pathway, respectively. Analysis of the expression levels of the three genes showed that they were actively expressed during the jointing stage of vigorous maize growth.ConclusionsIn this study, three consistent major QTL related to plant type traits were identified and three candidate genes were screened. These results lay a foundation for the cloning of related functional genes and marker-assisted breeding of related functional genes.
Project description:Nanobodies are single-domain antibodies derived from the variable regions of Camelidae atypical immunoglobulins. They show promise as high-affinity reagents for research, diagnostics and therapeutics owing to their high specificity, small size (∼15 kDa) and straightforward bacterial expression. However, identification of repertoires with sufficiently high affinity has proven time consuming and difficult, hampering nanobody implementation. Our approach generates large repertoires of readily expressible recombinant nanobodies with high affinities and specificities against a given antigen. We demonstrate the efficacy of this approach through the production of large repertoires of nanobodies against two antigens, GFP and mCherry, with Kd values into the subnanomolar range. After mapping diverse epitopes on GFP, we were also able to design ultrahigh-affinity dimeric nanobodies with Kd values as low as ∼30 pM. The approach presented here is well suited for the routine production of high-affinity capture reagents for various biomedical applications.
Project description:Endoscopic content area refers to the informative area enclosed by the dark, non-informative, border regions present in most endoscopic footage. The estimation of the content area is a common task in endoscopic image processing and computer vision pipelines. Despite the apparent simplicity of the problem, several factors make reliable real-time estimation surprisingly challenging. The lack of rigorous investigation into the topic combined with the lack of a common benchmark dataset for this task has been a long-lasting issue in the field. In this paper, we propose two variants of a lean GPU-based computational pipeline combining edge detection and circle fitting. The two variants differ by relying on handcrafted features, and learned features respectively to extract content area edge point candidates. We also present a first-of-its-kind dataset of manually annotated and pseudo-labelled content areas across a range of surgical indications. To encourage further developments, the curated dataset, and an implementation of both algorithms, has been made public (https://doi.org/10.7303/syn32148000, https://github.com/charliebudd/torch-content-area). We compare our proposed algorithm with a state-of-the-art U-Net-based approach and demonstrate significant improvement in terms of both accuracy (Hausdorff distance: 6.3 px versus 118.1 px) and computational time (Average runtime per frame: 0.13 ms versus 11.2 ms).
Project description:We applied a simple and efficient two-step method to analyze a family-based association study of gene expression quantitative trait loci (eQTL) in a mixed model framework. This two-step method produces very similar results to the full mixed model method, with our method being significantly faster than the full model. Using the Genetic Analysis Workshop 15 (GAW15) Problem 1 data, we demonstrated the value of data filtering for reducing the number of tests and controlling the number of false positives. Specifically, we showed that removing non-expressed genes by filtering on expression variability effectively reduced the number of tests by nearly 50%. Furthermore, we demonstrated that filtering on genotype counts substantially reduced spurious detection. Finally, we restricted our analysis to the markers and transcripts that were closely located. We found five times more signals in close proximity (cis-) to transcripts than in our genome-wide analysis. Our results suggest that careful pre-filtering and partitioning of data are crucial for controlling false positives and allowing detection of genuine effects in genetic analysis of gene expression.
Project description:Aims/hypothesisHyperglycaemia disproportionately affects African-Americans (AfAs). We tested the transferability of 18 single-nucleotide polymorphisms (SNPs) associated with glycaemic traits identified in European ancestry (EuA) populations in 5,984 non-diabetic AfAs.MethodsWe meta-analysed SNP associations with fasting glucose (FG) or insulin (FI) in AfAs from five cohorts in the Candidate Gene Association Resource. We: (1) calculated allele frequency differences, variations in linkage disequilibrium (LD), fixation indices (F(st)s) and integrated haplotype scores (iHSs); (2) tested EuA SNPs in AfAs; and (3) interrogated within ± 250 kb around each EuA SNP in AfAs.ResultsAllele frequency differences ranged from 0.6% to 54%. F(st) exceeded 0.15 at 6/16 loci, indicating modest population differentiation. All iHSs were <2, suggesting no recent positive selection. For 18 SNPs, all directions of effect were the same and 95% CIs of association overlapped when comparing EuA with AfA. For 17 of 18 loci, at least one SNP was nominally associated with FG in AfAs. Four loci were significantly associated with FG (GCK, p = 5.8 × 10(-8); MTNR1B, p = 8.5 × 10(-9); and FADS1, p = 2.2 × 10(-4)) or FI (GCKR, p = 5.9 × 10(-4)). At GCK and MTNR1B the EuA and AfA SNPs represented the same signal, while at FADS1, and GCKR, the EuA and best AfA SNPs were weakly correlated (r(2) <0.2), suggesting allelic heterogeneity for association with FG at these loci.Conclusions/interpretationFew glycaemic SNPs showed strict evidence of transferability from EuA to AfAs. Four loci were significantly associated in both AfAs and those with EuA after accounting for varying LD across ancestral groups, with new signals emerging to aid fine-mapping.
Project description:Anthracnose, caused by Colletotrichum lindemuthianum, is an important fungal disease of common bean (Phaseolus vulgaris). Alleles at the Co-4 locus confer resistance to a number of races of C. lindemuthianum. A population of 94 F4:5 recombinant inbred lines of a cross between resistant black bean genotype B09197 and susceptible navy bean cultivar Nautica was used to identify markers associated with resistance in bean chromosome 8 (Pv08) where Co-4 is localized. Three SCAR markers with known linkage to Co-4 and a panel of single nucleotide markers were used for genotyping. A refined physical region on Pv08 with significant association with anthracnose resistance identified by markers was used in BLAST searches with the genomic sequence of common bean accession G19833. Thirty two unique annotated candidate genes were identified that spanned a physical region of 936.46 kb. A majority of the annotated genes identified had functional similarity to leucine rich repeats/receptor like kinase domains. Three annotated genes had similarity to 1, 3-?-glucanase domains. There were sequence similarities between some of the annotated genes found in the study and the genes associated with phosphoinositide-specific phosphilipases C associated with Co-x and the COK-4 loci found in previous studies. It is possible that the Co-4 locus is structured as a group of genes with functional domains dominated by protein tyrosine kinase along with leucine rich repeats/nucleotide binding site, phosphilipases C as well as ?-glucanases.
Project description:Recently, crop breeders have widely adopted a new biotechnology-based process, termed Seed Production Technology (SPT), to produce hybrid varieties. The SPT does not produce nuclear male-sterile lines, and instead utilizes transgenic SPT maintainer lines to pollinate male-sterile plants for propagation of nuclear-recessive male-sterile lines. A late-stage pollen-specific promoter is an essential component of the pollen-inactivating cassette used by the SPT maintainers. While a number of plant pollen-specific promoters have been reported so far, their usefulness in SPT has remained limited. To increase the repertoire of pollen-specific promoters for the maize community, we conducted a comprehensive comparative analysis of transcriptome profiles of mature pollen and mature anthers against other tissue types. We found that maize pollen has much less expressed genes (>1 FPKM) than other tissue types, but the pollen grain has a large set of distinct genes, called pollen-specific genes, which are exclusively or much higher (100 folds) expressed in pollen than other tissue types. Utilizing transcript abundance and correlation coefficient analysis, 1215 mature pollen-specific (MPS) genes and 1009 mature anther-specific (MAS) genes were identified in B73 transcriptome. These two gene sets had similar GO term and KEGG pathway enrichment patterns, indicating that their members share similar functions in the maize reproductive process. Of the genes, 623 were shared between the two sets, called mature anther- and pollen-specific (MAPS) genes, which represent the late-stage pollen-specific genes of the maize genome. Functional annotation analysis of MAPS showed that 447 MAPS genes (71.7% of MAPS) belonged to genes encoding pollen allergen protein. Their 2-kb promoters were analyzed for cis-element enrichment and six well-known pollen-specific cis-elements (AGAAA, TCCACCA, TGTGGTT, [TA]AAAG, AAATGA, and TTTCT) were found highly enriched in the promoters of MAPS. Interestingly, JA-responsive cis-element GCC box (GCCGCC) and ABA-responsive cis-element-coupling element1 (ABRE-CE1, CCACC) were also found enriched in the MAPS promoters, indicating that JA and ABA signaling likely regulate pollen-specific MAPS expression. This study describes a robust and straightforward pipeline to discover pollen-specific promotes from publicly available data while providing maize breeders and the maize industry a number of late-stage (mature) pollen-specific promoters for use in SPT for hybrid breeding and seed production.
Project description:We report here a PCR-based cloning methodology that requires no post-PCR modifications such as restriction digestion and phosphorylation of the amplified DNA. The advantage of the present method is that it yields only recombinant clones thus eliminating the need for screening. Two DNA amplification reactions by PCR are performed wherein the first reaction amplifies the gene of interest from a source template, and the second reaction fuses it with the designed expression vector fragments. These vector fragments carry the essential elements that are required for the fusion product selection. The entire process can be completed in less than 8 hours. Furthermore, ligation of the amplified DNA by a DNA ligase is not required before transformation, although the procedure yields more number of colonies upon transformation if ligation is carried out. As a proof-of-concept, we show the cloning and expression of GFP, adh, and rho genes. Using GFP production as an example, we further demonstrate that the E. coli T7 express strain can directly be used in our methodology for the protein expression immediately after PCR. The expressed protein is without or with 6xHistidine tag at either terminus, depending upon the chosen vector fragments. We believe that our method will find tremendous use in molecular and structural biology.
Project description:BackgroundThe extracellular domain of matrix protein 2 (M2e) of influenza A virus is a promising target for the development of a universal vaccine against influenza because M2e sequences are highly conserved among human influenza A strains. However, native M2e is poorly immunogenic, but its immunogenicity can be increased by delivery in combination with adjuvants or carrier particles. It was previously shown that fusion of M2e to bacterial flagellin, the ligand for Toll-like receptor (TLR) 5 and powerful mucosal adjuvant, significantly increases the immunogenicity and protective capacity of M2e.ResultsIn this study, we report for the first time the transient expression in plants of a recombinant protein Flg-4M comprising flagellin of Salmonella typhimurium fused to four tandem copies of the M2e peptide. The chimeric construct was expressed in Nicotiana benthamiana plants using either the self-replicating potato virus X (PVX) based vector, pA7248AMV-GFP, or the cowpea mosaic virus (CPMV)-derived expression vector, pEAQ-HT. The highest expression level up to 30% of total soluble protein (about 1 mg/g of fresh leaf tissue) was achieved with the PVX-based expression system. Intranasal immunization of mice with purified Flg-4M protein induced high levels of M2e-specific serum antibodies and provided protection against lethal challenge with influenza virus.ConclusionsThis study confirms the usefulness of flagellin as a carrier of M2e and its relevance for the production of M2e-based candidate influenza vaccines in plants.
Project description:Metagenomics is the study of genomic DNA recovered from a microbial community. Both assembly-based and mapping-based methods have been used to analyze metagenomic data. When appropriate gene catalogs are available, mapping-based methods are preferred over assembly based approaches, especially for analyzing the data at the functional level. In this study, we introduce CAMAMED as a composition-aware mapping-based metagenomic data analysis pipeline. This pipeline can analyze metagenomic samples at both taxonomic and functional profiling levels. Using this pipeline, metagenome sequences can be mapped to non-redundant gene catalogs and the gene frequency in the samples are obtained. Due to the highly compositional nature of metagenomic data, the cumulative sum-scaling method is used at both taxa and gene levels for compositional data analysis in our pipeline. Additionally, by mapping the genes to the KEGG database, annotations related to each gene can be extracted at different functional levels such as KEGG ortholog groups, enzyme commission numbers and reactions. Furthermore, the pipeline enables the user to identify potential biomarkers in case-control metagenomic samples by investigating functional differences. The source code for this software is available from https://github.com/mhnb/camamed. Also, the ready to use Docker images are available at https://hub.docker.com.