Project description:The advancement of Third-Generation Sequencing (TGS) techniques managed to increase the sequencing length to several kilobases, which leads to a bright future for completely reserving alternative splicing (AS) events and isoform expressions. In recent years, many computational methods for isoform detection from long-read sequencing data have been developed and published. However, there is no prior comparative study that systemically evaluates the performance of the software implemented with different algorithms. Here we benchmarked nine methods implemented in seven computational tools that can identify isoform structures from TGS RNA sequencing data and analyzed their performances from various aspects using both simulated datasets produced by an in-house simulator and previously published experimental data. Our results comprehensively demonstrate the relative effectiveness of the approaches and provide guidance as well as recommendations for future research on AS analysis and further improvement of the tools for isoform detection using TGS data.
Project description:Alternative cleavage and polyadenylation (APA) is emerging as an important mechanism of gene regulation in eukaryotes and plays important regulatory roles in human development and diseases. Despite the widespread application of Second Generation Sequencing (SGS) technology for polyadenylation site identification, matching each identified polyadenylation site within a gene to its derived isoform remains a major challenge. To achieve the isoform-resolved APA analysis, we developed a tool termed “IDP-APA” that constructs truly expressed isoforms and identifies polyadenylation sites by integrating the respective strengths of Third Generation Sequencing (TGS) long reads and SGS short reads. Compared to existing tools, IDP-APA demonstrated superior performance in both isoform reconstruction and polyadenylation site identification. Applications to human embryonic stem cells, breast cancer cells and brain tissue from a patient with Alzheimer’s disease revealed prevalent APA events and cell-/tissue-specific APA patterns, especially in an isoform-resolved way.
Project description:We applied Single Molecule Real-Time long-read whole-genome sequencing in Dux knockout mouse and confirmed the success of our Dux knockout mouse model.
Project description:Microsatellites are short tandem repeats (STRs) of a motif of 1 to 6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, are still very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. In the present study, we assessed several second and third NGS approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard PCR-free and PCR-containing, single Unique Molecular Index (UMI) and dual UMI ‘duplex sequencing’ protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and nanopore long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.
Project description:Microsatellites are short tandem repeats (STRs) of a motif of 1 to 6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, are still very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. In the present study, we assessed several second and third NGS approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard PCR-free and PCR-containing, single Unique Molecular Index (UMI) and dual UMI ‘duplex sequencing’ protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and nanopore long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.
Project description:Microsatellites are short tandem repeats (STRs) of a motif of 1 to 6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, are still very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. In the present study, we assessed several second and third NGS approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard PCR-free and PCR-containing, single Unique Molecular Index (UMI) and dual UMI ‘duplex sequencing’ protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and nanopore long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.
Project description:Microsatellites are short tandem repeats (STRs) of a motif of 1 to 6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, are still very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. In the present study, we assessed several second and third NGS approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard PCR-free and PCR-containing, single Unique Molecular Index (UMI) and dual UMI ‘duplex sequencing’ protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and nanopore long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.
Project description:Microsatellites are short tandem repeats (STRs) of a motif of 1 to 6 nucleotides that are ubiquitous in almost all genomes and widely used in many biomedical applications. However, despite the development of next-generation sequencing (NGS) over the past two decades with new technologies coming to the market, accurately sequencing and genotyping STRs, particularly homopolymers, are still very challenging today due to several technical limitations. This leads in many cases to erroneous allele calls and difficulty in correctly identifying the genuine allele distribution in a sample. In the present study, we assessed several second and third NGS approaches in their capability to correctly determine the length of microsatellites using plasmids containing A/T homopolymers, AC/TG or AT/TA dinucleotide STRs of variable length. Standard PCR-free and PCR-containing, single Unique Molecular Index (UMI) and dual UMI ‘duplex sequencing’ protocols were evaluated using Illumina short-read sequencing, and two PCR-free protocols using PacBio and nanopore long-read sequencing. Several bioinformatics algorithms were developed to correctly identify microsatellite alleles from sequencing data, including four and two modes for generating standard and combined consensus alleles, respectively. We provided a detailed analysis and comparison of these approaches and made several recommendations for the accurate determination of microsatellite allele length.