Project description:Sarcomas are a heterogeneous group of rare malignancies that exhibit remarkable heterogeneity, with more than 50 subtypes recognized. Advances in next-generation sequencing technology have resulted in the discovery of genetic events in these mesenchymal tumors, which in addition to enhancing understanding of the biology, have opened up avenues for molecularly targeted therapy and immunotherapy. This review focuses on how incorporation of next-generation sequencing has affected drug development in sarcomas and strategies for optimizing precision oncology for these rare cancers. In a significant percentage of soft tissue sarcomas, which represent up to 40% of all sarcomas, specific driver molecular abnormalities have been identified. The challenge to evaluate these mutations across rare cancer subtypes requires the careful characterization of these genetic alterations to further define compelling drivers with therapeutic implications. Novel models of clinical trial design also are needed. This shift would entail sustained efforts by the sarcoma community to move from one-size-fits-all trials, in which all sarcomas are treated similarly, to divide-and-conquer subtype-specific strategies.
Project description:BackgroundIn environmental sequencing projects, a mix of DNA from a whole microbial community is fragmented and sequenced, with one of the possible goals being to reconstruct partial or complete genomes of members of the community. In communities with high diversity of species, a significant proportion of the sequences do not overlap any other fragment in the sample. This problem will arise not only in situations with a relatively even distribution of many species, but also when the community in a particular environment is routinely dominated by the same few species. In the former case, no genomes may be assembled at all, while in the latter case a few dominant species in an environment will always be sequenced at high coverage to the detriment of coverage of the greater number of sparse species.Methods and resultsHere we show that, with the same global sequencing effort, separating the species into two or more sub-communities prior to sequencing can yield a much higher proportion of sequences that can be assembled. We first use the Lander-Waterman model to show that, if the expected percentage of singleton sequences is higher than 25%, then, under the uniform distribution hypothesis, splitting the community is always a wise choice. We then construct simulated microbial communities to show that the results hold for highly non-uniform distributions. We also show that, for the distributions considered in the experiments, it is possible to estimate quite accurately the relative diversity of the two sub-communities.ConclusionGiven the fact that several methods exist to split microbial communities based on physical properties such as size, density, surface biochemistry, or optical properties, we strongly suggest that groups involved in environmental sequencing, and expecting high diversity, consider splitting their communities in order to maximize the information content of their sequencing effort.
Project description:We propose a computationally and statistically efficient divide-and-conquer (DAC) algorithm to fit sparse Cox regression to massive datasets where the sample size $n_0$ is exceedingly large and the covariate dimension $p$ is not small but $n_0\gg p$. The proposed algorithm achieves computational efficiency through a one-step linear approximation followed by a least square approximation to the partial likelihood (PL). These sequences of linearization enable us to maximize the PL with only a small subset and perform penalized estimation via a fast approximation to the PL. The algorithm is applicable for the analysis of both time-independent and time-dependent survival data. Simulations suggest that the proposed DAC algorithm substantially outperforms the full sample-based estimators and the existing DAC algorithm with respect to the computational speed, while it achieves similar statistical efficiency as the full sample-based estimators. The proposed algorithm was applied to extraordinarily large survival datasets for the prediction of heart failure-specific readmission within 30 days among Medicare heart failure patients.
Project description:The ability to perform ab initio electronic structure calculations that scales linearly with the system size is one of the central aims in theoretical chemistry. In this study, the implementation of the divide-and-conquer (DC) algorithm, an algorithm with the potential to aid the achievement of true linear scaling within Hartree-Fock (HF) theory is revisited. Standard HF calculations solve the Roothaan-Hall equations for the whole system; in the DC-HF approach, the diagonalization of the Fock matrix is carried out on smaller subsystems. The DC algorithm for HF calculations was validated on polyglycines, polyalanines and eleven real three-dimensional proteins of up to 608 atoms in this work. We also found that a fragment-based initial guess using molecular fractionation with conjugated caps (MFCC) method significantly reduces the number of SCF cycles and even is capable of achieving convergence for some globular proteins where the simple superposition of atomic densities (SAD) initial guess fails.
Project description:The aggressive peripheral T-cell lymphomas (PTCLs) are a heterogenous group of uncommon lymphomas of mature T lymphocytes dominated by 3 subtypes: systemic anaplastic large-cell lymphoma, both anaplastic lymphoma kinase positive and negative; nodal PTCL with T-follicular helper phenotype; and PTCL, not otherwise specified. Although the accurate diagnosis of T-cell lymphoma and the subtyping of these lymphomas may be challenging, there is growing evidence that knowledge of the subtype of disease can aid in prognostication and in the selection of optimal treatments, in both the front-line and the relapsed or refractory setting. This report focuses on the 3 most common subtypes of aggressive PTCL, to learn how current knowledge may dictate choices of therapy and consultative referrals and inform rational targets and correlative studies in the development of future clinical trials. Finally, I note that clinical-pathologic correlation, especially in cases of T-cell lymphomas that may present with an extranodal component, is essential in the accurate diagnosis and subsequent treatment of our patients.
Project description:Many trait measurements are size-dependent, and while we often divide these traits by size before fitting statistical models to control for the effect of size, this approach does not account for allometry and the intermediate outcome problem. We describe these problems and outline potential solutions.
Project description:MotivationNext-generation sequencing (NGS) provides a great opportunity to investigate genome-wide variation at nucleotide resolution. Due to the huge amount of data, NGS applications require very fast and accurate alignment algorithms. Most existing algorithms for read mapping basically adopt seed-and-extend strategy, which is sequential in nature and takes much longer time on longer reads.ResultsWe develop a divide-and-conquer algorithm, called Kart, which can process long reads as fast as short reads by dividing a read into small fragments that can be aligned independently. Our experiment result indicates that the average size of fragments requiring the more time-consuming gapped alignment is around 20 bp regardless of the original read length. Furthermore, it can tolerate much higher error rates. The experiments show that Kart spends much less time on longer reads than other aligners and still produce reliable alignments even when the error rate is as high as 15%.Availability and implementationKart is available at https://github.com/hsinnan75/Kart/ .Contacthsu@iis.sinica.edu.tw.Supplementary informationSupplementary data are available at Bioinformatics online.
Project description:Identifying the determinants of cumulative cultural evolution is a key issue in the interdisciplinary field of cultural evolution. A widely held view is that large and well-connected social networks facilitate cumulative cultural evolution because they promote the spread of useful cultural traits and prevent the loss of cultural knowledge through factors such as drift. This view stems from models that focus on the transmission of cultural information, without considering how new cultural traits actually arise. In this paper, we review the literature from various fields that suggest that, under some circumstances, increased connectedness can decrease cultural diversity and reduce innovation rates. Incorporating this idea into an agent-based model, we explore the effect of population fragmentation on cumulative culture and show that, for a given population size, there exists an intermediate level of population fragmentation that maximizes the rate of cumulative cultural evolution. This result is explained by the fact that fully connected, non-fragmented populations are able to maintain complex cultural traits but produce insufficient variation and so lack the cultural diversity required to produce highly complex cultural traits. Conversely, highly fragmented populations produce a variety of cultural traits but cannot maintain complex ones. In populations with intermediate levels of fragmentation, cultural loss and cultural diversity are balanced in a way that maximizes cultural complexity. Our results suggest that population structure needs to be taken into account when investigating the relationship between demography and cumulative culture.This article is part of the theme issue 'Bridging cultural gaps: interdisciplinary studies in human cultural evolution'.
Project description:MotivationSurface generation and visualization are some of the most important tasks in biomolecular modeling and computation. Eulerian solvent excluded surface (ESES) software provides analytical solvent excluded surface (SES) in the Cartesian grid, which is necessary for simulating many biomolecular electrostatic and ion channel models. However, large biomolecules and/or fine grid resolutions give rise to excessively large memory requirements in ESES construction. We introduce an out-of-core and parallel algorithm to improve the ESES software.ResultsThe present approach drastically improves the spatial and temporal efficiency of ESES. The memory footprint and time complexity are analyzed and empirically verified through extensive tests with a large collection of biomolecule examples. Our results show that our algorithm can successfully reduce memory footprint through a straightforward divide-and-conquer strategy to perform the calculation of arbitrarily large proteins on a typical commodity personal computer. On multi-core computers or clusters, our algorithm can reduce the execution time by parallelizing most of the calculation as disjoint subproblems. Various comparisons with the state-of-the-art Cartesian grid based SES calculation were done to validate the present method and show the improved efficiency. This approach makes ESES a robust software for the construction of analytical solvent excluded surfaces.Availability and implementationhttp://weilab.math.msu.edu/ESES.
Project description:Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a "divide and conquer" approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.