Dataset Information

Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics.

ABSTRACT: Phenotypic characteristics of a plant species refers to its physical properties as cataloged by plant biologists at different research centers around the world. Clustering species based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently generated promising results for genotypic data. The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant species, we tested it on the phenotypic data obtained from about 2,400 Soybean species. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ). The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost the same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45% more accurate than HC in terms of clustering accuracy. The computational complexity of our algorithm is more than a magnitude less than that of HC.

SUBMITTER: Shastri AA

PROVIDER: S-EPMC8432307 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Similar Datasets

Project description:Genetically modified (GM) crops are one of the most valuable tools of modern biotechnology that secure yield potential needed to sustain the global agricultural demands for food, feed, fiber, and energy. Crossing single GM events through conventional breeding has proven to be an effective way to pyramid GM traits from individual events and increase yield protection in the resulting combined products. Even though years of research and commercialization of GM crops show that these organisms are safe and raise no additional biosafety concerns, some regulatory agencies still require risk assessments for these products. We sought out to investigate whether stacking single GM events would have a significant impact on agronomic and phenotypic plant characteristics in soybean, maize, and cotton. Several replicated field trials designed as randomized complete blocks were conducted by Monsanto Regulatory Department from 2008 to 2017 in field sites representative of cultivation regions in Brazil. In total, twenty-one single and stacked GM materials currently approved for in-country commercial use were grown with the corresponding conventional counterparts and commercially available GM/non-GM references. The generated data were presented to the Brazilian regulatory agency CTNBio (National Biosafety Technical Committee) over the years to request regulatory approvals for the single and stacked products, in compliance with the existing normatives. Data was submitted to analysis of variance and differences between GM and control materials were assessed using t-test with a 5% significance level. Data indicated the predominance of similarities and neglectable differences between single and stacked GM crops when compared to conventional counterpart. Our results support the conclusion that combining GM events through conventional breeding does not alter agronomic or phenotypic plant characteristics in these stacked crops. This is compatible with a growing weight of evidence that indicates this long-adopted strategy does not increase the risks associated with GM materials. It also provides evidence to support the review and modernization of the existing regulatory normatives to no longer require additional risk assessments of GM stacks comprised of previously approved single events for biotechnology-derived crops. The data analyzed confirms that the risk assessment of the individual events is sufficient to demonstrate the safety of the stacked products, which deliver significant benefits to growers and to the environment.

Dataset Information

Probabilistically sampled and spectrally clustered plant species using phenotypic characteristics.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets