Project description:This paper considers the analysis of continuous time gradient-based optimization algorithms through the lens of nonlinear contraction theory. It demonstrates that in the case of a time-invariant objective, most elementary results on gradient descent based on convexity can be replaced by much more general results based on contraction. In particular, gradient descent converges to a unique equilibrium if its dynamics are contracting in any metric, with convexity of the cost corresponding to the special case of contraction in the identity metric. More broadly, contraction analysis provides new insights for the case of geodesically-convex optimization, wherein non-convex problems in Euclidean space can be transformed to convex ones posed over a Riemannian manifold. In this case, natural gradient descent converges to a unique equilibrium if it is contracting in any metric, with geodesic convexity of the cost corresponding to contraction in the natural metric. New results using semi-contraction provide additional insights into the topology of the set of optimizers in the case when multiple optima exist. Furthermore, they show how semi-contraction may be combined with specific additional information to reach broad conclusions about a dynamical system. The contraction perspective also easily extends to time-varying optimization settings and allows one to recursively build large optimization structures out of simpler elements. Extensions to natural primal-dual optimization and game-theoretic contexts further illustrate the potential reach of these new perspectives.
Project description:This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest x♮∈ℝn from m quadratic equations/samples yi=(ai⊤x♮)2,1≤i≤m . This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning. We investigate the efficacy of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent - when randomly initialized - yields an ϵ-accurate solution in O(log n + log(1/ϵ)) iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.
Project description:We show analytically that training a neural network by conditioned stochastic mutation or neuroevolution of its weights is equivalent, in the limit of small mutations, to gradient descent on the loss function in the presence of Gaussian white noise. Averaged over independent realizations of the learning process, neuroevolution is equivalent to gradient descent on the loss function. We use numerical simulation to show that this correspondence can be observed for finite mutations, for shallow and deep neural networks. Our results provide a connection between two families of neural-network training methods that are usually considered to be fundamentally different.
Project description:Recent research in machine learning pointed to the core problem of state-of-the-art models which impedes their widespread adoption in different domains. The models' inability to differentiate between noise and subtle, yet significant variation in data leads to their vulnerability to adversarial perturbations that cause wrong predictions with high confidence. The study is aimed at identifying whether the algorithms inspired by biological evolution may achieve better results in cases where brittle robustness properties are highly sensitive to the slight noise. To answer this question, we introduce the new robust gradient descent inspired by the stability and adaptability of biological systems to unknown and changing environments. The proposed optimization technique involves an open-ended adaptation process with regard to two hyperparameters inherited from the generalized Verhulst population growth equation. The hyperparameters increase robustness to adversarial noise by penalizing the degree to which hardly visible changes in gradients impact prediction. The empirical evidence on synthetic and experimental datasets confirmed the viability of the bio-inspired gradient descent and suggested promising directions for future research. The code used for computational experiments is provided in a repository at https://github.com/yukinoi/bio_gradient_descent.
Project description:BackgroundGene Regulatory Networks (GRNs) have become a major focus of interest in recent years. Elucidating the architecture and dynamics of large scale gene regulatory networks is an important goal in systems biology. The knowledge of the gene regulatory networks further gives insights about gene regulatory pathways. This information leads to many potential applications in medicine and molecular biology, examples of which are identification of metabolic pathways, complex genetic diseases, drug discovery and toxicology analysis. High-throughput technologies allow studying various aspects of gene regulatory networks on a genome-wide scale and we will discuss recent advances as well as limitations and future challenges for gene network modeling. Novel approaches are needed to both infer the causal genes and generate hypothesis on the underlying regulatory mechanisms.MethodologyIn the present article, we introduce a new method for identifying a set of optimal gene regulatory pathways by using structural equations as a tool for modeling gene regulatory networks. The method, first of all, generates data on reaction flows in a pathway. A set of constraints is formulated incorporating weighting coefficients. Finally the gene regulatory pathways are obtained through optimization of an objective function with respect to these weighting coefficients. The effectiveness of the present method is successfully tested on ten gene regulatory networks existing in the literature. A comparative study with the existing extreme pathway analysis also forms a part of this investigation. The results compare favorably with earlier experimental results. The validated pathways point to a combination of previously documented and novel findings.ConclusionsWe show that our method can correctly identify the causal genes and effectively output experimentally verified pathways. The present method has been successful in deriving the optimal regulatory pathways for all the regulatory networks considered. The biological significance and applicability of the optimal pathways has also been discussed. Finally the usefulness of the present method on genetic engineering is depicted with an example.
Project description:Stein variational gradient descent (SVGD) is a particle-based inference algorithm that leverages gradient information for efficient approximate inference. In this work, we enhance SVGD by leveraging preconditioning matrices, such as the Hessian and Fisher information matrix, to incorporate geometric information into SVGD updates. We achieve this by presenting a generalization of SVGD that replaces the scalar-valued kernels in vanilla SVGD with more general matrix-valued kernels. This yields a significant extension of SVGD, and more importantly, allows us to flexibly incorporate various preconditioning matrices to accelerate the exploration in the probability landscape. Empirical results show that our method outperforms vanilla SVGD and a variety of baseline approaches over a range of real-world Bayesian inference tasks.
Project description:Overparametrized deep networks predict well, despite the lack of an explicit complexity control during training, such as an explicit regularization term. For exponential-type loss functions, we solve this puzzle by showing an effective regularization effect of gradient descent in terms of the normalized weights that are relevant for classification.
Project description:MotivationThe increasing availability of complete genomes demands for models to study genomic variability within entire populations. Pangenome graphs capture the full genomic similarity and diversity between multiple genomes. In order to understand them, we need to see them. For visualization, we need a human readable graph layout: A graph embedding in low (e.g. two) dimensional depictions. Due to a pangenome graph's potential excessive size, this is a significant challenge.ResultsIn response, we introduce a novel graph layout algorithm: the Path-Guided Stochastic Gradient Descent (PG-SGD). PG-SGD uses the genomes, represented in the pangenome graph as paths, as an embedded positional system to sample genomic distances between pairs of nodes. This avoids the quadratic cost seen in previous versions of graph drawing by Stochastic Gradient Descent (SGD). We show that our implementation efficiently computes the low dimensional layouts of gigabase-scale pangenome graphs, unveiling their biological features.AvailabilityWe integrated PG-SGD in ODGI which is released as free software under the MIT open source license. Source code is available at https://github.com/pangenome/odgi.
Project description:Photoacoustic microscopy uses multiple wavelengths to measure concentrations of different absorbers. The speed of sound limits the shortest wavelength switching time to sub-microseconds, which is a bottleneck for high-speed broad-spectrum imaging. Via computational separation of overlapped signals, we can break the sound-speed limit on the wavelength switching time. This paper presents a new signal unmixing algorithm named two-step proximal gradient descent. It is advantageous in separating multiple wavelengths with long overlapping and high noise. In the simulation, we can unmix up to nine overlapped signals and successfully separate three overlapped signals with 12-ns delay and 15.9-dB signal-to-noise ratio. We apply this technique to separate three-wavelength photoacoustic images in microvessels. In vivo results show that the algorithm can successfully unmix overlapped multi-wavelength photoacoustic signals, and the unmixed data can improve accuracy in oxygen saturation imaging.