Browse
Submit Data
Databases
API
Help

Dataset Information

34 Views

0 Connections

0 Citations

0 Reanalyses

0 Downloads

Omics score: 0

A Manifold Learning Perspective on Representation Learning: Learning Decoder and Representations without an Encoder.

ABSTRACT: Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward method to map n-dimensional data in input space to a lower m-dimensional representation space and back. The decoder itself defines an m-dimensional manifold in input space. Inspired by manifold learning, we showed that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derived expressions for the number of samples needed to specify the encoder and decoder and showed that the decoder generally requires much fewer training samples to be well-specified compared to the encoder. We discuss the training of autoencoders in this perspective and relate it to previous work in the field that uses noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrated that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further showed that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning.

SUBMITTER: Schuster V

PROVIDER: S-EPMC8625121 | biostudies-literature |

REPOSITORIES: biostudies-literature

ACCESS DATA

Json Xml

Similar Datasets

DxFormer: a decoupled automatic diagnostic system based on decoder-encoder transformer with dense symptom representations.

Project description:MotivationSymptom-based automatic diagnostic system queries the patient's potential symptoms through continuous interaction with the patient and makes predictions about possible diseases. A few studies use reinforcement learning (RL) to learn the optimal policy from the joint action space of symptoms and diseases. However, existing RL (or Non-RL) methods focus on disease diagnosis while ignoring the importance of symptom inquiry. Although these systems have achieved considerable diagnostic accuracy, they are still far below its performance upper bound due to few turns of interaction with patients and insufficient performance of symptom inquiry. To address this problem, we propose a new automatic diagnostic framework called DxFormer, which decouples symptom inquiry and disease diagnosis, so that these two modules can be independently optimized. The transition from symptom inquiry to disease diagnosis is parametrically determined by the stopping criteria. In DxFormer, we treat each symptom as a token, and formalize the symptom inquiry and disease diagnosis to a language generation model and a sequence classification model, respectively. We use the inverted version of Transformer, i.e. the decoder-encoder structure, to learn the representation of symptoms by jointly optimizing the reinforce reward and cross-entropy loss.ResultsWe conduct experiments on three real-world medical dialogue datasets, and the experimental results verify the feasibility of increasing diagnostic accuracy by improving symptom recall. Our model overcomes the shortcomings of previous RL-based methods. By decoupling symptom query from the process of diagnosis, DxFormer greatly improves the symptom recall and achieves the state-of-the-art diagnostic accuracy.Availability and implementationBoth code and data are available at https://github.com/lemuria-wchen/DxFormer.Supplementary informationSupplementary data are available at Bioinformatics online.

| S-EPMC9825744 | biostudies-literature

Investigation of chemical structure recognition by encoder-decoder models in learning progress.

Project description:Descriptor generation methods using latent representations of encoder-decoder (ED) models with SMILES as input are useful because of the continuity of descriptor and restorability to the structure. However, it is not clear how the structure is recognized in the learning progress of ED models. In this work, we created ED models of various learning progress and investigated the relationship between structural information and learning progress. We showed that compound substructures were learned early in ED models by monitoring the accuracy of downstream tasks and input-output substructure similarity using substructure-based descriptors, which suggests that existing evaluation methods based on the accuracy of downstream tasks may not be sensitive enough to evaluate the performance of ED models with SMILES as descriptor generation methods. On the other hand, we showed that structure restoration was time-consuming, and in particular, insufficient learning led to the estimation of a larger structure than the actual one. It can be inferred that determining the endpoint of the structure is a difficult task for the model. To our knowledge, this is the first study to link the learning progress of SMILES by ED model to chemical structures for a wide range of chemicals.

| S-EPMC10100163 | biostudies-literature

Learning Semantic Graphics Using Convolutional Encoder-Decoder Network for Autonomous Weeding in Paddy.

Project description:Weeds in agricultural farms are aggressive growers which compete for nutrition and other resources with the crop and reduce production. The increasing use of chemicals to control them has inadvertent consequences to the human health and the environment. In this work, a novel neural network training method combining semantic graphics for data annotation and an advanced encoder-decoder network for (a) automatic crop line detection and (b) weed (wild millet) detection in paddy fields is proposed. The detected crop lines act as a guiding line for an autonomous weeding robot for inter-row weeding, whereas the detection of weeds enables autonomous intra-row weeding. The proposed data annotation method, semantic graphics, is intuitive, and the desired targets can be annotated easily with minimal labor. Also, the proposed "extended skip network" is an improved deep convolutional encoder-decoder neural network for efficient learning of semantic graphics. Quantitative evaluations of the proposed method demonstrated an increment of 6.29% and 6.14% in mean intersection over union (mIoU), over the baseline network on the task of paddy line detection and wild millet detection, respectively. The proposed method also leads to a 3.56% increment in mIoU and a significantly higher recall compared to a popular bounding box-based object detection approach on the task of wild-millet detection.

| S-EPMC6837080 | biostudies-literature

Encoder-decoder optimization for brain-computer interfaces.

Project description:Neuroprosthetic brain-computer interfaces are systems that decode neural activity into useful control signals for effectors, such as a cursor on a computer screen. It has long been recognized that both the user and decoding system can adapt to increase the accuracy of the end effector. Co-adaptation is the process whereby a user learns to control the system in conjunction with the decoder adapting to learn the user's neural patterns. We provide a mathematical framework for co-adaptation and relate co-adaptation to the joint optimization of the user's control scheme ("encoding model") and the decoding algorithm's parameters. When the assumptions of that framework are respected, co-adaptation cannot yield better performance than that obtainable by an optimal initial choice of fixed decoder, coupled with optimal user learning. For a specific case, we provide numerical methods to obtain such an optimized decoder. We demonstrate our approach in a model brain-computer interface system using an online prosthesis simulator, a simple human-in-the-loop pyschophysics setup which provides a non-invasive simulation of the BCI setting. These experiments support two claims: that users can learn encoders matched to fixed, optimal decoders and that, once learned, our approach yields expected performance advantages.

| S-EPMC4451011 | biostudies-literature

Neuron segmentation using 3D wavelet integrated encoder-decoder network.

Project description:Motivation3D neuron segmentation is a key step for the neuron digital reconstruction, which is essential for exploring brain circuits and understanding brain functions. However, the fine line-shaped nerve fibers of neuron could spread in a large region, which brings great computational cost to the neuron segmentation. Meanwhile, the strong noises and disconnected nerve fibers bring great challenges to the task.ResultsIn this article, we propose a 3D wavelet and deep learning-based 3D neuron segmentation method. The neuronal image is first partitioned into neuronal cubes to simplify the segmentation task. Then, we design 3D WaveUNet, the first 3D wavelet integrated encoder-decoder network, to segment the nerve fibers in the cubes; the wavelets could assist the deep networks in suppressing data noises and connecting the broken fibers. We also produce a Neuronal Cube Dataset (NeuCuDa) using the biggest available annotated neuronal image dataset, BigNeuron, to train 3D WaveUNet. Finally, the nerve fibers segmented in cubes are assembled to generate the complete neuron, which is digitally reconstructed using an available automatic tracing algorithm. The experimental results show that our neuron segmentation method could completely extract the target neuron in noisy neuronal images. The integrated 3D wavelets can efficiently improve the performance of 3D neuron segmentation and reconstruction.Availabilityand implementationThe data and codes for this work are available at https://github.com/LiQiufu/3D-WaveUNet.Supplementary informationSupplementary data are available at Bioinformatics online.

| S-EPMC8756182 | biostudies-literature

Deep residual inception encoder-decoder network for amyloid PET harmonization.

Project description:IntroductionMultiple positron emission tomography (PET) tracers are available for amyloid imaging, posing a significant challenge to consensus interpretation and quantitative analysis. We accordingly developed and validated a deep learning model as a harmonization strategy.MethodA Residual Inception Encoder-Decoder Neural Network was developed to harmonize images between amyloid PET image pairs made with Pittsburgh Compound-B and florbetapir tracers. The model was trained using a dataset with 92 subjects with 10-fold cross validation and its generalizability was further examined using an independent external dataset of 46 subjects.ResultsSignificantly stronger between-tracer correlations (P < .001) were observed after harmonization for both global amyloid burden indices and voxel-wise measurements in the training cohort and the external testing cohort.DiscussionWe proposed and validated a novel encoder-decoder based deep model to harmonize amyloid PET imaging data from different tracers. Further investigation is ongoing to improve the model and apply to additional tracers.

| S-EPMC9360199 | biostudies-literature

Locally Embedding Autoencoders: A Semi-Supervised Manifold Learning Approach of Document Representation.

Project description:Topic models and neural networks can discover meaningful low-dimensional latent representations of text corpora; as such, they have become a key technology of document representation. However, such models presume all documents are non-discriminatory, resulting in latent representation dependent upon all other documents and an inability to provide discriminative document representation. To address this problem, we propose a semi-supervised manifold-inspired autoencoder to extract meaningful latent representations of documents, taking the local perspective that the latent representation of nearby documents should be correlative. We first determine the discriminative neighbors set with Euclidean distance in observation spaces. Then, the autoencoder is trained by joint minimization of the Bernoulli cross-entropy error between input and output and the sum of the square error between neighbors of input and output. The results of two widely used corpora show that our method yields at least a 15% improvement in document clustering and a nearly 7% improvement in classification tasks compared to comparative methods. The evidence demonstrates that our method can readily capture more discriminative latent representation of new documents. Moreover, some meaningful combinations of words can be efficiently discovered by activating features that promote the comprehensibility of latent representation.

| S-EPMC4718658 | biostudies-literature

REDfold: accurate RNA secondary structure prediction using residual encoder-decoder network.

Project description:BackgroundAs the RNA secondary structure is highly related to its stability and functions, the structure prediction is of great value to biological research. The traditional computational prediction for RNA secondary prediction is mainly based on the thermodynamic model with dynamic programming to find the optimal structure. However, the prediction performance based on the traditional approach is unsatisfactory for further research. Besides, the computational complexity of the structure prediction using dynamic programming is [Formula: see text]; it becomes [Formula: see text] for RNA structure with pseudoknots, which is computationally impractical for large-scale analysis.ResultsIn this paper, we propose REDfold, a novel deep learning-based method for RNA secondary prediction. REDfold utilizes an encoder-decoder network based on CNN to learn the short and long range dependencies among the RNA sequence, and the network is further integrated with symmetric skip connections to efficiently propagate activation information across layers. Moreover, the network output is post-processed with constrained optimization to yield favorable predictions even for RNAs with pseudoknots. Experimental results based on the ncRNA database demonstrate that REDfold achieves better performance in terms of efficiency and accuracy, outperforming the contemporary state-of-the-art methods.

| S-EPMC10044938 | biostudies-literature

Multi-level pooling encoder-decoder convolution neural network for MRI reconstruction.

Project description:MRI reconstruction is one of the critical processes of MRI machines, along with the acquisition. Due to a slow processing time of signal acquiring, parallel imaging and reconstruction techniques are applied for acceleration. To accelerate the acquisition process, fewer raw data are sampled simultaneously with all RF coils acquisition. Then, the reconstruction uses under-sampled data from all RF coils to restore the final MR image that resembles the fully sampled MR image. These processes have been a traditional procedure inside the MRI system since the invention of the multi-coils MRI machine. This paper proposes the deep learning technique with a lightweight network. The deep neural network is capable of generating the high-quality reconstructed MR image with a high peak signal-to-noise ratio (PSNR). This also opens a high acceleration factor for MR data acquisition. The lightweight network is called Multi-Level Pooling Encoder-Decoder Net (MLPED Net). The proposed network outperforms the traditional encoder-decoder networks on 4-fold acceleration with a significant margin on every evaluation metric. The network can be trained end-to-end, and it is a lightweight structure that can reduce training time significantly. Experimental results are based on a publicly available MRI Knee dataset from the fastMRI competition.

| S-EPMC9044365 | biostudies-literature

Efficient attention-based deep encoder and decoder for automatic crack segmentation.

Project description:Recently, crack segmentation studies have been investigated using deep convolutional neural networks. However, significant deficiencies remain in the preparation of ground truth data, consideration of complex scenes, development of an object-specific network for crack segmentation, and use of an evaluation method, among other issues. In this paper, a novel semantic transformer representation network (STRNet) is developed for crack segmentation at the pixel level in complex scenes in a real-time manner. STRNet is composed of a squeeze and excitation attention-based encoder, a multi head attention-based decoder, coarse upsampling, a focal-Tversky loss function, and a learnable swish activation function to design the network concisely by keeping its fast-processing speed. A method for evaluating the level of complexity of image scenes was also proposed. The proposed network is trained with 1203 images with further extensive synthesis-based augmentation, and it is investigated with 545 testing images (1280 × 720, 1024 × 512); it achieves 91.7%, 92.7%, 92.2%, and 92.6% in terms of precision, recall, F1 score, and mIoU (mean intersection over union), respectively. Its performance is compared with those of recently developed advanced networks (Attention U-net, CrackSegNet, Deeplab V3+, FPHBN, and Unet++), with STRNet showing the best performance in the evaluation metrics-it achieves the fastest processing at 49.2 frames per second.

| S-EPMC9411784 | biostudies-literature

OmicsDI is part of the ELIXIR infrastructure

OmicsDI is an Elixir interoperability service. Learn more ›

Tweets

OmicsDI Databases

PRIDE
PeptideAtlas
MassIVE
JPOST Repository
Physiome Model Repository

EGA
EVA
ENA
LINCS
PAXDB
Cell Collective

MetaboLights
Metabolomics Workbench
MetabolomeExpress
GNPS
BioModels
FAIRDOMHub

ArrayExpress
dbGaP
ExpressionAtlas
GEO
NODE

Information

Databases
Help
API
Contact us
Code on GitHub
Terms of use
Submit Data