This directory contains sets of molecules used to train chemical language models in the paper, "Learning generative models of molecules from limited training examples." Between 1,000 and 500,000 molecules were sampled from each of four chemical databases (ChEMBL, COCONUT, GDB, and ZINC). These molecules were represented using either the SMILES, DeepSMILES, or SELFIES formats. For molecules in the SMILES format, data augmentation was also performed by enumerating non-canonical SMILES, with augmentation factors of 3x, 10x, or 30x. For each training dataset size, ten independent samples were drawn to assess variability
Pretraining foundation models that adapt to a wide range of molecule tasks have been long pursued by...
Molecular design is a critical aspect of various scientific and industrial fields, where the propert...
Computer-based de-novo design of functional molecules is one of the most prominent challenges in che...
This directory contains sets of molecules used to train chemical language models in the paper, "Lear...
Recent applications of recurrent neural networks (RNN) enable training models that sample the chemic...
Recent applications of recurrent neural networks (RNN) enable training models that sample the chemic...
A Recurrent Neural Network (RNN) trained with a set of molecules represented as SMILES strings can g...
Recent applications of Recurrent Neural Networks enable training models that sample the chemical spa...
Herein find the molecular datasets from "SMILES-Based Deep Generative Scaffold Decorator for De-Novo...
Deep generative models of molecules have grown immensely in popularity, trained on relevant datasets...
Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) S...
Research in chemistry increasingly requires interdisciplinary work prompted by, among other things, ...
A key component of automated molecular design is the generation of compound ideas for subsequent fil...
The number of 'small' molecules that may be of interest to chemical biologists - chemical space - is...
Molecule generation is a challenging open problem in cheminformatics. Currently, deep generative app...
Pretraining foundation models that adapt to a wide range of molecule tasks have been long pursued by...
Molecular design is a critical aspect of various scientific and industrial fields, where the propert...
Computer-based de-novo design of functional molecules is one of the most prominent challenges in che...
This directory contains sets of molecules used to train chemical language models in the paper, "Lear...
Recent applications of recurrent neural networks (RNN) enable training models that sample the chemic...
Recent applications of recurrent neural networks (RNN) enable training models that sample the chemic...
A Recurrent Neural Network (RNN) trained with a set of molecules represented as SMILES strings can g...
Recent applications of Recurrent Neural Networks enable training models that sample the chemical spa...
Herein find the molecular datasets from "SMILES-Based Deep Generative Scaffold Decorator for De-Novo...
Deep generative models of molecules have grown immensely in popularity, trained on relevant datasets...
Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) S...
Research in chemistry increasingly requires interdisciplinary work prompted by, among other things, ...
A key component of automated molecular design is the generation of compound ideas for subsequent fil...
The number of 'small' molecules that may be of interest to chemical biologists - chemical space - is...
Molecule generation is a challenging open problem in cheminformatics. Currently, deep generative app...
Pretraining foundation models that adapt to a wide range of molecule tasks have been long pursued by...
Molecular design is a critical aspect of various scientific and industrial fields, where the propert...
Computer-based de-novo design of functional molecules is one of the most prominent challenges in che...