A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Fin...
Current approaches to metadata discovery are dependent on manual curations which are time consuming ...
Abstract Background Global meta-analysis (GMA) of microarray data to identify genes with highly simi...
Mining gene expression databases for association rules Chad Creighton 1, ∗ and Samir Hanash 2 Motiva...
A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descri...
Abstract Background The ability to efficiently search and filter datasets depends on access to high ...
While there exists an abundance of open biomedical data, the lack of high-quality metadata makes it ...
<p>In biomedicine, good metadata is crucial to finding experimental datasets, to understand how expe...
Abstract Background NCBI’s Gene Expression Omnibus (GEO) is a rich community resource containing mil...
There is a great deal of interest in analyzing very large data sets in the biomedical sciences. This...
High-quality metadata annotations for data hosted in large public repositories are essential for res...
Modern computational biology is awash in large-scale data mining problems. Several high-throughput t...
This study explores metadata practices in the relation to data reuse in biology. Metadata has long b...
The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional gen...
Current approaches to metadata discovery are dependent on manual curations which are time consuming ...
Abstract Background Global meta-analysis (GMA) of microarray data to identify genes with highly simi...
Mining gene expression databases for association rules Chad Creighton 1, ∗ and Samir Hanash 2 Motiva...
A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descri...
Abstract Background The ability to efficiently search and filter datasets depends on access to high ...
While there exists an abundance of open biomedical data, the lack of high-quality metadata makes it ...
<p>In biomedicine, good metadata is crucial to finding experimental datasets, to understand how expe...
Abstract Background NCBI’s Gene Expression Omnibus (GEO) is a rich community resource containing mil...
There is a great deal of interest in analyzing very large data sets in the biomedical sciences. This...
High-quality metadata annotations for data hosted in large public repositories are essential for res...
Modern computational biology is awash in large-scale data mining problems. Several high-throughput t...
This study explores metadata practices in the relation to data reuse in biology. Metadata has long b...
The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional gen...
Current approaches to metadata discovery are dependent on manual curations which are time consuming ...
Abstract Background Global meta-analysis (GMA) of microarray data to identify genes with highly simi...
Mining gene expression databases for association rules Chad Creighton 1, ∗ and Samir Hanash 2 Motiva...