Machine-learning based entity extraction requires a large corpus of annotated training to achieve acceptable results. However, the cost of expert annotation of relevant data, coupled with issues of inter-annotator variability, makes it expensive and time-consuming to create the necessary corpora. We report here on a simple method for the automatic creation of large quantities of imperfect training data for a biological entity (gene or protein) extraction system. We used resources available in the FlyBase model organism database; these resources include a curated lists of genes and the articles from which the entries were drawn, together a synonym lexicon. We applied simple pattern matching to identify gene names in the associated abstracts ...
Wu, Cathy H.Shanker, Vijay K.Biomedical researchers usually describe their experimental results in r...
Automatically extracting information from biomedical text holds the promise of easily consolidating ...
AbstractProtein name extraction, one of the basic tasks in automatic extraction of information from ...
AbstractBiology has now become an information science, and researchers are increasingly dependent on...
AbstractAs the pace of biological research accelerates, biologists are becoming increasingly reliant...
The recognition and normalization of gene mentions in biomedical literature are crucial steps in bio...
The recognition and normalization of gene mentions in biomedical literature are crucial steps in bio...
A large volume of protein data has been generated as a result of biological research. This vast amou...
Automatically extracting information from biomedical text holds the promise of easily consolidating ...
The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of the...
Background: Significant parts of biological knowledge are available only as unstructured text in art...
Advances of genome sequencing techniques have risen an overwhelming increase in the literature on di...
We studied contrast and variability in a corpus of gene names to identify potential heuristics for u...
Automatically extracting information from biomedical text holds the promise of easily consolidating ...
We report the results of a study into the use of a linear interpolating hidden Markov model (HMM) fo...
Wu, Cathy H.Shanker, Vijay K.Biomedical researchers usually describe their experimental results in r...
Automatically extracting information from biomedical text holds the promise of easily consolidating ...
AbstractProtein name extraction, one of the basic tasks in automatic extraction of information from ...
AbstractBiology has now become an information science, and researchers are increasingly dependent on...
AbstractAs the pace of biological research accelerates, biologists are becoming increasingly reliant...
The recognition and normalization of gene mentions in biomedical literature are crucial steps in bio...
The recognition and normalization of gene mentions in biomedical literature are crucial steps in bio...
A large volume of protein data has been generated as a result of biological research. This vast amou...
Automatically extracting information from biomedical text holds the promise of easily consolidating ...
The breadth and depth of biomedical literature are increasing year upon year. To keep abreast of the...
Background: Significant parts of biological knowledge are available only as unstructured text in art...
Advances of genome sequencing techniques have risen an overwhelming increase in the literature on di...
We studied contrast and variability in a corpus of gene names to identify potential heuristics for u...
Automatically extracting information from biomedical text holds the promise of easily consolidating ...
We report the results of a study into the use of a linear interpolating hidden Markov model (HMM) fo...
Wu, Cathy H.Shanker, Vijay K.Biomedical researchers usually describe their experimental results in r...
Automatically extracting information from biomedical text holds the promise of easily consolidating ...
AbstractProtein name extraction, one of the basic tasks in automatic extraction of information from ...