OAGK is a keyword extraction/generation dataset consisting of 2.2 million abstracts, titles and keyword strings from cientific articles. Texts were lowercased and tokenized with Stanford CoreNLP tokenizer. No other preprocessing steps were applied in this release version. Dataset records (samples) are stored as JSON lines in each text file. This data is derived from OAG data collection (https://aminer.org/open-academic-graph) which was released under ODC-BY licence. This data (OAGK Keyword Generation Dataset) is released under CC-BY licence (https://creativecommons.org/licenses/by/4.0/). If using it, please cite the following paper: Çano, Erion and Bojar, Ondřej, 2019, Keyphrase Generation: A Text Summarization Struggle, 2019 An...
Keyphrases that efficiently summarize a document’s content are used in various document processing a...
Doctor of PhilosophyDepartment of Computer ScienceCornelia CarageaDoina CarageaScholarly digital lib...
International audienceKeyphrase generation is the task of predicting a set of lexical units that con...
OAGKX is a keyword extraction/generation dataset consisting of 22674436 abstracts, titles and keywor...
OAGSX is a title generation dataset consisting of 34408509 abstracts and titles from scientific arti...
OAGS is a title generation dataset consisting of 34993700 abstracts and titles from scientific artic...
Recent developments in sequence-to-sequence learning with neural networks have considerably improved...
OAGT is a paper topic dataset consisting of 6942930 records which comprise various scientific public...
OAGL is a paper length prediction dataset consisting of 17528680 records which comprise various scie...
The keyphrases of a document are the textual units that characterize its content such as the topics ...
LDkp (Long Document keyphrase) dataset is the first benchmark corpus of 1.3M documents for identifyi...
The article discusses the evaluation of automatic keyword extraction algorithms (AKEA) and points ou...
The keyphrase extraction task is a fundamental and challenging task designed to automatically extrac...
Automatically assigning keyphrases to documents has a great variety of applications. Here we focus o...
Context Scientific papers, as well as other types of documents, can be identified by a set of keywo...
Keyphrases that efficiently summarize a document’s content are used in various document processing a...
Doctor of PhilosophyDepartment of Computer ScienceCornelia CarageaDoina CarageaScholarly digital lib...
International audienceKeyphrase generation is the task of predicting a set of lexical units that con...
OAGKX is a keyword extraction/generation dataset consisting of 22674436 abstracts, titles and keywor...
OAGSX is a title generation dataset consisting of 34408509 abstracts and titles from scientific arti...
OAGS is a title generation dataset consisting of 34993700 abstracts and titles from scientific artic...
Recent developments in sequence-to-sequence learning with neural networks have considerably improved...
OAGT is a paper topic dataset consisting of 6942930 records which comprise various scientific public...
OAGL is a paper length prediction dataset consisting of 17528680 records which comprise various scie...
The keyphrases of a document are the textual units that characterize its content such as the topics ...
LDkp (Long Document keyphrase) dataset is the first benchmark corpus of 1.3M documents for identifyi...
The article discusses the evaluation of automatic keyword extraction algorithms (AKEA) and points ou...
The keyphrase extraction task is a fundamental and challenging task designed to automatically extrac...
Automatically assigning keyphrases to documents has a great variety of applications. Here we focus o...
Context Scientific papers, as well as other types of documents, can be identified by a set of keywo...
Keyphrases that efficiently summarize a document’s content are used in various document processing a...
Doctor of PhilosophyDepartment of Computer ScienceCornelia CarageaDoina CarageaScholarly digital lib...
International audienceKeyphrase generation is the task of predicting a set of lexical units that con...