$FastDoc$: Domain-Specific Fast Pre-training Technique using Document-Level Metadata and Taxonomy

Nandy, Abhilash
Kapadnis, Manav Nitin
Patnaik, Sohan
Butala, Yash Parag
Goyal, Pawan
Ganguly, Niloy

Publication date

November 2023

Language

English

Abstract

As the demand for sophisticated Natural Language Processing (NLP) models continues to grow, so does the need for efficient pre-training techniques. Current NLP models undergo resource-intensive pre-training. In response, we introduce $FastDoc$ (Fast Pre-training Technique using Document-Level Metadata and Taxonomy), a novel approach designed to significantly reduce computational demands. $FastDoc$ leverages document metadata and domain-specific taxonomy as supervision signals. It involves continual pre-training of an open-domain transformer encoder using sentence-level embeddings, followed by fine-tuning using token-level embeddings. We evaluate $FastDoc$ on six tasks across nine datasets spanning three distinct domains. Remarkably, $FastDo...

Extracted data

We use cookies to provide a better user experience.

Data Protection

$FastDoc$: Domain-Specific Fast Pre-training Technique using Document-Level Metadata and Taxonomy

Abstract

Extracted data

$FastDoc$: Domain-Specific Fast Pre-training Technique using Document-Level Metadata and Taxonomy

Abstract

Extracted data

Related items

Related items