Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Saetia, Chanatip
Chuangsuwanich, Ekapol
Chalothorn, Tawunrat
Vateekul, Peerapon

Open PDF

Open link

Publication date

June 2021

DOI

10.4186/ej.2021.25.6.15

Publisher

Faculty of Engineering, Chulalongkorn University

Language

English

Abstract

A sentence is typically treated as the minimal syntactic unit used to extract valuable information from long text. However, in written Thai, there are no explicit sentence markers. Some prior works use machine learning; however, a deep learning approach has never been employed. We propose a deep learning model for sentence segmentation that includes three main contributions. First, we integrate n-gram embedding as a local representation to capture word groups near sentence boundaries. Second, to focus on the keywords of dependent clauses, we combine the model with a distant representation obtained from self-attention modules. Finally, due to the scarcity of labeled data, for which annotation is difficult and time-consuming, we also investig...

Extracted data

We use cookies to provide a better user experience.

Data Protection

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Abstract

Extracted data

Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

Abstract

Extracted data

Related items

Related items