Parallel corpora are crucial for statistical machine translation (SMT); however, they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract either parallel sentences or fragments from them for SMT. In this article, we propose an integrated system to extract both parallel sentences and fragments from comparable corpora. We first apply parallel sentence extraction to identify parallel sentences from comparable sentences. We then extract parallel fragments from the comparable sentences. Parallel sentence extraction is based on a parallel sentence candidate filter and classifier for parallel sentence identification. We improve it by proposing a novel filte...
Thesis (Master's)--University of Washington, 2018In an emergency, machine translation systems can be...
This paper describes a method for extracting parallel sentences from comparable texts. We present th...
International audienceIn this article, we present a simple and effective approach for extracting bil...
Although parallel sentences rarely exist in quasi–comparable corpora, there could be parallel fragme...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
Parallel sentences are crucial for statistical machine translation (SMT). However, they are quite sc...
Parallel sentences are crucial for statistical machine translation (SMT). However, they are quite sc...
Achieving accurate translation, especially in multiple domain documents with statistical machine tra...
We report on methods to create the largest publicly available parallel corpora by crawling the web, ...
We report on methods to create the largest publicly available parallel corpora by crawling the web, ...
Building parallel resources for corpus based machine translation, especially Statistical Machine Tra...
We present a simple and effective method for extracting parallel sentences from comparable corpora. ...
This paper proposes a novel method for exploiting comparable documents to generate parallel data fo...
We present a novel method to detect parallel fragments within noisy parallel corpora. Isolat-ing the...
Thesis (Master's)--University of Washington, 2018In an emergency, machine translation systems can be...
Thesis (Master's)--University of Washington, 2018In an emergency, machine translation systems can be...
This paper describes a method for extracting parallel sentences from comparable texts. We present th...
International audienceIn this article, we present a simple and effective approach for extracting bil...
Although parallel sentences rarely exist in quasi–comparable corpora, there could be parallel fragme...
AbstractParallel sentences are a relatively scarce but extremely useful resource for many applicatio...
Parallel sentences are crucial for statistical machine translation (SMT). However, they are quite sc...
Parallel sentences are crucial for statistical machine translation (SMT). However, they are quite sc...
Achieving accurate translation, especially in multiple domain documents with statistical machine tra...
We report on methods to create the largest publicly available parallel corpora by crawling the web, ...
We report on methods to create the largest publicly available parallel corpora by crawling the web, ...
Building parallel resources for corpus based machine translation, especially Statistical Machine Tra...
We present a simple and effective method for extracting parallel sentences from comparable corpora. ...
This paper proposes a novel method for exploiting comparable documents to generate parallel data fo...
We present a novel method to detect parallel fragments within noisy parallel corpora. Isolat-ing the...
Thesis (Master's)--University of Washington, 2018In an emergency, machine translation systems can be...
Thesis (Master's)--University of Washington, 2018In an emergency, machine translation systems can be...
This paper describes a method for extracting parallel sentences from comparable texts. We present th...
International audienceIn this article, we present a simple and effective approach for extracting bil...