Recent advances in natural language generation have introduced powerful language models with high-quality output text. However, this raises concerns about the potential misuse of such models for malicious purposes. In this paper, we study natural language watermarking as a defense to help better mark and trace the provenance of text. We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training that, given an input text and a binary message, generates an output text that is unobtrusively encoded with the given message. We further study different training and inference strategies to achieve minimal changes to the semantics and correctness of the input text. AWT is the first end-t...
© Springer Nature Switzerland AG 2020. Recently, generating adversarial examples has become an impor...
Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performan...
Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attack...
Text content created by humans or language models is often stolen or misused by adversaries. Tracing...
We present a text watermarking scheme that embeds a bitstream watermark Wi in a text document P pres...
We present a text watermarking scheme that embeds a bitstream watermark Wi in a text document P pres...
We propose a methodology for planting watermarks in text from an autoregressive language model that ...
Adversarial Text Generation Frameworks (ATGFs) aim at causing a Natural Language Processing (NLP) ma...
Contributing our own creativity (in the form of text, image, audio, and video) to the pool of online...
We construct the first provable watermarking scheme for language models with public detectability or...
The vulnerability of deep neural networks to adversarial attacks has posed significant threats to re...
We propose a method of text watermarking and hashing based on natural-language semantic structures. ...
Despite their promising performance across various natural language processing (NLP) tasks, current ...
The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority...
Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alt...
© Springer Nature Switzerland AG 2020. Recently, generating adversarial examples has become an impor...
Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performan...
Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attack...
Text content created by humans or language models is often stolen or misused by adversaries. Tracing...
We present a text watermarking scheme that embeds a bitstream watermark Wi in a text document P pres...
We present a text watermarking scheme that embeds a bitstream watermark Wi in a text document P pres...
We propose a methodology for planting watermarks in text from an autoregressive language model that ...
Adversarial Text Generation Frameworks (ATGFs) aim at causing a Natural Language Processing (NLP) ma...
Contributing our own creativity (in the form of text, image, audio, and video) to the pool of online...
We construct the first provable watermarking scheme for language models with public detectability or...
The vulnerability of deep neural networks to adversarial attacks has posed significant threats to re...
We propose a method of text watermarking and hashing based on natural-language semantic structures. ...
Despite their promising performance across various natural language processing (NLP) tasks, current ...
The monumental achievements of deep learning (DL) systems seem to guarantee the absolute superiority...
Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alt...
© Springer Nature Switzerland AG 2020. Recently, generating adversarial examples has become an impor...
Transformer-based text classifiers like BERT, Roberta, T5, and GPT-3 have shown impressive performan...
Deep learning models are vulnerable to backdoor attacks. The success rate of textual backdoor attack...