Maximum likelihood estimation (MLE) is the predominant algorithm for training text generation models. This paradigm relies on direct supervision examples, which is not applicable to many applications, such as generating adversarial attacks or generating prompts to control language models. Reinforcement learning (RL) on the other hand offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. Yet previous RL algorithms for text generation, such as policy gradient (on-policy RL) and Q-learning (off-policy RL), are often notoriously inefficient or unstable to train due to the large sequence space and the sparse reward received only at the end of sequences. In this paper, we introduce a new RL formulation for...
Large-scale language models often learn behaviors that are misaligned with user expectations. Genera...
Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactiv...
Neural language models often fail to generate diverse and informative texts, limiting their applicab...
Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large Language Models...
Controlling the generative model to adapt a new domain with limited samples is a difficult challenge...
Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform...
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based an...
We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If...
Text-based environments enable RL agents to learn to converse and perform interactive tasks through ...
In many sequential decision-making problems (e.g., robotics control, game playing, sequential predic...
This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach ...
The Reinforcement learning (RL) algorithms solve a wide range of problems we faced. The topic of RL ...
In the past few years, generative adversarial networks (GANs) have become increasingly important in ...
Natural language modeling with limited training data is a challenging problem, and many algorithms m...
Reinforcement learning (RL) has been widely used to aid training in language generation. This is ach...
Large-scale language models often learn behaviors that are misaligned with user expectations. Genera...
Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactiv...
Neural language models often fail to generate diverse and informative texts, limiting their applicab...
Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large Language Models...
Controlling the generative model to adapt a new domain with limited samples is a difficult challenge...
Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform...
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-based an...
We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If...
Text-based environments enable RL agents to learn to converse and perform interactive tasks through ...
In many sequential decision-making problems (e.g., robotics control, game playing, sequential predic...
This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach ...
The Reinforcement learning (RL) algorithms solve a wide range of problems we faced. The topic of RL ...
In the past few years, generative adversarial networks (GANs) have become increasingly important in ...
Natural language modeling with limited training data is a challenging problem, and many algorithms m...
Reinforcement learning (RL) has been widely used to aid training in language generation. This is ach...
Large-scale language models often learn behaviors that are misaligned with user expectations. Genera...
Different from classic Supervised Learning, Reinforcement Learning (RL), is fundamentally interactiv...
Neural language models often fail to generate diverse and informative texts, limiting their applicab...