Reinforcement learning (RL) has been widely used to aid training in language generation. This is achieved by enhancing standard maximum likelihood objectives with user-specified reward functions that encourage global semantic consistency. We propose a principled approach to address the difficulties associated with RL-based solutions, namely, high-variance gradients, uninformative rewards and brittle training. By leveraging the optimal transport distance, we introduce a regularizer that significantly alleviates the above issues. Our formulation emphasizes the preservation of semantic features, enabling end-to-end training instead of ad-hoc fine-tuning, and when combined with RL, it controls the exploration space for more efficient model upda...
This paper studies the use of Reinforcement Learning (RL) policies for optimizing the sequencing of...
There is a long-existing discrepancy between training and testing process of most generative models ...
Reinforcement Learning (RL) has seen exponential performance improvements over the past decade, achi...
Natural language generation (NLG) is an important task with various applications like neural machine...
Although Neural Machine Translation (NMT) models have advanced state-of-the-art performance in machi...
Reinforcement learning (RL) has been widely used, for example, in robotics, recommendation systems, ...
Reinforcement learning (RL) has been widely used, for example, in robotics, recommendation systems, ...
This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach ...
Usage of reinforcement learning (RL) in natural language processing (NLP) tasks has gained momentum ...
Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform...
We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If...
We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequenc...
Curriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequen...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
Reinforcement Learning (RL) represents a very promising field in the umbrella of Machine Learning (M...
This paper studies the use of Reinforcement Learning (RL) policies for optimizing the sequencing of...
There is a long-existing discrepancy between training and testing process of most generative models ...
Reinforcement Learning (RL) has seen exponential performance improvements over the past decade, achi...
Natural language generation (NLG) is an important task with various applications like neural machine...
Although Neural Machine Translation (NMT) models have advanced state-of-the-art performance in machi...
Reinforcement learning (RL) has been widely used, for example, in robotics, recommendation systems, ...
Reinforcement learning (RL) has been widely used, for example, in robotics, recommendation systems, ...
This paper introduces TRUncated ReinForcement Learning for Language (TrufLL), an original ap-proach ...
Usage of reinforcement learning (RL) in natural language processing (NLP) tasks has gained momentum ...
Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform...
We tackle the problem of aligning pre-trained large language models (LMs) with human preferences. If...
We present a study on reinforcement learning (RL) from human bandit feedback for sequence-to-sequenc...
Curriculum reinforcement learning (CRL) allows solving complex tasks by generating a tailored sequen...
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Comput...
Reinforcement Learning (RL) represents a very promising field in the umbrella of Machine Learning (M...
This paper studies the use of Reinforcement Learning (RL) policies for optimizing the sequencing of...
There is a long-existing discrepancy between training and testing process of most generative models ...
Reinforcement Learning (RL) has seen exponential performance improvements over the past decade, achi...