Reliably controlling the behavior of large language models is a pressing open problem. Existing methods include supervised finetuning, reinforcement learning from human feedback, prompt engineering, and guided decoding. We instead investigate activation engineering: modifying activations at inference time to predictably alter model behavior. In particular, we bias the forward pass with an added 'steering vector' implicitly specified through natural language. Unlike past work which learned these steering vectors, our Activation Addition (ActAdd) method computes them by taking the activation differences that result from pairs of prompts. We demonstrate ActAdd on GPT-2 on OpenWebText and ConceptNet. Our inference-time approach yields control...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...
Recent advances in Transformer-based large language models (LLMs) have led to significant performanc...
Reliably controlling the behavior of large language models is a pressing open problem. Existing meth...
Prior work on controllable text generation has focused on learning how to control language models th...
Prefix-tuning is a powerful lightweight technique for adapting a large pre-trained language model to...
In recent years, language models (LMs) have made remarkable progress in advancing the field of natu...
This paper explores the effectiveness of prompt programming in the fine-tuning process of a Hungaria...
Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large Language Models...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...
Recent advances in Transformer-based large language models (LLMs) have led to significant performanc...
Reliably controlling the behavior of large language models is a pressing open problem. Existing meth...
Prior work on controllable text generation has focused on learning how to control language models th...
Prefix-tuning is a powerful lightweight technique for adapting a large pre-trained language model to...
In recent years, language models (LMs) have made remarkable progress in advancing the field of natu...
This paper explores the effectiveness of prompt programming in the fine-tuning process of a Hungaria...
Reinforcement learning (RL) has emerged as a powerful paradigm for fine-tuning Large Language Models...
Language model fine-tuning is essential for modern natural language processing, but is computational...
Pretrained large language models (LLMs) are strong in-context learners that are able to perform few-...
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
Language models, given their black-box nature, often exhibit sensitivity to input perturbations, lea...
Fine-tuning large language models for different tasks can be costly and inefficient, and even method...
Recent advancements in Large Language Models (LLMs) have enabled the development of a single model c...
Natural Language Inference (NLI) models are known to learn from biases and artefacts within their tr...
Recent advances in Transformer-based large language models (LLMs) have led to significant performanc...