The practice of transferring knowledge from a sophisticated, proprietary large language model (LLM) to a compact, open-source LLM has garnered considerable attention. Previous works have focused on a unidirectional knowledge distillation way by aligning the responses of the student model with those of the teacher model to a set of instructions. Nevertheless, they overlooked the possibility of incorporating any reciprocal "feedback"--identifying challenging instructions where the student model's performance falls short--to boost the student model's proficiency iteratively. To this end, we propose a novel adversarial distillation framework for a more efficient knowledge transfer. Leveraging the versatile role adaptability of LLMs, we prompt t...
Knowledge distillation (KD), best known as an effective method for model compression, aims at transf...
Distillation efforts have led to language models that are more compact and efficient without serious...
Recently, large-scale pre-trained models have shown their advantages in many tasks. However, due to ...
Recently, multi-modal content generation has attracted lots of attention from researchers by investi...
Deep and large pre-trained language models (e.g., BERT, GPT-3) are state-of-the-art for various natu...
This work investigates large language models (LLMs) as teachable agents for learning by teaching (LB...
Large language models (LLMs) have shown incredible performance in completing various real-world task...
Large-scale pretrained language models have led to significant improvements in Natural Language Proc...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Instruction tuning is instrumental in enabling Large Language Models~(LLMs) to follow user instructi...
High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collec...
Large language models~(LLMs) are instruction followers, but it can be challenging to find the best i...
Large language models have become a vital component in modern NLP, achieving state of the art perfor...
Although remarkable progress has been achieved in preventing large language model (LLM) hallucinatio...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
Knowledge distillation (KD), best known as an effective method for model compression, aims at transf...
Distillation efforts have led to language models that are more compact and efficient without serious...
Recently, large-scale pre-trained models have shown their advantages in many tasks. However, due to ...
Recently, multi-modal content generation has attracted lots of attention from researchers by investi...
Deep and large pre-trained language models (e.g., BERT, GPT-3) are state-of-the-art for various natu...
This work investigates large language models (LLMs) as teachable agents for learning by teaching (LB...
Large language models (LLMs) have shown incredible performance in completing various real-world task...
Large-scale pretrained language models have led to significant improvements in Natural Language Proc...
Deploying large language models (LLMs) is challenging because they are memory inefficient and comput...
Instruction tuning is instrumental in enabling Large Language Models~(LLMs) to follow user instructi...
High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collec...
Large language models~(LLMs) are instruction followers, but it can be challenging to find the best i...
Large language models have become a vital component in modern NLP, achieving state of the art perfor...
Although remarkable progress has been achieved in preventing large language model (LLM) hallucinatio...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
Knowledge distillation (KD), best known as an effective method for model compression, aims at transf...
Distillation efforts have led to language models that are more compact and efficient without serious...
Recently, large-scale pre-trained models have shown their advantages in many tasks. However, due to ...