The collection and curation of high-quality training data is crucial for developing text classification models with superior performance, but it is often associated with significant costs and time investment. Researchers have recently explored using large language models (LLMs) to generate synthetic datasets as an alternative approach. However, the effectiveness of the LLM-generated synthetic data in supporting model training is inconsistent across different classification tasks. To better understand factors that moderate the effectiveness of the LLM-generated synthetic data, in this study, we look into how the performance of models trained on these synthetic data may vary with the subjectivity of classification. Our results indicate that s...
Large Language Models (LLMs) have demonstrated impressive zero shot performance on a wide range of N...
Scaling language models with more data, compute and parameters has driven significant progress in na...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
This paper studies the use of language models as a source of synthetic unlabeled text for NLP. We fo...
Scarcity of user data continues to be a problem in research on conversational user interfaces and of...
High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collec...
We present an empirical evaluation of various outputs generated by nine of the most widely-available...
This study discusses the effect of semi-supervised learning in combination with pretrained language ...
Large Language Models (LLMs) have achieved significant success across various natural language proce...
In a Data-Centric AI paradigm, the model performance is enhanced without altering the model architec...
This paper examines the comparative effectiveness of a specialized compiled language model and a gen...
The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-l...
In many cases of machine learning, research suggests that the development of training data might hav...
Data augmentation techniques are widely used for enhancing the performance of machine learning model...
The use of machine learning (ML) models to assess and score textual data has become increasingly per...
Large Language Models (LLMs) have demonstrated impressive zero shot performance on a wide range of N...
Scaling language models with more data, compute and parameters has driven significant progress in na...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...
This paper studies the use of language models as a source of synthetic unlabeled text for NLP. We fo...
Scarcity of user data continues to be a problem in research on conversational user interfaces and of...
High-quality instruction-tuning data is critical to improving LLM capabilities. Existing data collec...
We present an empirical evaluation of various outputs generated by nine of the most widely-available...
This study discusses the effect of semi-supervised learning in combination with pretrained language ...
Large Language Models (LLMs) have achieved significant success across various natural language proce...
In a Data-Centric AI paradigm, the model performance is enhanced without altering the model architec...
This paper examines the comparative effectiveness of a specialized compiled language model and a gen...
The widespread use of Large Language Models (LLMs), celebrated for their ability to generate human-l...
In many cases of machine learning, research suggests that the development of training data might hav...
Data augmentation techniques are widely used for enhancing the performance of machine learning model...
The use of machine learning (ML) models to assess and score textual data has become increasingly per...
Large Language Models (LLMs) have demonstrated impressive zero shot performance on a wide range of N...
Scaling language models with more data, compute and parameters has driven significant progress in na...
In this work, we evaluate 10 open-source instructed LLMs on four representative code comprehension a...