Big gains from pretrained language models (from internet corpora). We now have a paradigm: taking a large pre-existing model and fine-tune on some downstream task.
Language models are now being used to generate new synthetic data. Results even suggest the models are good at recovering facts. Since it's pretty hard to evaluate (quantify) the quality of natural language, we have come with a dream:
Train language models to evaluate natural language (against high-quality, human judgement). Then, use these trained models as an evaluation metric.
We are getting very close to human-level evaluation! However, LLMs break in surprising ways: