Best Large Language Models

In the last few years, large language models have captured the imagination of the public. People who aren’t natural language processing or artificial intelligence researchers have seen the outputs of these models and used them. Many think that they can be used to write articles and power the next generation of business chat bots. Perhaps with more human-like chat bots, customers won’t decide to boycott any business that uses them. In this guide, we’ll give you a rundown of some recent large language models and also some services that you can use to access them.

This robot might be able to talk one day if it has a large language model in its brain.

Large Language Models

We first talk about some recent academic advances in large language models. All of these models take existing neural networks and scale them up to have 100s of billions of parameters. They are then trained on very large corpuses that include a lot of text on the internet. Some of the models also include the text of academic papers.

GPT-3

This model has 175 billion parameters and was trained on thousands of GPUs for weeks. In the paper about this model, the authors have graphs indicating that the quality of their model did not saturate benchmarks. This means that a model that has more parameters and is trained for longer on more data would be even better than GPT-3.

When the model came out in 2020, many public intellectuals thought that the AI revolution was nigh. They thought it was only a matter of years before a rogue AI turned all humans into machines. However, this is likely because the creator of GPT-3, OpenAI, only released a handful of examples, many of which were cherry picked. This made some people think that GPT-3 was sentient.

However, when the public got access to this model, it became clear, even to people who are not machine learning researchers, that this model is not a real-life version of the Terminator. Instead, it is just the next iteration of decades of natural language processing research.

One big problem with GPT-3 is that OpenAI refused to release the model weights due to fears that people would use the model to do nefarious things like create fake news or perpetuate racial bias. Of course, humans can do both of those things without GPT-3. And it is now clear that OpenAI may have not released the model weights because they want to monetize their model.

Megatron-Turing

Megatron-Turing is a model released by researchers at NVIDIA and Microsoft. It has 530 billion parameters. The model is not publicly released. It appears to primarily be a way for both companies to show off their ability to scale up the model by another factor of 3. According to the paper on Megatron-Turing, the model outperforms GPT-3 on several tasks. These tasks are objective proxies for the model’s ability to reason like a human. Because the benchmarks were created in advance, it is possible that the model is overfitting to them.

OPT-175B

Meta researchers realized that it was not possible for people to advance the field of natural language processing without access to model weights. So, they released several models of varying sizes to the public. The models with up to 66 billion parameters are available for download. The 175 billion parameter is only available to select researchers and appears to be comparable to GPT-3 on many tasks.

Meta kindly asks those who download the model to not use it for nefarious purposes.

BLOOM

A group of researchers from smaller companies than OpenAI and Meta have publicly released the weights of a 176 billion parameter model called BLOOM. Like Meta, they kindly ask users to not use the model for nefarious purposes.

Commercial Services

Several companies are trying to commercialize the above models by giving users access to them via an inference API. It is relatively difficult for a user to execute one of the above models on a consumer GPU, so the below APIs can make inference much easier.

OpenAI

OpenAI provides an API to what seems to be an improved version of GPT-3 at a cost of a few cents per several hundred words generated. The model is fairly good at producing convincing text, but the interface is painful to use.

Jasper.AI

Jasper AI provides a nicer interface than OpenAI’s. However, their underlying language model seems to be worse. If you are willing to do more of the work of writing the article, you still may find your productivity boosted if you use this company’s product.

We hope you found this list of large language models useful. Maybe one day there will be an AI that can help you count your calories.

Best Large Language Models