“That’s something that, you know, we can’t comment on right now,” OpenAI chief scientist Ilya Sutskever said when I spoke to the GPT-4 team on a video call an hour after the announcement. “It’s pretty competitive out there.”
Access to GPT-4 will be available to users who sign up on the waiting list and premium paid ChatGPT Plus subscribers in a limited text-only capacity.
GPT-4 is a multimodal large language model, which means that it can respond to both text and images. Give him a photo of the contents of your fridge and ask him what you could make, and GPT-4 will try to find recipes that use the ingredients in the photo.
“Continuous improvements across many dimensions are remarkable,” says Oren Etzioni of the Allen Institute for AI. “GPT-4 is now the standard by which all basic models will be evaluated.”
“A good multimodal model has been the holy grail of many large tech labs over the last two years,” says Thomas Wolf, co-founder of Hugging Face, the AI startup behind the open-source big language model BLOOM. “But it has remained elusive.”
In theory, the combination of text and images could allow multimodal models to better understand the world. “It could address the traditional pain points of language models, such as spatial reasoning,” Wolf says.
It is not yet clear if that is true for GPT-4. The new OpenAI model seems to be better at some basic reasoning than ChatGPT, solving simple puzzles like summarizing blocks of text into words that start with the same letter. In my demo, I was shown GPT-4 summarizing the OpenAI website advertisement blurb using words beginning with g: “GPT-4, Innovative Generational Growth, Gets Higher Ratings. Railings, orientation and profits obtained. Giant, innovative and globally gifted.” In another demonstration, GPT-4 picked up a tax document and answered questions about it, citing reasons for their answers.