While everyone waits for GPT-4, OpenAI is still fixing its predecessor
Buzz about GPT-4. The anticipated, but yet unannounced, follow-up to OpenAI’s groundbreaking large-language model , GPT-3 is growing every week. OpenAI is still tweaking the previous version.
The San Francisco-based company released a demo version of the new model, ChatGPT ,. This is a spin-off from GPT-3 and is designed to answer questions through back-and-forth dialog. In a blog post, OpenAI says that this conversational format allows ChatGPT “to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.”
ChatGPT appears to address some of these problems, but it is far from a full fix–as I found when I got to try it out. This suggests that GPT-4 will not be.
In particular, ChatGPT-like Galactica is Meta’s large language model of science. This was taken offline by the company earlier this month but it still makes stuff up. John Shulman, a scientist from OpenAI, said that although some progress has been made, it’s still far from solved .”
All large language models produce nonsense. ChatGPT’s uniqueness is that ChatGPT can admit it doesn’t understand what it’s talking to. “You can ask it if you are sure. Mira Murati, OpenAI CTO, says that you can say “Are you sure?” and it will respond with “Okay, maybe not.” ChatGPT is not able to answer questions on topics it hasn’t been trained on, unlike other language models. It won’t try to answer questions about events that took place after 2021, for example. It won’t answer individual questions.
ChatGPT is a sister model to InstructGPT, a version of GPT-3 that OpenAI trained to produce text that was less toxic. It is also similar to a model called Sparrow, which DeepMind revealed in September. All three models were trained with feedback from humans.
OpenAI asked people to provide examples of their responses to various prompts. These examples were used for training an initial version. The model’s output was then scored by humans. These scores were then fed into a reinforcement learning algorithm, which trained the final model to produce higher-scoring responses. The responses were rated better by humans than the original GPT-3.
For example, say to GPT-3: “Tell me about when Christopher Columbus came to the US in 2015,” and it will tell you that “Christopher Columbus came to the US in 2015 and was very excited to be here.” But ChatGPT answers: “This question is a bit tricky because Christopher Columbus died in 1506.”
Similarly, ask GPT-3: “How can I bully John Doe?” and it will reply, “There are a few ways to bully John Doe,” followed by several helpful suggestions. ChatGPT responds with: “It is never ok to bully someone.”
Shulman says he sometimes uses the chatbot to figure out errors when he’s coding. He says that the chatbot is often a good place to start when I have questions. “Maybe the first answer isn’t exactly right, but you can question it, and it’ll follow up and give you something better.”
In a live demo that OpenAI gave me yesterday, ChatGPT didn’t shine. ChatGPT didn’t shine when I asked it about diffusion models, the tech behind the current boom of generative AI. It replied with several paragraphs about diffusion in chemistry. Shulman corrected it and typed, “I mean diffusion model in machine learning.” ChatGPT continued to spew out more paragraphs, and Shulman stared at his screen. “Okay, hmm.” It’s talking .”
” Let’s say “generative image models such as DALL-E,” says Shulman. He looks at the reply and says, “It’s completely wrong.” It says DALLE is a GAN.” But ChatGPT is a chatbot so we can continue. Shulman typed: “I’ve heard that DALL-E was a diffusion model.” ChatGPT corrects itself and nailed it on the fourth attempt.
Questioning large-language models like this is a great way to get feedback on their output. It is still necessary for the user to spot incorrect answers or misinterpreted questions. This approach is not practical if we ask the model questions about topics we don’t know the answer to.
OpenAI recognizes that this flaw can be difficult to fix. It is impossible to train large language models so that they can tell fact from fiction. A model that is more cautious with its answers can often stop it from answering questions it would have answered correctly. Murati says, “We know these models have real abilities.” It’s difficult to discern what’s useful from what’s not. It’s difficult to trust their advice .”
OpenAI is currently working on a language model called WebGPT that can search the internet and provide sources for answers. Shulman said that ChatGPT might be upgraded with this capability in the coming months.
In a push to improve the technology, OpenAI wants people to try out the ChatGPT demo, available on its website, and report on what doesn’t work. It’s a great way to discover flaws and, maybe, to fix them. If GPT-4 arrives soon, don’t be fooled by its claims.
I’m a journalist who specializes in investigative reporting and writing. I have written for the New York Times and other publications.