The film ‘Her’ is now a reality

The film ‘Her’ is now a reality

OpenAI CEO Sam Altman has said that his favourite film is ‘Her’ directed by Spike Jonze. Now Altman is making his favourite film a reality with GPT-4o.

OpenAI recently announced GPT-4o, its new flagship artificial intelligence model that can reason between audio, video and text in real time. Although only hours have passed, what the new model does is quite shocking. According to the company, GPT-4o, which can read your facial expressions and translate spoken language in real time, can also mimic different types of emotions. Film gurus may immediately associate these statements with the film ‘Her’ directed by Spike Jonze. Because this is exactly the case.

For those who don't know, in Spike Jonze's 2013 film Her, Joaquin Phoenix plays a heartbroken man going through a divorce who falls in love with an AI virtual assistant named Samantha, voiced by Scarlett Johansson. Towards the end of 2023, OpenAI CEO Sam Altman said at an event that Her was one of his favourite films, that he loved how people were using AI and that the film was extremely prescient.
 

Her film is now real

Last night, OpenAI introduced GPT-4o live on air and asked him to tell a story about robots and love. GPT-4o, OpenAI engineers and CTO Mira Murati were asked to tell the story in different tones, interrupting OpenAI engineers and CTO Mira Murati. And the AI just kept doing what it was doing, as if it were someone in the room.

Interestingly, Sam Altman made a post on X after the event and simply wrote: ‘Her’. Of course, GPT-4o in its current form is not as capable and advanced as Samantha in the film, but it is quite close. On the other hand, when we look at OpenAI's own development, we better understand the size of the step taken with GPT-4o.

As we said in our content where we mentioned the details of GPT-4o just above, GPT-4o is not like other models of the company. All previous GPT models worked using several different models. For example, three models were activated for a sound analysis. However, with GPT-4o, everything is combined in a single model. Audio, text and visual information are analysed and converted into output by a single model. This results in a faster and more competent model.

For example, in the previous version, there was also a voice mode, but when you asked something from it, you had to wait for it to finish speaking. In GPT-4o, however, you can now interrupt him while he is talking and give him a new direction. It can also now see the world through your camera and relay what it sees with pin-sharp accuracy.

The new features will be available in a limited ‘alpha’ release in the coming weeks, and will be rolled out first to ChatGPT Plus subscribers after a wider rollout begins. Some advanced features have also been added to the free version and other paid tiers starting today.

Let's end the last part of our content with Sam Altman's words in his blog post published yesterday: ‘The new audio (and video) mode is the best computer interface I've ever used. It feels like the AI in the films, and it's still a little surprising to me that it's real. Getting to human-level reaction times and expressiveness seems like a big change.

Comments