How well can OpenAI's o1-preview code? It aced my 4 tests - and showed its work in surprising detail

Naver-backed Cinamon wants to make 3D video animation easier using AI

2025-02-03

Red Hat’s take on open-source AI: Pragmatism over utopian dreams

2025-02-03

Normally, when a software program firm pushes out a serious new launch in Could, they do not attempt to high it with one other main new launch 4 months later. However there’s nothing typical concerning the tempo of innovation within the AI enterprise.

Though OpenAI dropped its new omni-powerful GPT-4o mannequin in mid-Could, the corporate has been busy. Way back to final November, Reuters revealed a rumor that OpenAI was engaged on a next-generation language mannequin, then often known as Q*. They doubled down on that report in Could, stating that Q* was being labored on beneath the code identify of Strawberry.

Strawberry, because it seems, is definitely a mannequin referred to as o1-preview, which is on the market now as an choice to ChatGPT Plus subscribers. You may select the mannequin from the choice dropdown:

As you may think, if there is a new ChatGPT mannequin out there, I will put it by its paces. And that is what I am doing right here.

The brand new Strawberry mannequin focuses on reasoning, breaking down prompts and issues into steps. OpenAI showcases this method by a reasoning abstract that may be displayed earlier than every reply.

When o1-preview is requested a query, it does some considering after which shows how lengthy it took to do this considering. When you toggle the dropdown, you will see some reasoning. This is an instance from one among my coding assessments:

It is good that the AI knew sufficient so as to add error dealing with, however I discover it fascinating that o1-preview categorizes that step beneath “Regulatory compliance”.

I additionally found the o1-preview mannequin supplies extra exposition after the code. In my first check, which created a WordPress plugin, the mannequin supplied explanations of the header, class construction, admin menu, admin web page, logic, safety measures, compatibility, set up directions, working directions, and even check information. That is much more info than was supplied by earlier fashions.

However actually, the proof is within the pudding. Let’s put this new mannequin by our normal assessments and see how properly it really works.

1. Writing a WordPress plugin

This easy coding check requires information of the PHP programming language and the WordPress framework. The problem asks the AI to write down each interface code and useful logic, with the twist being that as an alternative of eradicating duplicate entries, it has to separate the duplicate entries, so they don’t seem to be subsequent to one another.

The o1-preview mannequin excelled. It introduced the UI first as simply the entry area:

As soon as the info was entered, and Randomize Strains was clicked, the AI generated an output area with correctly randomized output information. You may see how Abigail Williams is duplicated, and in compliance with the check directions, each entries should not listed side-by-side:

In my assessments of different LLMs, solely 4 of the ten fashions handed this check. The o1-preview mannequin accomplished this check completely.

2. Rewriting a string perform

Our second check fixes a string common expression that was a bug reported by a consumer. The unique code was designed to check if an entered quantity was legitimate for {dollars} and cents. Sadly, the code solely allowed integers (so 5 was allowed, however not 5.25).

The o1-preview LLM rewrote the code efficiently. The mannequin joined 4 of my earlier LLM assessments within the winners’ circle.

3. Discovering an annoying bug

This check was created from a real-world bug I had problem resolving. Figuring out the basis trigger requires information of the programming language (on this case PHP) and the nuances of the WordPress API.

The error messages supplied weren’t technically correct. The error messages referenced the start and the tip of the calling sequence I used to be working, however the bug was associated to the center a part of the code.

I wasn’t alone in struggling to resolve the issue. Three of the opposite LLMs I examined could not determine the basis reason behind the issue and really useful the extra apparent (however fallacious) resolution of adjusting the start and ending of the calling sequence.

The o1-preview mannequin supplied the right resolution. In its rationalization, the mannequin additionally pointed to the WordPress API documentation for the capabilities I used incorrectly, offering an added useful resource to study why it had made its suggestion. Very useful.

4. Writing a script

This problem requires the AI to combine information of three separate coding spheres, the AppleScript language, the Chrome DOM (how an internet web page is structured internally), and Keyboard Maestro (a specialty programming instrument from a single programmer).

Answering this query requires an understanding of all three applied sciences, in addition to how they should work collectively.

As soon as once more, o1-preview succeeded, becoming a member of solely three of the opposite 10 LLMs which have solved this downside.

A really chatty chatbot

The brand new reasoning method for o1-preview definitely would not diminish ChatGPT’s capability to ace our programming assessments. The output from my preliminary WordPress plugin check, particularly, appeared to perform as a extra refined piece of software program than earlier variations.

It is nice that ChatGPT supplies reasoning steps at first of its work and a few explanatory information on the finish. Nevertheless, the reasons might be chatty. I requested o1-preview to write down “Good day world” in C#, the canonical check line in programming. That is how GPT-4o responded:

And that is how o1-preview responded to the identical check:

I imply, wow, proper? That is quite a lot of chat from ChatGPT. It’s also possible to flip the reasoning dropdown and get much more info:

All of this info is nice, nevertheless it’s quite a lot of textual content to filter by. I want a concise rationalization, with extra info choices in dropdowns faraway from the principle reply.

But ChatGPT’s o1-preview mannequin carried out excellently. I look ahead to how properly it should work when built-in extra absolutely with the GPT-4o options, akin to file evaluation and internet entry.

Have you ever tried coding with o1-preview? What had been your experiences? Tell us within the feedback under.

You may comply with my day-to-day venture updates on social media. You should definitely subscribe to my weekly replace publication, and comply with me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

Tags: aced AI AI News Code detail o1preview OpenAIs showed Surprising Tests work

How well can OpenAI’s o1-preview code? It aced my 4 tests – and showed its work in surprising detail

Related articles

1. Writing a WordPress plugin

2. Rewriting a string perform

3. Discovering an annoying bug

4. Writing a script

A really chatty chatbot

Introducing OpenAI o1: A Leap in AI’s Reasoning Abilities for Advanced Problem Solving

How AI Helps Map the Universe

Related Posts

Leave a Reply Cancel reply

Popular Post

Categories

Newsletter

Categories tes

Recent Posts

Newsletter