OpenAI's o1-preview model aced my coding tests, and showed its work (in surprising detail)

Nearly half of Gen AI adopters want it open source – here’s why

2024-11-21

A content writer’s home office, generated with Midjourney

BrandWell Review: More Than A Rebrand

2024-11-21

Often, when a software program firm pushes out a serious new launch in Might, they do not attempt to prime it with one other main new launch 4 months later. However there’s nothing common in regards to the tempo of innovation within the AI enterprise.

Though OpenAI dropped its new omni-powerful GPT-4o mannequin in mid-Might, the corporate has been busy. Way back to final November, Reuters printed a rumor that OpenAI was engaged on a next-generation language mannequin, then generally known as Q*. They doubled down on that report in Might, stating that Q* was being labored on underneath the code title of Strawberry.

Strawberry, because it seems, is definitely a mannequin referred to as o1-preview, which is accessible now as an choice to ChatGPT Plus subscribers. You’ll be able to select the mannequin from the choice dropdown:

As you may think, if there is a new ChatGPT mannequin accessible, I’ll put it by means of its paces. And that is what I am doing right here.

The brand new Strawberry mannequin focuses on reasoning, breaking down prompts and issues into steps. OpenAI showcases this method by means of a reasoning abstract that may be displayed earlier than every reply.

When o1-preview is requested a query, it does some pondering after which shows how lengthy it took to try this pondering. If you happen to toggle the dropdown, you will see some reasoning. This is an instance from certainly one of my coding exams:

It is good that the AI knew sufficient so as to add error dealing with, however I discover it attention-grabbing that o1-preview categorizes that step underneath “Regulatory compliance”.

I additionally found the o1-preview mannequin gives extra exposition after the code. In my first check, which created a WordPress plugin, the mannequin offered explanations of the header, class construction, admin menu, admin web page, logic, safety measures, compatibility, set up directions, working directions, and even check information. That is much more data than was offered by earlier fashions.

However actually, the proof is within the pudding. Let’s put this new mannequin by means of our normal exams and see how properly it really works.

1. Writing a WordPress plugin

This simple coding check requires data of the PHP programming language and the WordPress framework. The problem asks the AI to write down each interface code and useful logic, with the twist being that as a substitute of eradicating duplicate entries, it has to separate the duplicate entries, so they don’t seem to be subsequent to one another.

The o1-preview mannequin excelled. It introduced the UI first as simply the entry area:

As soon as the info was entered, and Randomize Traces was clicked, the AI generated an output area with correctly randomized output information. You’ll be able to see how Abigail Williams is duplicated, and in compliance with the check directions, each entries will not be listed side-by-side:

In my exams of different LLMs, solely 4 of the ten fashions handed this check. The o1-preview mannequin accomplished this check completely.

2. Rewriting a string perform

Our second check fixes a string common expression that was a bug reported by a consumer. The unique code was designed to check if an entered quantity was legitimate for {dollars} and cents. Sadly, the code solely allowed integers (so 5 was allowed, however not 5.25).

The o1-preview LLM rewrote the code efficiently. The mannequin joined 4 of my earlier LLM exams within the winners’ circle.

3. Discovering an annoying bug

This check was created from a real-world bug I had issue resolving. Figuring out the basis trigger requires data of the programming language (on this case PHP) and the nuances of the WordPress API.

The error messages offered weren’t technically correct. The error messages referenced the start and the tip of the calling sequence I used to be working, however the bug was associated to the center a part of the code.

I wasn’t alone in struggling to resolve the issue. Three of the opposite LLMs I examined could not establish the basis reason behind the issue and beneficial the extra apparent (however incorrect) answer of adjusting the start and ending of the calling sequence.

The o1-preview mannequin offered the right answer. In its rationalization, the mannequin additionally pointed to the WordPress API documentation for the features I used incorrectly, offering an added useful resource to be taught why it had made its suggestion. Very useful.

4. Writing a script

This problem requires the AI to combine data of three separate coding spheres, the AppleScript language, the Chrome DOM (how an internet web page is structured internally), and Keyboard Maestro (a specialty programming instrument from a single programmer).

Answering this query requires an understanding of all three applied sciences, in addition to how they need to work collectively.

As soon as once more, o1-preview succeeded, becoming a member of solely three of the opposite 10 LLMs which have solved this drawback.

A really chatty chatbot

The brand new reasoning method for o1-preview definitely does not diminish ChatGPT’s capability to ace our programming exams. The output from my preliminary WordPress plugin check, specifically, appeared to perform as a extra subtle piece of software program than earlier variations.

It is nice that ChatGPT gives reasoning steps in the beginning of its work and a few explanatory information on the finish. Nevertheless, the reasons will be chatty. I requested o1-preview to write down “Hi there world” in C#, the canonical check line in programming. That is how GPT-4o responded:

And that is how o1-preview responded to the identical check:

I imply, wow, proper? That is lots of chat from ChatGPT. You may also flip the reasoning dropdown and get much more data:

All of this data is nice, however it’s lots of textual content to filter by means of. I favor a concise rationalization, with further data choices in dropdowns faraway from the primary reply.

But ChatGPT’s o1-preview mannequin carried out excellently. I look ahead to how properly it’s going to work when built-in extra totally with the GPT-4o options, resembling file evaluation and net entry.

Have you ever tried coding with o1-preview? What had been your experiences? Tell us within the feedback under.

You’ll be able to observe my day-to-day challenge updates on social media. Be sure you subscribe to my weekly replace e-newsletter, and observe me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, and on YouTube at YouTube.com/DavidGewirtzTV.

Tags: aced AI AI News coding detail Model o1preview OpenAIs showed Surprising Tests work

OpenAI’s o1-preview model aced my coding tests, and showed its work (in surprising detail)

Related articles

1. Writing a WordPress plugin

2. Rewriting a string perform

3. Discovering an annoying bug

4. Writing a script

A really chatty chatbot

Apple just gave me a compelling reason to buy the smaller iPhone 16 Pro over the Max this year

Introducing OpenAI o1: A Leap in AI’s Reasoning Abilities for Advanced Problem Solving

Related Posts

Leave a Reply Cancel reply

Popular Post

Categories

Newsletter

Categories tes

Recent Posts

Newsletter