I put DeepSeek AI's coding skills to the test

OpenAI’s Sora generates 10 videos per second and here are the top 5 cities

2025-01-29

Zuck shrugs off DeepSeek, vows to spend hundreds of billions on AI

2025-01-29

DeepSeek exploded into the world’s consciousness this previous weekend. It stands out for 3 highly effective causes:

It is an AI chatbot from China, quite than the US
It is open supply.
It makes use of vastly much less infrastructure than the massive AI instruments we have been .

Given the US authorities’s issues over TikTok and attainable Chinese language authorities involvement in that code, a brand new AI rising from China is certain to generate consideration. ZDNET’s Radhika Rajkumar did a deep dive into these points in her article Why China’s DeepSeek might burst our AI bubble.

On this article, we’re avoiding politics. As an alternative, I am placing DeepSeek R1 via the identical set of AI coding assessments I’ve thrown at 10 different massive language fashions.

The quick reply is that this: spectacular, however not excellent. Let’s dig in.

Check 1: Writing a WordPress plugin

This take a look at was really my first take a look at of ChatGPT’s programming prowess, method again within the day. My spouse wanted a plugin for WordPress that may assist her run an involvement machine for her on-line group.

Her wants have been pretty easy. It wanted to soak up an inventory of names, one identify per line. It then needed to type the names, and if there have been duplicate names, separate them so that they weren’t listed side-by-side.

I did not actually have time to code it for her, so I made a decision to present the AI the problem on a whim. To my big shock, it labored.

Since then, it has been my first take a look at for AIs when evaluating their programming abilities. It requires the AI to know methods to arrange code for the WordPress framework and comply with prompts clearly sufficient to create each the person interface and program logic.

Solely about half of the AIs I’ve examined can totally cross this take a look at. Now, nonetheless, we will add another to the winner’s circle.

DeepSeek created each the person interface and program logic precisely as specified. Up to now, DeepSeek has handed one among 4 assessments.

Check 2: Rewriting a string perform

A person complained that he was unable to enter {dollars} and cents right into a donation entry subject. As written, my code solely allowed {dollars}. So, the take a look at includes giving the AI the routine that I wrote and asking it to rewrite it to permit for each {dollars} and cents

Normally, this ends in the AI producing some common expression validation code. DeepSeek did generate code that works, though there’s room for enchancment. The code that DeepSeek wrote was unnecessarily lengthy and repetitious. My greatest concern is that the DeepSeek validation ensures validation as much as 2 decimal locations, but when a really massive quantity is entered (like 0.30000000000000004), the usage of parseFloat would not have express rounding information.

I might give this to DeepSeek as a result of neither of those points would trigger this system to interrupt when run by a person and would generate the anticipated outcomes.

And that offers DeepSeek two wins out of 4.

Check 3: Discovering an annoying bug

It is a take a look at created after I had a really annoying bug that I had issue monitoring down. As soon as once more, I made a decision to see if ChatGPT might deal with it, which it did.

The problem is that the reply is not apparent. Really, the problem is that there’s an apparent reply, primarily based on the error message. However the apparent reply is the improper reply. This not solely caught me, however it often catches a few of the AIs.

Fixing this bug requires understanding how particular API calls inside WordPress work, having the ability to see past the error message to the code itself, after which realizing the place to search out the bug.

DeepSeek handed this one as effectively, bringing us to a few out of 4 wins. That already places DeepSeek forward of Gemini, Copilot, Claude, and Meta.

Will DeepSeek rating a house run? Let’s discover out.

Check 4: Writing a script

And one other one bites the mud. It is a difficult take a look at as a result of it requires the AI to grasp the interaction between three environments: AppleScript, the Chrome object mannequin, and a Mac scripting instrument known as Keyboard Maestro.

I’d have known as this an unfair take a look at as a result of Keyboard Maestro is just not a mainstream programming instrument. However ChatGPT dealt with the take a look at simply, understanding precisely what a part of the issue is dealt with by every instrument.

Sadly, DeepSeek didn’t have this stage of information. It did not know that it wanted to separate the duty between directions to Keyboard Maestro and Chrome. It additionally had pretty weak information of AppleScript, writing customized routines for AppleScript which can be native to the language.

This leaves DeepSeek with three appropriate assessments and one fail.

Last ideas

I discovered that DeepSeek’s insistence on utilizing a public cloud e-mail handle like gmail.com (quite than my regular e-mail handle with my company area) was annoying. It additionally had numerous responsiveness fails that made doing these assessments take longer than I’d have preferred.

I wasn’t certain I might be capable to write this text as a result of, for a lot of the day, I bought this error when attempting to enroll:

DeepSeek’s on-line providers have lately confronted large-scale malicious assaults. To make sure continued service, registration is quickly restricted to +86 cellphone numbers. Current customers can log in as common. Thanks to your understanding and assist.

Then, I bought in and was capable of run the assessments.

DeepSeek appears to be overly loquacious by way of the code it generates. The AppleScript code in Check 4 was each improper and excessively lengthy. The common expression code in Check 2 was appropriate, however it might have been written in a method that made it far more maintainable.

I am positively impressed that DeepSeek beat out Gemini, Copilot, and Meta. However it seems to be on the previous GPT-3.5 stage, which suggests there’s positively room for enchancment.

For a brand-new instrument working on a lot decrease infrastructure than the opposite instruments, this may very well be an AI to look at.

What do you suppose? Have you ever tried DeepSeek? Are you utilizing any AIs for programming assist? Tell us within the feedback under.

You may comply with my day-to-day mission updates on social media. Make sure you subscribe to my weekly replace publication, and comply with me on Twitter/X at @DavidGewirtz, on Fb at Fb.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.

Tags: AI AI News AIs coding DeepSeek fell Heres put Skills Test

I put DeepSeek AI’s coding skills to the test – here’s where it fell apart

Related articles

Check 1: Writing a WordPress plugin

Check 2: Rewriting a string perform

Check 3: Discovering an annoying bug

Check 4: Writing a script

Last ideas

Undetectable AI vs. Grammarly’s AI Detector: It’s One-Sided

DeepSeek vs. OpenAI: The Battle of Open Reasoning Models

Related Posts

Leave a Reply Cancel reply

Popular Post

Categories

Newsletter

Categories tes

Recent Posts

Newsletter