Why Engineers Shouldn’t Trust AI to Do Their Math

We gave the GPTs a freshman physics problem. Most of them messed up.

I was overjoyed to see Claude.ai, which roundly beat all LLMs tested, including ChatGPT4 in the latest test, solved math problems. So I gave it one from my physics book.

A stone is dropped from a tall cliff. One second later, a second stone is thrown vertically downward with a velocity of 20 meters per second. How far from the top of the cliff will the 2nd stone pass the first?

The answer was given in two seconds as 59.6 m. That’s a pretty tall cliff. Surely, my physics book would not suggest a cliff so tall. But the most photographed White Cliffs of Dover are 110 meters so maybe it was right. Let’s give Claude the benefit of the doubt and examine its solution. Here is Claude’s solutions, word for word:

To solve this problem, we need to consider the motion of both stones and find the distance traveled by each stone when the second stone passes the first.

Given information:

– The first stone is dropped from the top of the cliff (initial velocity = 0 m/s).

– The second stone is thrown vertically downward with an initial velocity of 20 m/s.

– The time between dropping the first stone and throwing the second stone is 1 second.

We can use the kinematic equations of motion to calculate the distances traveled by each stone.

For the first stone (dropped from rest):

- Initial velocity (u) = 0 m/s

- Acceleration due to gravity (g) = 9.8 m/s^2 (downward)

- Time (t) = 1 second (for the first stone) + time taken for the second stone to pass the first stone

The distance traveled by the first stone (s1) after time t is given by:

s1 = 1/2 × g × t^2  [equation #1]

For the second stone (thrown vertically downward):

- Initial velocity (u) = 20 m/s

- Acceleration due to gravity (g) = 9.8 m/s^2 (downward)

- Time (t) = time taken for the second stone to pass the first stone

The distance traveled by the second stone (s2) after time t is given by:

s2 = u × t + 1/2 × g × t^2  [equation #2]

At the instant when the second stone passes the first stone, their distances traveled are equal:

s1 = s2 [equation #3]

Substituting the equations and simplifying, we get:

1/2 × 9.8 × t^2 = 20t + 1/2 × 9.8 × t^2

Solving for t, we get:

t = 2 seconds

Now, we can calculate the distance traveled by the second stone in 2 seconds:

s2 = 20 × 2 + 1/2 × 9.8 × 2^2

s2 = 40 + 19.6

s2 = 59.6 meters

Therefore, the second stone will pass the first stone at a distance of 59.6 meters from the top of the cliff. If you were following along, you might have spotted the error in Claude’s reasoning.

Claude used the same time (t) in both equations. But time for the second stone, let’s call it t2, is one second less than time for the first stone, t1. Using equations and hand calculations, admittedly an old-school approach, and setting t2 equal to t1-1, I got an answer of 10.74 meters. But using ChatGPT 4 (the paid version), I got the correct answer Here is ChatGPT 4 solution, cut and pasted: Wolfram’s solution of 10.76 m agrees with mine so I have to give the Wolfram plug in full credit.

The difference is minor and due entirely to using 9.8instead of 9.81 for g.

We have to give ChatGPT 4 full credit.

This exercise is not an endorsement of ChatGPT, nor a criticism of Claude. Several other LLMS got the wrong answer. Fewer got it right. LLMs are proliferating now and of course, I could not test them all, but fewer got it right. Each, however looked as if it was solving the problem correctly. That all of them knew which equations to use is impressive. But it’s scary to think of how many, given the believability of the appearance, will provide an answer that will be believed without being questioned.

As to the question whether any LLM, including ChatGPT, should be used for freshman physics or engineering problems, it seems that the process of having to check the validity of the answer with hand calculations each time would negate any time advantage they provide. What good is an answer provided in 2 seconds when you have to spend the same amount of time you used to spend to check it. You could give the same problem to several LLMs but unless each gives the same answer, there will doubt of all of them. No engineer is going to check 3 LLM and settle on the answer provided by 2 of the three.

For now, proceed with caution. AI had not yet proven it can be as reliable in the solution of all but the most trivial of math problems.