social.kernel.org

Conversation

James Morris

I've been experimenting with having Claude finish an unfinished coding project. You've really got to watch it like a hawk. It does some inexplicable things like removing important comments, fixing non-existent buffer overflows, breaking protocol parsing etc. What am I missing?

penguin42

penguin42@mastodon.org.uk

3 months ago

Reply to @jmorris

@jmorris Have you tried making it write the tests for the parsing first?

James Morris

jmorris

3 months ago

Reply to @penguin42@mastodon.org.uk

@penguin42 nope, good idea.

bert hubert 🇺🇦🇪🇺🇺🇦

bert_hubert@eupolicy.social

3 months ago

Reply to @jmorris

@jmorris you are missing that other users do not care as much as you do about quality.

Michael K Johnson

mcdanlj@social.makerforums.info

3 months ago

Reply to @jmorris

Edited 3 months ago

@jmorris It's been trained on a metric boatload of humans "fixing other things while I'm in here" but I've found it generally amenable to strict instructions not to deviate from its assigned task.

Breaking it up into small units of work just like a human should do anyway can help the stochastic parrot avoid scope creep by keeping the goals tight.

I've done things like "assuming the code is correct, fix all the failing tests" and then reviewed the changes to the tests. Yesterday I had a task that was most definitely easier to change in code than to explain confidently to it what to do, but I was facing dozens of tests with failing mocks and the brain-melting process of then fixing the trivially broken tests. So I had it fix the tests, then reviewed all the changes as a form of indirect review of my own code. A test changing in an unexpected way would have sent me back with new ideas about what was wrong with the code. And in fact my first attempt did exactly that; it helped me very quickly realize a missed assumption, fix that, and start over from scratch. Again writing code by hand and letting the robot adapt tests, then reviewing that all the test changes made sense for adapting to my code change.

But also a good CLAUDE.md that instructs it to always color within the lines has been quite helpful in my experience for reducing this unhelpful behavior. I've also reinforced this in prompts, especially where I think it is likely to go astray.

James Morris

jmorris

3 months ago

Reply to @mcdanlj@social.makerforums.info

@mcdanlj Thanks for the info! I'm expecting to need to provide a lot of structure & prompting, which is the real aim of the learning exercise for me here, but I was surprised it did some very random things that did not have any internal logical consistency. This is with Haiku, so perhaps it will be better with one of the current models. At some point I gather we will have an agent which also does the prompt development & managing of the coding AI?

Michael K Johnson

mcdanlj@social.makerforums.info

3 months ago

Reply to @jmorris

@jmorris I currently have Claude set to choose its own model, and it uses multiple models, and sometimes it spins off sub-agents to do work; it is therefore actually acting as an agent which is prompting sub-agents.

For simple, obvious tasks, the cheaper models are sometimes better because they, in anthropomorphizing terms, can be "less imaginative." 🤣 But also sometimes they just aren't as good. Worktrees and branches and trying things a few different ways can apply to LLM models as well as to trying a few different ways to solve a problem "by hand."

I do run it in a container with very limited access. I used https://github.com/gherlein/localdev as a starting point, but then practically re-wrote it. While Greg Herlein uses it to make --dangerously-skip-permissions relatively safe, I use it because running something that is designed to act as if it has agency exceeds my trust for running outside of fairly strict isolation. ☺

Another tip I've learned: If the LLM starts going off the rails, don't re-prompt that session more than twice trying to redirect. Tell it to write a plan as a markdown file, then exit that session. Review and possibly edit the plan, and then start a new session. You can try other models on the same plan, or iterate on changing the plan. Even before the context window gets all the way full, the LLM "gets stupid" — I think of it kind of like a filesystem, where allocation tends to get hard long before the filesystem is actually 100% full.

Even without the LLM going off the rails, it's often a good idea to tell the LLM to write a plan before doing anything, and you can review the plan, and change it.

I hope some of that is useful...

James Morris

jmorris

3 months ago

Reply to @mcdanlj@social.makerforums.info

@mcdanlj very useful, thanks. I think you may be describing the context window? I've seen it in other non-gen AI areas.

Michael K Johnson

mcdanlj@social.makerforums.info

3 months ago

Reply to @jmorris

@jmorris Yes, exactly.

Also if the LLM tells you that it is compacting, that's another hint to have it write a plan and exit and start fresh. Again vaguely like a filesystem, where you often get higher performance if you can back up, wipe, and restore than just "defragmenting" — I don't know how strongly the analogy works but it's how I remember it... 🤣

And of course all this might be obsolete tomorrow. 🙄

James Morris

jmorris

3 months ago

Reply to @mcdanlj@social.makerforums.info

@mcdanlj yep, I assume they will abstract all of this away somehow. btw, it is doing some impressive things already. I said "you are a very senior developer and keen to impress with clean, bug-free code" and that seemed to have a big impact on its attitude. This is for controlling a radio via C-IV over serial, a messy undertaking especially for cross-platform UI and hardware. I told it to make a radio simulator which listens on a local UDP port for testing & it saved me probably hours just on that.

Michael K Johnson

mcdanlj@social.makerforums.info

3 months ago

Reply to @jmorris

@jmorris Heh! I've been thinking of trying to use Claude to write a touch screen control for my QMX and maybe KX3 to run on an ESP32 "cheap yellow display" so I can have exactly the controls I want. I like the idea of asking it to write a radio simulator for testing — I'll keep that in mind if I follow through on this project!

Good luck with the project. 🙂

James Morris

jmorris

3 months ago

Reply to @mcdanlj@social.makerforums.info

@mcdanlj tried a simpler project from scratch with a newer model. Took about an hour, then each new feature took 5-10 mins each. Saved many hours of work. https://github.com/xjamesmorris/pskreporter-tool

About social.kernel.org

Terms of service

Please do not use this service in violation of the Linux Kernel Code of Conduct. Doing so will result in your account suspension with the referral of the matter to the CoC committee.
"Repeating"/"boosting" someone else's status on this platform will be treated as endorsement and will fall under rule #1.
You are encouraged to use this platform to promote your work on the Linux Kernel, but there is no restriction on permitted topics (with the exception of anything covered by #1 above).
There is no requirement to post in English, but it should be considered the primary language of communication on this platform.

Privacy notice

The admins of this service have access to all posted statuses. They aren't looking, but if it's something they shouldn't know about, then you should not post it on this platform.

Please see the Linux Foundation Privacy Policy, which applies to this platform as well.

Getting your own account

If you would like an account on this instance, please check that the following applies to you:

You are listed in MAINTAINERS or CREDITS
OR: You have a kernel.org account or email address
OR: You have a long and established history of involvement with the Linux Kernel

If the above is true and you agree with the Terms of Service and Privacy Notice listed above, please use these instructions to request an account:

How to request an account on social.kernel.org