A Research Preview of Codex

(openai.com)

511 points meetpateltech | 3 comments | 16 May 25 15:02 UTC | HN request time: 0.84s | source

Show context

prhn ◴[16 May 25 15:32 UTC] No.44006680[source]▶

Is anyone using any of these tools to write non boilerplate code?

I'm very interested.

In my experience ChatGPT and Gemini are absolutely terrible at these types of things. They are constantly wrong. I know I'm not saying anything new, but I'm waiting to personally experience an LLM that does something useful with any of the code I give it.

These tools aren't useless. They're great as search engines and pointing me in the right direction. They write dumb bash scripts that save me time here and there. That's it.

And it's hilarious to me how these people present these tools. It generates a bunch of code, and then you spend all your time auditing and fixing what is expected to be wrong.

That's not the type of code I'm putting in my company's code base, and I could probably write the damn code more correctly in less time than it takes to review for expected errors.

What am I missing?

replies(16): >>44006706 #>>44006751 #>>44006766 #>>44006808 #>>44006858 #>>44006868 #>>44006872 #>>44007014 #>>44007038 #>>44007115 #>>44007288 #>>44007383 #>>44007699 #>>44009108 #>>44012169 #>>44014213 #

1. IXCoach ◴[16 May 25 16:26 UTC] No.44007288[source]▶

>>44006680 #

Hey there!

Lots missing here, but I had the same issues, it takes iteration and practice. I use claude code in terminal windows, and text expander to save explicit reminders that I have to inject super regularly because anthropic obscures access to system prompts.

For example, I have 3 to 8 paragraph long instructions I will place regularly about not assuming, checking deterministically etc. and for most things I have the agents write a report with a specific instruction set.

I pop the instructions into text expander so I just type - docs when saying go figure this out, and give me the path to the report when done.

They come back with a path, and I copy it and search vscode

It opens as an md and i use preview mode, its similar to a google doc.

And ill review it. always, things will be wrong, tons of assumptions, failures to check determistically, etc... but I see that in the doc and have it fix it. correct misunderstandings, update the doc until its perfect.

From there ill say add a plan in a table with status for each task based on this ( another text expander snippet with instructions )

And WHEN thats 100% right, Ill say implement and update as you go. The update as you go forces it to recognize and remember the scope of the task.

Greatest points of failure in the system is misalignment. Ethics teams got that right. It compounds FAST if allowed. you let them assume things, they state assumptions as facts, that becomes what other agents read and you get true chaos unchecked.

I started rebuilding claude code from scratch literally because they block us from accessing system prompts and I NEED these agents to stop lying to me about things that are not done or assumed, which highlights the true chaos possible when applied to system critical operations in governance or at scale.

I also built my own tool like codex for managing agent tasks and making this simpler, but getting them to use it without getting confused is still a gap.

Let me know if you have any other questions. I am performing the work of 20 Engineers as of today, rewrote 2 years of back end code that required a team of 2 engineers full time work in 4 weeks by myself with this system... so I am, I guess quite good at it.

I need to push my edges further into this latest tech, have not tried codex cli or the new tool yet.

replies(1): >>44007336 #

2. IXCoach ◴[16 May 25 16:30 UTC] No.44007336[source]▶

>>44007288 (TP) #

Its a total of about 30 snippets, avg 6 paragraphs long, that I have to inject. for each role switch it goes through i have to re inject them.

its a pain but it works.

Even TDD it will hallucinate the mocks without management. and hallucinate the requirements. Each layer has to be checked atomically, but the text expander snippets done right can get it close to 75% right.

My main project faces 5000 users so I cant let the agents run freely, whereas with isolated projects in separate repos I can let them run more freely, then review in gitkraken before committing.

replies(1): >>44008428 #

3. Rudybega ◴[16 May 25 18:24 UTC] No.44008428[source]▶

>>44007336 #

You could just use something like roo code with custom modes rather than manually injecting them. The orchestrator mode can decide on the other appropriate modes to use for subtasks.

You can customize the system prompts, baseline propmts, and models used for every single mode and have as many or as few as you want.

↑