A Software Development Methodology for Disciplined LLM Collaboration

(github.com)

97 points jay-baleine | 2 comments | 06 Sep 25 10:47 UTC | HN request time: 0.511s | source

Show context

sublinear ◴[06 Sep 25 12:58 UTC] No.45148898[source]▶

This may produce some successes, but it's so much more work than just writing the code yourself that it's pointless. This structured way of working with generative AI is so strict that there is no scaling it up either. It feels like years since this was established to be a waste of time.

If the goal is to start writing code not knowing much, it may be a good way to learn how and establish a similar discipline within yourself to tackle projects? I think there's been research that training wheels don't work either though. Whatever works and gets people learning to write code for real can't be bad, right?

replies(3): >>45148990 #>>45149237 #>>45149588 #

jay-baleine ◴[06 Sep 25 13:47 UTC] No.45149237[source]▶

>>45148898 #

What tends to get overlooked is the actual development speeds these projects achieve.

The PhiCode runtime for example - a complete programming language with code conversion, performance optimization, and security validation. It was built in 14 days. The commit history provides trackable evidence; manual development of comparable functionality would require months of work as a solo developer.

The "more work" claim doesn't hold up to measurement. AI generates code faster than manual typing while systematic constraints prevent the architectural debt that creates expensive refactoring cycles later. The 5-minute setup phase establishes foundations that enable consistent development throughout the project.

On scalability, the runtime demonstrates 70+ modules maintaining architectural consistency. The 150-line constraint forced modularization that made managing these components feasible - each remains comprehensible and testable in isolation. The approach scales by sharing core context (main entry points, configuration, constants, benchmarks) rather than managing entire codebases.

Teams can collaborate effectively under shared architectural constraints without coordination overhead.

This isn't about training wheels or learning syntax. The methodology treats AI as a systematic development partner focused on architectural thinking rather than ad-hoc prompting. AI handles syntax perfectly - the challenge lies in directing it toward maintainable, scalable solutions at production speed.

Previous attempts at structured AI collaboration may have failed, but this approach addresses specific failure modes through empirical measurement rather than theoretical frameworks.

The perceived 'strictness' provides flexibility within proven constraints. Developers retain complete freedom in implementation approaches, but the constraints prevent common pitfalls like monolithic files or tangled dependencies - like guardrails that keep you on the road.

The project examples and commit histories provide concrete evidence for these development speeds and architectural outcomes.

replies(2): >>45149561 #>>45150427 #

gravypod ◴[06 Sep 25 14:27 UTC] No.45149561[source]▶

>>45149237 #

> The PhiCode runtime for example - a complete programming language with code conversion, performance optimization, and security validation. It was built in 14 days. The commit history provides trackable evidence; manual development of comparable functionality would require months of work as a solo developer.

I've been looking at the docs and something I don't fully understand is what PhiCode Runtime does? It seems like:

1. Mapping of ligatures -> keywords (ex: ƒ -> def).

2. Caching of 3 types (source content, python parsing, module imports, and python bytecode).

3. Call into phirust-transpiler which seems to try and convert things into rust code?

4. An http api for requesting these operations.

A lot of this seems to be done with regexs. Was there a motivation for doing string replace instead of python -> ast -> conversion -> new ast -> source? What is this code being used for?

replies(2): >>45149620 #>>45149858 #

1. jay-baleine ◴[06 Sep 25 15:00 UTC] No.45149858[source]▶

>>45149561 #

Your four points are correct:

1. Symbol mapping: Yes - ƒ → def, ∀ → for, λ → lambda, π → print, etc. Custom mappings are configurable.

2. Multi-layer caching: Confirmed - source content cache, transpiled Python cache, module import specs, and optimized bytecode with batch writes.

3. PhiRust acceleration: Clarification - it's a Rust-based transpiler that handles the symbol-to-Python conversion for performance, not converting Python to Rust. When files exceed 300KB, the system delegates transpilation to the Rust binary instead of using Python regex processing.

4. HTTP API: Yes - provides endpoints for transpilation, symbol mapping queries, and engine info to enable IDE integration.

The technical decision to use string replacement over AST manipulation came down to measured performance differences.

The benchmarks show 3,000,000+ chars/sec throughput on extreme stress tests and 1,200,000+ chars/sec on typical workloads. Where AST parsing, transformation, and regeneration introduces overhead that makes real-time symbol conversion impractical for large codebases.

The string replacement preserves exact formatting, comments, and whitespace while maintaining compatibility with any Python syntax. Including future language features that AST parsers might not support yet. Each symbol maps directly to its Python equivalent without intermediate representations that can introduce transformation errors.

The cache system includes integrity validation to detect corrupted cache entries and automatic cleanup of temporary files. Cache invalidation occurs when source files change, preventing stale transpilation results. Batch write operations with atomic file replacement ensure cache consistency under concurrent access.

The runtime serves cognitive improvements for domain-specific development. Mathematical algorithms become more readable when written with actual mathematical notation rather than verbose keywords. It can help in game development, where certain functions can benefit from different naming (eg.: def → skill, def → special, def → equipment).

The gradual adoption path matters for production environments. Teams can introduce custom syntax incrementally without rewriting existing codebases since the transpiled output remains standard Python. The multi-layer caching system ensures that symbol conversion overhead doesn't impact execution performance.

Domain-specific languages for mathematics, finance, education, or any field where visual clarity improves comprehension. The system maintains full Python compatibility while enabling cognitive improvements through customizable syntax.

replies(1): >>45151566 #

2. UncleEntity ◴[06 Sep 25 18:13 UTC] No.45151566[source]▶

>>45149858 (TP) #

> Where AST parsing, transformation, and regeneration introduces overhead that makes real-time symbol conversion impractical for large codebases.

I don't really understand why you need to do anything different when using a parser than the regex method, there's no real reason to have to parse to an AST (with all the python goodness involved with that) at all when the parser can just do the string replacement the same as whatever PhiRust is doing.

I have this peg VM (based on the lpeg papers) I've been poking at for a little while now that, while admittedly I haven't actually tested its speed, I'd be amazed if it couldn't do 3Mb/s...in fact, the main limiting factor seems to be getting bytes off the disk and the parser runtime is just noise compared to that with all the 'musttail' shenanigans going on.

And even that is overkill for simple keyword replacement with all the work done over the years on macro systems needing to be blazing fast -- which is not something I've looked into at all to see how they do their magic except a brief peek at C's macro rules which are, let's just say, complicated.

↑