I studied Aider's code and prompts quite a bit in the early stages of building Plandex. I'm grateful to Paul for building it and making it open source.
PS. I've gathered a list of LLM agents (for coding and general purpose) https://docs.google.com/spreadsheets/d/1M3cQmuwhpJ4X0jOw5XWT...
I see the power of LLMs. I use GH Copilot, I use ChatGPT, but I crave deeper integration in my existing toolset. I need to force myself to try in-IDE Copilot Chat. My habit is to go to ChatGPT for anything of that nature and I'm not sure why that is. Sometimes it's the same way I break down my search to for things "I know I can find" then put together the results. In the same way I break down the problem into small pieces and have ChatGPT write them individually or somethings additively.
1. Its dependencies will conflict with your code requirements.
2. If you don't install it within the code environment, you can use `aider run` where you can run local commands and pipe their outputs.
3. You will need to use all it's dependencies even in prod environment that can increase the attack surface.
So until they introduce a global binary install, I suggest using Plandex which is based on Go and can work across any environment within the system
You can install aider with pipx to avoid this. There's a FAQ entry that explains how:
https://aider.chat/docs/faq.html#how-to-use-pipx-to-avoid-py...
Also, why would you want to install aider in a production environment? It's a development tool, I wouldn't expect anyone to use it in prod. But maybe there's a use case I'm not thinking of?
I don't want aider in prod environment. Im saying its hard to remove it from prod if we can't isolate it from code dependencies as its hard to maintain multiple requirements.txt for different envs.
The aider install instructions has more info:
https://aider.chat/docs/install.html#add-aider-to-your-edito...
That said, counting isn't necessarily required to use line numbers. If line numbers are included in the file when it's sent to the model, it becomes a text analysis task rather than a counting task. Here are the relevant prompts: https://github.com/plandex-ai/plandex/blob/main/app/server/m...
I just have copilot in my editor and switch into my editor with C-x C-e for AI completion. I use neovim like example but you can use whatever you like.
EDIT: Oh never mind. I see what it is now. It’s a terminal based flow for editing code. Mine is for command line writing live.
We are interesting in integrating Aider as a tool for Dosu https://dosu.dev/ to help it navigate and modify a codebase on issues like this https://github.com/langchain-ai/langchain/issues/8263#issuec...
> OpenAI just released GPT-4 Turbo with Vision and it performs worse on aider’s benchmark suites than all the previous GPT-4 models. In particular, it seems much more prone to “lazy coding” than the GPT-4 Turbo preview models.
I did start out with just the CLI running locally, but it reached a point where I needed a database and thus a client-server model. Plandex is designed for working on many 'plans' at different levels of the project hierarchy (some users on cloud have 50+ after using it for a week), and there's also a fair amount of concurrency, so it got to be too much for a local filesystem or even something like a local SQLite db.
Plandex also has the ability to send tasks to the background, which I think will start to play a more and more important role as models get better and more capable of running autonomously for longer periods, and I want to add sharing and collaboration features in the future as well, so all-in-all I thought a client-server model was the best base to build from.
I understand where you're coming from though. That local-only simplicity is definitely a nice aspect of Aider.
I'm trying to deploy the server right now so I can try Plandex, it would be easier if I hadn't forgotten my Postgres password...
As a tip, self-hosting would be much easier (which may be something you don't want to do) if you provided a plain Docker image, then it would just be "pull the Docker image, specify the local directory, specify the DB URL, done".
By the way, why does it need a local directory if it has a database? What's stored in the directory?
I do want to make self-hosting as easy as possible. In my experience, there will still be enough folks who prefer cloud to make it work :)
There's a local .plandex directory in the project which just stores the project id, and a $HOME/.plandex-home directory that stores some local metadata on each project--so far just the current plan and current branch.
It runs on Groq (the company I work for), so it's super snappy.
Nonetheless, I'm particularly curious which cases the AI tool can find things that are not easy to find via find & grep (eg: finding URLs that are created via string concatenation, those that do not appear as a string literal in the source code)
Perhaps a larger question there, what's the overall false negative rate of a tool like this? Are there places where it is particularly good and/or particularly poor?
edits: brevity & clarity
Copilot is pretty good but I like the split context of declaring what you are working on in the CLI.
It still suffers from ChatGPT laziness sometimes, you can see it retrying several times to get a correct output before giving up.
It was able to rewrite (partially, some didn't get fully done) 10 files before I hit my budget limits from Vue 2 Class Component syntax to Vue 3 Composition API. It would have needed another iteration or so to iron out the issues (plus some manual clean up/checking from me) but that's within spitting distance of being worth it. For now I'll use ChatGPT/Claude (which I pay for) to do this work but I will keep a close eye on this project, it's super cool!
OpenInterpreter is another project you could check out that is more focused on code/script execution: https://github.com/OpenInterpreter/open-interpreter
If youre worried about changes getting it wrong, just show a prompt with all the batched changes.
me > build my jar, move it to the last folder I copied it to, and run it. LLM > built jar xyz.jar moving jar to x/y/z me > yes. me > redo last command.
Provide rollback/log for these features if need be.
I really dont think you even need an LLM for this. I feel like I can do it with a simple classifier. It just needs to be hooked into to OS, so that it can scan what you were doing, and replicate it.
For example if I keep opening up folder x and dropping a file called build.jar to folder y, a program should be able to easily understand "copy the new jar over"
I imagine at point this is going to be done at the OS level
I love how everyone always leaves PHP off these lists of "popular languages" despite the fact that 80% of the web runs on PHP.
I had similar ideas when I started on Plandex. I wanted it to be able to install dependencies when needed, move files around, etc., but I quickly realized that there's just so much the model needs to know about the system and its state to even have a chance of getting it right. That's not to say it's impossible. It's just a really hard problem and I'd guess the first projects/products to nail it will either come from the OS vendors themselves, or else from people focusing very specifically on that challenge.
I would however be curious to know what percentage of the 80% (or so) is WordPress et al. Since those largely don't involve folks actually writing code. I suspect a very small amount of PHP code is being run a lot.
I hear you on the API costs. You should see my OpenAI bills from building Plandex :-/
Lsp-mode will schedule one request per keypress but then cancel that request at the next keypress. But since the python LSP server doesn't do async, it handles cancel requests by ignoring them
Cursor is a fork of VSCode focused on AI. I'd prefer to use something totally open-source, but Cursor is free, gets regular updates, and I can use my OpenAI API key.
The diff view works well with AI coding assistants. I end up parallelizing more. I let cursor do its thing while I'm already looking at the next file.
I love aider too! Have used it to automate things such as maintaining a translated version of the page in a git pre-commit hook.
A lot of my successful projects have been rewritten later in nodejs. But for getting something up and running to test a concept, PHP is great if you're comfortable with its idiosyncracies.
I'd say Python is just as idiosyncratic, and its packaging system is just too much of a pain point. And Node doesn't ship with mature database interfaces, its dependencies are scary, there's more concern about runaway scripts, crashes are harder to recover from, and a lot of times all you really want from a router is to serve your file structure with some access rules.
I think PHP is still the best choice for prototyping dynamic HTML and logic fast, without any packages or plug-ins. A lotta times I still even use it for short CLI scripts and cron tasks that do database ops.
I had a largish website with few thousand static webpages. Over a period the pages grew into around 100K with some server side features. Over the course of 10 years I did and redid this site in multiple technologies. React, Angular, Spring Boot + Freemarker etc.
However the PHP power version of it remains best for SEO has near zero downtime and no maintenance what so ever runs on a VM that shares like 10 other websites. traffic serves is around 100K visits a day.
I don’t use very much php, but would be remiss if I left my opinion of it as dated as whisper campaign rumors based on interpretation and preference.
Packages like Laravel and especially technologies like Hotwire are nothing to overlook.
Standardized and capable frameworks that have large workforces can be quite valuable at time of valuation and due diligence. Specialized and brittle techs can be a challenge.
There is still some ambiguity there because cases might slightly differ, youre right.
For rm/mv. mv is easily reversible no? You just need to store some context. Same with rm, just copy it to a temp directory. But again with a confirmation prompt its a non issue either way.
build a jar. > I can build a jar with x,y,z, which do you want?
I'm sure I'll have to eat these words, but: This just doesn't feel like the right interface to me. LLMs are incredible at generating "inroads" to a problem, but terrible at execution. Worse yet at anticipating future problems.
All this might very well change. But until it does, I just want my LLMs to help me brainstorm and help me with syntax. I think there's a sweet spot somewhere between this tool and Copilot, but I'm not sure where.
I think I used it for 10-15 rounds of iteration on my latest project and it generated about 50% of the code of a web app with Python backend. Pretty sweet and costs nothing on top of the web subscription. The funny part is that I was using this AI coding tool to build another AI tool to manage a collection of prompts and demonstrations including automatic prompt evaluation, so I was using an AI tool to make another AI tool.
- Review GitHub PR and suggest fixes.
- Improve the readability of code with a single command (devs suck at naming variables).
- context aware autocomplete for real
It's not as slick as SQL on a RDBMS, but very close, and integrates well into e.g. vim, so I can directly pull in output from the tools and add notes when I'm building up my reports. Finding partial URL:s, suspicious strings like API keys, SQL query concatenation and the like is usually trivial.
For me to switch to another toolset there would have to be very strong guarantees that the output is correct, deterministic and the full set of results, since this is the core basis for correctness in my risk assessments and value estimations.
> when exploring new solutions and "unknown territory"
If it’s something I have no idea how to do I might describe the problem and just look at the code it spits out; not even copy pasting but just reading for a basic idea.
> how do you compare it with "regular search" via Google/Bing
Much worse if there’s a blog post or example in documentation that’s exactly what I’m looking for, but, if it’s something novel, much better.
An example:
Recently asked how I could convert pressure and temperature data to “skew T” coordinates for a meteorological plot. Not something easy to Google, and the answers the AI gave were slightly wrong, but it gave me a foot in the door.
I asked it to rename a global variable. It broke the application and failed to understand scoping rules.
Perhaps it is bad luck, or perhaps my Go code is weird, but I don't understand how y'all wanna trust this.
I do agree though, these basic examples do seem quite pointless, if you already know what you’re doing. It’s just as pointless as telling another developer to “add a name param to ‘greeting’ function, add all types”, which you’d then have to review.
I think it comes down to your level of experience though. If you have years and years of experience and have honed your search skills and are perfectly comfortable, then I suspect there isn’t a lot that an LLM is going to do when it comes to writing chunks of code. That’s how I’ve felt about all these “write a chunk of code” tools.
In my case, apart from automating the kind of repetitive, mindless work I mentioned, it’s just been a glorified autocomplete. It works -really- well for that, especially with comments. Oftentimes I find myself adding a little comment that explains what I’m about to do, and then boop, I’ve got the next few lines autocompleted with no surprises.
I had to work without an internet connection a few days ago and it really, really hit me how much I’ve come to use that autocomplete - I barely ever type anything to completion anymore, it was jarring, having to type everything by hand. I didn’t realise how lazy my typing had become.
Why is it recommended to not quickly review the changes (git status, git diff) before committing?
Can you explain more how "checking everything is right takes just a few seconds as well? A code review can't happen in "just a few seconds" so maybe I don't understand what the process your describing really is
We live in a world with everything from macro systems and code generation to higher-order functions and types... if you find yourself writing the same "boilerplate" enough times that you find it annoying, just automate it, the same way you can automate anything else we do using software. I have found myself writing very little "boilerplate" in my decades of software development, as I'd rather at the extreme (and it almost never comes to this) throw together a custom compiler than litter my code with a bunch of hopefully-the-same-every-time difficult-to-adjust-later "boilerplate".
There's zero technical challenge, almost no logic, super tedious for a human to do, not quite automatable since there could be any kind of code in those views, and it's very very unlikely that the LLM gets it wrong. I give it a quick look over, it looks right, the tests pass, it's not really a big deal.
And one nice thing I did as well was ask it to "move all logic to the top of the file", which makes it -very- easy to clean up all the "quick fix" cruft that's built up over years that needs to be cleaned up or refactored out.
In those cases the file might indeed need more time dedicated to it, but it would've needed it either way.
Nah these things are all stupid as hell. Any back and forth between a human and an LLM in terms of problem solving coding tasks is an absolute disaster.
People here and certainly in the mainstream population see some knowledge and just naturally expect intelligence to go with it. But it doesn't. Wikipedia has knowledge. Books have knowledge. LLMs are just the latest iteration of how humans store knowledge. That's about it, everything else is a hyped up bubble. There's nothing in physics that stops us from creating an artificial, generally intelligent being, but it's NEVER going to be with auto-regressive next-token prediction.
The current wave of coding assistants target junior programmers who don't know how to even start approaching a task. LLMs are quite good at spitting out code that will create a widget or instantiate a client for a given API, figuring out all the parameters and all the incantations that you'd otherwise need to copy paste from a documentation. In a way they are documentation "search and digest" tools.
While that's also useful for senior developers when they need to work outside of their particular focus area, it's not that useful to help you work on a mature codebase where you have your own abstractions and all sorts of custom things that have good reasons to be there but are project specific.
Sure, we could eventually have LLMs that can be fine tuned to your specific projects, company or personal style.
But there is also another area where we can use intelligent assistants: editors.
Right now editors offer powerful tools to move around and replace text, often in ways that respects the syntax of the language. But it's cumbersome to use and learn, relying on key bindings or complicated "refactoring" commands.
I wish there was a way for me to have a smarter editor. Something that understands the syntax and a bit the semantics of the code but also the general intent of the local change in working on and the wider context so it can help me apply the right edits.
For example right now I'm factoring out a part of a larger function into it's own function so it can be called independently.
I know there are editor features that predate AI that can do this work but for various reasons I can't us it. For example, you may have started to do it manually because it seemed simple and then you realize you have to factor out 5 parameters and it becomes a boring exercise of copy paste. Another example is that the function extraction refactoring tool of your IDE just can't handle that case, for example: func A(a Foo) { b := a.GetBar(); Baz(b.X, b.Y, c, d) } you'd want to extract a function func _A(b Bar) { Baz(b.X.... and have A call that. In some simple cases the IDE can do that. In other you need to do it manually.
I want an editor extension that can help me with the boring parts of shuffling parameters around, moving them in structures etc etc all the while I'm in control of the shape of the code but I don't have to remember the advanced editor commands but instead augment my actions with some natural language comments (written or even spoken!)
I paste the table definition into a comment, and let the LLM generate the model (if the ORM doesn't automate it), the list of validation rules, custom type casts, whatever specifics your project has. None of it is new or technically challenging, it's just autocompleting stuff I was going to write anyway.
It's not that you're writing "too much" boilerplate; this is a tiny part of my work as well. This is just the one part where I've actually found an LLM useful. Any time I feel like "yeah this doesn't require thought, just needs doing", I chuck it over to an LLM to do.
Statistical prediction has its limitations - who knew.
core programming hasn't really changed over the past years with good reason: you need. to. understand what you do. this is the bottleneck. not writing it.
Telling it what I want to do in a broader term and asking for code examples is a lot better., especially for something I don't know how to do.
Otherwise the autocomplete/suggestions in the editor is great for the minutia and tedious crap and utility functions. Probably saves me about 20% typing which is great on hands that have typing for 20 odd years.
It's also good for finding tools and libraries (when it doesn't hallucinate) since https://libs.garden disappeared inexplicably (dunno what to do on Friday nights now that I can't browse through that wonderful site till 2am)
I have trouble understanding the "boilerplate" thing because avoiding writing boilerplate is
1) already a solved "problem" long before AI
2) is it really a "problem"?
The first point: * If you find yourself writing the same piece of code over and over again in the same code it's the indication that you should abstract it away as a function / class / library.
* IDEs have had snippets / code completion for a long time to save you from writing the same pieces of code.
* Large piece of recycled functionalities are generally abstracted away in libraries of frameworks.
* Things like "writing similar static websites a million times" are the reason why solutions like WordPress exist: to take away the boilerplate part of writing websites. This of course applies to solutions / technologies / services that make "avoid writing boilerplate code" their core business
* The only type of real boilerplate that comes to my mind are things like "start a new React application" but that is a thing you do once per project and it's the reason why boostrappers exist so that you only really have to type "npx create-react-app my-app" once and the boilerplate part is taken care of.
The second point: Some mundane refactoring / translations of pieces of code from one technology to the other can actually be automated by AI (I think it's what you're talking about here, but how often does one really do such tasks?), but... Do you really want to? Automate, it, I mean?
I mean, yes "let AI do the boring staff so that I can concentrate on the most interesting parts" make sense, but it's not something I want to do. Maybe it's because I'm aging, but I don't have it in me to be concentrated on demanding, difficult, tiring tasks 8 hour straight a day. It's not something that I can and it's also something that I don't want to.
I much prefer alternating hard stuff that require 100% of my attention with lighter tasks that I can do while listening to a podcast and steam off in order to rest by brain before going back to a harder task. Honestly I don't think anyone is supposed to be concentrated on demanding stuff all day long all week long. That's the recipe for a burnout.
I’ll admit some of that might be from me being used to what I get from GH Copilot but basic stuff like initializing a variable called “count” with `0` or “++”-ing it in a loop were both things it didn’t auto-complete. I switched back to Copilot and it did exactly what I expected.
The polish is lacking with Cody and the errors are completely unacceptable in a paid product. I’ve seen 2 Copilot outages the entire time I’ve been using it (since before GA) so to have Cody barf up stupid errors multiple times in a 3-day period is just ridiculous.
I actually like completions more, it feels more natural. I’m fine to go to ChatGPT/Opus to chat if needed.
- create migration files locally, run statements against containerized local postgres instance - use a custom data extractor script in IntelliJ's data tool to generate r2dbc DAO files with a commented out CSV table containing column_name, data_type, kotlin_type, is_nullable as headers - let AI assistant handle the rest
I find it is faster in lots of cases where the solution is 'simple' but long and and a bit fiddly. As a concrete example from earlier today, I needed a function that took a polygon and returned a list of its internal angles. Can I write it myself, sure. Did copilot generate the code (and unit tests) for me in a fraction of the time it would have taken me to do it, absolutely.
It is capable of handling complex tasks like feature development and refactoring across multiple files, but it doesn't try to generate diff and apply them automatically.
Instead, you will get a response from LLM that is easy to read and allow you as a developer to quickly apply to your existing codebase.
You can check it out here: https://prompt.16x.engineer/
That demo GIF is just showing a toy example. To see what it's like to work with aider on more complex changes you can check out the examples page [0].
The demo GIF was just intended to convey the general workflow that aider provides: you ask for some changes and aider shares your existing code base with the LLM, collects back the suggested code edits, applies them to your code and git commits with a sensible commit message.
This workflow is generally a big improvement over manually cutting and pasting bits of code back and forth between the ChatGPT UI and your IDE.
Beyond just sending the code that needs to be edited, aider also sends GPT a "repository map" [1] that gives it the overall context of your codebase. This makes aider more effective when working in larger code bases.
In particular, this is one of the most important tips: Large changes are best performed as a sequence of thoughtful bite sized steps, where you plan out the approach and overall design. Walk GPT through changes like you might with a junior dev. Ask for a refactor to prepare, then ask for the actual change. Spend the time to ask for code quality/structure improvements.
Not sure if this was a factor in your attempts? I'd be happy to help you if you'd like to open an GitHub issue [1] our jump into our discord [2].
[0] https://github.com/paul-gauthier/aider#tips
[1] https://github.com/paul-gauthier/aider/issues/new/choose
Me: Can you give me code for a simple image viewer in python? It should be able to open images via a file open dialog as well as show the previous and next image in the folder
GPT: [code doing that with tkinter]
Me: That code has a bug because the path handling is wrong on windows
GPT: [tries to convince me that the code isn't broken, fixes it regardless]
Me: Can you add keyboard shortcuts for the previous and next buttons
GPT: [adds keyboard shortcuts]
After that I did all development the old fashioned way, but that alone saved me a good chunk of time. Since it was just internal tooling for myself code quality didn't matter, and I wasn't too upset about the questionable error handling choices
'typeofchange(scopeofchange): reason for change'
It sort helps force devs to type out more meaningful commit messages.
A "marketingy" demo video: https://www.youtube.com/watch?v=DXunbNBpgZg&ab_channel=Wasp
I like the idea of something like `plandex load some-dir --defs` to load definitions with tree-sitter. I don't think I'd load the whole repo's defs by default like Aider does (I believe?), because that could potentially use a lot of tokens in a large repo and include a lot of irrelevant definitions. One of Plandex's goals is to give the user granular control over what's in context.
But for now if you wanted to do something where definitions across the whole repo would be helpful (vs. loading in specific files or directories) then Aider is better at that.
Understanding a codebase, along with the in/outs between the calls is pretty vital to any codebase, especially the larger a codebase gets.
I'm not attached to the way Aider or Plandex does anything, but I'm still not clear on which scenarios it's worth considering compared to Aider, or vice Versa. Aider seems pretty unique and stands alone on a number of things. I'll still install Plandex and try it out.
Without details, it's a little surprising a post like this could get upvoted so much.
Like I said, I think Aider's use of tree-sitter is a great concept and something I'd like to incorporate in some way. I'm not at all trying to claim that Plandex is 'better' than Aider for every use case. I think they are suited to different kinds of tasks.
I actually agree in the general case, but for specific applications these tools can be seriously awesome. Case in point - this repo of mine, which I think it's fair to say was 80% written by GPT-4 via Aider.
https://github.com/epiccoleman/scrapio
Now of course this is a very simple project, which is obviously going to have better results. And if you read through the commit history [1], you can see that I had to have a pretty good idea of what had to be done to get useful output from the LLM. There are places where I had to figure out something that the LLM was never going to get on its own, places where I made manual changes because directing the AI to do it would have been more trouble than it was worth, etc.
But to me, the cool thing about this project was that I just wouldn't have bothered to do it if I had to do all the work myself. Realistically I just wanted to download and process a list of like 15 urls, and I don't think the time invested in writing a scraper would have made sense for the level of time I would have saved if I had to figure it all out myself. But because I knew specifically what needed to happen, and was able to provide detailed requirements, I saved a ton of time and labor and wound up with something useful.
I've tried to use these sorts of tools for tasks in bigger and more complicated repos, and I agree that in those cases they really tend to swing and miss more often than not. But if you're smart enough to use it as the tool it is and recognize the limitations, LLM-aided dev can be seriously great.
[1]: https://github.com/epiccoleman/scrapio/commits/master/?befor...
If you are running nixos, an example of using it can be found here: https://github.com/breakds/nixos-machines/blob/main/flake.ni...
Usually you do this with a human as an investment in their future performance, with the understanding that this is the least efficient way to get the job done in the short term.
Having to take a product that is already supposed to "grok code" and make a similar investment doesn't make any sense to me.
I wish every language just came with a good ctags solution that worked with all IDEs. When this is set up properly I rarely need more power than a shortcut to look up tags.
LLM foo is very much a real thing. They are surprisingly difficult to use well, but can be very powerful.