←back to thread

116 points rohansood15 | 1 comments | | HN request time: 0.201s | source

Hi HN! We’re Asankhaya and Rohan and we are building Patchwork.

Patchwork tackles development gruntwork—like reviews, docs, linting, and security fixes—through customizable, code-first 'patchflows' using LLMs and modular code management steps, all in Python. Here's a quick overview video: https://youtu.be/MLyn6B3bFMU

From our time building DevSecOps tools, we experienced first-hand the frustrations our users faced as they built complex delivery pipelines. Almost a third of developer time is spent on code management tasks[1], yet backlogs remain.

Patchwork lets you combine well-defined prompts with effective workflow orchestration to automate as much as 80% of these gruntwork tasks using LLMs[2]. For instance, the AutoFix patchflow can resolve 82% of issues flagged by semgrep using gpt-4 (or 68% with llama-3.1-8B) without fine-tuning or providing specialized context [3]. Success rates are higher for text-based patchflows like PR Review and Generate Docstring, but lower for more complex tasks like Dependency Upgrades.

We are not a coding assistant or a black-box GitHub bot. Our automation workflows run outside your IDE via the CLI or CI scripts without your active involvement.

We are also not an ‘AI agent’ framework. In our experience, LLM agents struggle with planning and rarely identify the right execution path. Instead, Patchwork requires explicitly defined workflows that provide greater success and full control.

Patchwork is open-source so you can build your own patchflows, integrate your preferred LLM endpoints, and fully self-host, ensuring privacy and compliance for large teams.

As devs, we prefer to build our own ‘AI-enabled automation’ given how easy it is to consume LLM APIs. If you do, try patchwork via a simple 'pip install patchwork-cli' or find us on Github[4].

Sources:

[1] https://blog.tidelift.com/developers-spend-30-of-their-time-...

[2] https://www.patched.codes/blog/patched-rtc-evaluating-llms-f...

[3] https://www.patched.codes/blog/how-good-are-llms

[4] https://github.com/patched-codes/patchwork

[Sample PRs] https://github.com/patched-demo/sample-injection/pulls

Show context
meiraleal ◴[] No.41084187[source]
PR reviews are the one thing you sure don't want a LLM doing.
replies(4): >>41084276 #>>41084316 #>>41084513 #>>41086015 #
Carrok ◴[] No.41084316[source]
Please elaborate.

While obviously a LLM might miss functional problems, it feels extremely well suited for catching “stupid mistakes”.

I don’t think anyone is advocating for LLMs merging and approving PRs on their own, they can certainly provide value to the human reviewer.

replies(2): >>41084867 #>>41097230 #
spartanatreyu ◴[] No.41097230[source]
> LLM [...] feels extremely well suited for catching “stupid mistakes”.

No.

Linters are extremely well suited for catching stupid mistakes.

LLMs are extremely well suited for the appearance of catching stupid mistakes.

Linters will catch things like this because they can go through checking and evaluating things logically:

> if (

> isValid(primaryValue, "strict") || isValid(secondaryValue, "strict") ||

> isValid(primaryValue, "loose" || isValid(secondaryValue, "loose"))

> //...............................^^^^ Did we forget a closing ')'?

> ) {

> ...

> }

LLMs will only highlight exact problems they've seen before, miss other problems that linters would immediately find, and hallucinate new problems altogether.

replies(2): >>41097913 #>>41098717 #
1. luckilydiscrete ◴[] No.41098717[source]
While true in a subset of problems, linters will also miss stupid mistakes because not everything is syntactical.

AI for example can catch the fact that `phone.match(/\d{10}/)` might break because of spaces, while a linter has no concept of a correct "regex" as long as it matches the regex syntax.

I don't think anyone is arguing that replacing linters with AI is the answer, instead a combination of both is useful.