←back to thread

176 points GavCo | 3 comments | | HN request time: 0.567s | source

The new Gemini 3 Pro Image model (aka Nano Banana) is incredible at generating slides, so I thought it would be fun to build a CLI tool that lets you edit PDF presentations using plain English. The tool converts the page you want to edit into an image, sends it to the model API together with your prompt to generate an edited image, then converts the updated image back and stitches into the original document.

Examples:

- `nano-pdf edit deck.pdf 5 "Update the revenue chart to show Q3 at $2.5M"`

- `nano-pdf add deck.pdf 15 "Create an executive summary slide with 5 bullet points"`

Features:

- Edit multiple pages in parallel

- Add entirely new slides that match your deck's style

- Google Search enabled by default so the model can look up current data

- Preserves text layer for copy/paste and search

It can work with any kind of PDF but I expect it would be most useful for a quick edit to a deck or something similar.

GitHub: https://github.com/gavrielc/Nano-PDF

Show context
tecoholic ◴[] No.46090986[source]
> Converts an image to a single-page PDF with a hidden text layer using Tesseract. This is the 'State Preservation' step.

Does this mean the text only pdf page is transformed into an image that covers the full page, but the text is still under there. So, any machine based extraction would still get the text, but would probably loose all the bounding box information and regular users cannot just use their mouse to select text anymore?

replies(1): >>46091899 #
kumarm ◴[] No.46091899[source]
Seems true and really wish the project included some sample PDF output.

My Text to Speech app uses bounding box to display what text in PDF is being read and would not work well PDF's from this project.

replies(1): >>46095291 #
1. GavCo ◴[] No.46095291[source]
OP here, I added a sample PDF output in the project assets and put screenshots in the ReadMe. The text is selectable after rehydration. would this work with your app?
replies(2): >>46095998 #>>46099381 #
2. tecoholic ◴[] No.46095998[source]
Wait! what? This is incredible. Amazing work.
3. kumarm ◴[] No.46099381[source]
Amazing. Worked really well. Thank you.