An amateur historian has discovered a long-lost short story by Bram Stoker

(www.bbc.com)

323 points lermontov | 3 comments | 21 Oct 24 16:14 UTC | HN request time: 0.705s | source

Show context

mmastrac ◴[21 Oct 24 17:25 UTC] No.41906276[source]▶

>>41905664 (OP) #

I started a quick transcription here -- not enough time to complete more than half the first column, but some scans and very rough OCR are here if anyone is interested in contributing:

https://github.com/mmastrac/gibbet-hill

Top and bottom halves of the page in the repo here:

https://github.com/mmastrac/gibbet-hill/blob/main/scan-1.png https://github.com/mmastrac/gibbet-hill/blob/main/scan-2.png

EDIT: If you have access to a multi-modal LLM, the rough transcription + the column scan and the instruction to "OCR this text, keep linebreaks" gives a _very good_ result.

EDIT 2: Rough draft, needs some proofreading and corrections:

https://github.com/mmastrac/gibbet-hill/blob/main/story.md

replies(6): >>41906561 #>>41907098 #>>41907235 #>>41908097 #>>41908454 #>>41918290 #

1. simonw ◴[21 Oct 24 17:53 UTC] No.41906561[source]▶

>>41906276 #

I tried extracting the content using Google Gemini 1.5 Pro 002 using https://aistudio.google.com/ - the first page (scan-2) worked fantastically well, the second page not so much. Here's what I got so far: https://gist.github.com/simonw/ba87f507ef5c11d3335959c055533...

replies(1): >>41906687 #

2. mmastrac ◴[21 Oct 24 18:05 UTC] No.41906687[source]▶

>>41906561 (TP) #

I cropped the columns out into six files -- it might have an easier time with these:

https://github.com/mmastrac/gibbet-hill/blob/main/col-1-a.pn...

replies(2): >>41907087 #>>41907203 #

3. reaperducer ◴[21 Oct 24 18:56 UTC] No.41907203[source]▶

>>41906687 #

…and my wife's Halloween present has been printed.

Tip: Load the pngs into Preview, hit "Auto Levels," and crank up "Sharpness" on each one. Looks pretty good!

↑