Launch HN: Midship (YC S24) – Turn PDFs, docs, and images into usable data

1. hubraumhugo ◴[06 Nov 24 18:47 UTC] No.42067242[source]▶

>>42066500 (OP) #

Congrats on the launch! A quick search in the YC startup directory brought up 5-10 companies doing pretty much the same thing:

- https://www.ycombinator.com/companies/tableflow

- https://www.ycombinator.com/companies/reducto

- https://www.ycombinator.com/companies/mindee

- https://www.ycombinator.com/companies/omniai

- https://www.ycombinator.com/companies/trellis

At the same time, accurate document extraction is becoming a commodity with powerful VLMs. Are you planning to focus on a specific industry, or how do you plan to differentiate?

replies(8): >>42067424 #>>42067521 #>>42067529 #>>42067560 #>>42067808 #>>42068776 #>>42071352 #>>42073283 #

2. maxmaio ◴[06 Nov 24 18:58 UTC] No.42067424[source]▶

>>42067242 (TP) #

Yes there is definitely a boom in document related startups. We see our niche as focusing on non technical users. We have focused on making it easy to build schemas, an audit and review experience, and integrating into downstream applications.

3. ◴[06 Nov 24 19:03 UTC] No.42067521[source]▶

>>42067242 (TP) #

4. ◴[06 Nov 24 19:04 UTC] No.42067529[source]▶

>>42067242 (TP) #

5. tlofreso ◴[06 Nov 24 19:06 UTC] No.42067560[source]▶

>>42067242 (TP) #

"accurate document extraction is becoming a commodity with powerful VLMs"

Agree.

The capability is fairly trivial for orgs with decent technical talent. The tech / processes all look similar:

User uploads file --> Azure prebuilt-layout returns .MD --> prompt + .MD + schema set to LLM --> JSON returned. Do whatever you want with it.

replies(2): >>42068181 #>>42069425 #

6. mitchpatin ◴[06 Nov 24 19:22 UTC] No.42067808[source]▶

>>42067242 (TP) #

TableFlow co-founder here - I don't want to distract from the Midship launch (congrats!) but did want to add my 2 cents.

We see a ton of industries/use-cases still bogged down by manual workflows that start with data extraction. These are often large companies throwing many people at the issue ($$). The vast majority of these companies lack technical teams required to leverage VLMs directly (or at least the desire to manage their own software). There’s a ton of room for tailored solutions here, and I don't think it's a winner-take-all space.

replies(1): >>42067996 #

7. maxmaio ◴[06 Nov 24 19:35 UTC] No.42067996[source]▶

>>42067808 #

+1 to what mitch said. We believe there is a large market for non-technical users who can now automate extraction tasks but do not know how to interact with apis. Midship is another option for them that requires 0 programming!

8. kietay ◴[06 Nov 24 19:45 UTC] No.42068181[source]▶

>>42067560 #

Totally agree that this is becoming the standard "reference architecture" for this kind of pipeline. The only thing that complicates this a lot today is complex inputs. For simple 1-2 page PDFs what you describes works quite well out of the box but for 100+ page doc it starts to fall over in ways I described in another comment.

replies(1): >>42068732 #

9. tlofreso ◴[06 Nov 24 20:21 UTC] No.42068732{3}[source]▶

>>42068181 #

Are really large inputs solved at midship? If so, I'd consider that a differentiator (at least today). The demo's limited to 15pgs, and I don't see any marketing around long-context or complex inputs on the site.

I suspect this problem gets solved in the next iteration or two of commodity models. In the meantime, being smart about how the context gets divvied works ok.

I do like the UI you appear to have for citing information. Drawing the polygons around the data, and then where they appear in the PDF. Nice.

10. erulabs ◴[06 Nov 24 20:24 UTC] No.42068776[source]▶

>>42067242 (TP) #

Execution is everything. Not to drop a link in someone else’s HN launch but I’m building https://therapy-forms.com and these guys are way ahead of me on UI, polish, and probably overall quality. I do think there’s plenty of slightly different niches here, but even if there were not, execution is everything. Heck it’s likely I’ll wind up as a midship customer, my spare time to fiddle with OCR models is desperately limited and all I want to do is sell to clinics.

replies(1): >>42072611 #

11. Kiro ◴[06 Nov 24 21:09 UTC] No.42069425[source]▶

>>42067560 #

Why all those steps? Why not just file + prompt to JSON directly?

replies(1): >>42070089 #

12. tlofreso ◴[06 Nov 24 21:55 UTC] No.42070089{3}[source]▶

>>42069425 #

Having the text (for now) is still pretty important for quality output. The vision models are quite good, but not a replacement for a quality OCR step. A combination of Text + Vision is compelling too.

13. hermitcrab ◴[06 Nov 24 23:41 UTC] No.42071352[source]▶

>>42067242 (TP) #

Do you know if there any good (pref C++) libraries for extracting data tables from PDFs?

14. _hfqa ◴[07 Nov 24 02:18 UTC] No.42072611[source]▶

>>42068776 #

Just a heads up, but I tried to signup but the button doesn't seem to work.

replies(1): >>42078221 #

15. themanmaran ◴[07 Nov 24 04:05 UTC] No.42073283[source]▶

>>42067242 (TP) #

Hey we're on that list! Congrats on the launch Max & team!

I could definitely point to minor differences between all the platforms, but you're right that everyone is tackling the same unstructured data problem.

In general, I think it will be a couple years before anyone really butts heads in the market. The problem space is just that big. I'm constantly blown away by how big the document problem at these mid sized businesses. And most of these companies don't have any engineers on staff. So no attempt has ever been made to fix it.

16. erulabs ◴[07 Nov 24 16:39 UTC] No.42078221{3}[source]▶

>>42072611 #

See what I mean about execution?