←back to thread

321 points laserduck | 1 comments | | HN request time: 0.344s | source
Show context
lolinder ◴[] No.42157485[source]
One of the consistent problems I'm seeing over and over again with LLMs is people forgetting that they're limited by the training data.

Software engineers get hyped when they see the progress in AI coding and immediately begin to extrapolate to other fields—if Copilot can reduce the burden of coding so much, think of all the money we can make selling a similar product to XYZ industries!

The problem with this extrapolation is that the software industry is pretty much unique in the amount of information about its inner workings that is publicly available for training on. We've spent the last 20+ years writing millions and millions of lines of code that we published on the internet, not to mention answering questions on Stack Overflow (which still has 3x as many answers as all other Stack Exchanges combined [0]), writing technical blogs, hundreds of thousands of emails in public mailing lists, and so on.

Nearly every other industry (with the possible exception of Law) produces publicly-visible output at a tiny fraction of the rate that we do. Ethics of the mass harvesting aside, it's simply not possible for an LLM to have the same skill level in ${insert industry here} as they do with software, so you can't extrapolate from Copilot to other domains.

[0] https://stackexchange.com/sites?view=list#answers

replies(4): >>42157535 #>>42157654 #>>42157924 #>>42164051 #
1. steveBK123 ◴[] No.42157924[source]
Yes this is EXACTLY it, and I was discussing this a bit at work (financial services).

In software, we've all self taught, improved, posted Q&A all over the web. Plus all the open source code out there. Just mountains and mountains of free training data.

However software is unique in being both well paying and something with freely available, complete information online.

A lot of the rest of the world remains far more closed and almost an apprenticeship system. In my domain thinks like company fundamental analysis, algo/quant trading, etc. Lots of books you can buy from the likes of Dalio, but no real (good) step by step research and investment process information online.

Likewise I'd imagine heavily patented/regulated/IP industries like chip design, drug design, etc are substantially as closed. Maybe companies using an LLM on their own data internally could make something of their data, but its also quite likely there is no 'data' so much as tacit knowledge handed down over time.