I have this idea that a tiny LM would be good at canonicalizing entered real estate addresses. We currently buy a data set and software from Experian, but it feels like something an LM might be very good at. There are lots of weirdnesses in address entry that regexes have a hard time with. We know the bulk of addresses a user might be entering, unless it's a totally new property, so we should be able to train it on that.
replies(1):