←back to thread

685 points georgemandis | 2 comments | | HN request time: 0.41s | source
1. stogot ◴[] No.44379057[source]
Love this idea but the accuracy section is lacking. Couldnt you do a simple diff of the outputs and see how many differences there are? .5% or 5%?
replies(1): >>44379143 #
2. georgemandis ◴[] No.44379143[source]
Yeah, I'd like to do a more formal analysis of the outputs if I can carve out the time.

I don't think a simple diff is the way to go, at least for what I'm interested in. What I care about more is the overall accuracy of the summary—not the word-for-word transcription.

The test I want to setup is using LLMs to evaluate the summarized output and see if the primary themes/topics persist. That's more interesting and useful to me for this exercise.