Love this idea but the accuracy section is lacking. Couldnt you do a simple diff of the outputs and see how many differences there are? .5% or 5%?
replies(1):
I don't think a simple diff is the way to go, at least for what I'm interested in. What I care about more is the overall accuracy of the summary—not the word-for-word transcription.
The test I want to setup is using LLMs to evaluate the summarized output and see if the primary themes/topics persist. That's more interesting and useful to me for this exercise.