←back to thread

421 points sohkamyung | 1 comments | | HN request time: 0.225s | source
Show context
roguecoder ◴[] No.45670387[source]
I am curious if LLMs evangelists understand how off-putting it is when they knee-jerk rationalize how badly these tools are performing. It makes it seem like it isn't about technological capabilities: it is about a religious belief that "competence" is too much to ask of either them or their software tools.
replies(7): >>45670776 #>>45670799 #>>45670830 #>>45671500 #>>45671741 #>>45672916 #>>45673109 #
1. lyu07282 ◴[] No.45671500[source]
I partially agree, it seems a lot have shifted the argument to news media criticism or something else. But this study is also questionable, for anyone who reads actual academic studies that should be immediately obvious. I don't understand why the bar is this low for some paid Ipsos study vs. some peer-reviewed paper in some IEEE journal?

Like for a study like this I expect as a bare minimum clearly stated model variants used, R@k recall numbers measuring retrieval and something like BLEU or ROUGE to measure summarization accuracy against some baseline on top of their human evaluation metrics. If this is useless for the field itself, I don't understand how this can be useful for anyone outside the field?