←back to thread

229 points geetee | 1 comments | | HN request time: 0.303s | source
Show context
thicknavyrain ◴[] No.45100075[source]
I know it's a reductive take to point to a single mistake and act like the whole project might be a bit futile (maybe it's a rarity) but this example in their sample is really quite awful if the idea is to give AI better epistemics:

    {
        "causal_relation": {
            "cause": {
                "concept": "vaccines"
            },
            "effect": {
                "concept": "autism"
            }
        }
    },
... seriously? Then again, they do say these are just "causal beliefs" expressed on the internet, but seems like some stronger filtering of which beliefs to adopt ought to be exercised for an downstream usecase.
replies(2): >>45100168 #>>45100176 #
kolektiv ◴[] No.45100168[source]
Oh, ouch, yeah. We already know that misinformation tends to get amplified, the last thing we need is a starting point full of harmful misinformation. There are lots of "causal beliefs" on the internet that should have no place in any kind of general dataset.
replies(1): >>45100731 #
1. Amadiro ◴[] No.45100731[source]
It's even worse than that, because the way they extract the causal link is just a regex, so

"vaccines > autism"

because

"Even though the article was fraudulent and was retracted, 1 in 4 parents still believe vaccines can cause autism."

I think this could be solved much better by using even a modestly powerful LLM to do the causal extraction... The website claims "an estimated extraction precision of 83% " but I doubt this is an even remotely sensible estimate.