←back to thread

248 points doener | 1 comments | | HN request time: 0.208s | source
Show context
miros_love ◴[] No.43691616[source]
>European versions of ARC

But this is an image-like benchmark. Has anyone looked at the article about the EU-ARC, what is the difference? Why can't you measure it on a regular one?

I glanced through it, didn't find it right away, but judging by their tokenizer, they are learning from scratch. In general, I don't like this approach for the task at hand. For large languages, there are already good models that they don't want to compare with. And for low-resource languages, it is very important to take more languages from this language group, which are not necessarily part of the EU

replies(2): >>43691644 #>>43691647 #
1. whiplash451 ◴[] No.43691644[source]
You might be confusing ARC-AGI and EU-ARC which is a language benchmark [1]

[1] https://arxiv.org/pdf/2410.08928