So if you give a human and a system 10 tasks and the human completes 3 correctly, 5 incorrectly and 3 it failed to complete altogether… And then you give those 10 tasks to the software and it does 9 correctly and 1 it fails to complete, what does that mean. In general I’d say the tasks need to be defined, as I can give very many tasks to people right now that language models can solve that they can’t, but language models to me aren’t “AGI” in my opinion.
So if you give a human and a system 10 tasks and the human completes 3 correctly, 5 incorrectly and 3 it failed to complete altogether… And then you give those 10 tasks to the software and it does 9 correctly and 1 it fails to complete, what does that mean. In general I’d say the tasks need to be defined, as I can give very many tasks to people right now that language models can solve that they can’t, but language models to me aren’t “AGI” in my opinion.
any cognitive Task. Not “9 out of the 10 you were able to think of right now”.
Any is very hard to benchmark and is also not how humans are tested.