Leaderboard

Want to evaluate on CRoW?

Please check the Getting Started page for instructions on how to make a submission to this leaderboard.


Global

Only submissions evaluated on at least 5 core tasks (Dialogue, Summarization, Intent Detection, Stance Classification and Safety Detection) are shown in the Global Leaderboard.

CRoW Score [-MT] = Average Score across all except Machine Translation tasks

SA = Situational Accuracy

Contributor Model Model Size CRoW Score [-MT]
(macro-F1)
CRoW Score [-MT]
(SA)
CRoW Score
(macro-F1)
CRoW Score
(SA)


By Task

Open-domain Dialogue

Contributor Model Model Size Date Macro-F1 SA


Dialogue Summarization

Contributor Model Model Size Date Macro-F1 SA


Intent Detection

Contributor Model Model Size Date Macro-F1 SA


Safety Detection

Contributor Model Model Size Date Macro-F1 SA


Stance Classification

Contributor Model Model Size Date Macro-F1 SA


Machine Translation

Only submissions evaluated on all MT tasks (zh-en, en-de, en-fr, en-ru) are shown in this leaderboard.

Contributor Model Model Size Macro-F1 SA


Machine Translation (zh-en)

Contributor Model Model Size Date Macro-F1 SA


Machine Translation (en-de)

Contributor Model Model Size Date Macro-F1 SA


Machine Translation (en-fr)

Contributor Model Model Size Date Macro-F1 SA


Machine Translation (en-ru)

Contributor Model Model Size Date Macro-F1 SA