The collection and release of this data is described in the following paper:
Constructing Parallel Corpora for Six Indian Languages via Crowdsourcing
Matt Post, Chris Callison-Burch, and Miles Osborne
WMT 2012
PDF BIB
Below are the best translation scores (case-insensitive BLEU-4) that have been reported on the provided test sets. The Google results were recorded in the fall of 2011 (and are described in Post et al. (2012)). Google does not have a Malayalam system.
Citation | BN | HI | ML | TA | TE | UR |
---|---|---|---|---|---|---|
20.01 | 25.21 | – | 13.51 | 16.03 | 23.09 | |
Post et al. (2012) | 13.53 | 17.29 | 13.72 | 9.81 | 12.46 | 19.53 |