On AutoML Translate and BLEU
Vertex AI
When evaluating a new test set with AutoML Translate, it can only be done through the UI, and not with the API. Recommendation is to use always NLTK to evaluate
When evaluating a new test set with AutoML Translate, it can only be done through the UI, and not with the API. Refer to [here] on how BLEU works. Note the calculation may defer if using the open-source tool NTLK and the method nltk.translate.bleu_score_corpus_bleu, due to the fact that normalization and tokenization may defer.
So, to avoid misunderstanding between the open source NLTK tool and the internal BLEU score calculation in AutoML, use always the NLTK to evaluate.
Code example
import nltk
hypothesis = ['This', 'is', 'cat']
reference = ['This', 'is', 'a', 'cat']
references = [reference] # list of references for 1 sentence.
list_of_references = [references] # list of references for all sentences in corpus.
list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.
nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)
# 0.6025286104785453
nltk.translate.bleu_score.sentence_bleu(references, hypothesis)
# 0.6025286104785453