Sunday, August 26, 2007

Testing translations

A recent bug in Task Coach was caused by one of the translations being incorrect. So, I decided it was time to unittest the translations. For each translated string I wanted to check that certain conditions hold. For example, if the original string has a formatting operator (e.g. '%d' for digits) the translated string should contain the same formatting operator. These tests are relatively simple:

for formatter in '%s', '%d', '%.2f':
self.assertEqual(self.englishString.count(formatter),
self.translatedString.count(formatter))

The challenge is how to create one unittest for each (language, string)-pair. This is not a good solution:

def testMatchingFormatting(self):
for language in getLanguages():
for english, translated in language.dictionary():
...

because this unittest stops as soon as one translation is incorrect.

My first thought was that I could use decorators to unfold the loop, but after a few feeble attempts I decided I am not smart enough to wrap my head around decorators. After some more experimenting I ended up with the code below. I put the loop outside the test class and explicitly create a new TestCase class for each (language, string)-pair. This generates a lot of unittests (over 7600 for the current version of Task Coach), but they run in less than 0.5 seconds, so that's a small price to pay for increased test coverage.


import test, i18n, meta, string

class TranslationIntegrityTests(object):
''' Unittests for translations. This class is
subclassed below for each translated string
in each language. '''

def testMatchingFormatting(self):
for formatter in '%s', '%d', '%.2f':
self.assertEqual(self.englishString.count(formatter),
self.translatedString.count(formatter))

def testMatchingAccelerators(self):
# snipped


def getLanguages():
return [language for language in \
meta.data.languages.values() \
if language is not None]


def createTestCaseClassName(language, englishString,
prefix='TranslationIntegrityTest'):
''' Generate a class name for the test case class based
on the language and the English string. '''

# Make sure we only use characters allowed in Python
# identifiers:
englishString = englishString.replace(' ', '_')
allowableCharacters = string.ascii_letters + \
string.digits + '_'
englishString = ''.join([char for char in englishString \
if char in allowableCharacters])
className = '%s_%s_%s'%(prefix, language, englishString)
count = 0
while className in globals(): # Make sure className is unique
count += 1
className = '%s_%s_%s_%d'%(prefix, language,
englishString, count)
return className


def createTestCaseClass(className, language, englishString,
translatedString):
class_ = type(className,
(TranslationIntegrityTests, test.TestCase),
{})
class_.language = language
class_.englishString = englishString
class_.translatedString = translatedString
return class_


for language in getLanguages():
translation = __import__('i18n.%s'%language,
fromlist=['dict'])
for english, translated in translation.dict.iteritems():
className = createTestCaseClassName(language, english)
class_ = createTestCaseClass(className, language,
english, translated)
globals()[className] = class_

4 comments:

Calvin Spealman said...

I think it would have been a lot simpler to create an empty TestCase, and then after it doing a loop over the language and formatter combinations and for each combination doing something like this:

setattr(TranslationFormattingTestCase, 'test_%s_%s_%s" % (fromLang, toLang, formatter), lambda self: self.assertEqual(self.englishString.count(formatter),
self.translatedString.count(formatter))

Yes, using this technique you can even use the formatter directly in the method name, and unittest can pick it up just fine. A TestCase is more of a dictionary than a class in this fashion.

Frank said...

Hi Calvin,

Thanks for your comments. I agree that your solution is a bit simpler as you don't need to create new classes, but just add methods to an existing class. However, what I like about my solution is that the code that does the construction of the TestCase subclasses is separate from the test code. That way the tests are all in one place and the TestCase construction code could even be moved to a different module.

Cheers, Frank

Joey Weng Blog said...

Hi, just want to let you know that I have created a Traditional Chinese translation for Task Coach on launchpad.net , please check it at URL "https://translations.launchpad.net/taskcoach/trunk/+pots/taskcoach/"
.

Thanks.

Frank said...

Hi Joy, thanks for the new translation. I'll add it to the next release.

Thanks, Frank