2010-09-19

CLI tool to review PO files

If there is something annoying about reviewing PO files is that it is impossible. When there are two hundred messages in a PO file, how are you going to know which messages changed? Well, that's the way it works currently for Transifex but there are very good news, first a review board is already available which is a good step forward but second it is going to get some good kick to make it awesome. But until this happens, I have written two scripts to make such a review.

A shell script msgdiff.sh

Pros: tools available on every system
Cons: ugly output, needs template file

#!/bin/sh
PO_ORIG=$1
PO_REVIEW=$2
PO_TEMPL=$3

MSGMERGE=msgmerge
DIFF=diff
PAGER=more
RM=/bin/rm
MKTEMP=mktemp

# Usage
if test "$1" = "" -o "$2" = "" -o "$3" = ""; then
    echo Usage: $0 orig.po review.po template.pot
    exit 1
fi

# Merge
TMP_ORIG=`$MKTEMP po-orig.XXX`
TMP_REVIEW=`$MKTEMP po-review.XXX`
$MSGMERGE $PO_ORIG $PO_TEMPL > $TMP_ORIG 2> /dev/null
$MSGMERGE $PO_REVIEW $PO_TEMPL > $TMP_REVIEW 2> /dev/null

# Diff
$DIFF -u $TMP_ORIG $TMP_REVIEW | $PAGER

# Clean up files
$RM $TMP_ORIG $TMP_REVIEW

Example:
$ ./msgdiff.sh fr.po fr.review.po thunar.pot
[...]
 #: ../thunar-vcs-plugin/tvp-git-action.c:265
-#, fuzzy
 msgid "Menu|Bisect"
-msgstr "Différences détaillées"
+msgstr "Menu|Couper en deux"
 
 #: ../thunar-vcs-plugin/tvp-git-action.c:265
 msgid "Bisect"
-msgstr ""
+msgstr "Couper en deux"
[...]

A Python script podiff.py

Pros: programmable output
Cons: external dependency

The script depends on polib that can be installed with the setuptools scripts. Make sure setuptools is installed and than run the command sudo easy_install polib.

#!/usr/bin/env python
import polib

def podiff(path_po_orig, path_po_review):
    po_orig = polib.pofile(path_po_orig)
    po_review = polib.pofile(path_po_review)
    po_diff = polib.POFile()
    po_diff.header = "PO Diff Header"
    for entry in po_review:
        orig_entry = po_orig.find(entry.msgid)
        if not entry.obsolete and (orig_entry.msgstr != entry.msgstr \
        or ("fuzzy" in orig_entry.flags) != ("fuzzy" in entry.flags)):
            po_diff.append(entry)
    return po_diff


if __name__ == "__main__":
    import sys
    import os.path

    # Usage
    if len(sys.argv) != 3 \
      or not os.path.isfile(sys.argv[1]) \
      or not os.path.isfile(sys.argv[2]):
        print "Usage: %s orig.po review.po" % sys.argv[0]
        sys.exit(1)

    # Retrieve diff
    path_po_orig = sys.argv[1]
    path_po_review = sys.argv[2]
    po_diff = podiff(path_po_orig, path_po_review)

    # Print out orig v. review messages
    po = polib.pofile(path_po_orig)
    for entry in po_diff:
        orig_entry = po.find(entry.msgid)
        orig_fuzzy = review_fuzzy = "fuzzy"
        if "fuzzy" not in orig_entry.flags:
            orig_fuzzy = "not fuzzy"
        if "fuzzy" not in entry.flags:
            review_fuzzy = "not fuzzy"
        print "'%s' was %s is %s\n\tOriginal => '%s'\n\tReviewed => '%s'\n" % (entry.msgid, orig_fuzzy, review_fuzzy, orig_entry.msgstr, entry.msgstr)

Example:
$ ./podiff.py fr.po fr.review.po
'Menu|Bisect' was fuzzy is not fuzzy
 Original => 'Différences détaillées'
 Reviewed => 'Menu|Couper en deux'

'Bisect' was not fuzzy is not fuzzy
 Original => ''
 Reviewed => 'Couper en deux'
[...]

4 comments:

  1. Per, Danish translator, pasted the following link: https://edge.launchpad.net/pyg3t.

    PyG3T consists of these parts:

    - gtgrep: perform grep-like operations on po-files
    - podiff: generate diffs of po-files, such that each differing entry is printed completely
    - poabc: check for common translation errors, such as missing capitalization or punctuation
    - gtxml: check xml in translations

    ReplyDelete
  2. What about this tool:
    http://gramps.svn.sourceforge.net/viewvc/gramps/trunk/po/check_po
    ?

    I've found it useful while I was still doing translations.

    ReplyDelete
  3. I am currently translating po files with https://poeditor.com and to me it seems it is the best on the market. It has many helpful features that keep on adding up and a nice UI. I recommend it sincerely.

    ReplyDelete
  4. A very informative tutorial Mike. Thanks for posting. We make some Python tutorials as well that may benefit your readers at http://www.fireboxtraining.com/

    ReplyDelete