Go to the first, previous, next, last section, table of contents.


7 Manipulating PO Files

Sometimes it is necessary to manipulate PO files in a way that is better performed automatically than by hand. GNU gettext includes a complete set of tools for this purpose.

When merging two packages into a single package, the resulting POT file will be the concatenation of the two packages' POT files. Thus the maintainer must concatenate the two existing package translations into a single translation catalog, for each language. This is best performed using `msgcat´. It is then the translators' duty to deal with any possible conflicts that arose during the merge.

When a translator takes over the translation job from another translator, but she uses a different character encoding in her locale, she will convert the catalog to her character encoding. This is best done through the `msgconv´ program.

When a maintainer takes a source file with tagged messages from another package, he should also take the existing translations for this source file (and not let the translators do the same job twice). One way to do this is through `msggrep´, another is to create a POT file for that source file and use `msgmerge´.

When a translator wants to adjust some translation catalog for a special dialect or orthography -- for example, German as written in Switzerland versus German as written in Germany -- she needs to apply some text processing to every message in the catalog. The tool for doing this is `msgfilter´.

Another use of msgfilter is to produce approximately the POT file for which a given PO file was made. This can be done through a filter command like `msgfilter sed -e d | sed -e '/^# /d'´. Note that the original POT file may have had different comments and different plural message counts, that's why it's better to use the original POT file if available.

When a translator wants to check her translations, for example according to orthography rules or using a non-interactive spell checker, she can do so using the `msgexec´ program.

When third party tools create PO or POT files, sometimes duplicates cannot be avoided. But the GNU gettext tools give an error when they encounter duplicate msgids in the same file and in the same domain. To merge duplicates, the `msguniq´ program can be used.

`msgcomm´ is a more general tool for keeping or throwing away duplicates, occurring in different files.

`msgcmp´ can be used to check whether a translation catalog is completely translated.

`msgattrib´ can be used to select and extract only the fuzzy or untranslated messages of a translation catalog.

`msgen´ is useful as a first step for preparing English translation catalogs. It copies each message's msgid to its msgstr.

7.1 Invoking the msgcat Program

msgcat [option] [inputfile]...

The msgcat program concatenates and merges the specified PO files. It finds messages which are common to two or more of the specified PO files. By using the --more-than option, greater commonality may be requested before messages are printed. Conversely, the --less-than option may be used to specify less commonality before messages are printed (i.e. `--less-than=2´ will only print the unique messages). Translations, comments and extract comments will be cumulated, except that if --use-first is specified, they will be taken from the first PO file to define them. File positions from all PO files will be cumulated.

7.1.1 Input file location

`inputfile ...´
Input files.
`-f file´
`--files-from=file´
Read the names of the input files from file instead of getting them from the command line.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting `.po´ file will be written relative to the current directory, though.

If inputfile is `-´, standard input is read.

7.1.2 Output file location

`-o file´
`--output-file=file´
Write output to specified file.

The results are written to standard output if no output file is specified or if it is `-´.

7.1.3 Message selection

`-< number´
`--less-than=number´
Print messages with less than number definitions, defaults to infinite if not set.
`-> number´
`--more-than=number´
Print messages with more than number definitions, defaults to 0 if not set.
`-u´
`--unique´
Shorthand for `--less-than=2´. Requests that only unique messages be printed.

7.1.4 Output details

`-t´
`--to-code=name´
Specify encoding for output.
`--use-first´
Use first available translation for each message. Don't merge several translations into one.
`--force-po´
Always write an output file even if it contains no message.
`-i´
`--indent´
Write the .po file using indented style.
`--no-location´
Do not write `#: filename:line´ lines.
`-n´
`--add-location´
Generate `#: filename:line´ lines (default).
`--strict´
Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.
`-w number´
`--width=number´
Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.
`--no-wrap´
Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.
`-s´
`--sort-output´
Generate sorted output. Note that using this option makes it much harder for the translator to understand each message's context.
`-F´
`--sort-by-file´
Sort output by file location.

7.1.5 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.

7.2 Invoking the msgconv Program

msgconv [option] [inputfile]

The msgconv program converts a translation catalog to a different character encoding.

7.2.1 Input file location

`inputfile´
Input PO file.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting `.po´ file will be written relative to the current directory, though.

If no inputfile is given or if it is `-´, standard input is read.

7.2.2 Output file location

`-o file´
`--output-file=file´
Write output to specified file.

The results are written to standard output if no output file is specified or if it is `-´.

7.2.3 Conversion target

`-t´
`--to-code=name´
Specify encoding for output.

The default encoding is the current locale's encoding.

7.2.4 Output details

`--force-po´
Always write an output file even if it contains no message.
`-i´
`--indent´
Write the .po file using indented style.
`--no-location´
Do not write `#: filename:line´ lines.
`--add-location´
Generate `#: filename:line´ lines (default).
`--strict´
Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.
`-w number´
`--width=number´
Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.
`--no-wrap´
Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.
`-s´
`--sort-output´
Generate sorted output. Note that using this option makes it much harder for the translator to understand each message's context.
`-F´
`--sort-by-file´
Sort output by file location.

7.2.5 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.

7.3 Invoking the msggrep Program

msggrep [option] [inputfile]

The msggrep program extracts all messages of a translation catalog that match a given pattern or belong to some given source files.

7.3.1 Input file location

`inputfile´
Input PO file.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting `.po´ file will be written relative to the current directory, though.

If no inputfile is given or if it is `-´, standard input is read.

7.3.2 Output file location

`-o file´
`--output-file=file´
Write output to specified file.

The results are written to standard output if no output file is specified or if it is `-´.

7.3.3 Message selection

  [-N sourcefile]... [-M domainname]...
  [-K msgid-pattern] [-T msgstr-pattern] [-C comment-pattern]

A message is selected if

When more than one selection criterion is specified, the set of selected messages is the union of the selected messages of each criterion.

msgid-pattern or msgstr-pattern syntax:

  [-E | -F] [-e pattern | -f file]...

patterns are basic regular expressions by default, or extended regular expressions if -E is given, or fixed strings if -F is given.

`-N sourcefile´
`--location=sourcefile´
Select messages extracted from sourcefile. sourcefile can be either a literal file name or a wildcard pattern.
`-M domainname´
`--domain=domainname´
Select messages belonging to domain domainname.
`-K´
`--msgid´
Start of patterns for the msgid.
`-T´
`--msgstr´
Start of patterns for the msgstr.
`-E´
`--extended-regexp´
Specify that pattern is an extended regular expression.
`-F´
`--fixed-strings´
Specify that pattern is a set of newline-separated strings.
`-e pattern´
`--regexp=pattern´
Use pattern as a regular expression.
`-f file´
`--file=file´
Obtain pattern from file.
`-i´
`--ignore-case´
Ignore case distinctions.

7.3.4 Output details

`--force-po´
Always write an output file even if it contains no message.
`--indent´
Write the .po file using indented style.
`--no-location´
Do not write `#: filename:line´ lines.
`--add-location´
Generate `#: filename:line´ lines (default).
`--strict´
Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.
`-w number´
`--width=number´
Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.
`--no-wrap´
Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.
`--sort-output´
Generate sorted output. Note that using this option makes it much harder for the translator to understand each message's context.
`--sort-by-file´
Sort output by file location.

7.3.5 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.

7.4 Invoking the msgfilter Program

msgfilter [option] filter [filter-option]

The msgfilter program applies a filter to all translations of a translation catalog.

7.4.1 Input file location

`-i inputfile´
`--input=inputfile´
Input PO file.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting `.po´ file will be written relative to the current directory, though.

If no inputfile is given or if it is `-´, standard input is read.

7.4.2 Output file location

`-o file´
`--output-file=file´
Write output to specified file.

The results are written to standard output if no output file is specified or if it is `-´.

7.4.3 The filter

The filter can be any program that reads a translation from standard input and writes a modified translation to standard output. A frequently used filter is `sed´.

Note: It is your responsibility to ensure that the filter can cope with input encoded in the translation catalog's encoding. If the filter wants input in a particular encoding, you can in a first step convert the translation catalog to that encoding using the `msgconv´ program, before invoking `msgfilter´. If the filter wants input in the locale's encoding, but you want to avoid the locale's encoding, then you can first convert the translation catalog to UTF-8 using the `msgconv´ program and then make `msgfilter´ work in an UTF-8 locale, by using the LC_ALL environment variable.

Note: Most translations in a translation catalog don't end with a newline character. For this reason, it is important that the filter recognizes its last input line even if it ends without a newline, and that it doesn't add an undesired trailing newline at the end. The `sed´ program on some platforms is known to ignore the last line of input if it is not terminated with a newline. You can use GNU sed instead; it does not have this limitation.

7.4.4 Useful filter-options when the filter is `sed´

`-e script´
`--expression=script´
Add script to the commands to be executed.
`-f scriptfile´
`--file=scriptfile´
Add the contents of scriptfile to the commands to be executed.
`-n´
`--quiet´
`--silent´
Suppress automatic printing of pattern space.

7.4.5 Output details

`--force-po´
Always write an output file even if it contains no message.
`--indent´
Write the .po file using indented style.
`--keep-header´
Keep the header entry, i.e. the message with `msgid ""´, unmodified, instead of filtering it. By default, the header entry is subject to filtering like any other message.
`--no-location´
Do not write `#: filename:line´ lines.
`--add-location´
Generate `#: filename:line´ lines (default).
`--strict´
Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.
`-w number´
`--width=number´
Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.
`--no-wrap´
Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.
`-s´
`--sort-output´
Generate sorted output. Note that using this option makes it much harder for the translator to understand each message's context.
`-F´
`--sort-by-file´
Sort output by file location.

7.4.6 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.

7.5 Invoking the msguniq Program

msguniq [option] [inputfile]

The msguniq program unifies duplicate translations in a translation catalog. It finds duplicate translations of the same message ID. Such duplicates are invalid input for other programs like msgfmt, msgmerge or msgcat. By default, duplicates are merged together. When using the `--repeated´ option, only duplicates are output, and all other messages are discarded. Comments and extracted comments will be cumulated, except that if `--use-first´ is specified, they will be taken from the first translation. File positions will be cumulated. When using the `--unique´ option, duplicates are discarded.

7.5.1 Input file location

`inputfile´
Input PO file.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting `.po´ file will be written relative to the current directory, though.

If no inputfile is given or if it is `-´, standard input is read.

7.5.2 Output file location

`-o file´
`--output-file=file´
Write output to specified file.

The results are written to standard output if no output file is specified or if it is `-´.

7.5.3 Message selection

`-d´
`--repeated´
Print only duplicates.
`-u´
`--unique´
Print only unique messages, discard duplicates.

7.5.4 Output details

`-t´
`--to-code=name´
Specify encoding for output.
`--use-first´
Use first available translation for each message. Don't merge several translations into one.
`--force-po´
Always write an output file even if it contains no message.
`-i´
`--indent´
Write the .po file using indented style.
`--no-location´
Do not write `#: filename:line´ lines.
`-n´
`--add-location´
Generate `#: filename:line´ lines (default).
`--strict´
Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.
`-w number´
`--width=number´
Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.
`--no-wrap´
Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.
`-s´
`--sort-output´
Generate sorted output. Note that using this option makes it much harder for the translator to understand each message's context.
`-F´
`--sort-by-file´
Sort output by file location.

7.5.5 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.

7.6 Invoking the msgcomm Program

msgcomm [option] [inputfile]...

The msgcomm program finds messages which are common to two or more of the specified PO files. By using the --more-than option, greater commonality may be requested before messages are printed. Conversely, the --less-than option may be used to specify less commonality before messages are printed (i.e. `--less-than=2´ will only print the unique messages). Translations, comments and extract comments will be preserved, but only from the first PO file to define them. File positions from all PO files will be cumulated.

7.6.1 Input file location

`inputfile ...´
Input files.
`-f file´
`--files-from=file´
Read the names of the input files from file instead of getting them from the command line.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting `.po´ file will be written relative to the current directory, though.

If inputfile is `-´, standard input is read.

7.6.2 Output file location

`-o file´
`--output-file=file´
Write output to specified file.

The results are written to standard output if no output file is specified or if it is `-´.

7.6.3 Message selection

`-< number´
`--less-than=number´
Print messages with less than number definitions, defaults to infinite if not set.
`-> number´
`--more-than=number´
Print messages with more than number definitions, defaults to 1 if not set.
`-u´
`--unique´
Shorthand for `--less-than=2´. Requests that only unique messages be printed.

7.6.4 Output details

`--force-po´
Always write an output file even if it contains no message.
`-i´
`--indent´
Write the .po file using indented style.
`--no-location´
Do not write `#: filename:line´ lines.
`-n´
`--add-location´
Generate `#: filename:line´ lines (default).
`--strict´
Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.
`-w number´
`--width=number´
Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.
`--no-wrap´
Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.
`-s´
`--sort-output´
Generate sorted output. Note that using this option makes it much harder for the translator to understand each message's context.
`-F´
`--sort-by-file´
Sort output by file location.
`--omit-header´
Don't write header with `msgid ""´ entry.

7.6.5 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.

7.7 Invoking the msgcmp Program

msgcmp [option] def.po ref.pot

The msgcmp program compares two Uniforum style .po files to check that both contain the same set of msgid strings. The def.po file is an existing PO file with the translations. The ref.pot file is the last created PO file, or a PO Template file (generally created by xgettext). This is useful for checking that you have translated each and every message in your program. Where an exact match cannot be found, fuzzy matching is used to produce better diagnostics.

7.7.1 Input file location

`def.po´
Translations.
`ref.pot´
References to the sources.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories.

7.7.2 Operation modifiers

`-m´
`--multi-domain´
Apply ref.pot to each of the domains in def.po.

7.7.3 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.

7.8 Invoking the msgattrib Program

msgattrib [option] [inputfile]

The msgattrib program filters the messages of a translation catalog according to their attributes, and manipulates the attributes.

7.8.1 Input file location

`inputfile´
Input PO file.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting `.po´ file will be written relative to the current directory, though.

If no inputfile is given or if it is `-´, standard input is read.

7.8.2 Output file location

`-o file´
`--output-file=file´
Write output to specified file.

The results are written to standard output if no output file is specified or if it is `-´.

7.8.3 Message selection

`--translated´
Keep translated messages, remove untranslated messages.
`--untranslated´
Keep untranslated messages, remove translated messages.
`--no-fuzzy´
Remove `fuzzy' marked messages.
`--only-fuzzy´
Keep `fuzzy' marked messages, remove all other messsages.
`--no-obsolete´
Remove obsolete #~ messages.
`--only-obsolete´
Keep obsolete #~ messages, remove all other messages.

7.8.4 Attribute manipulation

Attributes are modified after the message selection/removal has been performed.

`--set-fuzzy´
Set all messages `fuzzy'.
`--clear-fuzzy´
Set all messages non-`fuzzy'.
`--set-obsolete´
Set all messages obsolete.
`--clear-obsolete´
Set all messages non-obsolete.
`--fuzzy´
Synonym for `--only-fuzzy --clear-fuzzy´: It keeps only the fuzzy messages and removes their `fuzzy' mark.
`--obsolete´
Synonym for `--only-obsolete --clear-obsolete´: It keeps only the obsolete messages and makes them non-obsolete.

7.8.5 Output details

`--force-po´
Always write an output file even if it contains no message.
`-i´
`--indent´
Write the .po file using indented style.
`--no-location´
Do not write `#: filename:line´ lines.
`-n´
`--add-location´
Generate `#: filename:line´ lines (default).
`--strict´
Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.
`-w number´
`--width=number´
Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.
`--no-wrap´
Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.
`-s´
`--sort-output´
Generate sorted output. Note that using this option makes it much harder for the translator to understand each message's context.
`-F´
`--sort-by-file´
Sort output by file location.

7.8.6 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.

7.9 Invoking the msgen Program

msgen [option] inputfile

The msgen program creates an English translation catalog. The input file is the last created English PO file, or a PO Template file (generally created by xgettext). Untranslated entries are assigned a translation that is identical to the msgid, and are marked fuzzy.

Note: `msginit --no-translator --locale=en´ performs a very similar task. The main difference is that msginit cares specially about the header entry, whereas msgen doesn't.

7.9.1 Input file location

`inputfile´
Input PO or POT file.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting `.po´ file will be written relative to the current directory, though.

If inputfile is `-´, standard input is read.

7.9.2 Output file location

`-o file´
`--output-file=file´
Write output to specified file.

The results are written to standard output if no output file is specified or if it is `-´.

7.9.3 Output details

`--force-po´
Always write an output file even if it contains no message.
`-i´
`--indent´
Write the .po file using indented style.
`--no-location´
Do not write `#: filename:line´ lines.
`--add-location´
Generate `#: filename:line´ lines (default).
`--strict´
Write out a strict Uniforum conforming PO file. Note that this Uniforum format should be avoided because it doesn't support the GNU extensions.
`-w number´
`--width=number´
Set the output page width. Long strings in the output files will be split across multiple lines in order to ensure that each line's width (= number of screen columns) is less or equal to the given number.
`--no-wrap´
Do not break long message lines. Message lines whose width exceeds the output page width will not be split into several lines. Only file reference lines which are wider than the output page width will be split.
`-s´
`--sort-output´
Generate sorted output. Note that using this option makes it much harder for the translator to understand each message's context.
`-F´
`--sort-by-file´
Sort output by file location.

7.9.4 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.

7.10 Invoking the msgexec Program

msgexec [option] command [command-option]

The msgexec program applies a command to all translations of a translation catalog. The command can be any program that reads a translation from standard input. It is invoked once for each translation. Its output becomes msgexec's output. msgexec's return code is the maximum return code across all invocations.

A special builtin command called `0´ outputs the translation, followed by a null byte. The output of `msgexec 0´ is suitable as input for `xargs -0´.

During each command invocation, the environment variable MSGEXEC_MSGID is bound to the message's msgid, and the environment variable MSGEXEC_LOCATION is bound to the location in the PO file of the message.

Note: It is your responsibility to ensure that the command can cope with input encoded in the translation catalog's encoding. If the command wants input in a particular encoding, you can in a first step convert the translation catalog to that encoding using the `msgconv´ program, before invoking `msgexec´. If the command wants input in the locale's encoding, but you want to avoid the locale's encoding, then you can first convert the translation catalog to UTF-8 using the `msgconv´ program and then make `msgexec´ work in an UTF-8 locale, by using the LC_ALL environment variable.

7.10.1 Input file location

`-i inputfile´
`--input=inputfile´
Input PO file.
`-D directory´
`--directory=directory´
Add directory to the list of directories. Source files are searched relative to this list of directories. The resulting `.po´ file will be written relative to the current directory, though.

If no inputfile is given or if it is `-´, standard input is read.

7.10.2 Informative output

`-h´
`--help´
Display this help and exit.
`-V´
`--version´
Output version information and exit.


Go to the first, previous, next, last section, table of contents.