While the presentation of gettext
focuses mostly on C and
implicitly applies to C++ as well, its scope is far broader than that:
Many programming languages, scripting languages and other textual data
like GUI resources or package descriptions can make use of the gettext
approach.
All programming and scripting languages that have the notion of strings
are eligible to supporting gettext
. Supporting gettext
means the following:
gettext
would do, but a shorthand
syntax helps keeping the legibility of internationalized programs. For
example, in C we use the syntax _("string")
, in bash we use the
syntax $"string"
, and in GNU awk we use the shorthand
_"string"
.
gettext
function, or performs equivalent
processing.
ngettext
,
dcgettext
, dcngettext
available from within the language.
These functions are less often used, but are nevertheless necessary for
particular purposes: ngettext
for correct plural handling, and
dcgettext
and dcngettext
for obeying other locale
environment variables than LC_MESSAGES
, such as LC_TIME
or
LC_MONETARY
. For these latter functions, you need to make the
LC_*
constants, available in the C header <locale.h>
,
referenceable from within the language, usually either as enumeration
values or as strings.
textdomain
function available from within the
language, or by introducing a magic variable called TEXTDOMAIN
.
Similarly, you should allow the programmer to designate where to search
for message catalogs, by providing access to the bindtextdomain
function.
setlocale (LC_ALL, "")
call during
the startup of your language runtime, or allow the programmer to do so.
Remember that gettext will act as a no-op if the LC_MESSAGES
and
LC_CTYPE
locale facets are not both set.
xgettext
program is being
extended to support very different programming languages. Please
contact the GNU gettext
maintainers to help them doing this. If
the string extractor is best integrated into your language's parser, GNU
xgettext
can function as a front end to your string extractor.
gettext
, but the programs should be portable
across implementations, you should provide a no-i18n emulation, that
makes the other implementations accept programs written for yours,
without actually translating the strings.
gettext
maintainers, so they can add support for
your language to `po-mode.el´.
On the implementation side, three approaches are possible, with different effects on portability and copyright:
gettext
's `intl/´ directory in
your package, as described in section 12 The Maintainer's View. This allows you to
have internationalization on all kinds of platforms. Note that when you
then distribute your package, it legally falls under the GNU General
Public License, and the GNU project will be glad about your contribution
to the Free Software pool.
gettext
functions if they are found in
the C library. For example, an autoconf test for gettext()
and
ngettext()
will detect this situation. For the moment, this test
will succeed on GNU systems and not on other platforms. No severe
copyright restrictions apply.
gettext
functionality.
This has the advantage of full portability and no copyright
restrictions, but also the drawback that you have to reimplement the GNU
gettext
features (such as the LANGUAGE
environment
variable, the locale aliases database, the automatic charset conversion,
and plural handling).
For the programmer, the general procedure is the same as for the C
language. The Emacs PO mode supports other languages, and the GNU
xgettext
string extractor recognizes other languages based on the
file extension or a command-line option. In some languages,
setlocale
is not needed because it is already performed by the
underlying language runtime.
The translator works exactly as in the C language case. The only difference is that when translating format strings, she has to be aware of the language's particular syntax for positional arguments in format strings.
C format strings are described in POSIX (IEEE P1003.1 2001), section XSH 3 fprintf(), http://www.opengroup.org/onlinepubs/007904975/functions/fprintf.html. See also the fprintf(3) manual page, http://www.linuxvalley.it/encyclopedia/ldp/manpage/man3/printf.3.php, http://informatik.fh-wuerzburg.de/student/i510/man/printf.html.
Python format strings are described in Python Library reference / 2. Built-in Types, Exceptions and Functions / 2.2. Built-in Types / 2.2.6. Sequence Types / 2.2.6.2. String Formatting Operations. http://www.python.org/doc/2.2.1/lib/typesseq-strings.html.
Lisp format strings are described in the Common Lisp HyperSpec, chapter 22.3 Formatted Output, http://www.lisp.org/HyperSpec/Body/sec_22-3.html.
Emacs Lisp format strings are documented in the Emacs Lisp reference, section Formatting Strings, http://www.gnu.org/manual/elisp-manual-21-2.8/html_chapter/elisp_4.html#SEC75. Note that as of version 21, XEmacs supports numbered argument specifications in format strings while FSF Emacs doesn't.
librep format strings are documented in the librep manual, section Formatted Output, http://librep.sourceforge.net/librep-manual.html#Formatted%20Output, http://www.gwinnup.org/research/docs/librep.html#SEC122.
Smalltalk format strings are described in the GNU Smalltalk documentation,
class CharArray
, methods `bindWith:´ and
`bindWithArguments:´.
http://www.gnu.org/software/smalltalk/gst-manual/gst_68.html#SEC238.
In summary, a directive starts with `%´ and is followed by `%´
or a nonzero digit (`1´ to `9´).
Java format strings are described in the JDK documentation for class
java.text.MessageFormat
,
http://java.sun.com/j2se/1.4/docs/api/java/text/MessageFormat.html.
See also the ICU documentation
http://oss.software.ibm.com/icu/apiref/classMessageFormat.html.
awk format strings are described in the gawk documentation, section Printf, http://www.gnu.org/manual/gawk/html_node/Printf.html#Printf.
Where is this documented?
YCP sformat strings are described in the libycp documentation file:/usr/share/doc/packages/libycp/YCP-builtins.html. In summary, a directive starts with `%´ and is followed by `%´ or a nonzero digit (`1´ to `9´).
Tcl format strings are described in the `format.n´ manual page, http://www.scriptics.com/man/tcl8.3/TclCmd/format.htm.
For the maintainer, the general procedure differs from the C language case in two ways.
gettextize
program without the `--intl´ option, and that he
invokes the AM_GNU_GETTEXT
autoconf macro via
`AM_GNU_GETTEXT([external])´.
XGETTEXT_OPTIONS
variable in `po/Makevars´ (see section 12.4.3 `Makefile´ pieces in `po/´) should be adjusted to
match the xgettext
options for that particular programming language.
If the package uses more than one programming language with gettext
support, it becomes necessary to change the POT file construction rule
in `po/Makefile.in.in´. It is recommended to make one xgettext
invocation per programming language, each with the options appropriate for
that language, and to combine the resulting files using msgcat
.
c
, h
.
C
, c++
, cc
, cxx
, cpp
, hpp
.
m
.
"abc"
_("abc")
gettext
, dgettext
, dcgettext
, ngettext
,
dngettext
, dcngettext
textdomain
function
bindtextdomain
function
setlocale (LC_ALL, "")
#include <libintl.h>
#include <locale.h>
#define _(string) gettext (string)
xgettext -k_
fprintf "%2$d %1$d"
(POSIX but not C 99)
sh
"abc"
, 'abc'
, abc
"`gettext "abc"`"
gettext
, ngettext
programs
TEXTDOMAIN
TEXTDOMAINDIR
sh
"abc"
, 'abc'
, abc
$"abc"
gettext
, ngettext
programs
TEXTDOMAIN
TEXTDOMAINDIR
bash --dump-po-strings
py
'abc'
, u'abc'
, r'abc'
, ur'abc'
,
"abc"
, u"abc"
, r"abc"
, ur"abc"
,
"'abc"'
, u"'abc"'
, r"'abc"'
, ur"'abc"'
,
"""abc"""
, u"""abc"""
, r"""abc"""
, ur"""abc"""
_('abc')
etc.
gettext.gettext
, gettext.dgettext
, also ugettext
gettext.textdomain
function, or
gettext.install(domain)
function
gettext.bindtextdomain
function, or
gettext.install(domain,localedir)
function
import gettext
xgettext
'...%(ident)d...' % { 'ident': value }
lisp
"abc"
(_ "abc")
, (ENGLISH "abc")
i18n:gettext
, i18n:ngettext
i18n:textdomain
i18n:textdomaindir
xgettext -k_ -kENGLISH
format "~1@*~D ~0@*~D"
d
"abc"
ENGLISH ? "abc" : ""
GETTEXT("abc")
GETTEXTL("abc")
clgettext
, clgettextl
#include "lispbibl.c"
clisp-xgettext
fprintf "%2$d %1$d"
(POSIX but not C 99)
el
"abc"
(_"abc")
gettext
, dgettext
(xemacs only)
domain
special form (xemacs only)
bind-text-domain
function (xemacs only)
xgettext
format "%2$d %1$d"
I18N3
defined at build time, no translation.
jl
"abc"
(_"abc")
gettext
textdomain
function
bindtextdomain
function
(require 'rep.i18n.gettext)
xgettext
format "%2$d %1$d"
st
"abc"
NLS? "abc"
self? "abc"
LcMessagesDomain>>#at:
, LcMessagesDomain>>#at:plural:with:
LcMessages>>#?
(returns a LcMessagesDomain
object).Locale default messages ? 'gettext'
LcMessages>>#domain:directory:
(returns a LcMessagesDomain
object)
Locale
object from Locale
class methods
such as #fromString:
or #default
.Locale default messages
gives the LcMessages
object for the default locale.
'%1 %2' bindWith: 'Hello' with: 'world'
java
GettextResource.gettext
, GettextResource.ngettext
ResourceBundle.getResource
instead
xgettext -k_
MessageFormat.format "{1,number} {0,number}"
Before marking strings as internationalizable, uses of the string
concatenation operator need to be converted to MessageFormat
applications. For example, "file "+filename+" not found"
becomes
MessageFormat.format("file {0} not found", new Object[] { filename })
.
Only after this is done, can the strings be marked and extracted.
GNU gettext uses the native Java internationalization mechanism, namely
ResourceBundle
s. To convert a PO file to a ResourceBundle, the
msgfmt
program can be used with the option --java
or
--java2
. To convert a ResourceBundle back to a PO file, the
msgunfmt
program can be used with the option --java
.
Two different programmatic APIs can be used to access ResourceBundles.
Note that both APIs work with all kinds of ResourceBundles, whether
GNU gettext generated classes, or other .class
or .properties
files.
java.util.ResourceBundle
API.
In particular, its getString
function returns a string translation.
Note that a missing translation yields a MissingResourceException
.
This has the advantage of being the standard API. And it does not require
any additional libraries, only the msgfmt
generated .class
files. But it cannot do plural handling, even if the resource was generated
from a PO file with plural handling.
gnu.gettext.GettextResource
API.
Reference documentation in Javadoc 1.1 style format
is in the javadoc1 directory and
in Javadoc 2 style format
in the javadoc2 directory.
Its gettext
function returns a string translation. Note that when
a translation is missing, the msgid argument is returned unchanged.
This has the advantage of having the ngettext
function for plural
handling.
To use this API, one needs the libintl.jar
file which is part of
the GNU gettext package and distributed under the LGPL.
awk
"abc"
_"abc"
dcgettext
, missing dcngettext
in gawk-3.1.0
TEXTDOMAIN
variable
bindtextdomain
function
setlocale (LC_MESSAGES, "")
in gawk-3.1.0
xgettext
printf "%2$d %1$d"
(GNU awk only)
dcgettext
, dcngettext
and bindtextdomain
yourself.
pp
, pas
'abc'
ResourceString
data type instead
TranslateResourceStrings
function instead
TranslateResourceStrings
function instead
{$mode delphi}
or {$mode objfpc}
uses gettext;
ppc386
followed by xgettext
or rstconv
uses sysutils;
format "%1:d %0:d"
The Pascal compiler has special support for the ResourceString
data
type. It generates a .rst
file. This is then converted to a .pot
file by use of xgettext
or rstconv
. At runtime, a .mo
file corresponding to translations of this .pot
file can be loaded
using the TranslateResourceStrings
function in the gettext
unit.
cpp
"abc"
_("abc")
wxLocale::GetString
, wxGetTranslation
wxLocale::AddCatalog
wxLocale::AddCatalogLookupPathPrefix
wxLocale::Init
, wxSetLocale
#include <wx/intl.h>
include/wx/intl.h
and src/common/intl.cpp
xgettext
ycp
"abc"
_("abc")
_()
with 1 or 3 arguments
textdomain
statement
xgettext
sformat "%2 %1"
tcl
"abc"
[_ "abc"]
::msgcat::mc
::msgcat::mcload
instead
package require msgcat
proc _ {s} {return [::msgcat::mc $s]}
xgettext -k_
format "%2\$d %1\$d"
Before marking strings as internationalizable, substitutions of variables
into the string need to be converted to format
applications. For
example, "file $filename not found"
becomes
[format "file %s not found" $filename]
.
Only after this is done, can the strings be marked and extracted.
After marking, this example becomes
[format [_ "file %s not found"] $filename]
or
[msgcat::mc "file %s not found" $filename]
. Note that the
msgcat::mc
function implicitly calls format
when more than one
argument is given.
pl
, PL
"abc"
gettext
, dgettext
, dcgettext
textdomain
function
bindtextdomain
function
setlocale (LC_ALL, "");
use POSIX;
use Locale::gettext;
php
, php3
, php4
"abc"
_("abc")
gettext
, dgettext
, dcgettext
textdomain
function
bindtextdomain
function
setlocale
function
pike
"abc"
gettext
, dgettext
, dcgettext
textdomain
function
bindtextdomain
function
setlocale
function
import Locale.Gettext;
Here is a list of other data formats which can be internationalized using GNU gettext.
pot
, po
xgettext
rst
xgettext
, rstconv
glade
xgettext
, libglade-xgettext
Go to the first, previous, next, last section, table of contents.