This is libc.info, produced by makeinfo version 4.6 from libc.texinfo.

INFO-DIR-SECTION GNU libraries
START-INFO-DIR-ENTRY
* Libc: (libc).                 C library.
END-INFO-DIR-ENTRY

   This file documents the GNU C library.

   This is Edition 0.10, last updated 2001-07-06, of `The GNU C Library
Reference Manual', for Version 2.3.x.

   Copyright (C) 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2001, 2002
Free Software Foundation, Inc.

   Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1 or
any later version published by the Free Software Foundation; with the
Invariant Sections being "Free Software Needs Free Documentation" and
"GNU Lesser General Public License", the Front-Cover texts being (a)
(see below), and with the Back-Cover Texts being (b) (see below).  A
copy of the license is included in the section entitled "GNU Free
Documentation License".

   (a) The FSF's Front-Cover Text is:

   A GNU Manual

   (b) The FSF's Back-Cover Text is:

   You have freedom to copy and modify this GNU Manual, like GNU
software.  Copies published by the Free Software Foundation raise
funds for GNU development.


File: libc.info,  Node: GUI program problems,  Next: Using gettextized software,  Prev: Charset conversion in gettext,  Up: Message catalogs with gettext

How to use `gettext' in GUI programs
....................................

One place where the `gettext' functions, if used normally, have big
problems is within programs with graphical user interfaces (GUIs).  The
problem is that many of the strings which have to be translated are very
short.  They have to appear in pull-down menus which restricts the
length.  But strings which are not containing entire sentences or at
least large fragments of a sentence may appear in more than one
situation in the program but might have different translations.  This is
especially true for the one-word strings which are frequently used in
GUI programs.

   As a consequence many people say that the `gettext' approach is
wrong and instead `catgets' should be used which indeed does not have
this problem.  But there is a very simple and powerful method to handle
these kind of problems with the `gettext' functions.

As as example consider the following fictional situation.  A GUI program
has a menu bar with the following entries:

     +------------+------------+--------------------------------------+
     | File       | Printer    |                                      |
     +------------+------------+--------------------------------------+
     | Open     | | Select   |
     | New      | | Open     |
     +----------+ | Connect  |
                  +----------+

   To have the strings `File', `Printer', `Open', `New', `Select', and
`Connect' translated there has to be at some point in the code a call
to a function of the `gettext' family.  But in two places the string
passed into the function would be `Open'.  The translations might not
be the same and therefore we are in the dilemma described above.

   One solution to this problem is to artificially enlengthen the
strings to make them unambiguous.  But what would the program do if no
translation is available?  The enlengthened string is not what should be
printed.  So we should use a little bit modified version of the
functions.

   To enlengthen the strings a uniform method should be used.  E.g., in
the example above the strings could be chosen as

     Menu|File
     Menu|Printer
     Menu|File|Open
     Menu|File|New
     Menu|Printer|Select
     Menu|Printer|Open
     Menu|Printer|Connect

   Now all the strings are different and if now instead of `gettext'
the following little wrapper function is used, everything works just
fine:

       char *
       sgettext (const char *msgid)
       {
         char *msgval = gettext (msgid);
         if (msgval == msgid)
           msgval = strrchr (msgid, '|') + 1;
         return msgval;
       }

   What this little function does is to recognize the case when no
translation is available.  This can be done very efficiently by a
pointer comparison since the return value is the input value.  If there
is no translation we know that the input string is in the format we used
for the Menu entries and therefore contains a `|' character.  We simply
search for the last occurrence of this character and return a pointer
to the character following it.  That's it!

   If one now consistently uses the enlengthened string form and
replaces the `gettext' calls with calls to `sgettext' (this is normally
limited to very few places in the GUI implementation) then it is
possible to produce a program which can be internationalized.

   With advanced compilers (such as GNU C) one can write the `sgettext'
functions as an inline function or as a macro like this:

     #define sgettext(msgid) \
       ({ const char *__msgid = (msgid);            \
          char *__msgstr = gettext (__msgid);       \
          if (__msgval == __msgid)                  \
            __msgval = strrchr (__msgid, '|') + 1;  \
          __msgval; })

   The other `gettext' functions (`dgettext', `dcgettext' and the
`ngettext' equivalents) can and should have corresponding functions as
well which look almost identical, except for the parameters and the
call to the underlying function.

   Now there is of course the question why such functions do not exist
in the GNU C library?  There are two parts of the answer to this
question.

   * They are easy to write and therefore can be provided by the
     project they are used in.  This is not an answer by itself and
     must be seen together with the second part which is:

   * There is no way the C library can contain a version which can work
     everywhere.  The problem is the selection of the character to
     separate the prefix from the actual string in the enlenghtened
     string.  The examples above used `|' which is a quite good choice
     because it resembles a notation frequently used in this context
     and it also is a character not often used in message strings.

     But what if the character is used in message strings.  Or if the
     chose character is not available in the character set on the
     machine one compiles (e.g., `|' is not required to exist for
     ISO C; this is why the `iso646.h' file exists in ISO C programming
     environments).

   There is only one more comment to make left.  The wrapper function
above require that the translations strings are not enlengthened
themselves.  This is only logical.  There is no need to disambiguate
the strings (since they are never used as keys for a search) and one
also saves quite some memory and disk space by doing this.


File: libc.info,  Node: Using gettextized software,  Prev: GUI program problems,  Up: Message catalogs with gettext

User influence on `gettext'
...........................

The last sections described what the programmer can do to
internationalize the messages of the program.  But it is finally up to
the user to select the message s/he wants to see.  S/He must understand
them.

   The POSIX locale model uses the environment variables `LC_COLLATE',
`LC_CTYPE', `LC_MESSAGES', `LC_MONETARY', `NUMERIC', and `LC_TIME' to
select the locale which is to be used.  This way the user can influence
lots of functions.  As we mentioned above the `gettext' functions also
take advantage of this.

   To understand how this happens it is necessary to take a look at the
various components of the filename which gets computed to locate a
message catalog.  It is composed as follows:

     DIR_NAME/LOCALE/LC_CATEGORY/DOMAIN_NAME.mo

   The default value for DIR_NAME is system specific.  It is computed
from the value given as the prefix while configuring the C library.
This value normally is `/usr' or `/'.  For the former the complete
DIR_NAME is:

     /usr/share/locale

   We can use `/usr/share' since the `.mo' files containing the message
catalogs are system independent, so all systems can use the same files.
If the program executed the `bindtextdomain' function for the message
domain that is currently handled, the `dir_name' component is exactly
the value which was given to the function as the second parameter.
I.e., `bindtextdomain' allows overwriting the only system dependent and
fixed value to make it possible to address files anywhere in the
filesystem.

   The CATEGORY is the name of the locale category which was selected
in the program code.  For `gettext' and `dgettext' this is always
`LC_MESSAGES', for `dcgettext' this is selected by the value of the
third parameter.  As said above it should be avoided to ever use a
category other than `LC_MESSAGES'.

   The LOCALE component is computed based on the category used.  Just
like for the `setlocale' function here comes the user selection into
the play.  Some environment variables are examined in a fixed order and
the first environment variable set determines the return value of the
lookup process.  In detail, for the category `LC_xxx' the following
variables in this order are examined:

`LANGUAGE'

`LC_ALL'

`LC_xxx'

`LANG'

   This looks very familiar.  With the exception of the `LANGUAGE'
environment variable this is exactly the lookup order the `setlocale'
function uses.  But why introducing the `LANGUAGE' variable?

   The reason is that the syntax of the values these variables can have
is different to what is expected by the `setlocale' function.  If we
would set `LC_ALL' to a value following the extended syntax that would
mean the `setlocale' function will never be able to use the value of
this variable as well.  An additional variable removes this problem
plus we can select the language independently of the locale setting
which sometimes is useful.

   While for the `LC_xxx' variables the value should consist of exactly
one specification of a locale the `LANGUAGE' variable's value can
consist of a colon separated list of locale names.  The attentive
reader will realize that this is the way we manage to implement one of
our additional demands above: we want to be able to specify an ordered
list of language.

   Back to the constructed filename we have only one component missing.
The DOMAIN_NAME part is the name which was either registered using the
`textdomain' function or which was given to `dgettext' or `dcgettext'
as the first parameter.  Now it becomes obvious that a good choice for
the domain name in the program code is a string which is closely
related to the program/package name.  E.g., for the GNU C Library the
domain name is `libc'.

A limit piece of example code should show how the programmer is supposed
to work:

     {
       setlocale (LC_ALL, "");
       textdomain ("test-package");
       bindtextdomain ("test-package", "/usr/local/share/locale");
       puts (gettext ("Hello, world!"));
     }

   At the program start the default domain is `messages', and the
default locale is "C".  The `setlocale' call sets the locale according
to the user's environment variables; remember that correct functioning
of `gettext' relies on the correct setting of the `LC_MESSAGES' locale
(for looking up the message catalog) and of the `LC_CTYPE' locale (for
the character set conversion).  The `textdomain' call changes the
default domain to `test-package'.  The `bindtextdomain' call specifies
that the message catalogs for the domain `test-package' can be found
below the directory `/usr/local/share/locale'.

   If now the user set in her/his environment the variable `LANGUAGE'
to `de' the `gettext' function will try to use the translations from
the file

     /usr/local/share/locale/de/LC_MESSAGES/test-package.mo

   From the above descriptions it should be clear which component of
this filename is determined by which source.

   In the above example we assumed that the `LANGUAGE' environment
variable to `de'.  This might be an appropriate selection but what
happens if the user wants to use `LC_ALL' because of the wider
usability and here the required value is `de_DE.ISO-8859-1'?  We
already mentioned above that a situation like this is not infrequent.
E.g., a person might prefer reading a dialect and if this is not
available fall back on the standard language.

   The `gettext' functions know about situations like this and can
handle them gracefully.  The functions recognize the format of the value
of the environment variable.  It can split the value is different pieces
and by leaving out the only or the other part it can construct new
values.  This happens of course in a predictable way.  To understand
this one must know the format of the environment variable value.  There
is one more or less standardized form, originally from the X/Open
specification:

   `language[_territory[.codeset]][@modifier]'

   Less specific locale names will be stripped of in the order of the
following list:

  1. `codeset'

  2. `normalized codeset'

  3. `territory'

  4. `modifier'

   The `language' field will never be dropped for obvious reasons.

   The only new thing is the `normalized codeset' entry.  This is
another goodie which is introduced to help reducing the chaos which
derives from the inability of the people to standardize the names of
character sets.  Instead of ISO-8859-1 one can often see 8859-1, 88591,
iso8859-1, or iso_8859-1.  The `normalized codeset' value is generated
from the user-provided character set name by applying the following
rules:

  1. Remove all characters beside numbers and letters.

  2. Fold letters to lowercase.

  3. If the same only contains digits prepend the string `"iso"'.

So all of the above name will be normalized to `iso88591'.  This allows
the program user much more freely choosing the locale name.

   Even this extended functionality still does not help to solve the
problem that completely different names can be used to denote the same
locale (e.g., `de' and `german').  To be of help in this situation the
locale implementation and also the `gettext' functions know about
aliases.

   The file `/usr/share/locale/locale.alias' (replace `/usr' with
whatever prefix you used for configuring the C library) contains a
mapping of alternative names to more regular names.  The system manager
is free to add new entries to fill her/his own needs.  The selected
locale from the environment is compared with the entries in the first
column of this file ignoring the case.  If they match the value of the
second column is used instead for the further handling.

   In the description of the format of the environment variables we
already mentioned the character set as a factor in the selection of the
message catalog.  In fact, only catalogs which contain text written
using the character set of the system/program can be used (directly;
there will come a solution for this some day).  This means for the user
that s/he will always have to take care for this.  If in the collection
of the message catalogs there are files for the same language but coded
using different character sets the user has to be careful.


File: libc.info,  Node: Helper programs for gettext,  Prev: Message catalogs with gettext,  Up: The Uniforum approach

Programs to handle message catalogs for `gettext'
-------------------------------------------------

The GNU C Library does not contain the source code for the programs to
handle message catalogs for the `gettext' functions.  As part of the
GNU project the GNU gettext package contains everything the developer
needs.  The functionality provided by the tools in this package by far
exceeds the abilities of the `gencat' program described above for the
`catgets' functions.

   There is a program `msgfmt' which is the equivalent program to the
`gencat' program.  It generates from the human-readable and -editable
form of the message catalog a binary file which can be used by the
`gettext' functions.  But there are several more programs available.

   The `xgettext' program can be used to automatically extract the
translatable messages from a source file.  I.e., the programmer need not
take care for the translations and the list of messages which have to be
translated.  S/He will simply wrap the translatable string in calls to
`gettext' et.al and the rest will be done by `xgettext'.  This program
has a lot of option which help to customize the output or do help to
understand the input better.

   Other programs help to manage development cycle when new messages
appear in the source files or when a new translation of the messages
appear.  here it should only be noted that using all the tools in GNU
gettext it is possible to _completely_ automize the handling of message
catalog.  Beside marking the translatable string in the source code and
generating the translations the developers do not have anything to do
themselves.


File: libc.info,  Node: Searching and Sorting,  Next: Pattern Matching,  Prev: Message Translation,  Up: Top

Searching and Sorting
*********************

This chapter describes functions for searching and sorting arrays of
arbitrary objects.  You pass the appropriate comparison function to be
applied as an argument, along with the size of the objects in the array
and the total number of elements.

* Menu:

* Comparison Functions::        Defining how to compare two objects.
				 Since the sort and search facilities
                                 are general, you have to specify the
                                 ordering.
* Array Search Function::       The `bsearch' function.
* Array Sort Function::         The `qsort' function.
* Search/Sort Example::         An example program.
* Hash Search Function::        The `hsearch' function.
* Tree Search Function::        The `tsearch' function.


File: libc.info,  Node: Comparison Functions,  Next: Array Search Function,  Up: Searching and Sorting

Defining the Comparison Function
================================

In order to use the sorted array library functions, you have to describe
how to compare the elements of the array.

   To do this, you supply a comparison function to compare two elements
of the array.  The library will call this function, passing as arguments
pointers to two array elements to be compared.  Your comparison function
should return a value the way `strcmp' (*note String/Array
Comparison::) does: negative if the first argument is "less" than the
second, zero if they are "equal", and positive if the first argument is
"greater".

   Here is an example of a comparison function which works with an
array of numbers of type `double':

     int
     compare_doubles (const void *a, const void *b)
     {
       const double *da = (const double *) a;
       const double *db = (const double *) b;
     
       return (*da > *db) - (*da < *db);
     }

   The header file `stdlib.h' defines a name for the data type of
comparison functions.  This type is a GNU extension.

     int comparison_fn_t (const void *, const void *);


File: libc.info,  Node: Array Search Function,  Next: Array Sort Function,  Prev: Comparison Functions,  Up: Searching and Sorting

Array Search Function
=====================

Generally searching for a specific element in an array means that
potentially all elements must be checked.  The GNU C library contains
functions to perform linear search.  The prototypes for the following
two functions can be found in `search.h'.

 - Function: void * lfind (const void *KEY, void *BASE, size_t *NMEMB,
          size_t SIZE, comparison_fn_t COMPAR)
     The `lfind' function searches in the array with `*NMEMB' elements
     of SIZE bytes pointed to by BASE for an element which matches the
     one pointed to by KEY.  The function pointed to by COMPAR is used
     decide whether two elements match.

     The return value is a pointer to the matching element in the array
     starting at BASE if it is found.  If no matching element is
     available `NULL' is returned.

     The mean runtime of this function is `*NMEMB'/2.  This function
     should only be used elements often get added to or deleted from
     the array in which case it might not be useful to sort the array
     before searching.

 - Function: void * lsearch (const void *KEY, void *BASE, size_t
          *NMEMB, size_t SIZE, comparison_fn_t COMPAR)
     The `lsearch' function is similar to the `lfind' function.  It
     searches the given array for an element and returns it if found.
     The difference is that if no matching element is found the
     `lsearch' function adds the object pointed to by KEY (with a size
     of SIZE bytes) at the end of the array and it increments the value
     of `*NMEMB' to reflect this addition.

     This means for the caller that if it is not sure that the array
     contains the element one is searching for the memory allocated for
     the array starting at BASE must have room for at least SIZE more
     bytes.  If one is sure the element is in the array it is better to
     use `lfind' so having more room in the array is always necessary
     when calling `lsearch'.

   To search a sorted array for an element matching the key, use the
`bsearch' function.  The prototype for this function is in the header
file `stdlib.h'.

 - Function: void * bsearch (const void *KEY, const void *ARRAY, size_t
          COUNT, size_t SIZE, comparison_fn_t COMPARE)
     The `bsearch' function searches the sorted array ARRAY for an
     object that is equivalent to KEY.  The array contains COUNT
     elements, each of which is of size SIZE bytes.

     The COMPARE function is used to perform the comparison.  This
     function is called with two pointer arguments and should return an
     integer less than, equal to, or greater than zero corresponding to
     whether its first argument is considered less than, equal to, or
     greater than its second argument.  The elements of the ARRAY must
     already be sorted in ascending order according to this comparison
     function.

     The return value is a pointer to the matching array element, or a
     null pointer if no match is found.  If the array contains more
     than one element that matches, the one that is returned is
     unspecified.

     This function derives its name from the fact that it is implemented
     using the binary search algorithm.


File: libc.info,  Node: Array Sort Function,  Next: Search/Sort Example,  Prev: Array Search Function,  Up: Searching and Sorting

Array Sort Function
===================

To sort an array using an arbitrary comparison function, use the
`qsort' function.  The prototype for this function is in `stdlib.h'.

 - Function: void qsort (void *ARRAY, size_t COUNT, size_t SIZE,
          comparison_fn_t COMPARE)
     The QSORT function sorts the array ARRAY.  The array contains
     COUNT elements, each of which is of size SIZE.

     The COMPARE function is used to perform the comparison on the
     array elements.  This function is called with two pointer
     arguments and should return an integer less than, equal to, or
     greater than zero corresponding to whether its first argument is
     considered less than, equal to, or greater than its second
     argument.

     *Warning:* If two objects compare as equal, their order after
     sorting is unpredictable.  That is to say, the sorting is not
     stable.  This can make a difference when the comparison considers
     only part of the elements.  Two elements with the same sort key
     may differ in other respects.

     If you want the effect of a stable sort, you can get this result by
     writing the comparison function so that, lacking other reason
     distinguish between two elements, it compares them by their
     addresses.  Note that doing this may make the sorting algorithm
     less efficient, so do it only if necessary.

     Here is a simple example of sorting an array of doubles in
     numerical order, using the comparison function defined above
     (*note Comparison Functions::):

          {
            double *array;
            int size;
            ...
            qsort (array, size, sizeof (double), compare_doubles);
          }

     The `qsort' function derives its name from the fact that it was
     originally implemented using the "quick sort" algorithm.

     The implementation of `qsort' in this library might not be an
     in-place sort and might thereby use an extra amount of memory to
     store the array.


File: libc.info,  Node: Search/Sort Example,  Next: Hash Search Function,  Prev: Array Sort Function,  Up: Searching and Sorting

Searching and Sorting Example
=============================

Here is an example showing the use of `qsort' and `bsearch' with an
array of structures.  The objects in the array are sorted by comparing
their `name' fields with the `strcmp' function.  Then, we can look up
individual objects based on their names.

     #include <stdlib.h>
     #include <stdio.h>
     #include <string.h>
     
     /* Define an array of critters to sort. */
     
     struct critter
       {
         const char *name;
         const char *species;
       };
     
     struct critter muppets[] =
       {
         {"Kermit", "frog"},
         {"Piggy", "pig"},
         {"Gonzo", "whatever"},
         {"Fozzie", "bear"},
         {"Sam", "eagle"},
         {"Robin", "frog"},
         {"Animal", "animal"},
         {"Camilla", "chicken"},
         {"Sweetums", "monster"},
         {"Dr. Strangepork", "pig"},
         {"Link Hogthrob", "pig"},
         {"Zoot", "human"},
         {"Dr. Bunsen Honeydew", "human"},
         {"Beaker", "human"},
         {"Swedish Chef", "human"}
       };
     
     int count = sizeof (muppets) / sizeof (struct critter);
     
     
     
     /* This is the comparison function used for sorting and searching. */
     
     int
     critter_cmp (const struct critter *c1, const struct critter *c2)
     {
       return strcmp (c1->name, c2->name);
     }
     
     
     /* Print information about a critter. */
     
     void
     print_critter (const struct critter *c)
     {
       printf ("%s, the %s\n", c->name, c->species);
     }
     
     
     /* Do the lookup into the sorted array. */
     
     void
     find_critter (const char *name)
     {
       struct critter target, *result;
       target.name = name;
       result = bsearch (&target, muppets, count, sizeof (struct critter),
                         critter_cmp);
       if (result)
         print_critter (result);
       else
         printf ("Couldn't find %s.\n", name);
     }
     
     /* Main program. */
     
     int
     main (void)
     {
       int i;
     
       for (i = 0; i < count; i++)
         print_critter (&muppets[i]);
       printf ("\n");
     
       qsort (muppets, count, sizeof (struct critter), critter_cmp);
     
       for (i = 0; i < count; i++)
         print_critter (&muppets[i]);
       printf ("\n");
     
       find_critter ("Kermit");
       find_critter ("Gonzo");
       find_critter ("Janice");
     
       return 0;
     }

   The output from this program looks like:

     Kermit, the frog
     Piggy, the pig
     Gonzo, the whatever
     Fozzie, the bear
     Sam, the eagle
     Robin, the frog
     Animal, the animal
     Camilla, the chicken
     Sweetums, the monster
     Dr. Strangepork, the pig
     Link Hogthrob, the pig
     Zoot, the human
     Dr. Bunsen Honeydew, the human
     Beaker, the human
     Swedish Chef, the human
     
     Animal, the animal
     Beaker, the human
     Camilla, the chicken
     Dr. Bunsen Honeydew, the human
     Dr. Strangepork, the pig
     Fozzie, the bear
     Gonzo, the whatever
     Kermit, the frog
     Link Hogthrob, the pig
     Piggy, the pig
     Robin, the frog
     Sam, the eagle
     Swedish Chef, the human
     Sweetums, the monster
     Zoot, the human
     
     Kermit, the frog
     Gonzo, the whatever
     Couldn't find Janice.


File: libc.info,  Node: Hash Search Function,  Next: Tree Search Function,  Prev: Search/Sort Example,  Up: Searching and Sorting

The `hsearch' function.
=======================

The functions mentioned so far in this chapter are searching in a sorted
or unsorted array.  There are other methods to organize information
which later should be searched.  The costs of insert, delete and search
differ.  One possible implementation is using hashing tables.  The
following functions are declared in the the header file `search.h'.

 - Function: int hcreate (size_t NEL)
     The `hcreate' function creates a hashing table which can contain at
     least NEL elements.  There is no possibility to grow this table so
     it is necessary to choose the value for NEL wisely.  The used
     methods to implement this function might make it necessary to make
     the number of elements in the hashing table larger than the
     expected maximal number of elements.  Hashing tables usually work
     inefficient if they are filled 80% or more.  The constant access
     time guaranteed by hashing can only be achieved if few collisions
     exist.  See Knuth's "The Art of Computer Programming, Part 3:
     Searching and Sorting" for more information.

     The weakest aspect of this function is that there can be at most
     one hashing table used through the whole program.  The table is
     allocated in local memory out of control of the programmer.  As an
     extension the GNU C library provides an additional set of
     functions with an reentrant interface which provide a similar
     interface but which allow to keep arbitrarily many hashing tables.

     It is possible to use more than one hashing table in the program
     run if the former table is first destroyed by a call to `hdestroy'.

     The function returns a non-zero value if successful.  If it return
     zero something went wrong.  This could either mean there is
     already a hashing table in use or the program runs out of memory.

 - Function: void hdestroy (void)
     The `hdestroy' function can be used to free all the resources
     allocated in a previous call of `hcreate'.  After a call to this
     function it is again possible to call `hcreate' and allocate a new
     table with possibly different size.

     It is important to remember that the elements contained in the
     hashing table at the time `hdestroy' is called are _not_ freed by
     this function.  It is the responsibility of the program code to
     free those strings (if necessary at all).  Freeing all the element
     memory is not possible without extra, separately kept information
     since there is no function to iterate through all available
     elements in the hashing table.  If it is really necessary to free
     a table and all elements the programmer has to keep a list of all
     table elements and before calling `hdestroy' s/he has to free all
     element's data using this list.  This is a very unpleasant
     mechanism and it also shows that this kind of hashing tables is
     mainly meant for tables which are created once and used until the
     end of the program run.

   Entries of the hashing table and keys for the search are defined
using this type:

 - Data type: struct ENTRY
     Both elements of this structure are pointers to zero-terminated
     strings.  This is a limiting restriction of the functionality of
     the `hsearch' functions.  They can only be used for data sets
     which use the NUL character always and solely to terminate the
     records.  It is not possible to handle general binary data.

    `char *key'
          Pointer to a zero-terminated string of characters describing
          the key for the search or the element in the hashing table.

    `char *data'
          Pointer to a zero-terminated string of characters describing
          the data.  If the functions will be called only for searching
          an existing entry this element might stay undefined since it
          is not used.

 - Function: ENTRY * hsearch (ENTRY ITEM, ACTION ACTION)
     To search in a hashing table created using `hcreate' the `hsearch'
     function must be used.  This function can perform simple search
     for an element (if ACTION has the `FIND') or it can alternatively
     insert the key element into the hashing table.  Entries are never
     replaced.

     The key is denoted by a pointer to an object of type `ENTRY'.  For
     locating the corresponding position in the hashing table only the
     `key' element of the structure is used.

     If an entry with matching key is found the ACTION parameter is
     irrelevant.  The found entry is returned.  If no matching entry is
     found and the ACTION parameter has the value `FIND' the function
     returns a `NULL' pointer.  If no entry is found and the ACTION
     parameter has the value `ENTER' a new entry is added to the
     hashing table which is initialized with the parameter ITEM.  A
     pointer to the newly added entry is returned.

   As mentioned before the hashing table used by the functions
described so far is global and there can be at any time at most one
hashing table in the program.  A solution is to use the following
functions which are a GNU extension.  All have in common that they
operate on a hashing table which is described by the content of an
object of the type `struct hsearch_data'.  This type should be treated
as opaque, none of its members should be changed directly.

 - Function: int hcreate_r (size_t NEL, struct hsearch_data *HTAB)
     The `hcreate_r' function initializes the object pointed to by HTAB
     to contain a hashing table with at least NEL elements.  So this
     function is equivalent to the `hcreate' function except that the
     initialized data structure is controlled by the user.

     This allows having more than one hashing table at one time.  The
     memory necessary for the `struct hsearch_data' object can be
     allocated dynamically.  It must be initialized with zero before
     calling this function.

     The return value is non-zero if the operation were successful.  if
     the return value is zero something went wrong which probably means
     the programs runs out of memory.

 - Function: void hdestroy_r (struct hsearch_data *HTAB)
     The `hdestroy_r' function frees all resources allocated by the
     `hcreate_r' function for this very same object HTAB.  As for
     `hdestroy' it is the programs responsibility to free the strings
     for the elements of the table.

 - Function: int hsearch_r (ENTRY ITEM, ACTION ACTION, ENTRY **RETVAL,
          struct hsearch_data *HTAB)
     The `hsearch_r' function is equivalent to `hsearch'.  The meaning
     of the first two arguments is identical.  But instead of operating
     on a single global hashing table the function works on the table
     described by the object pointed to by HTAB (which is initialized
     by a call to `hcreate_r').

     Another difference to `hcreate' is that the pointer to the found
     entry in the table is not the return value of the functions.  It is
     returned by storing it in a pointer variables pointed to by the
     RETVAL parameter.  The return value of the function is an integer
     value indicating success if it is non-zero and failure if it is
     zero.  In the latter case the global variable ERRNO signals the
     reason for the failure.

    `ENOMEM'
          The table is filled and `hsearch_r' was called with an so far
          unknown key and ACTION set to `ENTER'.

    `ESRCH'
          The ACTION parameter is `FIND' and no corresponding element
          is found in the table.


File: libc.info,  Node: Tree Search Function,  Prev: Hash Search Function,  Up: Searching and Sorting

The `tsearch' function.
=======================

Another common form to organize data for efficient search is to use
trees.  The `tsearch' function family provides a nice interface to
functions to organize possibly large amounts of data by providing a mean
access time proportional to the logarithm of the number of elements.
The GNU C library implementation even guarantees that this bound is
never exceeded even for input data which cause problems for simple
binary tree implementations.

   The functions described in the chapter are all described in the
System V and X/Open specifications and are therefore quite portable.

   In contrast to the `hsearch' functions the `tsearch' functions can
be used with arbitrary data and not only zero-terminated strings.

   The `tsearch' functions have the advantage that no function to
initialize data structures is necessary.  A simple pointer of type
`void *' initialized to `NULL' is a valid tree and can be extended or
searched.  The prototypes for these functions can be found in the
header file `search.h'.

 - Function: void * tsearch (const void *KEY, void **ROOTP,
          comparison_fn_t COMPAR)
     The `tsearch' function searches in the tree pointed to by `*ROOTP'
     for an element matching KEY.  The function pointed to by COMPAR is
     used to determine whether two elements match.  *Note Comparison
     Functions::, for a specification of the functions which can be
     used for the COMPAR parameter.

     If the tree does not contain a matching entry the KEY value will
     be added to the tree.  `tsearch' does not make a copy of the object
     pointed to by KEY (how could it since the size is unknown).
     Instead it adds a reference to this object which means the object
     must be available as long as the tree data structure is used.

     The tree is represented by a pointer to a pointer since it is
     sometimes necessary to change the root node of the tree.  So it
     must not be assumed that the variable pointed to by ROOTP has the
     same value after the call.  This also shows that it is not safe to
     call the `tsearch' function more than once at the same time using
     the same tree.  It is no problem to run it more than once at a
     time on different trees.

     The return value is a pointer to the matching element in the tree.
     If a new element was created the pointer points to the new data
     (which is in fact KEY).  If an entry had to be created and the
     program ran out of space `NULL' is returned.

 - Function: void * tfind (const void *KEY, void *const *ROOTP,
          comparison_fn_t COMPAR)
     The `tfind' function is similar to the `tsearch' function.  It
     locates an element matching the one pointed to by KEY and returns
     a pointer to this element.  But if no matching element is
     available no new element is entered (note that the ROOTP parameter
     points to a constant pointer).  Instead the function returns
     `NULL'.

   Another advantage of the `tsearch' function in contrast to the
`hsearch' functions is that there is an easy way to remove elements.

 - Function: void * tdelete (const void *KEY, void **ROOTP,
          comparison_fn_t COMPAR)
     To remove a specific element matching KEY from the tree `tdelete'
     can be used.  It locates the matching element using the same
     method as `tfind'.  The corresponding element is then removed and
     a pointer to the parent of the deleted node is returned by the
     function.  If there is no matching entry in the tree nothing can be
     deleted and the function returns `NULL'.  If the root of the tree
     is deleted `tdelete' returns some unspecified value not equal to
     `NULL'.

 - Function: void tdestroy (void *VROOT, __free_fn_t FREEFCT)
     If the complete search tree has to be removed one can use
     `tdestroy'.  It frees all resources allocated by the `tsearch'
     function to generate the tree pointed to by VROOT.

     For the data in each tree node the function FREEFCT is called.
     The pointer to the data is passed as the argument to the function.
     If no such work is necessary FREEFCT must point to a function
     doing nothing.  It is called in any case.

     This function is a GNU extension and not covered by the System V or
     X/Open specifications.

   In addition to the function to create and destroy the tree data
structure, there is another function which allows you to apply a
function to all elements of the tree.  The function must have this type:

     void __action_fn_t (const void *nodep, VISIT value, int level);

   The NODEP is the data value of the current node (once given as the
KEY argument to `tsearch').  LEVEL is a numeric value which corresponds
to the depth of the current node in the tree.  The root node has the
depth 0 and its children have a depth of 1 and so on.  The `VISIT' type
is an enumeration type.

 - Data Type: VISIT
     The `VISIT' value indicates the status of the current node in the
     tree and how the function is called.  The status of a node is
     either `leaf' or `internal node'.  For each leaf node the function
     is called exactly once, for each internal node it is called three
     times: before the first child is processed, after the first child
     is processed and after both children are processed.  This makes it
     possible to handle all three methods of tree traversal (or even a
     combination of them).

    `preorder'
          The current node is an internal node and the function is
          called before the first child was processed.

    `postorder'
          The current node is an internal node and the function is
          called after the first child was processed.

    `endorder'
          The current node is an internal node and the function is
          called after the second child was processed.

    `leaf'
          The current node is a leaf.

 - Function: void twalk (const void *ROOT, __action_fn_t ACTION)
     For each node in the tree with a node pointed to by ROOT, the
     `twalk' function calls the function provided by the parameter
     ACTION.  For leaf nodes the function is called exactly once with
     VALUE set to `leaf'.  For internal nodes the function is called
     three times, setting the VALUE parameter or ACTION to the
     appropriate value.  The LEVEL argument for the ACTION function is
     computed while descending the tree with increasing the value by
     one for the descend to a child, starting with the value 0 for the
     root node.

     Since the functions used for the ACTION parameter to `twalk' must
     not modify the tree data, it is safe to run `twalk' in more than
     one thread at the same time, working on the same tree.  It is also
     safe to call `tfind' in parallel.  Functions which modify the tree
     must not be used, otherwise the behavior is undefined.


File: libc.info,  Node: Pattern Matching,  Next: I/O Overview,  Prev: Searching and Sorting,  Up: Top

Pattern Matching
****************

The GNU C Library provides pattern matching facilities for two kinds of
patterns: regular expressions and file-name wildcards.  The library also
provides a facility for expanding variable and command references and
parsing text into words in the way the shell does.

* Menu:

* Wildcard Matching::    Matching a wildcard pattern against a single string.
* Globbing::             Finding the files that match a wildcard pattern.
* Regular Expressions::  Matching regular expressions against strings.
* Word Expansion::       Expanding shell variables, nested commands,
			    arithmetic, and wildcards.
			    This is what the shell does with shell commands.


File: libc.info,  Node: Wildcard Matching,  Next: Globbing,  Up: Pattern Matching

Wildcard Matching
=================

This section describes how to match a wildcard pattern against a
particular string.  The result is a yes or no answer: does the string
fit the pattern or not.  The symbols described here are all declared in
`fnmatch.h'.

 - Function: int fnmatch (const char *PATTERN, const char *STRING, int
          FLAGS)
     This function tests whether the string STRING matches the pattern
     PATTERN.  It returns `0' if they do match; otherwise, it returns
     the nonzero value `FNM_NOMATCH'.  The arguments PATTERN and STRING
     are both strings.

     The argument FLAGS is a combination of flag bits that alter the
     details of matching.  See below for a list of the defined flags.

     In the GNU C Library, `fnmatch' cannot experience an "error"--it
     always returns an answer for whether the match succeeds.  However,
     other implementations of `fnmatch' might sometimes report "errors".
     They would do so by returning nonzero values that are not equal to
     `FNM_NOMATCH'.

   These are the available flags for the FLAGS argument:

`FNM_FILE_NAME'
     Treat the `/' character specially, for matching file names.  If
     this flag is set, wildcard constructs in PATTERN cannot match `/'
     in STRING.  Thus, the only way to match `/' is with an explicit
     `/' in PATTERN.

`FNM_PATHNAME'
     This is an alias for `FNM_FILE_NAME'; it comes from POSIX.2.  We
     don't recommend this name because we don't use the term "pathname"
     for file names.

`FNM_PERIOD'
     Treat the `.' character specially if it appears at the beginning of
     STRING.  If this flag is set, wildcard constructs in PATTERN
     cannot match `.' as the first character of STRING.

     If you set both `FNM_PERIOD' and `FNM_FILE_NAME', then the special
     treatment applies to `.' following `/' as well as to `.' at the
     beginning of STRING.  (The shell uses the `FNM_PERIOD' and
     `FNM_FILE_NAME' flags together for matching file names.)

`FNM_NOESCAPE'
     Don't treat the `\' character specially in patterns.  Normally,
     `\' quotes the following character, turning off its special meaning
     (if any) so that it matches only itself.  When quoting is enabled,
     the pattern `\?' matches only the string `?', because the question
     mark in the pattern acts like an ordinary character.

     If you use `FNM_NOESCAPE', then `\' is an ordinary character.

`FNM_LEADING_DIR'
     Ignore a trailing sequence of characters starting with a `/' in
     STRING; that is to say, test whether STRING starts with a
     directory name that PATTERN matches.

     If this flag is set, either `foo*' or `foobar' as a pattern would
     match the string `foobar/frobozz'.

`FNM_CASEFOLD'
     Ignore case in comparing STRING to PATTERN.

`FNM_EXTMATCH'
     Recognize beside the normal patterns also the extended patterns
     introduced in `ksh'.  The patterns are written in the form
     explained in the following table where PATTERN-LIST is a `|'
     separated list of patterns.

    `?(PATTERN-LIST)'
          The pattern matches if zero or one occurrences of any of the
          patterns in the PATTERN-LIST allow matching the input string.

    `*(PATTERN-LIST)'
          The pattern matches if zero or more occurrences of any of the
          patterns in the PATTERN-LIST allow matching the input string.

    `+(PATTERN-LIST)'
          The pattern matches if one or more occurrences of any of the
          patterns in the PATTERN-LIST allow matching the input string.

    `@(PATTERN-LIST)'
          The pattern matches if exactly one occurrence of any of the
          patterns in the PATTERN-LIST allows matching the input string.

    `!(PATTERN-LIST)'
          The pattern matches if the input string cannot be matched
          with any of the patterns in the PATTERN-LIST.


File: libc.info,  Node: Globbing,  Next: Regular Expressions,  Prev: Wildcard Matching,  Up: Pattern Matching

Globbing
========

The archetypal use of wildcards is for matching against the files in a
directory, and making a list of all the matches.  This is called
"globbing".

   You could do this using `fnmatch', by reading the directory entries
one by one and testing each one with `fnmatch'.  But that would be slow
(and complex, since you would have to handle subdirectories by hand).

   The library provides a function `glob' to make this particular use
of wildcards convenient.  `glob' and the other symbols in this section
are declared in `glob.h'.

* Menu:

* Calling Glob::             Basic use of `glob'.
* Flags for Globbing::       Flags that enable various options in `glob'.
* More Flags for Globbing::  GNU specific extensions to `glob'.


File: libc.info,  Node: Calling Glob,  Next: Flags for Globbing,  Up: Globbing

Calling `glob'
--------------

The result of globbing is a vector of file names (strings).  To return
this vector, `glob' uses a special data type, `glob_t', which is a
structure.  You pass `glob' the address of the structure, and it fills
in the structure's fields to tell you about the results.

 - Data Type: glob_t
     This data type holds a pointer to a word vector.  More precisely,
     it records both the address of the word vector and its size.  The
     GNU implementation contains some more fields which are non-standard
     extensions.

    `gl_pathc'
          The number of elements in the vector, excluding the initial
          null entries if the GLOB_DOOFFS flag is used (see gl_offs
          below).

    `gl_pathv'
          The address of the vector.  This field has type `char **'.

    `gl_offs'
          The offset of the first real element of the vector, from its
          nominal address in the `gl_pathv' field.  Unlike the other
          fields, this is always an input to `glob', rather than an
          output from it.

          If you use a nonzero offset, then that many elements at the
          beginning of the vector are left empty.  (The `glob' function
          fills them with null pointers.)

          The `gl_offs' field is meaningful only if you use the
          `GLOB_DOOFFS' flag.  Otherwise, the offset is always zero
          regardless of what is in this field, and the first real
          element comes at the beginning of the vector.

    `gl_closedir'
          The address of an alternative implementation of the `closedir'
          function.  It is used if the `GLOB_ALTDIRFUNC' bit is set in
          the flag parameter.  The type of this field is
          `void (*) (void *)'.

          This is a GNU extension.

    `gl_readdir'
          The address of an alternative implementation of the `readdir'
          function used to read the contents of a directory.  It is
          used if the `GLOB_ALTDIRFUNC' bit is set in the flag
          parameter.  The type of this field is
          `struct dirent *(*) (void *)'.

          This is a GNU extension.

    `gl_opendir'
          The address of an alternative implementation of the `opendir'
          function.  It is used if the `GLOB_ALTDIRFUNC' bit is set in
          the flag parameter.  The type of this field is
          `void *(*) (const char *)'.

          This is a GNU extension.

    `gl_stat'
          The address of an alternative implementation of the `stat'
          function to get information about an object in the
          filesystem.  It is used if the `GLOB_ALTDIRFUNC' bit is set
          in the flag parameter.  The type of this field is
          `int (*) (const char *, struct stat *)'.

          This is a GNU extension.

    `gl_lstat'
          The address of an alternative implementation of the `lstat'
          function to get information about an object in the
          filesystems, not following symbolic links.  It is used if the
          `GLOB_ALTDIRFUNC' bit is set in the flag parameter.  The type
          of this field is `int (*) (const char *, struct stat *)'.

          This is a GNU extension.

   For use in the `glob64' function `glob.h' contains another
definition for a very similar type.  `glob64_t' differs from `glob_t'
only in the types of the members `gl_readdir', `gl_stat', and
`gl_lstat'.

 - Data Type: glob64_t
     This data type holds a pointer to a word vector.  More precisely,
     it records both the address of the word vector and its size.  The
     GNU implementation contains some more fields which are non-standard
     extensions.

    `gl_pathc'
          The number of elements in the vector, excluding the initial
          null entries if the GLOB_DOOFFS flag is used (see gl_offs
          below).

    `gl_pathv'
          The address of the vector.  This field has type `char **'.

    `gl_offs'
          The offset of the first real element of the vector, from its
          nominal address in the `gl_pathv' field.  Unlike the other
          fields, this is always an input to `glob', rather than an
          output from it.

          If you use a nonzero offset, then that many elements at the
          beginning of the vector are left empty.  (The `glob' function
          fills them with null pointers.)

          The `gl_offs' field is meaningful only if you use the
          `GLOB_DOOFFS' flag.  Otherwise, the offset is always zero
          regardless of what is in this field, and the first real
          element comes at the beginning of the vector.

    `gl_closedir'
          The address of an alternative implementation of the `closedir'
          function.  It is used if the `GLOB_ALTDIRFUNC' bit is set in
          the flag parameter.  The type of this field is
          `void (*) (void *)'.

          This is a GNU extension.

    `gl_readdir'
          The address of an alternative implementation of the
          `readdir64' function used to read the contents of a
          directory.  It is used if the `GLOB_ALTDIRFUNC' bit is set in
          the flag parameter.  The type of this field is
          `struct dirent64 *(*) (void *)'.

          This is a GNU extension.

    `gl_opendir'
          The address of an alternative implementation of the `opendir'
          function.  It is used if the `GLOB_ALTDIRFUNC' bit is set in
          the flag parameter.  The type of this field is
          `void *(*) (const char *)'.

          This is a GNU extension.

    `gl_stat'
          The address of an alternative implementation of the `stat64'
          function to get information about an object in the
          filesystem.  It is used if the `GLOB_ALTDIRFUNC' bit is set
          in the flag parameter.  The type of this field is
          `int (*) (const char *, struct stat64 *)'.

          This is a GNU extension.

    `gl_lstat'
          The address of an alternative implementation of the `lstat64'
          function to get information about an object in the
          filesystems, not following symbolic links.  It is used if the
          `GLOB_ALTDIRFUNC' bit is set in the flag parameter.  The type
          of this field is `int (*) (const char *, struct stat64 *)'.

          This is a GNU extension.

 - Function: int glob (const char *PATTERN, int FLAGS, int (*ERRFUNC)
          (const char *FILENAME, int ERROR-CODE), glob_t *VECTOR-PTR)
     The function `glob' does globbing using the pattern PATTERN in the
     current directory.  It puts the result in a newly allocated
     vector, and stores the size and address of this vector into
     `*VECTOR-PTR'.  The argument FLAGS is a combination of bit flags;
     see *Note Flags for Globbing::, for details of the flags.

     The result of globbing is a sequence of file names.  The function
     `glob' allocates a string for each resulting word, then allocates
     a vector of type `char **' to store the addresses of these
     strings.  The last element of the vector is a null pointer.  This
     vector is called the "word vector".

     To return this vector, `glob' stores both its address and its
     length (number of elements, not counting the terminating null
     pointer) into `*VECTOR-PTR'.

     Normally, `glob' sorts the file names alphabetically before
     returning them.  You can turn this off with the flag `GLOB_NOSORT'
     if you want to get the information as fast as possible.  Usually
     it's a good idea to let `glob' sort them--if you process the files
     in alphabetical order, the users will have a feel for the rate of
     progress that your application is making.

     If `glob' succeeds, it returns 0.  Otherwise, it returns one of
     these error codes:

    `GLOB_ABORTED'
          There was an error opening a directory, and you used the flag
          `GLOB_ERR' or your specified ERRFUNC returned a nonzero value.
          *Note Flags for Globbing::, for an explanation of the
          `GLOB_ERR' flag and ERRFUNC.

    `GLOB_NOMATCH'
          The pattern didn't match any existing files.  If you use the
          `GLOB_NOCHECK' flag, then you never get this error code,
          because that flag tells `glob' to _pretend_ that the pattern
          matched at least one file.

    `GLOB_NOSPACE'
          It was impossible to allocate memory to hold the result.

     In the event of an error, `glob' stores information in
     `*VECTOR-PTR' about all the matches it has found so far.

     It is important to notice that the `glob' function will not fail if
     it encounters directories or files which cannot be handled without
     the LFS interfaces.  The implementation of `glob' is supposed to
     use these functions internally.  This at least is the assumptions
     made by the Unix standard.  The GNU extension of allowing the user
     to provide own directory handling and `stat' functions complicates
     things a bit.  If these callback functions are used and a large
     file or directory is encountered `glob' _can_ fail.

 - Function: int glob64 (const char *PATTERN, int FLAGS, int (*ERRFUNC)
          (const char *FILENAME, int ERROR-CODE), glob64_t *VECTOR-PTR)
     The `glob64' function was added as part of the Large File Summit
     extensions but is not part of the original LFS proposal.  The
     reason for this is simple: it is not necessary.  The necessity for
     a `glob64' function is added by the extensions of the GNU `glob'
     implementation which allows the user to provide own directory
     handling and `stat' functions.  The `readdir' and `stat' functions
     do depend on the choice of `_FILE_OFFSET_BITS' since the definition
     of the types `struct dirent' and `struct stat' will change
     depending on the choice.

     Beside this difference the `glob64' works just like `glob' in all
     aspects.

     This function is a GNU extension.


File: libc.info,  Node: Flags for Globbing,  Next: More Flags for Globbing,  Prev: Calling Glob,  Up: Globbing

Flags for Globbing
------------------

This section describes the flags that you can specify in the FLAGS
argument to `glob'.  Choose the flags you want, and combine them with
the C bitwise OR operator `|'.

`GLOB_APPEND'
     Append the words from this expansion to the vector of words
     produced by previous calls to `glob'.  This way you can
     effectively expand several words as if they were concatenated with
     spaces between them.

     In order for appending to work, you must not modify the contents
     of the word vector structure between calls to `glob'.  And, if you
     set `GLOB_DOOFFS' in the first call to `glob', you must also set
     it when you append to the results.

     Note that the pointer stored in `gl_pathv' may no longer be valid
     after you call `glob' the second time, because `glob' might have
     relocated the vector.  So always fetch `gl_pathv' from the
     `glob_t' structure after each `glob' call; *never* save the
     pointer across calls.

`GLOB_DOOFFS'
     Leave blank slots at the beginning of the vector of words.  The
     `gl_offs' field says how many slots to leave.  The blank slots
     contain null pointers.

`GLOB_ERR'
     Give up right away and report an error if there is any difficulty
     reading the directories that must be read in order to expand
     PATTERN fully.  Such difficulties might include a directory in
     which you don't have the requisite access.  Normally, `glob' tries
     its best to keep on going despite any errors, reading whatever
     directories it can.

     You can exercise even more control than this by specifying an
     error-handler function ERRFUNC when you call `glob'.  If ERRFUNC
     is not a null pointer, then `glob' doesn't give up right away when
     it can't read a directory; instead, it calls ERRFUNC with two
     arguments, like this:

          (*ERRFUNC) (FILENAME, ERROR-CODE)

     The argument FILENAME is the name of the directory that `glob'
     couldn't open or couldn't read, and ERROR-CODE is the `errno'
     value that was reported to `glob'.

     If the error handler function returns nonzero, then `glob' gives up
     right away.  Otherwise, it continues.

`GLOB_MARK'
     If the pattern matches the name of a directory, append `/' to the
     directory's name when returning it.

`GLOB_NOCHECK'
     If the pattern doesn't match any file names, return the pattern
     itself as if it were a file name that had been matched.
     (Normally, when the pattern doesn't match anything, `glob' returns
     that there were no matches.)

`GLOB_NOSORT'
     Don't sort the file names; return them in no particular order.
     (In practice, the order will depend on the order of the entries in
     the directory.)  The only reason _not_ to sort is to save time.

`GLOB_NOESCAPE'
     Don't treat the `\' character specially in patterns.  Normally,
     `\' quotes the following character, turning off its special meaning
     (if any) so that it matches only itself.  When quoting is enabled,
     the pattern `\?' matches only the string `?', because the question
     mark in the pattern acts like an ordinary character.

     If you use `GLOB_NOESCAPE', then `\' is an ordinary character.

     `glob' does its work by calling the function `fnmatch' repeatedly.
     It handles the flag `GLOB_NOESCAPE' by turning on the
     `FNM_NOESCAPE' flag in calls to `fnmatch'.


File: libc.info,  Node: More Flags for Globbing,  Prev: Flags for Globbing,  Up: Globbing

More Flags for Globbing
-----------------------

Beside the flags described in the last section, the GNU implementation
of `glob' allows a few more flags which are also defined in the
`glob.h' file.  Some of the extensions implement functionality which is
available in modern shell implementations.

`GLOB_PERIOD'
     The `.' character (period) is treated special.  It cannot be
     matched by wildcards.  *Note Wildcard Matching::, `FNM_PERIOD'.

`GLOB_MAGCHAR'
     The `GLOB_MAGCHAR' value is not to be given to `glob' in the FLAGS
     parameter.  Instead, `glob' sets this bit in the GL_FLAGS element
     of the GLOB_T structure provided as the result if the pattern used
     for matching contains any wildcard character.

`GLOB_ALTDIRFUNC'
     Instead of the using the using the normal functions for accessing
     the filesystem the `glob' implementation uses the user-supplied
     functions specified in the structure pointed to by PGLOB
     parameter.  For more information about the functions refer to the
     sections about directory handling see *Note Accessing
     Directories::, and *Note Reading Attributes::.

`GLOB_BRACE'
     If this flag is given the handling of braces in the pattern is
     changed.  It is now required that braces appear correctly grouped.
     I.e., for each opening brace there must be a closing one.  Braces
     can be used recursively.  So it is possible to define one brace
     expression in another one.  It is important to note that the range
     of each brace expression is completely contained in the outer
     brace expression (if there is one).

     The string between the matching braces is separated into single
     expressions by splitting at `,' (comma) characters.  The commas
     themselves are discarded.  Please note what we said above about
     recursive brace expressions.  The commas used to separate the
     subexpressions must be at the same level.  Commas in brace
     subexpressions are not matched.  They are used during expansion of
     the brace expression of the deeper level.  The example below shows
     this

          glob ("{foo/{,bar,biz},baz}", GLOB_BRACE, NULL, &result)

     is equivalent to the sequence

          glob ("foo/", GLOB_BRACE, NULL, &result)
          glob ("foo/bar", GLOB_BRACE|GLOB_APPEND, NULL, &result)
          glob ("foo/biz", GLOB_BRACE|GLOB_APPEND, NULL, &result)
          glob ("baz", GLOB_BRACE|GLOB_APPEND, NULL, &result)

     if we leave aside error handling.

`GLOB_NOMAGIC'
     If the pattern contains no wildcard constructs (it is a literal
     file name), return it as the sole "matching" word, even if no file
     exists by that name.

`GLOB_TILDE'
     If this flag is used the character `~' (tilde) is handled special
     if it appears at the beginning of the pattern.  Instead of being
     taken verbatim it is used to represent the home directory of a
     known user.

     If `~' is the only character in pattern or it is followed by a `/'
     (slash), the home directory of the process owner is substituted.
     Using `getlogin' and `getpwnam' the information is read from the
     system databases.  As an example take user `bart' with his home
     directory at `/home/bart'.  For him a call like

          glob ("~/bin/*", GLOB_TILDE, NULL, &result)

     would return the contents of the directory `/home/bart/bin'.
     Instead of referring to the own home directory it is also possible
     to name the home directory of other users.  To do so one has to
     append the user name after the tilde character.  So the contents
     of user `homer''s `bin' directory can be retrieved by

          glob ("~homer/bin/*", GLOB_TILDE, NULL, &result)

     If the user name is not valid or the home directory cannot be
     determined for some reason the pattern is left untouched and
     itself used as the result.  I.e., if in the last example `home' is
     not available the tilde expansion yields to `"~homer/bin/*"' and
     `glob' is not looking for a directory named `~homer'.

     This functionality is equivalent to what is available in C-shells
     if the `nonomatch' flag is set.

`GLOB_TILDE_CHECK'
     If this flag is used `glob' behaves like as if `GLOB_TILDE' is
     given.  The only difference is that if the user name is not
     available or the home directory cannot be determined for other
     reasons this leads to an error.  `glob' will return `GLOB_NOMATCH'
     instead of using the pattern itself as the name.

     This functionality is equivalent to what is available in C-shells
     if `nonomatch' flag is not set.

`GLOB_ONLYDIR'
     If this flag is used the globbing function takes this as a *hint*
     that the caller is only interested in directories matching the
     pattern.  If the information about the type of the file is easily
     available non-directories will be rejected but no extra work will
     be done to determine the information for each file.  I.e., the
     caller must still be able to filter directories out.

     This functionality is only available with the GNU `glob'
     implementation.  It is mainly used internally to increase the
     performance but might be useful for a user as well and therefore is
     documented here.

   Calling `glob' will in most cases allocate resources which are used
to represent the result of the function call.  If the same object of
type `glob_t' is used in multiple call to `glob' the resources are
freed or reused so that no leaks appear.  But this does not include the
time when all `glob' calls are done.

 - Function: void globfree (glob_t *PGLOB)
     The `globfree' function frees all resources allocated by previous
     calls to `glob' associated with the object pointed to by PGLOB.
     This function should be called whenever the currently used
     `glob_t' typed object isn't used anymore.

 - Function: void globfree64 (glob64_t *PGLOB)
     This function is equivalent to `globfree' but it frees records of
     type `glob64_t' which were allocated by `glob64'.


File: libc.info,  Node: Regular Expressions,  Next: Word Expansion,  Prev: Globbing,  Up: Pattern Matching

Regular Expression Matching
===========================

The GNU C library supports two interfaces for matching regular
expressions.  One is the standard POSIX.2 interface, and the other is
what the GNU system has had for many years.

   Both interfaces are declared in the header file `regex.h'.  If you
define `_POSIX_C_SOURCE', then only the POSIX.2 functions, structures,
and constants are declared.

* Menu:

* POSIX Regexp Compilation::    Using `regcomp' to prepare to match.
* Flags for POSIX Regexps::     Syntax variations for `regcomp'.
* Matching POSIX Regexps::      Using `regexec' to match the compiled
				   pattern that you get from `regcomp'.
* Regexp Subexpressions::       Finding which parts of the string were matched.
* Subexpression Complications:: Find points of which parts were matched.
* Regexp Cleanup::		Freeing storage; reporting errors.


File: libc.info,  Node: POSIX Regexp Compilation,  Next: Flags for POSIX Regexps,  Up: Regular Expressions

POSIX Regular Expression Compilation
------------------------------------

Before you can actually match a regular expression, you must "compile"
it.  This is not true compilation--it produces a special data
structure, not machine instructions.  But it is like ordinary
compilation in that its purpose is to enable you to "execute" the
pattern fast.  (*Note Matching POSIX Regexps::, for how to use the
compiled regular expression for matching.)

   There is a special data type for compiled regular expressions:

 - Data Type: regex_t
     This type of object holds a compiled regular expression.  It is
     actually a structure.  It has just one field that your programs
     should look at:

    `re_nsub'
          This field holds the number of parenthetical subexpressions
          in the regular expression that was compiled.

     There are several other fields, but we don't describe them here,
     because only the functions in the library should use them.

   After you create a `regex_t' object, you can compile a regular
expression into it by calling `regcomp'.

 - Function: int regcomp (regex_t *COMPILED, const char *PATTERN, int
          CFLAGS)
     The function `regcomp' "compiles" a regular expression into a data
     structure that you can use with `regexec' to match against a
     string.  The compiled regular expression format is designed for
     efficient matching.  `regcomp' stores it into `*COMPILED'.

     It's up to you to allocate an object of type `regex_t' and pass its
     address to `regcomp'.

     The argument CFLAGS lets you specify various options that control
     the syntax and semantics of regular expressions.  *Note Flags for
     POSIX Regexps::.

     If you use the flag `REG_NOSUB', then `regcomp' omits from the
     compiled regular expression the information necessary to record
     how subexpressions actually match.  In this case, you might as well
     pass `0' for the MATCHPTR and NMATCH arguments when you call
     `regexec'.

     If you don't use `REG_NOSUB', then the compiled regular expression
     does have the capacity to record how subexpressions match.  Also,
     `regcomp' tells you how many subexpressions PATTERN has, by
     storing the number in `COMPILED->re_nsub'.  You can use that value
     to decide how long an array to allocate to hold information about
     subexpression matches.

     `regcomp' returns `0' if it succeeds in compiling the regular
     expression; otherwise, it returns a nonzero error code (see the
     table below).  You can use `regerror' to produce an error message
     string describing the reason for a nonzero value; see *Note Regexp
     Cleanup::.


   Here are the possible nonzero values that `regcomp' can return:

`REG_BADBR'
     There was an invalid `\{...\}' construct in the regular
     expression.  A valid `\{...\}' construct must contain either a
     single number, or two numbers in increasing order separated by a
     comma.

`REG_BADPAT'
     There was a syntax error in the regular expression.

`REG_BADRPT'
     A repetition operator such as `?' or `*' appeared in a bad
     position (with no preceding subexpression to act on).

`REG_ECOLLATE'
     The regular expression referred to an invalid collating element
     (one not defined in the current locale for string collation).
     *Note Locale Categories::.

`REG_ECTYPE'
     The regular expression referred to an invalid character class name.

`REG_EESCAPE'
     The regular expression ended with `\'.

`REG_ESUBREG'
     There was an invalid number in the `\DIGIT' construct.

`REG_EBRACK'
     There were unbalanced square brackets in the regular expression.

`REG_EPAREN'
     An extended regular expression had unbalanced parentheses, or a
     basic regular expression had unbalanced `\(' and `\)'.

`REG_EBRACE'
     The regular expression had unbalanced `\{' and `\}'.

`REG_ERANGE'
     One of the endpoints in a range expression was invalid.

`REG_ESPACE'
     `regcomp' ran out of memory.


File: libc.info,  Node: Flags for POSIX Regexps,  Next: Matching POSIX Regexps,  Prev: POSIX Regexp Compilation,  Up: Regular Expressions

Flags for POSIX Regular Expressions
-----------------------------------

These are the bit flags that you can use in the CFLAGS operand when
compiling a regular expression with `regcomp'.

`REG_EXTENDED'
     Treat the pattern as an extended regular expression, rather than
     as a basic regular expression.

`REG_ICASE'
     Ignore case when matching letters.

`REG_NOSUB'
     Don't bother storing the contents of the MATCHES-PTR array.

`REG_NEWLINE'
     Treat a newline in STRING as dividing STRING into multiple lines,
     so that `$' can match before the newline and `^' can match after.
     Also, don't permit `.' to match a newline, and don't permit
     `[^...]' to match a newline.

     Otherwise, newline acts like any other ordinary character.


File: libc.info,  Node: Matching POSIX Regexps,  Next: Regexp Subexpressions,  Prev: Flags for POSIX Regexps,  Up: Regular Expressions

Matching a Compiled POSIX Regular Expression
--------------------------------------------

Once you have compiled a regular expression, as described in *Note
POSIX Regexp Compilation::, you can match it against strings using
`regexec'.  A match anywhere inside the string counts as success,
unless the regular expression contains anchor characters (`^' or `$').

 - Function: int regexec (regex_t *COMPILED, char *STRING, size_t
          NMATCH, regmatch_t MATCHPTR [], int EFLAGS)
     This function tries to match the compiled regular expression
     `*COMPILED' against STRING.

     `regexec' returns `0' if the regular expression matches;
     otherwise, it returns a nonzero value.  See the table below for
     what nonzero values mean.  You can use `regerror' to produce an
     error message string describing the reason for a nonzero value;
     see *Note Regexp Cleanup::.

     The argument EFLAGS is a word of bit flags that enable various
     options.

     If you want to get information about what part of STRING actually
     matched the regular expression or its subexpressions, use the
     arguments MATCHPTR and NMATCH.  Otherwise, pass `0' for NMATCH,
     and `NULL' for MATCHPTR.  *Note Regexp Subexpressions::.

   You must match the regular expression with the same set of current
locales that were in effect when you compiled the regular expression.

   The function `regexec' accepts the following flags in the EFLAGS
argument:

`REG_NOTBOL'
     Do not regard the beginning of the specified string as the
     beginning of a line; more generally, don't make any assumptions
     about what text might precede it.

`REG_NOTEOL'
     Do not regard the end of the specified string as the end of a
     line; more generally, don't make any assumptions about what text
     might follow it.

   Here are the possible nonzero values that `regexec' can return:

`REG_NOMATCH'
     The pattern didn't match the string.  This isn't really an error.

`REG_ESPACE'
     `regexec' ran out of memory.


File: libc.info,  Node: Regexp Subexpressions,  Next: Subexpression Complications,  Prev: Matching POSIX Regexps,  Up: Regular Expressions

Match Results with Subexpressions
---------------------------------

When `regexec' matches parenthetical subexpressions of PATTERN, it
records which parts of STRING they match.  It returns that information
by storing the offsets into an array whose elements are structures of
type `regmatch_t'.  The first element of the array (index `0') records
the part of the string that matched the entire regular expression.
Each other element of the array records the beginning and end of the
part that matched a single parenthetical subexpression.

 - Data Type: regmatch_t
     This is the data type of the MATCHARRAY array that you pass to
     `regexec'.  It contains two structure fields, as follows:

    `rm_so'
          The offset in STRING of the beginning of a substring.  Add
          this value to STRING to get the address of that part.

    `rm_eo'
          The offset in STRING of the end of the substring.

 - Data Type: regoff_t
     `regoff_t' is an alias for another signed integer type.  The
     fields of `regmatch_t' have type `regoff_t'.

   The `regmatch_t' elements correspond to subexpressions positionally;
the first element (index `1') records where the first subexpression
matched, the second element records the second subexpression, and so
on.  The order of the subexpressions is the order in which they begin.

   When you call `regexec', you specify how long the MATCHPTR array is,
with the NMATCH argument.  This tells `regexec' how many elements to
store.  If the actual regular expression has more than NMATCH
subexpressions, then you won't get offset information about the rest of
them.  But this doesn't alter whether the pattern matches a particular
string or not.

   If you don't want `regexec' to return any information about where
the subexpressions matched, you can either supply `0' for NMATCH, or
use the flag `REG_NOSUB' when you compile the pattern with `regcomp'.


File: libc.info,  Node: Subexpression Complications,  Next: Regexp Cleanup,  Prev: Regexp Subexpressions,  Up: Regular Expressions

Complications in Subexpression Matching
---------------------------------------

Sometimes a subexpression matches a substring of no characters.  This
happens when `f\(o*\)' matches the string `fum'.  (It really matches
just the `f'.)  In this case, both of the offsets identify the point in
the string where the null substring was found.  In this example, the
offsets are both `1'.

   Sometimes the entire regular expression can match without using some
of its subexpressions at all--for example, when `ba\(na\)*' matches the
string `ba', the parenthetical subexpression is not used.  When this
happens, `regexec' stores `-1' in both fields of the element for that
subexpression.

   Sometimes matching the entire regular expression can match a
particular subexpression more than once--for example, when `ba\(na\)*'
matches the string `bananana', the parenthetical subexpression matches
three times.  When this happens, `regexec' usually stores the offsets
of the last part of the string that matched the subexpression.  In the
case of `bananana', these offsets are `6' and `8'.

   But the last match is not always the one that is chosen.  It's more
accurate to say that the last _opportunity_ to match is the one that
takes precedence.  What this means is that when one subexpression
appears within another, then the results reported for the inner
subexpression reflect whatever happened on the last match of the outer
subexpression.  For an example, consider `\(ba\(na\)*s \)*' matching
the string `bananas bas '.  The last time the inner expression actually
matches is near the end of the first word.  But it is _considered_
again in the second word, and fails to match there.  `regexec' reports
nonuse of the "na" subexpression.

   Another place where this rule applies is when the regular expression
     \(ba\(na\)*s \|nefer\(ti\)* \)*

matches `bananas nefertiti'.  The "na" subexpression does match in the
first word, but it doesn't match in the second word because the other
alternative is used there.  Once again, the second repetition of the
outer subexpression overrides the first, and within that second
repetition, the "na" subexpression is not used.  So `regexec' reports
nonuse of the "na" subexpression.


File: libc.info,  Node: Regexp Cleanup,  Prev: Subexpression Complications,  Up: Regular Expressions

POSIX Regexp Matching Cleanup
-----------------------------

When you are finished using a compiled regular expression, you can free
the storage it uses by calling `regfree'.

 - Function: void regfree (regex_t *COMPILED)
     Calling `regfree' frees all the storage that `*COMPILED' points
     to.  This includes various internal fields of the `regex_t'
     structure that aren't documented in this manual.

     `regfree' does not free the object `*COMPILED' itself.

   You should always free the space in a `regex_t' structure with
`regfree' before using the structure to compile another regular
expression.

   When `regcomp' or `regexec' reports an error, you can use the
function `regerror' to turn it into an error message string.

 - Function: size_t regerror (int ERRCODE, regex_t *COMPILED, char
          *BUFFER, size_t LENGTH)
     This function produces an error message string for the error code
     ERRCODE, and stores the string in LENGTH bytes of memory starting
     at BUFFER.  For the COMPILED argument, supply the same compiled
     regular expression structure that `regcomp' or `regexec' was
     working with when it got the error.  Alternatively, you can supply
     `NULL' for COMPILED; you will still get a meaningful error
     message, but it might not be as detailed.

     If the error message can't fit in LENGTH bytes (including a
     terminating null character), then `regerror' truncates it.  The
     string that `regerror' stores is always null-terminated even if it
     has been truncated.

     The return value of `regerror' is the minimum length needed to
     store the entire error message.  If this is less than LENGTH, then
     the error message was not truncated, and you can use it.
     Otherwise, you should call `regerror' again with a larger buffer.

     Here is a function which uses `regerror', but always dynamically
     allocates a buffer for the error message:

          char *get_regerror (int errcode, regex_t *compiled)
          {
            size_t length = regerror (errcode, compiled, NULL, 0);
            char *buffer = xmalloc (length);
            (void) regerror (errcode, compiled, buffer, length);
            return buffer;
          }


File: libc.info,  Node: Word Expansion,  Prev: Regular Expressions,  Up: Pattern Matching

Shell-Style Word Expansion
==========================

"Word expansion" means the process of splitting a string into "words"
and substituting for variables, commands, and wildcards just as the
shell does.

   For example, when you write `ls -l foo.c', this string is split into
three separate words--`ls', `-l' and `foo.c'.  This is the most basic
function of word expansion.

   When you write `ls *.c', this can become many words, because the
word `*.c' can be replaced with any number of file names.  This is
called "wildcard expansion", and it is also a part of word expansion.

   When you use `echo $PATH' to print your path, you are taking
advantage of "variable substitution", which is also part of word
expansion.

   Ordinary programs can perform word expansion just like the shell by
calling the library function `wordexp'.

* Menu:

* Expansion Stages::            What word expansion does to a string.
* Calling Wordexp::             How to call `wordexp'.
* Flags for Wordexp::           Options you can enable in `wordexp'.
* Wordexp Example::             A sample program that does word expansion.
* Tilde Expansion::             Details of how tilde expansion works.
* Variable Substitution::       Different types of variable substitution.


File: libc.info,  Node: Expansion Stages,  Next: Calling Wordexp,  Up: Word Expansion

The Stages of Word Expansion
----------------------------

When word expansion is applied to a sequence of words, it performs the
following transformations in the order shown here:

  1. "Tilde expansion": Replacement of `~foo' with the name of the home
     directory of `foo'.

  2. Next, three different transformations are applied in the same step,
     from left to right:

        * "Variable substitution": Environment variables are
          substituted for references such as `$foo'.

        * "Command substitution": Constructs such as ``cat foo`' and
          the equivalent `$(cat foo)' are replaced with the output from
          the inner command.

        * "Arithmetic expansion": Constructs such as `$(($x-1))' are
          replaced with the result of the arithmetic computation.

  3. "Field splitting": subdivision of the text into "words".

  4. "Wildcard expansion": The replacement of a construct such as `*.c'
     with a list of `.c' file names.  Wildcard expansion applies to an
     entire word at a time, and replaces that word with 0 or more file
     names that are themselves words.

  5. "Quote removal": The deletion of string-quotes, now that they have
     done their job by inhibiting the above transformations when
     appropriate.

   For the details of these transformations, and how to write the
constructs that use them, see `The BASH Manual' (to appear).


File: libc.info,  Node: Calling Wordexp,  Next: Flags for Wordexp,  Prev: Expansion Stages,  Up: Word Expansion

Calling `wordexp'
-----------------

All the functions, constants and data types for word expansion are
declared in the header file `wordexp.h'.

   Word expansion produces a vector of words (strings).  To return this
vector, `wordexp' uses a special data type, `wordexp_t', which is a
structure.  You pass `wordexp' the address of the structure, and it
fills in the structure's fields to tell you about the results.

 - Data Type: wordexp_t
     This data type holds a pointer to a word vector.  More precisely,
     it records both the address of the word vector and its size.

    `we_wordc'
          The number of elements in the vector.

    `we_wordv'
          The address of the vector.  This field has type `char **'.

    `we_offs'
          The offset of the first real element of the vector, from its
          nominal address in the `we_wordv' field.  Unlike the other
          fields, this is always an input to `wordexp', rather than an
          output from it.

          If you use a nonzero offset, then that many elements at the
          beginning of the vector are left empty.  (The `wordexp'
          function fills them with null pointers.)

          The `we_offs' field is meaningful only if you use the
          `WRDE_DOOFFS' flag.  Otherwise, the offset is always zero
          regardless of what is in this field, and the first real
          element comes at the beginning of the vector.

 - Function: int wordexp (const char *WORDS, wordexp_t
          *WORD-VECTOR-PTR, int FLAGS)
     Perform word expansion on the string WORDS, putting the result in
     a newly allocated vector, and store the size and address of this
     vector into `*WORD-VECTOR-PTR'.  The argument FLAGS is a
     combination of bit flags; see *Note Flags for Wordexp::, for
     details of the flags.

     You shouldn't use any of the characters `|&;<>' in the string
     WORDS unless they are quoted; likewise for newline.  If you use
     these characters unquoted, you will get the `WRDE_BADCHAR' error
     code.  Don't use parentheses or braces unless they are quoted or
     part of a word expansion construct.  If you use quotation
     characters `'"`', they should come in pairs that balance.

     The results of word expansion are a sequence of words.  The
     function `wordexp' allocates a string for each resulting word, then
     allocates a vector of type `char **' to store the addresses of
     these strings.  The last element of the vector is a null pointer.
     This vector is called the "word vector".

     To return this vector, `wordexp' stores both its address and its
     length (number of elements, not counting the terminating null
     pointer) into `*WORD-VECTOR-PTR'.

     If `wordexp' succeeds, it returns 0.  Otherwise, it returns one of
     these error codes:

    `WRDE_BADCHAR'
          The input string WORDS contains an unquoted invalid character
          such as `|'.

    `WRDE_BADVAL'
          The input string refers to an undefined shell variable, and
          you used the flag `WRDE_UNDEF' to forbid such references.

    `WRDE_CMDSUB'
          The input string uses command substitution, and you used the
          flag `WRDE_NOCMD' to forbid command substitution.

    `WRDE_NOSPACE'
          It was impossible to allocate memory to hold the result.  In
          this case, `wordexp' can store part of the results--as much
          as it could allocate room for.

    `WRDE_SYNTAX'
          There was a syntax error in the input string.  For example,
          an unmatched quoting character is a syntax error.

 - Function: void wordfree (wordexp_t *WORD-VECTOR-PTR)
     Free the storage used for the word-strings and vector that
     `*WORD-VECTOR-PTR' points to.  This does not free the structure
     `*WORD-VECTOR-PTR' itself--only the other data it points to.


File: libc.info,  Node: Flags for Wordexp,  Next: Wordexp Example,  Prev: Calling Wordexp,  Up: Word Expansion

Flags for Word Expansion
------------------------

This section describes the flags that you can specify in the FLAGS
argument to `wordexp'.  Choose the flags you want, and combine them
with the C operator `|'.

`WRDE_APPEND'
     Append the words from this expansion to the vector of words
     produced by previous calls to `wordexp'.  This way you can
     effectively expand several words as if they were concatenated with
     spaces between them.

     In order for appending to work, you must not modify the contents
     of the word vector structure between calls to `wordexp'.  And, if
     you set `WRDE_DOOFFS' in the first call to `wordexp', you must also
     set it when you append to the results.

`WRDE_DOOFFS'
     Leave blank slots at the beginning of the vector of words.  The
     `we_offs' field says how many slots to leave.  The blank slots
     contain null pointers.

`WRDE_NOCMD'
     Don't do command substitution; if the input requests command
     substitution, report an error.

`WRDE_REUSE'
     Reuse a word vector made by a previous call to `wordexp'.  Instead
     of allocating a new vector of words, this call to `wordexp' will
     use the vector that already exists (making it larger if necessary).

     Note that the vector may move, so it is not safe to save an old
     pointer and use it again after calling `wordexp'.  You must fetch
     `we_pathv' anew after each call.

`WRDE_SHOWERR'
     Do show any error messages printed by commands run by command
     substitution.  More precisely, allow these commands to inherit the
     standard error output stream of the current process.  By default,
     `wordexp' gives these commands a standard error stream that
     discards all output.

`WRDE_UNDEF'
     If the input refers to a shell variable that is not defined,
     report an error.


File: libc.info,  Node: Wordexp Example,  Next: Tilde Expansion,  Prev: Flags for Wordexp,  Up: Word Expansion

`wordexp' Example
-----------------

Here is an example of using `wordexp' to expand several strings and use
the results to run a shell command.  It also shows the use of
`WRDE_APPEND' to concatenate the expansions and of `wordfree' to free
the space allocated by `wordexp'.

     int
     expand_and_execute (const char *program, const char **options)
     {
       wordexp_t result;
       pid_t pid
       int status, i;
     
       /* Expand the string for the program to run.  */
       switch (wordexp (program, &result, 0))
         {
         case 0:			/* Successful.  */
           break;
         case WRDE_NOSPACE:
           /* If the error was `WRDE_NOSPACE',
              then perhaps part of the result was allocated.  */
           wordfree (&result);
         default:                    /* Some other error.  */
           return -1;
         }
     
       /* Expand the strings specified for the arguments.  */
       for (i = 0; options[i] != NULL; i++)
         {
           if (wordexp (options[i], &result, WRDE_APPEND))
             {
               wordfree (&result);
               return -1;
             }
         }
     
       pid = fork ();
       if (pid == 0)
         {
           /* This is the child process.  Execute the command. */
           execv (result.we_wordv[0], result.we_wordv);
           exit (EXIT_FAILURE);
         }
       else if (pid < 0)
         /* The fork failed.  Report failure.  */
         status = -1;
       else
         /* This is the parent process.  Wait for the child to complete.  */
         if (waitpid (pid, &status, 0) != pid)
           status = -1;
     
       wordfree (&result);
       return status;
     }


File: libc.info,  Node: Tilde Expansion,  Next: Variable Substitution,  Prev: Wordexp Example,  Up: Word Expansion

Details of Tilde Expansion
--------------------------

It's a standard part of shell syntax that you can use `~' at the
beginning of a file name to stand for your own home directory.  You can
use `~USER' to stand for USER's home directory.

   "Tilde expansion" is the process of converting these abbreviations
to the directory names that they stand for.

   Tilde expansion applies to the `~' plus all following characters up
to whitespace or a slash.  It takes place only at the beginning of a
word, and only if none of the characters to be transformed is quoted in
any way.

   Plain `~' uses the value of the environment variable `HOME' as the
proper home directory name.  `~' followed by a user name uses
`getpwname' to look up that user in the user database, and uses
whatever directory is recorded there.  Thus, `~' followed by your own
name can give different results from plain `~', if the value of `HOME'
is not really your home directory.


File: libc.info,  Node: Variable Substitution,  Prev: Tilde Expansion,  Up: Word Expansion

Details of Variable Substitution
--------------------------------

Part of ordinary shell syntax is the use of `$VARIABLE' to substitute
the value of a shell variable into a command.  This is called "variable
substitution", and it is one part of doing word expansion.

   There are two basic ways you can write a variable reference for
substitution:

`${VARIABLE}'
     If you write braces around the variable name, then it is completely
     unambiguous where the variable name ends.  You can concatenate
     additional letters onto the end of the variable value by writing
     them immediately after the close brace.  For example, `${foo}s'
     expands into `tractors'.

`$VARIABLE'
     If you do not put braces around the variable name, then the
     variable name consists of all the alphanumeric characters and
     underscores that follow the `$'.  The next punctuation character
     ends the variable name.  Thus, `$foo-bar' refers to the variable
     `foo' and expands into `tractor-bar'.

   When you use braces, you can also use various constructs to modify
the value that is substituted, or test it in various ways.

`${VARIABLE:-DEFAULT}'
     Substitute the value of VARIABLE, but if that is empty or
     undefined, use DEFAULT instead.

`${VARIABLE:=DEFAULT}'
     Substitute the value of VARIABLE, but if that is empty or
     undefined, use DEFAULT instead and set the variable to DEFAULT.

`${VARIABLE:?MESSAGE}'
     If VARIABLE is defined and not empty, substitute its value.

     Otherwise, print MESSAGE as an error message on the standard error
     stream, and consider word expansion a failure.

`${VARIABLE:+REPLACEMENT}'
     Substitute REPLACEMENT, but only if VARIABLE is defined and
     nonempty.  Otherwise, substitute nothing for this construct.

`${#VARIABLE}'
     Substitute a numeral which expresses in base ten the number of
     characters in the value of VARIABLE.  `${#foo}' stands for `7',
     because `tractor' is seven characters.

   These variants of variable substitution let you remove part of the
variable's value before substituting it.  The PREFIX and SUFFIX are not
mere strings; they are wildcard patterns, just like the patterns that
you use to match multiple file names.  But in this context, they match
against parts of the variable value rather than against file names.

`${VARIABLE%%SUFFIX}'
     Substitute the value of VARIABLE, but first discard from that
     variable any portion at the end that matches the pattern SUFFIX.

     If there is more than one alternative for how to match against
     SUFFIX, this construct uses the longest possible match.

     Thus, `${foo%%r*}' substitutes `t', because the largest match for
     `r*' at the end of `tractor' is `ractor'.

`${VARIABLE%SUFFIX}'
     Substitute the value of VARIABLE, but first discard from that
     variable any portion at the end that matches the pattern SUFFIX.

     If there is more than one alternative for how to match against
     SUFFIX, this construct uses the shortest possible alternative.

     Thus, `${foo%r*}' substitutes `tracto', because the shortest match
     for `r*' at the end of `tractor' is just `r'.

`${VARIABLE##PREFIX}'
     Substitute the value of VARIABLE, but first discard from that
     variable any portion at the beginning that matches the pattern
     PREFIX.

     If there is more than one alternative for how to match against
     PREFIX, this construct uses the longest possible match.

     Thus, `${foo##*t}' substitutes `or', because the largest match for
     `*t' at the beginning of `tractor' is `tract'.

`${VARIABLE#PREFIX}'
     Substitute the value of VARIABLE, but first discard from that
     variable any portion at the beginning that matches the pattern
     PREFIX.

     If there is more than one alternative for how to match against
     PREFIX, this construct uses the shortest possible alternative.

     Thus, `${foo#*t}' substitutes `ractor', because the shortest match
     for `*t' at the beginning of `tractor' is just `t'.



File: libc.info,  Node: I/O Overview,  Next: I/O on Streams,  Prev: Pattern Matching,  Up: Top

Input/Output Overview
*********************

Most programs need to do either input (reading data) or output (writing
data), or most frequently both, in order to do anything useful.  The GNU
C library provides such a large selection of input and output functions
that the hardest part is often deciding which function is most
appropriate!

   This chapter introduces concepts and terminology relating to input
and output.  Other chapters relating to the GNU I/O facilities are:

   * *Note I/O on Streams::, which covers the high-level functions that
     operate on streams, including formatted input and output.

   * *Note Low-Level I/O::, which covers the basic I/O and control
     functions on file descriptors.

   * *Note File System Interface::, which covers functions for
     operating on directories and for manipulating file attributes such
     as access modes and ownership.

   * *Note Pipes and FIFOs::, which includes information on the basic
     interprocess communication facilities.

   * *Note Sockets::, which covers a more complicated interprocess
     communication facility with support for networking.

   * *Note Low-Level Terminal Interface::, which covers functions for
     changing how input and output to terminals or other serial devices
     are processed.

* Menu:

* I/O Concepts::       Some basic information and terminology.
* File Names::         How to refer to a file.


File: libc.info,  Node: I/O Concepts,  Next: File Names,  Up: I/O Overview

Input/Output Concepts
=====================

Before you can read or write the contents of a file, you must establish
a connection or communications channel to the file.  This process is
called "opening" the file.  You can open a file for reading, writing,
or both.

   The connection to an open file is represented either as a stream or
as a file descriptor.  You pass this as an argument to the functions
that do the actual read or write operations, to tell them which file to
operate on.  Certain functions expect streams, and others are designed
to operate on file descriptors.

   When you have finished reading to or writing from the file, you can
terminate the connection by "closing" the file.  Once you have closed a
stream or file descriptor, you cannot do any more input or output
operations on it.

* Menu:

* Streams and File Descriptors::    The GNU Library provides two ways
			             to access the contents of files.
* File Position::                   The number of bytes from the
                                     beginning of the file.


File: libc.info,  Node: Streams and File Descriptors,  Next: File Position,  Up: I/O Concepts

Streams and File Descriptors
----------------------------

When you want to do input or output to a file, you have a choice of two
basic mechanisms for representing the connection between your program
and the file: file descriptors and streams.  File descriptors are
represented as objects of type `int', while streams are represented as
`FILE *' objects.

   File descriptors provide a primitive, low-level interface to input
and output operations.  Both file descriptors and streams can represent
a connection to a device (such as a terminal), or a pipe or socket for
communicating with another process, as well as a normal file.  But, if
you want to do control operations that are specific to a particular kind
of device, you must use a file descriptor; there are no facilities to
use streams in this way.  You must also use file descriptors if your
program needs to do input or output in special modes, such as
nonblocking (or polled) input (*note File Status Flags::).

   Streams provide a higher-level interface, layered on top of the
primitive file descriptor facilities.  The stream interface treats all
kinds of files pretty much alike--the sole exception being the three
styles of buffering that you can choose (*note Stream Buffering::).

   The main advantage of using the stream interface is that the set of
functions for performing actual input and output operations (as opposed
to control operations) on streams is much richer and more powerful than
the corresponding facilities for file descriptors.  The file descriptor
interface provides only simple functions for transferring blocks of
characters, but the stream interface also provides powerful formatted
input and output functions (`printf' and `scanf') as well as functions
for character- and line-oriented input and output.

   Since streams are implemented in terms of file descriptors, you can
extract the file descriptor from a stream and perform low-level
operations directly on the file descriptor.  You can also initially open
a connection as a file descriptor and then make a stream associated with
that file descriptor.

   In general, you should stick with using streams rather than file
descriptors, unless there is some specific operation you want to do that
can only be done on a file descriptor.  If you are a beginning
programmer and aren't sure what functions to use, we suggest that you
concentrate on the formatted input functions (*note Formatted Input::)
and formatted output functions (*note Formatted Output::).

   If you are concerned about portability of your programs to systems
other than GNU, you should also be aware that file descriptors are not
as portable as streams.  You can expect any system running ISO C to
support streams, but non-GNU systems may not support file descriptors at
all, or may only implement a subset of the GNU functions that operate on
file descriptors.  Most of the file descriptor functions in the GNU
library are included in the POSIX.1 standard, however.


File: libc.info,  Node: File Position,  Prev: Streams and File Descriptors,  Up: I/O Concepts

File Position
-------------

One of the attributes of an open file is its "file position" that keeps
track of where in the file the next character is to be read or written.
In the GNU system, and all POSIX.1 systems, the file position is
simply an integer representing the number of bytes from the beginning
of the file.

   The file position is normally set to the beginning of the file when
it is opened, and each time a character is read or written, the file
position is incremented.  In other words, access to the file is normally
"sequential".

   Ordinary files permit read or write operations at any position within
the file.  Some other kinds of files may also permit this.  Files which
do permit this are sometimes referred to as "random-access" files.  You
can change the file position using the `fseek' function on a stream
(*note File Positioning::) or the `lseek' function on a file descriptor
(*note I/O Primitives::).  If you try to change the file position on a
file that doesn't support random access, you get the `ESPIPE' error.

   Streams and descriptors that are opened for "append access" are
treated specially for output: output to such files is _always_ appended
sequentially to the _end_ of the file, regardless of the file position.
However, the file position is still used to control where in the file
reading is done.

   If you think about it, you'll realize that several programs can read
a given file at the same time.  In order for each program to be able to
read the file at its own pace, each program must have its own file
pointer, which is not affected by anything the other programs do.

   In fact, each opening of a file creates a separate file position.
Thus, if you open a file twice even in the same program, you get two
streams or descriptors with independent file positions.

   By contrast, if you open a descriptor and then duplicate it to get
another descriptor, these two descriptors share the same file position:
changing the file position of one descriptor will affect the other.


File: libc.info,  Node: File Names,  Prev: I/O Concepts,  Up: I/O Overview

File Names
==========

In order to open a connection to a file, or to perform other operations
such as deleting a file, you need some way to refer to the file.  Nearly
all files have names that are strings--even files which are actually
devices such as tape drives or terminals.  These strings are called
"file names".  You specify the file name to say which file you want to
open or operate on.

   This section describes the conventions for file names and how the
operating system works with them.

* Menu:

* Directories::                 Directories contain entries for files.
* File Name Resolution::        A file name specifies how to look up a file.
* File Name Errors::            Error conditions relating to file names.
* File Name Portability::       File name portability and syntax issues.


File: libc.info,  Node: Directories,  Next: File Name Resolution,  Up: File Names

Directories
-----------

In order to understand the syntax of file names, you need to understand
how the file system is organized into a hierarchy of directories.

   A "directory" is a file that contains information to associate other
files with names; these associations are called "links" or "directory
entries".  Sometimes, people speak of "files in a directory", but in
reality, a directory only contains pointers to files, not the files
themselves.

   The name of a file contained in a directory entry is called a "file
name component".  In general, a file name consists of a sequence of one
or more such components, separated by the slash character (`/').  A
file name which is just one component names a file with respect to its
directory.  A file name with multiple components names a directory, and
then a file in that directory, and so on.

   Some other documents, such as the POSIX standard, use the term
"pathname" for what we call a file name, and either "filename" or
"pathname component" for what this manual calls a file name component.
We don't use this terminology because a "path" is something completely
different (a list of directories to search), and we think that
"pathname" used for something else will confuse users.  We always use
"file name" and "file name component" (or sometimes just "component",
where the context is obvious) in GNU documentation.  Some macros use
the POSIX terminology in their names, such as `PATH_MAX'.  These macros
are defined by the POSIX standard, so we cannot change their names.

   You can find more detailed information about operations on
directories in *Note File System Interface::.


File: libc.info,  Node: File Name Resolution,  Next: File Name Errors,  Prev: Directories,  Up: File Names

File Name Resolution
--------------------

A file name consists of file name components separated by slash (`/')
characters.  On the systems that the GNU C library supports, multiple
successive `/' characters are equivalent to a single `/' character.

   The process of determining what file a file name refers to is called
"file name resolution".  This is performed by examining the components
that make up a file name in left-to-right order, and locating each
successive component in the directory named by the previous component.
Of course, each of the files that are referenced as directories must
actually exist, be directories instead of regular files, and have the
appropriate permissions to be accessible by the process; otherwise the
file name resolution fails.

   If a file name begins with a `/', the first component in the file
name is located in the "root directory" of the process (usually all
processes on the system have the same root directory).  Such a file name
is called an "absolute file name".

   Otherwise, the first component in the file name is located in the
current working directory (*note Working Directory::).  This kind of
file name is called a "relative file name".

   The file name components `.' ("dot") and `..' ("dot-dot") have
special meanings.  Every directory has entries for these file name
components.  The file name component `.' refers to the directory
itself, while the file name component `..' refers to its "parent
directory" (the directory that contains the link for the directory in
question).  As a special case, `..' in the root directory refers to the
root directory itself, since it has no parent; thus `/..' is the same
as `/'.

   Here are some examples of file names:

`/a'
     The file named `a', in the root directory.

`/a/b'
     The file named `b', in the directory named `a' in the root
     directory.

`a'
     The file named `a', in the current working directory.

`/a/./b'
     This is the same as `/a/b'.

`./a'
     The file named `a', in the current working directory.

`../a'
     The file named `a', in the parent directory of the current working
     directory.

   A file name that names a directory may optionally end in a `/'.  You
can specify a file name of `/' to refer to the root directory, but the
empty string is not a meaningful file name.  If you want to refer to
the current working directory, use a file name of `.' or `./'.

   Unlike some other operating systems, the GNU system doesn't have any
built-in support for file types (or extensions) or file versions as part
of its file name syntax.  Many programs and utilities use conventions
for file names--for example, files containing C source code usually
have names suffixed with `.c'--but there is nothing in the file system
itself that enforces this kind of convention.


File: libc.info,  Node: File Name Errors,  Next: File Name Portability,  Prev: File Name Resolution,  Up: File Names

File Name Errors
----------------

Functions that accept file name arguments usually detect these `errno'
error conditions relating to the file name syntax or trouble finding
the named file.  These errors are referred to throughout this manual as
the "usual file name errors".

`EACCES'
     The process does not have search permission for a directory
     component of the file name.

`ENAMETOOLONG'
     This error is used when either the total length of a file name is
     greater than `PATH_MAX', or when an individual file name component
     has a length greater than `NAME_MAX'.  *Note Limits for Files::.

     In the GNU system, there is no imposed limit on overall file name
     length, but some file systems may place limits on the length of a
     component.

`ENOENT'
     This error is reported when a file referenced as a directory
     component in the file name doesn't exist, or when a component is a
     symbolic link whose target file does not exist.  *Note Symbolic
     Links::.

`ENOTDIR'
     A file that is referenced as a directory component in the file name
     exists, but it isn't a directory.

`ELOOP'
     Too many symbolic links were resolved while trying to look up the
     file name.  The system has an arbitrary limit on the number of
     symbolic links that may be resolved in looking up a single file
     name, as a primitive way to detect loops.  *Note Symbolic Links::.


File: libc.info,  Node: File Name Portability,  Prev: File Name Errors,  Up: File Names

Portability of File Names
-------------------------

The rules for the syntax of file names discussed in *Note File Names::,
are the rules normally used by the GNU system and by other POSIX
systems.  However, other operating systems may use other conventions.

   There are two reasons why it can be important for you to be aware of
file name portability issues:

   * If your program makes assumptions about file name syntax, or
     contains embedded literal file name strings, it is more difficult
     to get it to run under other operating systems that use different
     syntax conventions.

   * Even if you are not concerned about running your program on
     machines that run other operating systems, it may still be
     possible to access files that use different naming conventions.
     For example, you may be able to access file systems on another
     computer running a different operating system over a network, or
     read and write disks in formats used by other operating systems.

   The ISO C standard says very little about file name syntax, only that
file names are strings.  In addition to varying restrictions on the
length of file names and what characters can validly appear in a file
name, different operating systems use different conventions and syntax
for concepts such as structured directories and file types or
extensions.  Some concepts such as file versions might be supported in
some operating systems and not by others.

   The POSIX.1 standard allows implementations to put additional
restrictions on file name syntax, concerning what characters are
permitted in file names and on the length of file name and file name
component strings.  However, in the GNU system, you do not need to worry
about these restrictions; any character except the null character is
permitted in a file name string, and there are no limits on the length
of file name strings.


File: libc.info,  Node: I/O on Streams,  Next: Low-Level I/O,  Prev: I/O Overview,  Up: Top

Input/Output on Streams
***********************

This chapter describes the functions for creating streams and performing
input and output operations on them.  As discussed in *Note I/O
Overview::, a stream is a fairly abstract, high-level concept
representing a communications channel to a file, device, or process.

* Menu:

* Streams::                     About the data type representing a stream.
* Standard Streams::            Streams to the standard input and output
                                 devices are created for you.
* Opening Streams::             How to create a stream to talk to a file.
* Closing Streams::             Close a stream when you are finished with it.
* Streams and Threads::         Issues with streams in threaded programs.
* Streams and I18N::            Streams in internationalized applications.
* Simple Output::               Unformatted output by characters and lines.
* Character Input::             Unformatted input by characters and words.
* Line Input::                  Reading a line or a record from a stream.
* Unreading::                   Peeking ahead/pushing back input just read.
* Block Input/Output::          Input and output operations on blocks of data.
* Formatted Output::            `printf' and related functions.
* Customizing Printf::          You can define new conversion specifiers for
                                 `printf' and friends.
* Formatted Input::             `scanf' and related functions.
* EOF and Errors::              How you can tell if an I/O error happens.
* Error Recovery::		What you can do about errors.
* Binary Streams::              Some systems distinguish between text files
                                 and binary files.
* File Positioning::            About random-access streams.
* Portable Positioning::        Random access on peculiar ISO C systems.
* Stream Buffering::            How to control buffering of streams.
* Other Kinds of Streams::      Streams that do not necessarily correspond
                                 to an open file.
* Formatted Messages::          Print strictly formatted messages.


File: libc.info,  Node: Streams,  Next: Standard Streams,  Up: I/O on Streams

Streams
=======

For historical reasons, the type of the C data structure that represents
a stream is called `FILE' rather than "stream".  Since most of the
library functions deal with objects of type `FILE *', sometimes the
term "file pointer" is also used to mean "stream".  This leads to
unfortunate confusion over terminology in many books on C.  This
manual, however, is careful to use the terms "file" and "stream" only
in the technical sense.

   The `FILE' type is declared in the header file `stdio.h'.

 - Data Type: FILE
     This is the data type used to represent stream objects.  A `FILE'
     object holds all of the internal state information about the
     connection to the associated file, including such things as the
     file position indicator and buffering information.  Each stream
     also has error and end-of-file status indicators that can be
     tested with the `ferror' and `feof' functions; see *Note EOF and
     Errors::.

   `FILE' objects are allocated and managed internally by the
input/output library functions.  Don't try to create your own objects of
type `FILE'; let the library do it.  Your programs should deal only
with pointers to these objects (that is, `FILE *' values) rather than
the objects themselves.


File: libc.info,  Node: Standard Streams,  Next: Opening Streams,  Prev: Streams,  Up: I/O on Streams

Standard Streams
================

When the `main' function of your program is invoked, it already has
three predefined streams open and available for use.  These represent
the "standard" input and output channels that have been established for
the process.

   These streams are declared in the header file `stdio.h'.

 - Variable: FILE * stdin
     The "standard input" stream, which is the normal source of input
     for the program.

 - Variable: FILE * stdout
     The "standard output" stream, which is used for normal output from
     the program.

 - Variable: FILE * stderr
     The "standard error" stream, which is used for error messages and
     diagnostics issued by the program.

   In the GNU system, you can specify what files or processes
correspond to these streams using the pipe and redirection facilities
provided by the shell.  (The primitives shells use to implement these
facilities are described in *Note File System Interface::.)  Most other
operating systems provide similar mechanisms, but the details of how to
use them can vary.

   In the GNU C library, `stdin', `stdout', and `stderr' are normal
variables which you can set just like any others.  For example, to
redirect the standard output to a file, you could do:

     fclose (stdout);
     stdout = fopen ("standard-output-file", "w");

   Note however, that in other systems `stdin', `stdout', and `stderr'
are macros that you cannot assign to in the normal way.  But you can
use `freopen' to get the effect of closing one and reopening it.  *Note
Opening Streams::.

   The three streams `stdin', `stdout', and `stderr' are not unoriented
at program start (*note Streams and I18N::).


File: libc.info,  Node: Opening Streams,  Next: Closing Streams,  Prev: Standard Streams,  Up: I/O on Streams

Opening Streams
===============

Opening a file with the `fopen' function creates a new stream and
establishes a connection between the stream and a file.  This may
involve creating a new file.

   Everything described in this section is declared in the header file
`stdio.h'.

 - Function: FILE * fopen (const char *FILENAME, const char *OPENTYPE)
     The `fopen' function opens a stream for I/O to the file FILENAME,
     and returns a pointer to the stream.

     The OPENTYPE argument is a string that controls how the file is
     opened and specifies attributes of the resulting stream.  It must
     begin with one of the following sequences of characters:

    `r'
          Open an existing file for reading only.

    `w'
          Open the file for writing only.  If the file already exists,
          it is truncated to zero length.  Otherwise a new file is
          created.

    `a'
          Open a file for append access; that is, writing at the end of
          file only.  If the file already exists, its initial contents
          are unchanged and output to the stream is appended to the end
          of the file.  Otherwise, a new, empty file is created.

    `r+'
          Open an existing file for both reading and writing.  The
          initial contents of the file are unchanged and the initial
          file position is at the beginning of the file.

    `w+'
          Open a file for both reading and writing.  If the file
          already exists, it is truncated to zero length.  Otherwise, a
          new file is created.

    `a+'
          Open or create file for both reading and appending.  If the
          file exists, its initial contents are unchanged.  Otherwise,
          a new file is created.  The initial file position for reading
          is at the beginning of the file, but output is always
          appended to the end of the file.

     As you can see, `+' requests a stream that can do both input and
     output.  The ISO standard says that when using such a stream, you
     must call `fflush' (*note Stream Buffering::) or a file positioning
     function such as `fseek' (*note File Positioning::) when switching
     from reading to writing or vice versa.  Otherwise, internal buffers
     might not be emptied properly.  The GNU C library does not have
     this limitation; you can do arbitrary reading and writing
     operations on a stream in whatever order.

     Additional characters may appear after these to specify flags for
     the call.  Always put the mode (`r', `w+', etc.) first; that is
     the only part you are guaranteed will be understood by all systems.

     The GNU C library defines one additional character for use in
     OPENTYPE: the character `x' insists on creating a new file--if a
     file FILENAME already exists, `fopen' fails rather than opening
     it.  If you use `x' you are guaranteed that you will not clobber
     an existing file.  This is equivalent to the `O_EXCL' option to
     the `open' function (*note Opening and Closing Files::).

     The character `b' in OPENTYPE has a standard meaning; it requests
     a binary stream rather than a text stream.  But this makes no
     difference in POSIX systems (including the GNU system).  If both
     `+' and `b' are specified, they can appear in either order.  *Note
     Binary Streams::.

     If the OPENTYPE string contains the sequence `,ccs=STRING' then
     STRING is taken as the name of a coded character set and `fopen'
     will mark the stream as wide-oriented which appropriate conversion
     functions in place to convert from and to the character set STRING
     is place.  Any other stream is opened initially unoriented and the
     orientation is decided with the first file operation.  If the
     first operation is a wide character operation, the stream is not
     only marked as wide-oriented, also the conversion functions to
     convert to the coded character set used for the current locale are
     loaded.  This will not change anymore from this point on even if
     the locale selected for the `LC_CTYPE' category is changed.

     Any other characters in OPENTYPE are simply ignored.  They may be
     meaningful in other systems.

     If the open fails, `fopen' returns a null pointer.

     When the sources are compiling with `_FILE_OFFSET_BITS == 64' on a
     32 bit machine this function is in fact `fopen64' since the LFS
     interface replaces transparently the old interface.

   You can have multiple streams (or file descriptors) pointing to the
same file open at the same time.  If you do only input, this works
straightforwardly, but you must be careful if any output streams are
included.  *Note Stream/Descriptor Precautions::.  This is equally true
whether the streams are in one program (not usual) or in several
programs (which can easily happen).  It may be advantageous to use the
file locking facilities to avoid simultaneous access.  *Note File
Locks::.

 - Function: FILE * fopen64 (const char *FILENAME, const char *OPENTYPE)
     This function is similar to `fopen' but the stream it returns a
     pointer for is opened using `open64'.  Therefore this stream can be
     used even on files larger then 2^31 bytes on 32 bit machines.

     Please note that the return type is still `FILE *'.  There is no
     special `FILE' type for the LFS interface.

     If the sources are compiled with `_FILE_OFFSET_BITS == 64' on a 32
     bits machine this function is available under the name `fopen' and
     so transparently replaces the old interface.

 - Macro: int FOPEN_MAX
     The value of this macro is an integer constant expression that
     represents the minimum number of streams that the implementation
     guarantees can be open simultaneously.  You might be able to open
     more than this many streams, but that is not guaranteed.  The
     value of this constant is at least eight, which includes the three
     standard streams `stdin', `stdout', and `stderr'.  In POSIX.1
     systems this value is determined by the `OPEN_MAX' parameter;
     *note General Limits::.  In BSD and GNU, it is controlled by the
     `RLIMIT_NOFILE' resource limit; *note Limits on Resources::.

 - Function: FILE * freopen (const char *FILENAME, const char
          *OPENTYPE, FILE *STREAM)
     This function is like a combination of `fclose' and `fopen'.  It
     first closes the stream referred to by STREAM, ignoring any errors
     that are detected in the process.  (Because errors are ignored,
     you should not use `freopen' on an output stream if you have
     actually done any output using the stream.)  Then the file named by
     FILENAME is opened with mode OPENTYPE as for `fopen', and
     associated with the same stream object STREAM.

     If the operation fails, a null pointer is returned; otherwise,
     `freopen' returns STREAM.

     `freopen' has traditionally been used to connect a standard stream
     such as `stdin' with a file of your own choice.  This is useful in
     programs in which use of a standard stream for certain purposes is
     hard-coded.  In the GNU C library, you can simply close the
     standard streams and open new ones with `fopen'.  But other
     systems lack this ability, so using `freopen' is more portable.

     When the sources are compiling with `_FILE_OFFSET_BITS == 64' on a
     32 bit machine this function is in fact `freopen64' since the LFS
     interface replaces transparently the old interface.

 - Function: FILE * freopen64 (const char *FILENAME, const char
          *OPENTYPE, FILE *STREAM)
     This function is similar to `freopen'.  The only difference is that
     on 32 bit machine the stream returned is able to read beyond the
     2^31 bytes limits imposed by the normal interface.  It should be
     noted that the stream pointed to by STREAM need not be opened
     using `fopen64' or `freopen64' since its mode is not important for
     this function.

     If the sources are compiled with `_FILE_OFFSET_BITS == 64' on a 32
     bits machine this function is available under the name `freopen'
     and so transparently replaces the old interface.

   In some situations it is useful to know whether a given stream is
available for reading or writing.  This information is normally not
available and would have to be remembered separately.  Solaris
introduced a few functions to get this information from the stream
descriptor and these functions are also available in the GNU C library.

 - Function: int __freadable (FILE *STREAM)
     The `__freadable' function determines whether the stream STREAM
     was opened to allow reading.  In this case the return value is
     nonzero.  For write-only streams the function returns zero.

     This function is declared in `stdio_ext.h'.

 - Function: int __fwritable (FILE *STREAM)
     The `__fwritable' function determines whether the stream STREAM
     was opened to allow writing.  In this case the return value is
     nonzero.  For read-only streams the function returns zero.

     This function is declared in `stdio_ext.h'.

   For slightly different kind of problems there are two more functions.
They provide even finer-grained information.

 - Function: int __freading (FILE *STREAM)
     The `__freading' function determines whether the stream STREAM was
     last read from or whether it is opened read-only.  In this case
     the return value is nonzero, otherwise it is zero.  Determining
     whether a stream opened for reading and writing was last used for
     writing allows to draw conclusions about the content about the
     buffer, among other things.

     This function is declared in `stdio_ext.h'.

 - Function: int __fwriting (FILE *STREAM)
     The `__fwriting' function determines whether the stream STREAM was
     last written to or whether it is opened write-only.  In this case
     the return value is nonzero, otherwise it is zero.

     This function is declared in `stdio_ext.h'.


File: libc.info,  Node: Closing Streams,  Next: Streams and Threads,  Prev: Opening Streams,  Up: I/O on Streams

Closing Streams
===============

When a stream is closed with `fclose', the connection between the
stream and the file is canceled.  After you have closed a stream, you
cannot perform any additional operations on it.

 - Function: int fclose (FILE *STREAM)
     This function causes STREAM to be closed and the connection to the
     corresponding file to be broken.  Any buffered output is written
     and any buffered input is discarded.  The `fclose' function returns
     a value of `0' if the file was closed successfully, and `EOF' if
     an error was detected.

     It is important to check for errors when you call `fclose' to close
     an output stream, because real, everyday errors can be detected at
     this time.  For example, when `fclose' writes the remaining
     buffered output, it might get an error because the disk is full.
     Even if you know the buffer is empty, errors can still occur when
     closing a file if you are using NFS.

     The function `fclose' is declared in `stdio.h'.

   To close all streams currently available the GNU C Library provides
another function.

 - Function: int fcloseall (void)
     This function causes all open streams of the process to be closed
     and the connection to corresponding files to be broken.  All
     buffered data is written and any buffered input is discarded.  The
     `fcloseall' function returns a value of `0' if all the files were
     closed successfully, and `EOF' if an error was detected.

     This function should be used only in special situations, e.g.,
     when an error occurred and the program must be aborted.  Normally
     each single stream should be closed separately so that problems
     with individual streams can be identified.  It is also problematic
     since the standard streams (*note Standard Streams::) will also be
     closed.

     The function `fcloseall' is declared in `stdio.h'.

   If the `main' function to your program returns, or if you call the
`exit' function (*note Normal Termination::), all open streams are
automatically closed properly.  If your program terminates in any other
manner, such as by calling the `abort' function (*note Aborting a
Program::) or from a fatal signal (*note Signal Handling::), open
streams might not be closed properly.  Buffered output might not be
flushed and files may be incomplete.  For more information on buffering
of streams, see *Note Stream Buffering::.


File: libc.info,  Node: Streams and Threads,  Next: Streams and I18N,  Prev: Closing Streams,  Up: I/O on Streams

Streams and Threads
===================

Streams can be used in multi-threaded applications in the same way they
are used in single-threaded applications.  But the programmer must be
aware of a the possible complications.  It is important to know about
these also if the program one writes never use threads since the design
and implementation of many stream functions is heavily influenced by the
requirements added by multi-threaded programming.

   The POSIX standard requires that by default the stream operations are
atomic.  I.e., issuing two stream operations for the same stream in two
threads at the same time will cause the operations to be executed as if
they were issued sequentially.  The buffer operations performed while
reading or writing are protected from other uses of the same stream.  To
do this each stream has an internal lock object which has to be
(implicitly) acquired before any work can be done.

   But there are situations where this is not enough and there are also
situations where this is not wanted.  The implicit locking is not enough
if the program requires more than one stream function call to happen
atomically.  One example would be if an output line a program wants to
generate is created by several function calls.  The functions by
themselves would ensure only atomicity of their own operation, but not
atomicity over all the function calls.  For this it is necessary to
perform the stream locking in the application code.

 - Function: void flockfile (FILE *STREAM)
     The `flockfile' function acquires the internal locking object
     associated with the stream STREAM.  This ensures that no other
     thread can explicitly through `flockfile'/`ftrylockfile' or
     implicit through a call of a stream function lock the stream.  The
     thread will block until the lock is acquired.  An explicit call to
     `funlockfile' has to be used to release the lock.

 - Function: int ftrylockfile (FILE *STREAM)
     The `ftrylockfile' function tries to acquire the internal locking
     object associated with the stream STREAM just like `flockfile'.
     But unlike `flockfile' this function does not block if the lock is
     not available.  `ftrylockfile' returns zero if the lock was
     successfully acquired.  Otherwise the stream is locked by another
     thread.

 - Function: void funlockfile (FILE *STREAM)
     The `funlockfile' function releases the internal locking object of
     the stream STREAM. The stream must have been locked before by a
     call to `flockfile' or a successful call of `ftrylockfile'.  The
     implicit locking performed by the stream operations do not count.
     The `funlockfile' function does not return an error status and the
     behavior of a call for a stream which is not locked by the current
     thread is undefined.

   The following example shows how the functions above can be used to
generate an output line atomically even in multi-threaded applications
(yes, the same job could be done with one `fprintf' call but it is
sometimes not possible):

     FILE *fp;
     {
        ...
        flockfile (fp);
        fputs ("This is test number ", fp);
        fprintf (fp, "%d\n", test);
        funlockfile (fp)
     }

   Without the explicit locking it would be possible for another thread
to use the stream FP after the `fputs' call return and before `fprintf'
was called with the result that the number does not follow the word
`number'.

   From this description it might already be clear that the locking
objects in streams are no simple mutexes.  Since locking the same
stream twice in the same thread is allowed the locking objects must be
equivalent to recursive mutexes.  These mutexes keep track of the owner
and the number of times the lock is acquired.  The same number of
`funlockfile' calls by the same threads is necessary to unlock the
stream completely.  For instance:

     void
     foo (FILE *fp)
     {
       ftrylockfile (fp);
       fputs ("in foo\n", fp);
       /* This is very wrong!!!  */
       funlockfile (fp);
     }

   It is important here that the `funlockfile' function is only called
if the `ftrylockfile' function succeeded in locking the stream.  It is
therefore always wrong to ignore the result of `ftrylockfile'.  And it
makes no sense since otherwise one would use `flockfile'.  The result
of code like that above is that either `funlockfile' tries to free a
stream that hasn't been locked by the current thread or it frees the
stream prematurely.  The code should look like this:

     void
     foo (FILE *fp)
     {
       if (ftrylockfile (fp) == 0)
         {
           fputs ("in foo\n", fp);
           funlockfile (fp);
         }
     }

   Now that we covered why it is necessary to have these locking it is
necessary to talk about situations when locking is unwanted and what can
be done.  The locking operations (explicit or implicit) don't come for
free.  Even if a lock is not taken the cost is not zero.  The operations
which have to be performed require memory operations that are safe in
multi-processor environments.  With the many local caches involved in
such systems this is quite costly.  So it is best to avoid the locking
completely if it is not needed - because the code in question is never
used in a context where two or more threads may use a stream at a time.
This can be determined most of the time for application code; for
library code which can be used in many contexts one should default to be
conservative and use locking.

   There are two basic mechanisms to avoid locking.  The first is to use
the `_unlocked' variants of the stream operations.  The POSIX standard
defines quite a few of those and the GNU library adds a few more.
These variants of the functions behave just like the functions with the
name without the suffix except that they do not lock the stream.  Using
these functions is very desirable since they are potentially much
faster.  This is not only because the locking operation itself is
avoided.  More importantly, functions like `putc' and `getc' are very
simple and traditionally (before the introduction of threads) were
implemented as macros which are very fast if the buffer is not empty.
With the addition of locking requirements these functions are no longer
implemented as macros since they would would expand to too much code.
But these macros are still available with the same functionality under
the new names `putc_unlocked' and `getc_unlocked'.  This possibly huge
difference of speed also suggests the use of the `_unlocked' functions
even if locking is required.  The difference is that the locking then
has to be performed in the program:

     void
     foo (FILE *fp, char *buf)
     {
       flockfile (fp);
       while (*buf != '/')
         putc_unlocked (*buf++, fp);
       funlockfile (fp);
     }

   If in this example the `putc' function would be used and the
explicit locking would be missing the `putc' function would have to
acquire the lock in every call, potentially many times depending on when
the loop terminates.  Writing it the way illustrated above allows the
`putc_unlocked' macro to be used which means no locking and direct
manipulation of the buffer of the stream.

   A second way to avoid locking is by using a non-standard function
which was introduced in Solaris and is available in the GNU C library
as well.

 - Function: int __fsetlocking (FILE *STREAM, int TYPE)
     The `__fsetlocking' function can be used to select whether the
     stream operations will implicitly acquire the locking object of the
     stream STREAM.  By default this is done but it can be disabled and
     reinstated using this function.  There are three values defined
     for the TYPE parameter.

    `FSETLOCKING_INTERNAL'
          The stream `stream' will from now on use the default internal
          locking.  Every stream operation with exception of the
          `_unlocked' variants will implicitly lock the stream.

    `FSETLOCKING_BYCALLER'
          After the `__fsetlocking' function returns the user is
          responsible for locking the stream.  None of the stream
          operations will implicitly do this anymore until the state is
          set back to `FSETLOCKING_INTERNAL'.

    `FSETLOCKING_QUERY'
          `__fsetlocking' only queries the current locking state of the
          stream.  The return value will be `FSETLOCKING_INTERNAL' or
          `FSETLOCKING_BYCALLER' depending on the state.

     The return value of `__fsetlocking' is either
     `FSETLOCKING_INTERNAL' or `FSETLOCKING_BYCALLER' depending on the
     state of the stream before the call.

     This function and the values for the TYPE parameter are declared
     in `stdio_ext.h'.

   This function is especially useful when program code has to be used
which is written without knowledge about the `_unlocked' functions (or
if the programmer was too lazy to use them).


File: libc.info,  Node: Streams and I18N,  Next: Simple Output,  Prev: Streams and Threads,  Up: I/O on Streams

Streams in Internationalized Applications
=========================================

ISO C90 introduced the new type `wchar_t' to allow handling larger
character sets.  What was missing was a possibility to output strings
of `wchar_t' directly.  One had to convert them into multibyte strings
using `mbstowcs' (there was no `mbsrtowcs' yet) and then use the normal
stream functions.  While this is doable it is very cumbersome since
performing the conversions is not trivial and greatly increases program
complexity and size.

   The Unix standard early on (I think in XPG4.2) introduced two
additional format specifiers for the `printf' and `scanf' families of
functions.  Printing and reading of single wide characters was made
possible using the `%C' specifier and wide character strings can be
handled with `%S'.  These modifiers behave just like `%c' and `%s' only
that they expect the corresponding argument to have the wide character
type and that the wide character and string are transformed into/from
multibyte strings before being used.

   This was a beginning but it is still not good enough.  Not always is
it desirable to use `printf' and `scanf'.  The other, smaller and
faster functions cannot handle wide characters.  Second, it is not
possible to have a format string for `printf' and `scanf' consisting of
wide characters.  The result is that format strings would have to be
generated if they have to contain non-basic characters.

   In the Amendment 1 to ISO C90 a whole new set of functions was added
to solve the problem.  Most of the stream functions got a counterpart
which take a wide character or wide character string instead of a
character or string respectively.  The new functions operate on the
same streams (like `stdout').  This is different from the model of the
C++ runtime library where separate streams for wide and normal I/O are
used.

   Being able to use the same stream for wide and normal operations
comes with a restriction: a stream can be used either for wide
operations or for normal operations.  Once it is decided there is no
way back.  Only a call to `freopen' or `freopen64' can reset the
"orientation".  The orientation can be decided in three ways:

   * If any of the normal character functions is used (this includes the
     `fread' and `fwrite' functions) the stream is marked as not wide
     oriented.

   * If any of the wide character functions is used the stream is
     marked as wide oriented.

   * The `fwide' function can be used to set the orientation either way.

   It is important to never mix the use of wide and not wide operations
on a stream.  There are no diagnostics issued.  The application behavior
will simply be strange or the application will simply crash.  The
`fwide' function can help avoiding this.

 - Function: int fwide (FILE *STREAM, int MODE)
     The `fwide' function can be used to set and query the state of the
     orientation of the stream STREAM.  If the MODE parameter has a
     positive value the streams get wide oriented, for negative values
     narrow oriented.  It is not possible to overwrite previous
     orientations with `fwide'.  I.e., if the stream STREAM was already
     oriented before the call nothing is done.

     If MODE is zero the current orientation state is queried and
     nothing is changed.

     The `fwide' function returns a negative value, zero, or a positive
     value if the stream is narrow, not at all, or wide oriented
     respectively.

     This function was introduced in Amendment 1 to ISO C90 and is
     declared in `wchar.h'.

   It is generally a good idea to orient a stream as early as possible.
This can prevent surprise especially for the standard streams `stdin',
`stdout', and `stderr'.  If some library function in some situations
uses one of these streams and this use orients the stream in a
different way the rest of the application expects it one might end up
with hard to reproduce errors.  Remember that no errors are signal if
the streams are used incorrectly.  Leaving a stream unoriented after
creation is normally only necessary for library functions which create
streams which can be used in different contexts.

   When writing code which uses streams and which can be used in
different contexts it is important to query the orientation of the
stream before using it (unless the rules of the library interface
demand a specific orientation).  The following little, silly function
illustrates this.

     void
     print_f (FILE *fp)
     {
       if (fwide (fp, 0) > 0)
         /* Positive return value means wide orientation.  */
         fputwc (L'f', fp);
       else
         fputc ('f', fp);
     }

   Note that in this case the function `print_f' decides about the
orientation of the stream if it was unoriented before (will not happen
if the advise above is followed).

   The encoding used for the `wchar_t' values is unspecified and the
user must not make any assumptions about it.  For I/O of `wchar_t'
values this means that it is impossible to write these values directly
to the stream.  This is not what follows from the ISO C locale model
either.  What happens instead is that the bytes read from or written to
the underlying media are first converted into the internal encoding
chosen by the implementation for `wchar_t'.  The external encoding is
determined by the `LC_CTYPE' category of the current locale or by the
`ccs' part of the mode specification given to `fopen', `fopen64',
`freopen', or `freopen64'.  How and when the conversion happens is
unspecified and it happens invisible to the user.

   Since a stream is created in the unoriented state it has at that
point no conversion associated with it.  The conversion which will be
used is determined by the `LC_CTYPE' category selected at the time the
stream is oriented.  If the locales are changed at the runtime this
might produce surprising results unless one pays attention.  This is
just another good reason to orient the stream explicitly as soon as
possible, perhaps with a call to `fwide'.


File: libc.info,  Node: Simple Output,  Next: Character Input,  Prev: Streams and I18N,  Up: I/O on Streams

Simple Output by Characters or Lines
====================================

This section describes functions for performing character- and
line-oriented output.

   These narrow streams functions are declared in the header file
`stdio.h' and the wide stream functions in `wchar.h'.

 - Function: int fputc (int C, FILE *STREAM)
     The `fputc' function converts the character C to type `unsigned
     char', and writes it to the stream STREAM.  `EOF' is returned if a
     write error occurs; otherwise the character C is returned.

 - Function: wint_t fputwc (wchar_t WC, FILE *STREAM)
     The `fputwc' function writes the wide character WC to the stream
     STREAM.  `WEOF' is returned if a write error occurs; otherwise the
     character WC is returned.

 - Function: int fputc_unlocked (int C, FILE *STREAM)
     The `fputc_unlocked' function is equivalent to the `fputc'
     function except that it does not implicitly lock the stream.

 - Function: wint_t fputwc_unlocked (wint_t WC, FILE *STREAM)
     The `fputwc_unlocked' function is equivalent to the `fputwc'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Function: int putc (int C, FILE *STREAM)
     This is just like `fputc', except that most systems implement it as
     a macro, making it faster.  One consequence is that it may
     evaluate the STREAM argument more than once, which is an exception
     to the general rule for macros.  `putc' is usually the best
     function to use for writing a single character.

 - Function: wint_t putwc (wchar_t WC, FILE *STREAM)
     This is just like `fputwc', except that it can be implement as a
     macro, making it faster.  One consequence is that it may evaluate
     the STREAM argument more than once, which is an exception to the
     general rule for macros.  `putwc' is usually the best function to
     use for writing a single wide character.

 - Function: int putc_unlocked (int C, FILE *STREAM)
     The `putc_unlocked' function is equivalent to the `putc' function
     except that it does not implicitly lock the stream.

 - Function: wint_t putwc_unlocked (wchar_t WC, FILE *STREAM)
     The `putwc_unlocked' function is equivalent to the `putwc'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Function: int putchar (int C)
     The `putchar' function is equivalent to `putc' with `stdout' as
     the value of the STREAM argument.

 - Function: wint_t putwchar (wchar_t WC)
     The `putwchar' function is equivalent to `putwc' with `stdout' as
     the value of the STREAM argument.

 - Function: int putchar_unlocked (int C)
     The `putchar_unlocked' function is equivalent to the `putchar'
     function except that it does not implicitly lock the stream.

 - Function: wint_t putwchar_unlocked (wchar_t WC)
     The `putwchar_unlocked' function is equivalent to the `putwchar'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Function: int fputs (const char *S, FILE *STREAM)
     The function `fputs' writes the string S to the stream STREAM.
     The terminating null character is not written.  This function does
     _not_ add a newline character, either.  It outputs only the
     characters in the string.

     This function returns `EOF' if a write error occurs, and otherwise
     a non-negative value.

     For example:

          fputs ("Are ", stdout);
          fputs ("you ", stdout);
          fputs ("hungry?\n", stdout);

     outputs the text `Are you hungry?' followed by a newline.

 - Function: int fputws (const wchar_t *WS, FILE *STREAM)
     The function `fputws' writes the wide character string WS to the
     stream STREAM.  The terminating null character is not written.
     This function does _not_ add a newline character, either.  It
     outputs only the characters in the string.

     This function returns `WEOF' if a write error occurs, and otherwise
     a non-negative value.

 - Function: int fputs_unlocked (const char *S, FILE *STREAM)
     The `fputs_unlocked' function is equivalent to the `fputs'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Function: int fputws_unlocked (const wchar_t *WS, FILE *STREAM)
     The `fputws_unlocked' function is equivalent to the `fputws'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Function: int puts (const char *S)
     The `puts' function writes the string S to the stream `stdout'
     followed by a newline.  The terminating null character of the
     string is not written.  (Note that `fputs' does _not_ write a
     newline as this function does.)

     `puts' is the most convenient function for printing simple
     messages.  For example:

          puts ("This is a message.");

     outputs the text `This is a message.' followed by a newline.

 - Function: int putw (int W, FILE *STREAM)
     This function writes the word W (that is, an `int') to STREAM.  It
     is provided for compatibility with SVID, but we recommend you use
     `fwrite' instead (*note Block Input/Output::).


File: libc.info,  Node: Character Input,  Next: Line Input,  Prev: Simple Output,  Up: I/O on Streams

Character Input
===============

This section describes functions for performing character-oriented
input.  These narrow streams functions are declared in the header file
`stdio.h' and the wide character functions are declared in `wchar.h'.

   These functions return an `int' or `wint_t' value (for narrow and
wide stream functions respectively) that is either a character of
input, or the special value `EOF'/`WEOF' (usually -1).  For the narrow
stream functions it is important to store the result of these functions
in a variable of type `int' instead of `char', even when you plan to
use it only as a character.  Storing `EOF' in a `char' variable
truncates its value to the size of a character, so that it is no longer
distinguishable from the valid character `(char) -1'.  So always use an
`int' for the result of `getc' and friends, and check for `EOF' after
the call; once you've verified that the result is not `EOF', you can be
sure that it will fit in a `char' variable without loss of information.

 - Function: int fgetc (FILE *STREAM)
     This function reads the next character as an `unsigned char' from
     the stream STREAM and returns its value, converted to an `int'.
     If an end-of-file condition or read error occurs, `EOF' is
     returned instead.

 - Function: wint_t fgetwc (FILE *STREAM)
     This function reads the next wide character from the stream STREAM
     and returns its value.  If an end-of-file condition or read error
     occurs, `WEOF' is returned instead.

 - Function: int fgetc_unlocked (FILE *STREAM)
     The `fgetc_unlocked' function is equivalent to the `fgetc'
     function except that it does not implicitly lock the stream.

 - Function: wint_t fgetwc_unlocked (FILE *STREAM)
     The `fgetwc_unlocked' function is equivalent to the `fgetwc'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Function: int getc (FILE *STREAM)
     This is just like `fgetc', except that it is permissible (and
     typical) for it to be implemented as a macro that evaluates the
     STREAM argument more than once.  `getc' is often highly optimized,
     so it is usually the best function to use to read a single
     character.

 - Function: wint_t getwc (FILE *STREAM)
     This is just like `fgetwc', except that it is permissible for it to
     be implemented as a macro that evaluates the STREAM argument more
     than once.  `getwc' can be highly optimized, so it is usually the
     best function to use to read a single wide character.

 - Function: int getc_unlocked (FILE *STREAM)
     The `getc_unlocked' function is equivalent to the `getc' function
     except that it does not implicitly lock the stream.

 - Function: wint_t getwc_unlocked (FILE *STREAM)
     The `getwc_unlocked' function is equivalent to the `getwc'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Function: int getchar (void)
     The `getchar' function is equivalent to `getc' with `stdin' as the
     value of the STREAM argument.

 - Function: wint_t getwchar (void)
     The `getwchar' function is equivalent to `getwc' with `stdin' as
     the value of the STREAM argument.

 - Function: int getchar_unlocked (void)
     The `getchar_unlocked' function is equivalent to the `getchar'
     function except that it does not implicitly lock the stream.

 - Function: wint_t getwchar_unlocked (void)
     The `getwchar_unlocked' function is equivalent to the `getwchar'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

   Here is an example of a function that does input using `fgetc'.  It
would work just as well using `getc' instead, or using `getchar ()'
instead of `fgetc (stdin)'.  The code would also work the same for the
wide character stream functions.

     int
     y_or_n_p (const char *question)
     {
       fputs (question, stdout);
       while (1)
         {
           int c, answer;
           /* Write a space to separate answer from question. */
           fputc (' ', stdout);
           /* Read the first character of the line.
              This should be the answer character, but might not be. */
           c = tolower (fgetc (stdin));
           answer = c;
           /* Discard rest of input line. */
           while (c != '\n' && c != EOF)
             c = fgetc (stdin);
           /* Obey the answer if it was valid. */
           if (answer == 'y')
             return 1;
           if (answer == 'n')
             return 0;
           /* Answer was invalid: ask for valid answer. */
           fputs ("Please answer y or n:", stdout);
         }
     }

 - Function: int getw (FILE *STREAM)
     This function reads a word (that is, an `int') from STREAM.  It's
     provided for compatibility with SVID.  We recommend you use
     `fread' instead (*note Block Input/Output::).  Unlike `getc', any
     `int' value could be a valid result.  `getw' returns `EOF' when it
     encounters end-of-file or an error, but there is no way to
     distinguish this from an input word with value -1.


File: libc.info,  Node: Line Input,  Next: Unreading,  Prev: Character Input,  Up: I/O on Streams

Line-Oriented Input
===================

Since many programs interpret input on the basis of lines, it is
convenient to have functions to read a line of text from a stream.

   Standard C has functions to do this, but they aren't very safe: null
characters and even (for `gets') long lines can confuse them.  So the
GNU library provides the nonstandard `getline' function that makes it
easy to read lines reliably.

   Another GNU extension, `getdelim', generalizes `getline'.  It reads
a delimited record, defined as everything through the next occurrence
of a specified delimiter character.

   All these functions are declared in `stdio.h'.

 - Function: ssize_t getline (char **LINEPTR, size_t *N, FILE *STREAM)
     This function reads an entire line from STREAM, storing the text
     (including the newline and a terminating null character) in a
     buffer and storing the buffer address in `*LINEPTR'.

     Before calling `getline', you should place in `*LINEPTR' the
     address of a buffer `*N' bytes long, allocated with `malloc'.  If
     this buffer is long enough to hold the line, `getline' stores the
     line in this buffer.  Otherwise, `getline' makes the buffer bigger
     using `realloc', storing the new buffer address back in `*LINEPTR'
     and the increased size back in `*N'.  *Note Unconstrained
     Allocation::.

     If you set `*LINEPTR' to a null pointer, and `*N' to zero, before
     the call, then `getline' allocates the initial buffer for you by
     calling `malloc'.

     In either case, when `getline' returns,  `*LINEPTR' is a `char *'
     which points to the text of the line.

     When `getline' is successful, it returns the number of characters
     read (including the newline, but not including the terminating
     null).  This value enables you to distinguish null characters that
     are part of the line from the null character inserted as a
     terminator.

     This function is a GNU extension, but it is the recommended way to
     read lines from a stream.  The alternative standard functions are
     unreliable.

     If an error occurs or end of file is reached without any bytes
     read, `getline' returns `-1'.

 - Function: ssize_t getdelim (char **LINEPTR, size_t *N, int
          DELIMITER, FILE *STREAM)
     This function is like `getline' except that the character which
     tells it to stop reading is not necessarily newline.  The argument
     DELIMITER specifies the delimiter character; `getdelim' keeps
     reading until it sees that character (or end of file).

     The text is stored in LINEPTR, including the delimiter character
     and a terminating null.  Like `getline', `getdelim' makes LINEPTR
     bigger if it isn't big enough.

     `getline' is in fact implemented in terms of `getdelim', just like
     this:

          ssize_t
          getline (char **lineptr, size_t *n, FILE *stream)
          {
            return getdelim (lineptr, n, '\n', stream);
          }

 - Function: char * fgets (char *S, int COUNT, FILE *STREAM)
     The `fgets' function reads characters from the stream STREAM up to
     and including a newline character and stores them in the string S,
     adding a null character to mark the end of the string.  You must
     supply COUNT characters worth of space in S, but the number of
     characters read is at most COUNT - 1.  The extra character space
     is used to hold the null character at the end of the string.

     If the system is already at end of file when you call `fgets', then
     the contents of the array S are unchanged and a null pointer is
     returned.  A null pointer is also returned if a read error occurs.
     Otherwise, the return value is the pointer S.

     *Warning:*  If the input data has a null character, you can't tell.
     So don't use `fgets' unless you know the data cannot contain a
     null.  Don't use it to read files edited by the user because, if
     the user inserts a null character, you should either handle it
     properly or print a clear error message.  We recommend using
     `getline' instead of `fgets'.

 - Function: wchar_t * fgetws (wchar_t *WS, int COUNT, FILE *STREAM)
     The `fgetws' function reads wide characters from the stream STREAM
     up to and including a newline character and stores them in the
     string WS, adding a null wide character to mark the end of the
     string.  You must supply COUNT wide characters worth of space in
     WS, but the number of characters read is at most COUNT - 1.  The
     extra character space is used to hold the null wide character at
     the end of the string.

     If the system is already at end of file when you call `fgetws',
     then the contents of the array WS are unchanged and a null pointer
     is returned.  A null pointer is also returned if a read error
     occurs.  Otherwise, the return value is the pointer WS.

     *Warning:* If the input data has a null wide character (which are
     null bytes in the input stream), you can't tell.  So don't use
     `fgetws' unless you know the data cannot contain a null.  Don't use
     it to read files edited by the user because, if the user inserts a
     null character, you should either handle it properly or print a
     clear error message.

 - Function: char * fgets_unlocked (char *S, int COUNT, FILE *STREAM)
     The `fgets_unlocked' function is equivalent to the `fgets'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Function: wchar_t * fgetws_unlocked (wchar_t *WS, int COUNT, FILE
          *STREAM)
     The `fgetws_unlocked' function is equivalent to the `fgetws'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Deprecated function: char * gets (char *S)
     The function `gets' reads characters from the stream `stdin' up to
     the next newline character, and stores them in the string S.  The
     newline character is discarded (note that this differs from the
     behavior of `fgets', which copies the newline character into the
     string).  If `gets' encounters a read error or end-of-file, it
     returns a null pointer; otherwise it returns S.

     *Warning:* The `gets' function is *very dangerous* because it
     provides no protection against overflowing the string S.  The GNU
     library includes it for compatibility only.  You should *always*
     use `fgets' or `getline' instead.  To remind you of this, the
     linker (if using GNU `ld') will issue a warning whenever you use
     `gets'.


File: libc.info,  Node: Unreading,  Next: Block Input/Output,  Prev: Line Input,  Up: I/O on Streams

Unreading
=========

In parser programs it is often useful to examine the next character in
the input stream without removing it from the stream.  This is called
"peeking ahead" at the input because your program gets a glimpse of the
input it will read next.

   Using stream I/O, you can peek ahead at input by first reading it and
then "unreading" it (also called  "pushing it back" on the stream).
Unreading a character makes it available to be input again from the
stream, by  the next call to `fgetc' or other input function on that
stream.

* Menu:

* Unreading Idea::              An explanation of unreading with pictures.
* How Unread::                  How to call `ungetc' to do unreading.


File: libc.info,  Node: Unreading Idea,  Next: How Unread,  Up: Unreading

What Unreading Means
--------------------

Here is a pictorial explanation of unreading.  Suppose you have a
stream reading a file that contains just six characters, the letters
`foobar'.  Suppose you have read three characters so far.  The
situation looks like this:

     f  o  o  b  a  r
              ^

so the next input character will be `b'.

   If instead of reading `b' you unread the letter `o', you get a
situation like this:

     f  o  o  b  a  r
              |
           o--
           ^

so that the next input characters will be `o' and `b'.

   If you unread `9' instead of `o', you get this situation:

     f  o  o  b  a  r
              |
           9--
           ^

so that the next input characters will be `9' and `b'.


File: libc.info,  Node: How Unread,  Prev: Unreading Idea,  Up: Unreading

Using `ungetc' To Do Unreading
------------------------------

The function to unread a character is called `ungetc', because it
reverses the action of `getc'.

 - Function: int ungetc (int C, FILE *STREAM)
     The `ungetc' function pushes back the character C onto the input
     stream STREAM.  So the next input from STREAM will read C before
     anything else.

     If C is `EOF', `ungetc' does nothing and just returns `EOF'.  This
     lets you call `ungetc' with the return value of `getc' without
     needing to check for an error from `getc'.

     The character that you push back doesn't have to be the same as
     the last character that was actually read from the stream.  In
     fact, it isn't necessary to actually read any characters from the
     stream before unreading them with `ungetc'!  But that is a strange
     way to write a program; usually `ungetc' is used only to unread a
     character that was just read from the same stream.  The GNU C
     library supports this even on files opened in binary mode, but
     other systems might not.

     The GNU C library only supports one character of pushback--in other
     words, it does not work to call `ungetc' twice without doing input
     in between.  Other systems might let you push back multiple
     characters; then reading from the stream retrieves the characters
     in the reverse order that they were pushed.

     Pushing back characters doesn't alter the file; only the internal
     buffering for the stream is affected.  If a file positioning
     function (such as `fseek', `fseeko' or `rewind'; *note File
     Positioning::) is called, any pending pushed-back characters are
     discarded.

     Unreading a character on a stream that is at end of file clears the
     end-of-file indicator for the stream, because it makes the
     character of input available.  After you read that character,
     trying to read again will encounter end of file.

 - Function: wint_t ungetwc (wint_t WC, FILE *STREAM)
     The `ungetwc' function behaves just like `ungetc' just that it
     pushes back a wide character.

   Here is an example showing the use of `getc' and `ungetc' to skip
over whitespace characters.  When this function reaches a
non-whitespace character, it unreads that character to be seen again on
the next read operation on the stream.

     #include <stdio.h>
     #include <ctype.h>
     
     void
     skip_whitespace (FILE *stream)
     {
       int c;
       do
         /* No need to check for `EOF' because it is not
            `isspace', and `ungetc' ignores `EOF'.  */
         c = getc (stream);
       while (isspace (c));
       ungetc (c, stream);
     }


File: libc.info,  Node: Block Input/Output,  Next: Formatted Output,  Prev: Unreading,  Up: I/O on Streams

Block Input/Output
==================

This section describes how to do input and output operations on blocks
of data.  You can use these functions to read and write binary data, as
well as to read and write text in fixed-size blocks instead of by
characters or lines.

   Binary files are typically used to read and write blocks of data in
the same format as is used to represent the data in a running program.
In other words, arbitrary blocks of memory--not just character or string
objects--can be written to a binary file, and meaningfully read in
again by the same program.

   Storing data in binary form is often considerably more efficient than
using the formatted I/O functions.  Also, for floating-point numbers,
the binary form avoids possible loss of precision in the conversion
process.  On the other hand, binary files can't be examined or modified
easily using many standard file utilities (such as text editors), and
are not portable between different implementations of the language, or
different kinds of computers.

   These functions are declared in `stdio.h'.

 - Function: size_t fread (void *DATA, size_t SIZE, size_t COUNT, FILE
          *STREAM)
     This function reads up to COUNT objects of size SIZE into the
     array DATA, from the stream STREAM.  It returns the number of
     objects actually read, which might be less than COUNT if a read
     error occurs or the end of the file is reached.  This function
     returns a value of zero (and doesn't read anything) if either SIZE
     or COUNT is zero.

     If `fread' encounters end of file in the middle of an object, it
     returns the number of complete objects read, and discards the
     partial object.  Therefore, the stream remains at the actual end
     of the file.

 - Function: size_t fread_unlocked (void *DATA, size_t SIZE, size_t
          COUNT, FILE *STREAM)
     The `fread_unlocked' function is equivalent to the `fread'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

 - Function: size_t fwrite (const void *DATA, size_t SIZE, size_t
          COUNT, FILE *STREAM)
     This function writes up to COUNT objects of size SIZE from the
     array DATA, to the stream STREAM.  The return value is normally
     COUNT, if the call succeeds.  Any other value indicates some sort
     of error, such as running out of space.

 - Function: size_t fwrite_unlocked (const void *DATA, size_t SIZE,
          size_t COUNT, FILE *STREAM)
     The `fwrite_unlocked' function is equivalent to the `fwrite'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.


File: libc.info,  Node: Formatted Output,  Next: Customizing Printf,  Prev: Block Input/Output,  Up: I/O on Streams

Formatted Output
================

The functions described in this section (`printf' and related
functions) provide a convenient way to perform formatted output.  You
call `printf' with a "format string" or "template string" that
specifies how to format the values of the remaining arguments.

   Unless your program is a filter that specifically performs line- or
character-oriented processing, using `printf' or one of the other
related functions described in this section is usually the easiest and
most concise way to perform output.  These functions are especially
useful for printing error messages, tables of data, and the like.

* Menu:

* Formatted Output Basics::     Some examples to get you started.
* Output Conversion Syntax::    General syntax of conversion
                                 specifications.
* Table of Output Conversions:: Summary of output conversions and
                                 what they do.
* Integer Conversions::         Details about formatting of integers.
* Floating-Point Conversions::  Details about formatting of
                                 floating-point numbers.
* Other Output Conversions::    Details about formatting of strings,
                                 characters, pointers, and the like.
* Formatted Output Functions::  Descriptions of the actual functions.
* Dynamic Output::		Functions that allocate memory for the output.
* Variable Arguments Output::   `vprintf' and friends.
* Parsing a Template String::   What kinds of args does a given template
                                 call for?
* Example of Parsing::          Sample program using `parse_printf_format'.


File: libc.info,  Node: Formatted Output Basics,  Next: Output Conversion Syntax,  Up: Formatted Output

Formatted Output Basics
-----------------------

The `printf' function can be used to print any number of arguments.
The template string argument you supply in a call provides information
not only about the number of additional arguments, but also about their
types and what style should be used for printing them.

   Ordinary characters in the template string are simply written to the
output stream as-is, while "conversion specifications" introduced by a
`%' character in the template cause subsequent arguments to be
formatted and written to the output stream.  For example,

     int pct = 37;
     char filename[] = "foo.txt";
     printf ("Processing of `%s' is %d%% finished.\nPlease be patient.\n",
             filename, pct);

produces output like

     Processing of `foo.txt' is 37% finished.
     Please be patient.

   This example shows the use of the `%d' conversion to specify that an
`int' argument should be printed in decimal notation, the `%s'
conversion to specify printing of a string argument, and the `%%'
conversion to print a literal `%' character.

   There are also conversions for printing an integer argument as an
unsigned value in octal, decimal, or hexadecimal radix (`%o', `%u', or
`%x', respectively); or as a character value (`%c').

   Floating-point numbers can be printed in normal, fixed-point notation
using the `%f' conversion or in exponential notation using the `%e'
conversion.  The `%g' conversion uses either `%e' or `%f' format,
depending on what is more appropriate for the magnitude of the
particular number.

   You can control formatting more precisely by writing "modifiers"
between the `%' and the character that indicates which conversion to
apply.  These slightly alter the ordinary behavior of the conversion.
For example, most conversion specifications permit you to specify a
minimum field width and a flag indicating whether you want the result
left- or right-justified within the field.

   The specific flags and modifiers that are permitted and their
interpretation vary depending on the particular conversion.  They're all
described in more detail in the following sections.  Don't worry if this
all seems excessively complicated at first; you can almost always get
reasonable free-format output without using any of the modifiers at all.
The modifiers are mostly used to make the output look "prettier" in
tables.


File: libc.info,  Node: Output Conversion Syntax,  Next: Table of Output Conversions,  Prev: Formatted Output Basics,  Up: Formatted Output

Output Conversion Syntax
------------------------

This section provides details about the precise syntax of conversion
specifications that can appear in a `printf' template string.

   Characters in the template string that are not part of a conversion
specification are printed as-is to the output stream.  Multibyte
character sequences (*note Character Set Handling::) are permitted in a
template string.

   The conversion specifications in a `printf' template string have the
general form:

     % [ PARAM-NO $] FLAGS WIDTH [ . PRECISION ] TYPE CONVERSION

or

     % [ PARAM-NO $] FLAGS WIDTH . * [ PARAM-NO $] TYPE CONVERSION

   For example, in the conversion specifier `%-10.8ld', the `-' is a
flag, `10' specifies the field width, the precision is `8', the letter
`l' is a type modifier, and `d' specifies the conversion style.  (This
particular type specifier says to print a `long int' argument in
decimal notation, with a minimum of 8 digits left-justified in a field
at least 10 characters wide.)

   In more detail, output conversion specifications consist of an
initial `%' character followed in sequence by:

   * An optional specification of the parameter used for this format.
     Normally the parameters to the `printf' function are assigned to
     the formats in the order of appearance in the format string.  But
     in some situations (such as message translation) this is not
     desirable and this extension allows an explicit parameter to be
     specified.

     The PARAM-NO parts of the format must be integers in the range of
     1 to the maximum number of arguments present to the function call.
     Some implementations limit this number to a certainly upper
     bound.  The exact limit can be retrieved by the following constant.

      - Macro: NL_ARGMAX
          The value of `NL_ARGMAX' is the maximum value allowed for the
          specification of an positional parameter in a `printf' call.
          The actual value in effect at runtime can be retrieved by
          using `sysconf' using the `_SC_NL_ARGMAX' parameter *note
          Sysconf Definition::.

          Some system have a quite low limit such as 9 for System V
          systems.  The GNU C library has no real limit.

     If any of the formats has a specification for the parameter
     position all of them in the format string shall have one.
     Otherwise the behavior is undefined.

   * Zero or more "flag characters" that modify the normal behavior of
     the conversion specification.

   * An optional decimal integer specifying the "minimum field width".
     If the normal conversion produces fewer characters than this, the
     field is padded with spaces to the specified width.  This is a
     _minimum_ value; if the normal conversion produces more characters
     than this, the field is _not_ truncated.  Normally, the output is
     right-justified within the field.

     You can also specify a field width of `*'.  This means that the
     next argument in the argument list (before the actual value to be
     printed) is used as the field width.  The value must be an `int'.
     If the value is negative, this means to set the `-' flag (see
     below) and to use the absolute value as the field width.

   * An optional "precision" to specify the number of digits to be
     written for the numeric conversions.  If the precision is
     specified, it consists of a period (`.') followed optionally by a
     decimal integer (which defaults to zero if omitted).

     You can also specify a precision of `*'.  This means that the next
     argument in the argument list (before the actual value to be
     printed) is used as the precision.  The value must be an `int',
     and is ignored if it is negative.  If you specify `*' for both the
     field width and precision, the field width argument precedes the
     precision argument.  Other C library versions may not recognize
     this syntax.

   * An optional "type modifier character", which is used to specify the
     data type of the corresponding argument if it differs from the
     default type.  (For example, the integer conversions assume a type
     of `int', but you can specify `h', `l', or `L' for other integer
     types.)

   * A character that specifies the conversion to be applied.

   The exact options that are permitted and how they are interpreted
vary between the different conversion specifiers.  See the descriptions
of the individual conversions for information about the particular
options that they use.

   With the `-Wformat' option, the GNU C compiler checks calls to
`printf' and related functions.  It examines the format string and
verifies that the correct number and types of arguments are supplied.
There is also a GNU C syntax to tell the compiler that a function you
write uses a `printf'-style format string.  *Note Declaring Attributes
of Functions: (gcc.info)Function Attributes, for more information.


File: libc.info,  Node: Table of Output Conversions,  Next: Integer Conversions,  Prev: Output Conversion Syntax,  Up: Formatted Output

Table of Output Conversions
---------------------------

Here is a table summarizing what all the different conversions do:

`%d', `%i'
     Print an integer as a signed decimal number.  *Note Integer
     Conversions::, for details.  `%d' and `%i' are synonymous for
     output, but are different when used with `scanf' for input (*note
     Table of Input Conversions::).

`%o'
     Print an integer as an unsigned octal number.  *Note Integer
     Conversions::, for details.

`%u'
     Print an integer as an unsigned decimal number.  *Note Integer
     Conversions::, for details.

`%x', `%X'
     Print an integer as an unsigned hexadecimal number.  `%x' uses
     lower-case letters and `%X' uses upper-case.  *Note Integer
     Conversions::, for details.

`%f'
     Print a floating-point number in normal (fixed-point) notation.
     *Note Floating-Point Conversions::, for details.

`%e', `%E'
     Print a floating-point number in exponential notation.  `%e' uses
     lower-case letters and `%E' uses upper-case.  *Note Floating-Point
     Conversions::, for details.

`%g', `%G'
     Print a floating-point number in either normal or exponential
     notation, whichever is more appropriate for its magnitude.  `%g'
     uses lower-case letters and `%G' uses upper-case.  *Note
     Floating-Point Conversions::, for details.

`%a', `%A'
     Print a floating-point number in a hexadecimal fractional notation
     which the exponent to base 2 represented in decimal digits.  `%a'
     uses lower-case letters and `%A' uses upper-case.  *Note
     Floating-Point Conversions::, for details.

`%c'
     Print a single character.  *Note Other Output Conversions::.

`%C'
     This is an alias for `%lc' which is supported for compatibility
     with the Unix standard.

`%s'
     Print a string.  *Note Other Output Conversions::.

`%S'
     This is an alias for `%ls' which is supported for compatibility
     with the Unix standard.

`%p'
     Print the value of a pointer.  *Note Other Output Conversions::.

`%n'
     Get the number of characters printed so far.  *Note Other Output
     Conversions::.  Note that this conversion specification never
     produces any output.

`%m'
     Print the string corresponding to the value of `errno'.  (This is
     a GNU extension.)  *Note Other Output Conversions::.

`%%'
     Print a literal `%' character.  *Note Other Output Conversions::.

   If the syntax of a conversion specification is invalid, unpredictable
things will happen, so don't do this.  If there aren't enough function
arguments provided to supply values for all the conversion
specifications in the template string, or if the arguments are not of
the correct types, the results are unpredictable.  If you supply more
arguments than conversion specifications, the extra argument values are
simply ignored; this is sometimes useful.


File: libc.info,  Node: Integer Conversions,  Next: Floating-Point Conversions,  Prev: Table of Output Conversions,  Up: Formatted Output

Integer Conversions
-------------------

This section describes the options for the `%d', `%i', `%o', `%u',
`%x', and `%X' conversion specifications.  These conversions print
integers in various formats.

   The `%d' and `%i' conversion specifications both print an `int'
argument as a signed decimal number; while `%o', `%u', and `%x' print
the argument as an unsigned octal, decimal, or hexadecimal number
(respectively).  The `%X' conversion specification is just like `%x'
except that it uses the characters `ABCDEF' as digits instead of
`abcdef'.

   The following flags are meaningful:

`-'
     Left-justify the result in the field (instead of the normal
     right-justification).

`+'
     For the signed `%d' and `%i' conversions, print a plus sign if the
     value is positive.

` '
     For the signed `%d' and `%i' conversions, if the result doesn't
     start with a plus or minus sign, prefix it with a space character
     instead.  Since the `+' flag ensures that the result includes a
     sign, this flag is ignored if you supply both of them.

`#'
     For the `%o' conversion, this forces the leading digit to be `0',
     as if by increasing the precision.  For `%x' or `%X', this
     prefixes a leading `0x' or `0X' (respectively) to the result.
     This doesn't do anything useful for the `%d', `%i', or `%u'
     conversions.  Using this flag produces output which can be parsed
     by the `strtoul' function (*note Parsing of Integers::) and
     `scanf' with the `%i' conversion (*note Numeric Input
     Conversions::).

`''
     Separate the digits into groups as specified by the locale
     specified for the `LC_NUMERIC' category; *note General Numeric::.
     This flag is a GNU extension.

`0'
     Pad the field with zeros instead of spaces.  The zeros are placed
     after any indication of sign or base.  This flag is ignored if the
     `-' flag is also specified, or if a precision is specified.

   If a precision is supplied, it specifies the minimum number of
digits to appear; leading zeros are produced if necessary.  If you
don't specify a precision, the number is printed with as many digits as
it needs.  If you convert a value of zero with an explicit precision of
zero, then no characters at all are produced.

   Without a type modifier, the corresponding argument is treated as an
`int' (for the signed conversions `%i' and `%d') or `unsigned int' (for
the unsigned conversions `%o', `%u', `%x', and `%X').  Recall that
since `printf' and friends are variadic, any `char' and `short'
arguments are automatically converted to `int' by the default argument
promotions.  For arguments of other integer types, you can use these
modifiers:

`hh'
     Specifies that the argument is a `signed char' or `unsigned char',
     as appropriate.  A `char' argument is converted to an `int' or
     `unsigned int' by the default argument promotions anyway, but the
     `h' modifier says to convert it back to a `char' again.

     This modifier was introduced in ISO C99.

`h'
     Specifies that the argument is a `short int' or `unsigned short
     int', as appropriate.  A `short' argument is converted to an `int'
     or `unsigned int' by the default argument promotions anyway, but
     the `h' modifier says to convert it back to a `short' again.

`j'
     Specifies that the argument is a `intmax_t' or `uintmax_t', as
     appropriate.

     This modifier was introduced in ISO C99.

`l'
     Specifies that the argument is a `long int' or `unsigned long
     int', as appropriate.  Two `l' characters is like the `L'
     modifier, below.

     If used with `%c' or `%s' the corresponding parameter is
     considered as a wide character or wide character string
     respectively.  This use of `l' was introduced in Amendment 1 to
     ISO C90.

`L'
`ll'
`q'
     Specifies that the argument is a `long long int'.  (This type is
     an extension supported by the GNU C compiler.  On systems that
     don't support extra-long integers, this is the same as `long int'.)

     The `q' modifier is another name for the same thing, which comes
     from 4.4 BSD; a `long long int' is sometimes called a "quad" `int'.

`t'
     Specifies that the argument is a `ptrdiff_t'.

     This modifier was introduced in ISO C99.

`z'
`Z'
     Specifies that the argument is a `size_t'.

     `z' was introduced in ISO C99.  `Z' is a GNU extension predating
     this addition and should not be used in new code.

   Here is an example.  Using the template string:

     "|%5d|%-5d|%+5d|%+-5d|% 5d|%05d|%5.0d|%5.2d|%d|\n"

to print numbers using the different options for the `%d' conversion
gives results like:

     |    0|0    |   +0|+0   |    0|00000|     |   00|0|
     |    1|1    |   +1|+1   |    1|00001|    1|   01|1|
     |   -1|-1   |   -1|-1   |   -1|-0001|   -1|  -01|-1|
     |100000|100000|+100000|+100000| 100000|100000|100000|100000|100000|

   In particular, notice what happens in the last case where the number
is too large to fit in the minimum field width specified.

   Here are some more examples showing how unsigned integers print under
various format options, using the template string:

     "|%5u|%5o|%5x|%5X|%#5o|%#5x|%#5X|%#10.8x|\n"

     |    0|    0|    0|    0|    0|    0|    0|  00000000|
     |    1|    1|    1|    1|   01|  0x1|  0X1|0x00000001|
     |100000|303240|186a0|186A0|0303240|0x186a0|0X186A0|0x000186a0|


File: libc.info,  Node: Floating-Point Conversions,  Next: Other Output Conversions,  Prev: Integer Conversions,  Up: Formatted Output

Floating-Point Conversions
--------------------------

This section discusses the conversion specifications for floating-point
numbers: the `%f', `%e', `%E', `%g', and `%G' conversions.

   The `%f' conversion prints its argument in fixed-point notation,
producing output of the form [`-']DDD`.'DDD, where the number of digits
following the decimal point is controlled by the precision you specify.

   The `%e' conversion prints its argument in exponential notation,
producing output of the form [`-']D`.'DDD`e'[`+'|`-']DD.  Again, the
number of digits following the decimal point is controlled by the
precision.  The exponent always contains at least two digits.  The `%E'
conversion is similar but the exponent is marked with the letter `E'
instead of `e'.

   The `%g' and `%G' conversions print the argument in the style of
`%e' or `%E' (respectively) if the exponent would be less than -4 or
greater than or equal to the precision; otherwise they use the `%f'
style.  A precision of `0', is taken as 1. is Trailing zeros are
removed from the fractional portion of the result and a decimal-point
character appears only if it is followed by a digit.

   The `%a' and `%A' conversions are meant for representing
floating-point numbers exactly in textual form so that they can be
exchanged as texts between different programs and/or machines.  The
numbers are represented is the form [`-']`0x'H`.'HHH`p'[`+'|`-']DD.  At
the left of the decimal-point character exactly one digit is print.
This character is only `0' if the number is denormalized.  Otherwise
the value is unspecified; it is implementation dependent how many bits
are used.  The number of hexadecimal digits on the right side of the
decimal-point character is equal to the precision.  If the precision is
zero it is determined to be large enough to provide an exact
representation of the number (or it is large enough to distinguish two
adjacent values if the `FLT_RADIX' is not a power of 2, *note Floating
Point Parameters::).  For the `%a' conversion lower-case characters are
used to represent the hexadecimal number and the prefix and exponent
sign are printed as `0x' and `p' respectively.  Otherwise upper-case
characters are used and `0X' and `P' are used for the representation of
prefix and exponent string.  The exponent to the base of two is printed
as a decimal number using at least one digit but at most as many digits
as necessary to represent the value exactly.

   If the value to be printed represents infinity or a NaN, the output
is [`-']`inf' or `nan' respectively if the conversion specifier is
`%a', `%e', `%f', or `%g' and it is [`-']`INF' or `NAN' respectively if
the conversion is `%A', `%E', or `%G'.

   The following flags can be used to modify the behavior:

`-'
     Left-justify the result in the field.  Normally the result is
     right-justified.

`+'
     Always include a plus or minus sign in the result.

` '
     If the result doesn't start with a plus or minus sign, prefix it
     with a space instead.  Since the `+' flag ensures that the result
     includes a sign, this flag is ignored if you supply both of them.

`#'
     Specifies that the result should always include a decimal point,
     even if no digits follow it.  For the `%g' and `%G' conversions,
     this also forces trailing zeros after the decimal point to be left
     in place where they would otherwise be removed.

`''
     Separate the digits of the integer part of the result into groups
     as specified by the locale specified for the `LC_NUMERIC' category;
     *note General Numeric::.  This flag is a GNU extension.

`0'
     Pad the field with zeros instead of spaces; the zeros are placed
     after any sign.  This flag is ignored if the `-' flag is also
     specified.

   The precision specifies how many digits follow the decimal-point
character for the `%f', `%e', and `%E' conversions.  For these
conversions, the default precision is `6'.  If the precision is
explicitly `0', this suppresses the decimal point character entirely.
For the `%g' and `%G' conversions, the precision specifies how many
significant digits to print.  Significant digits are the first digit
before the decimal point, and all the digits after it.  If the
precision is `0' or not specified for `%g' or `%G', it is treated like
a value of `1'.  If the value being printed cannot be expressed
accurately in the specified number of digits, the value is rounded to
the nearest number that fits.

   Without a type modifier, the floating-point conversions use an
argument of type `double'.  (By the default argument promotions, any
`float' arguments are automatically converted to `double'.)  The
following type modifier is supported:

`L'
     An uppercase `L' specifies that the argument is a `long double'.

   Here are some examples showing how numbers print using the various
floating-point conversions.  All of the numbers were printed using this
template string:

     "|%13.4a|%13.4f|%13.4e|%13.4g|\n"

   Here is the output:

     |  0x0.0000p+0|       0.0000|   0.0000e+00|            0|
     |  0x1.0000p-1|       0.5000|   5.0000e-01|          0.5|
     |  0x1.0000p+0|       1.0000|   1.0000e+00|            1|
     | -0x1.0000p+0|      -1.0000|  -1.0000e+00|           -1|
     |  0x1.9000p+6|     100.0000|   1.0000e+02|          100|
     |  0x1.f400p+9|    1000.0000|   1.0000e+03|         1000|
     | 0x1.3880p+13|   10000.0000|   1.0000e+04|        1e+04|
     | 0x1.81c8p+13|   12345.0000|   1.2345e+04|    1.234e+04|
     | 0x1.86a0p+16|  100000.0000|   1.0000e+05|        1e+05|
     | 0x1.e240p+16|  123456.0000|   1.2346e+05|    1.235e+05|

   Notice how the `%g' conversion drops trailing zeros.


File: libc.info,  Node: Other Output Conversions,  Next: Formatted Output Functions,  Prev: Floating-Point Conversions,  Up: Formatted Output

Other Output Conversions
------------------------

This section describes miscellaneous conversions for `printf'.

   The `%c' conversion prints a single character.  In case there is no
`l' modifier the `int' argument is first converted to an `unsigned
char'.  Then, if used in a wide stream function, the character is
converted into the corresponding wide character.  The `-' flag can be
used to specify left-justification in the field, but no other flags are
defined, and no precision or type modifier can be given.  For example:

     printf ("%c%c%c%c%c", 'h', 'e', 'l', 'l', 'o');

prints `hello'.

   If there is a `l' modifier present the argument is expected to be of
type `wint_t'.  If used in a multibyte function the wide character is
converted into a multibyte character before being added to the output.
In this case more than one output byte can be produced.

   The `%s' conversion prints a string.  If no `l' modifier is present
the corresponding argument must be of type `char *' (or `const char
*').  If used in a wide stream function the string is first converted
in a wide character string.  A precision can be specified to indicate
the maximum number of characters to write; otherwise characters in the
string up to but not including the terminating null character are
written to the output stream.  The `-' flag can be used to specify
left-justification in the field, but no other flags or type modifiers
are defined for this conversion.  For example:

     printf ("%3s%-6s", "no", "where");

prints ` nowhere '.

   If there is a `l' modifier present the argument is expected to be of
type `wchar_t' (or `const wchar_t *').

   If you accidentally pass a null pointer as the argument for a `%s'
conversion, the GNU library prints it as `(null)'.  We think this is
more useful than crashing.  But it's not good practice to pass a null
argument intentionally.

   The `%m' conversion prints the string corresponding to the error
code in `errno'.  *Note Error Messages::.  Thus:

     fprintf (stderr, "can't open `%s': %m\n", filename);

is equivalent to:

     fprintf (stderr, "can't open `%s': %s\n", filename, strerror (errno));

The `%m' conversion is a GNU C library extension.

   The `%p' conversion prints a pointer value.  The corresponding
argument must be of type `void *'.  In practice, you can use any type
of pointer.

   In the GNU system, non-null pointers are printed as unsigned
integers, as if a `%#x' conversion were used.  Null pointers print as
`(nil)'.  (Pointers might print differently in other systems.)

   For example:

     printf ("%p", "testing");

prints `0x' followed by a hexadecimal number--the address of the string
constant `"testing"'.  It does not print the word `testing'.

   You can supply the `-' flag with the `%p' conversion to specify
left-justification, but no other flags, precision, or type modifiers
are defined.

   The `%n' conversion is unlike any of the other output conversions.
It uses an argument which must be a pointer to an `int', but instead of
printing anything it stores the number of characters printed so far by
this call at that location.  The `h' and `l' type modifiers are
permitted to specify that the argument is of type `short int *' or
`long int *' instead of `int *', but no flags, field width, or
precision are permitted.

   For example,

     int nchar;
     printf ("%d %s%n\n", 3, "bears", &nchar);

prints:

     3 bears

and sets `nchar' to `7', because `3 bears' is seven characters.

   The `%%' conversion prints a literal `%' character.  This conversion
doesn't use an argument, and no flags, field width, precision, or type
modifiers are permitted.


File: libc.info,  Node: Formatted Output Functions,  Next: Dynamic Output,  Prev: Other Output Conversions,  Up: Formatted Output

Formatted Output Functions
--------------------------

This section describes how to call `printf' and related functions.
Prototypes for these functions are in the header file `stdio.h'.
Because these functions take a variable number of arguments, you _must_
declare prototypes for them before using them.  Of course, the easiest
way to make sure you have all the right prototypes is to just include
`stdio.h'.

 - Function: int printf (const char *TEMPLATE, ...)
     The `printf' function prints the optional arguments under the
     control of the template string TEMPLATE to the stream `stdout'.
     It returns the number of characters printed, or a negative value
     if there was an output error.

 - Function: int wprintf (const wchar_t *TEMPLATE, ...)
     The `wprintf' function prints the optional arguments under the
     control of the wide template string TEMPLATE to the stream
     `stdout'.  It returns the number of wide characters printed, or a
     negative value if there was an output error.

 - Function: int fprintf (FILE *STREAM, const char *TEMPLATE, ...)
     This function is just like `printf', except that the output is
     written to the stream STREAM instead of `stdout'.

 - Function: int fwprintf (FILE *STREAM, const wchar_t *TEMPLATE, ...)
     This function is just like `wprintf', except that the output is
     written to the stream STREAM instead of `stdout'.

 - Function: int sprintf (char *S, const char *TEMPLATE, ...)
     This is like `printf', except that the output is stored in the
     character array S instead of written to a stream.  A null
     character is written to mark the end of the string.

     The `sprintf' function returns the number of characters stored in
     the array S, not including the terminating null character.

     The behavior of this function is undefined if copying takes place
     between objects that overlap--for example, if S is also given as
     an argument to be printed under control of the `%s' conversion.
     *Note Copying and Concatenation::.

     *Warning:* The `sprintf' function can be *dangerous* because it
     can potentially output more characters than can fit in the
     allocation size of the string S.  Remember that the field width
     given in a conversion specification is only a _minimum_ value.

     To avoid this problem, you can use `snprintf' or `asprintf',
     described below.

 - Function: int swprintf (wchar_t *S, size_t SIZE, const wchar_t
          *TEMPLATE, ...)
     This is like `wprintf', except that the output is stored in the
     wide character array WS instead of written to a stream.  A null
     wide character is written to mark the end of the string.  The SIZE
     argument specifies the maximum number of characters to produce.
     The trailing null character is counted towards this limit, so you
     should allocate at least SIZE wide characters for the string WS.

     The return value is the number of characters generated for the
     given input, excluding the trailing null.  If not all output fits
     into the provided buffer a negative value is returned.  You should
     try again with a bigger output string.  _Note:_ this is different
     from how `snprintf' handles this situation.

     Note that the corresponding narrow stream function takes fewer
     parameters.  `swprintf' in fact corresponds to the `snprintf'
     function.  Since the `sprintf' function can be dangerous and should
     be avoided the ISO C committee refused to make the same mistake
     again and decided to not define an function exactly corresponding
     to `sprintf'.

 - Function: int snprintf (char *S, size_t SIZE, const char *TEMPLATE,
          ...)
     The `snprintf' function is similar to `sprintf', except that the
     SIZE argument specifies the maximum number of characters to
     produce.  The trailing null character is counted towards this
     limit, so you should allocate at least SIZE characters for the
     string S.

     The return value is the number of characters which would be
     generated for the given input, excluding the trailing null.  If
     this value is greater or equal to SIZE, not all characters from
     the result have been stored in S.  You should try again with a
     bigger output string.  Here is an example of doing this:

          /* Construct a message describing the value of a variable
             whose name is NAME and whose value is VALUE. */
          char *
          make_message (char *name, char *value)
          {
            /* Guess we need no more than 100 chars of space. */
            int size = 100;
            char *buffer = (char *) xmalloc (size);
            int nchars;
            if (buffer == NULL)
              return NULL;
          
           /* Try to print in the allocated space. */
            nchars = snprintf (buffer, size, "value of %s is %s",
                               name, value);
            if (nchars >= size)
              {
                /* Reallocate buffer now that we know
                   how much space is needed. */
                buffer = (char *) xrealloc (buffer, nchars + 1);
          
                if (buffer != NULL)
                  /* Try again. */
                  snprintf (buffer, size, "value of %s is %s",
                            name, value);
              }
            /* The last call worked, return the string. */
            return buffer;
          }

     In practice, it is often easier just to use `asprintf', below.

     *Attention:* In versions of the GNU C library prior to 2.1 the
     return value is the number of characters stored, not including the
     terminating null; unless there was not enough space in S to store
     the result in which case `-1' is returned.  This was changed in
     order to comply with the ISO C99 standard.


File: libc.info,  Node: Dynamic Output,  Next: Variable Arguments Output,  Prev: Formatted Output Functions,  Up: Formatted Output

Dynamically Allocating Formatted Output
---------------------------------------

The functions in this section do formatted output and place the results
in dynamically allocated memory.

 - Function: int asprintf (char **PTR, const char *TEMPLATE, ...)
     This function is similar to `sprintf', except that it dynamically
     allocates a string (as with `malloc'; *note Unconstrained
     Allocation::) to hold the output, instead of putting the output in
     a buffer you allocate in advance.  The PTR argument should be the
     address of a `char *' object, and `asprintf' stores a pointer to
     the newly allocated string at that location.

     The return value is the number of characters allocated for the
     buffer, or less than zero if an error occurred. Usually this means
     that the buffer could not be allocated.

     Here is how to use `asprintf' to get the same result as the
     `snprintf' example, but more easily:

          /* Construct a message describing the value of a variable
             whose name is NAME and whose value is VALUE. */
          char *
          make_message (char *name, char *value)
          {
            char *result;
            if (asprintf (&result, "value of %s is %s", name, value) < 0)
              return NULL;
            return result;
          }

 - Function: int obstack_printf (struct obstack *OBSTACK, const char
          *TEMPLATE, ...)
     This function is similar to `asprintf', except that it uses the
     obstack OBSTACK to allocate the space.  *Note Obstacks::.

     The characters are written onto the end of the current object.  To
     get at them, you must finish the object with `obstack_finish'
     (*note Growing Objects::).


File: libc.info,  Node: Variable Arguments Output,  Next: Parsing a Template String,  Prev: Dynamic Output,  Up: Formatted Output

Variable Arguments Output Functions
-----------------------------------

The functions `vprintf' and friends are provided so that you can define
your own variadic `printf'-like functions that make use of the same
internals as the built-in formatted output functions.

   The most natural way to define such functions would be to use a
language construct to say, "Call `printf' and pass this template plus
all of my arguments after the first five."  But there is no way to do
this in C, and it would be hard to provide a way, since at the C
language level there is no way to tell how many arguments your function
received.

   Since that method is impossible, we provide alternative functions,
the `vprintf' series, which lets you pass a `va_list' to describe "all
of my arguments after the first five."

   When it is sufficient to define a macro rather than a real function,
the GNU C compiler provides a way to do this much more easily with
macros.  For example:

     #define myprintf(a, b, c, d, e, rest...) \
                 printf (mytemplate , ## rest)

*Note Macros with Variable Numbers of Arguments: (gcc.info)Macro
Varargs, for details.  But this is limited to macros, and does not
apply to real functions at all.

   Before calling `vprintf' or the other functions listed in this
section, you _must_ call `va_start' (*note Variadic Functions::) to
initialize a pointer to the variable arguments.  Then you can call
`va_arg' to fetch the arguments that you want to handle yourself.  This
advances the pointer past those arguments.

   Once your `va_list' pointer is pointing at the argument of your
choice, you are ready to call `vprintf'.  That argument and all
subsequent arguments that were passed to your function are used by
`vprintf' along with the template that you specified separately.

   In some other systems, the `va_list' pointer may become invalid
after the call to `vprintf', so you must not use `va_arg' after you
call `vprintf'.  Instead, you should call `va_end' to retire the
pointer from service.  However, you can safely call `va_start' on
another pointer variable and begin fetching the arguments again through
that pointer.  Calling `vprintf' does not destroy the argument list of
your function, merely the particular pointer that you passed to it.

   GNU C does not have such restrictions.  You can safely continue to
fetch arguments from a `va_list' pointer after passing it to `vprintf',
and `va_end' is a no-op.  (Note, however, that subsequent `va_arg'
calls will fetch the same arguments which `vprintf' previously used.)

   Prototypes for these functions are declared in `stdio.h'.

 - Function: int vprintf (const char *TEMPLATE, va_list AP)
     This function is similar to `printf' except that, instead of taking
     a variable number of arguments directly, it takes an argument list
     pointer AP.

 - Function: int vwprintf (const wchar_t *TEMPLATE, va_list AP)
     This function is similar to `wprintf' except that, instead of
     taking a variable number of arguments directly, it takes an
     argument list pointer AP.

 - Function: int vfprintf (FILE *STREAM, const char *TEMPLATE, va_list
          AP)
     This is the equivalent of `fprintf' with the variable argument list
     specified directly as for `vprintf'.

 - Function: int vfwprintf (FILE *STREAM, const wchar_t *TEMPLATE,
          va_list AP)
     This is the equivalent of `fwprintf' with the variable argument
     list specified directly as for `vwprintf'.

 - Function: int vsprintf (char *S, const char *TEMPLATE, va_list AP)
     This is the equivalent of `sprintf' with the variable argument list
     specified directly as for `vprintf'.

 - Function: int vswprintf (wchar_t *S, size_t SIZE, const wchar_t
          *TEMPLATE, va_list AP)
     This is the equivalent of `swprintf' with the variable argument
     list specified directly as for `vwprintf'.

 - Function: int vsnprintf (char *S, size_t SIZE, const char *TEMPLATE,
          va_list AP)
     This is the equivalent of `snprintf' with the variable argument
     list specified directly as for `vprintf'.

 - Function: int vasprintf (char **PTR, const char *TEMPLATE, va_list
          AP)
     The `vasprintf' function is the equivalent of `asprintf' with the
     variable argument list specified directly as for `vprintf'.

 - Function: int obstack_vprintf (struct obstack *OBSTACK, const char
          *TEMPLATE, va_list AP)
     The `obstack_vprintf' function is the equivalent of
     `obstack_printf' with the variable argument list specified directly
     as for `vprintf'.

   Here's an example showing how you might use `vfprintf'.  This is a
function that prints error messages to the stream `stderr', along with
a prefix indicating the name of the program (*note Error Messages::,
for a description of `program_invocation_short_name').

     #include <stdio.h>
     #include <stdarg.h>
     
     void
     eprintf (const char *template, ...)
     {
       va_list ap;
       extern char *program_invocation_short_name;
     
       fprintf (stderr, "%s: ", program_invocation_short_name);
       va_start (ap, template);
       vfprintf (stderr, template, ap);
       va_end (ap);
     }

You could call `eprintf' like this:

     eprintf ("file `%s' does not exist\n", filename);

   In GNU C, there is a special construct you can use to let the
compiler know that a function uses a `printf'-style format string.
Then it can check the number and types of arguments in each call to the
function, and warn you when they do not match the format string.  For
example, take this declaration of `eprintf':

     void eprintf (const char *template, ...)
             __attribute__ ((format (printf, 1, 2)));

This tells the compiler that `eprintf' uses a format string like
`printf' (as opposed to `scanf'; *note Formatted Input::); the format
string appears as the first argument; and the arguments to satisfy the
format begin with the second.  *Note Declaring Attributes of Functions:
(gcc.info)Function Attributes, for more information.


File: libc.info,  Node: Parsing a Template String,  Next: Example of Parsing,  Prev: Variable Arguments Output,  Up: Formatted Output

Parsing a Template String
-------------------------

You can use the function `parse_printf_format' to obtain information
about the number and types of arguments that are expected by a given
template string.  This function permits interpreters that provide
interfaces to `printf' to avoid passing along invalid arguments from
the user's program, which could cause a crash.

   All the symbols described in this section are declared in the header
file `printf.h'.

 - Function: size_t parse_printf_format (const char *TEMPLATE, size_t
          N, int *ARGTYPES)
     This function returns information about the number and types of
     arguments expected by the `printf' template string TEMPLATE.  The
     information is stored in the array ARGTYPES; each element of this
     array describes one argument.  This information is encoded using
     the various `PA_' macros, listed below.

     The argument N specifies the number of elements in the array
     ARGTYPES.  This is the maximum number of elements that
     `parse_printf_format' will try to write.

     `parse_printf_format' returns the total number of arguments
     required by TEMPLATE.  If this number is greater than N, then the
     information returned describes only the first N arguments.  If you
     want information about additional arguments, allocate a bigger
     array and call `parse_printf_format' again.

   The argument types are encoded as a combination of a basic type and
modifier flag bits.

 - Macro: int PA_FLAG_MASK
     This macro is a bitmask for the type modifier flag bits.  You can
     write the expression `(argtypes[i] & PA_FLAG_MASK)' to extract
     just the flag bits for an argument, or `(argtypes[i] &
     ~PA_FLAG_MASK)' to extract just the basic type code.

   Here are symbolic constants that represent the basic types; they
stand for integer values.

`PA_INT'
     This specifies that the base type is `int'.

`PA_CHAR'
     This specifies that the base type is `int', cast to `char'.

`PA_STRING'
     This specifies that the base type is `char *', a null-terminated
     string.

`PA_POINTER'
     This specifies that the base type is `void *', an arbitrary
     pointer.

`PA_FLOAT'
     This specifies that the base type is `float'.

`PA_DOUBLE'
     This specifies that the base type is `double'.

`PA_LAST'
     You can define additional base types for your own programs as
     offsets from `PA_LAST'.  For example, if you have data types `foo'
     and `bar' with their own specialized `printf' conversions, you
     could define encodings for these types as:

          #define PA_FOO  PA_LAST
          #define PA_BAR  (PA_LAST + 1)

   Here are the flag bits that modify a basic type.  They are combined
with the code for the basic type using inclusive-or.

`PA_FLAG_PTR'
     If this bit is set, it indicates that the encoded type is a
     pointer to the base type, rather than an immediate value.  For
     example, `PA_INT|PA_FLAG_PTR' represents the type `int *'.

`PA_FLAG_SHORT'
     If this bit is set, it indicates that the base type is modified
     with `short'.  (This corresponds to the `h' type modifier.)

`PA_FLAG_LONG'
     If this bit is set, it indicates that the base type is modified
     with `long'.  (This corresponds to the `l' type modifier.)

`PA_FLAG_LONG_LONG'
     If this bit is set, it indicates that the base type is modified
     with `long long'.  (This corresponds to the `L' type modifier.)

`PA_FLAG_LONG_DOUBLE'
     This is a synonym for `PA_FLAG_LONG_LONG', used by convention with
     a base type of `PA_DOUBLE' to indicate a type of `long double'.

   For an example of using these facilities, see *Note Example of
Parsing::.


File: libc.info,  Node: Example of Parsing,  Prev: Parsing a Template String,  Up: Formatted Output

Example of Parsing a Template String
------------------------------------

Here is an example of decoding argument types for a format string.  We
assume this is part of an interpreter which contains arguments of type
`NUMBER', `CHAR', `STRING' and `STRUCTURE' (and perhaps others which
are not valid here).

     /* Test whether the NARGS specified objects
        in the vector ARGS are valid
        for the format string FORMAT:
        if so, return 1.
        If not, return 0 after printing an error message.  */
     
     int
     validate_args (char *format, int nargs, OBJECT *args)
     {
       int *argtypes;
       int nwanted;
     
       /* Get the information about the arguments.
          Each conversion specification must be at least two characters
          long, so there cannot be more specifications than half the
          length of the string.  */
     
       argtypes = (int *) alloca (strlen (format) / 2 * sizeof (int));
       nwanted = parse_printf_format (string, nelts, argtypes);
     
       /* Check the number of arguments.  */
       if (nwanted > nargs)
         {
           error ("too few arguments (at least %d required)", nwanted);
           return 0;
         }
     
       /* Check the C type wanted for each argument
          and see if the object given is suitable.  */
       for (i = 0; i < nwanted; i++)
         {
           int wanted;
     
           if (argtypes[i] & PA_FLAG_PTR)
             wanted = STRUCTURE;
           else
             switch (argtypes[i] & ~PA_FLAG_MASK)
               {
               case PA_INT:
               case PA_FLOAT:
               case PA_DOUBLE:
                 wanted = NUMBER;
                 break;
               case PA_CHAR:
                 wanted = CHAR;
                 break;
               case PA_STRING:
                 wanted = STRING;
                 break;
               case PA_POINTER:
                 wanted = STRUCTURE;
                 break;
               }
           if (TYPE (args[i]) != wanted)
             {
               error ("type mismatch for arg number %d", i);
               return 0;
             }
         }
       return 1;
     }


File: libc.info,  Node: Customizing Printf,  Next: Formatted Input,  Prev: Formatted Output,  Up: I/O on Streams

Customizing `printf'
====================

The GNU C library lets you define your own custom conversion specifiers
for `printf' template strings, to teach `printf' clever ways to print
the important data structures of your program.

   The way you do this is by registering the conversion with the
function `register_printf_function'; see *Note Registering New
Conversions::.  One of the arguments you pass to this function is a
pointer to a handler function that produces the actual output; see
*Note Defining the Output Handler::, for information on how to write
this function.

   You can also install a function that just returns information about
the number and type of arguments expected by the conversion specifier.
*Note Parsing a Template String::, for information about this.

   The facilities of this section are declared in the header file
`printf.h'.

* Menu:

* Registering New Conversions::         Using `register_printf_function'
                                         to register a new output conversion.
* Conversion Specifier Options::        The handler must be able to get
                                         the options specified in the
                                         template when it is called.
* Defining the Output Handler::         Defining the handler and arginfo
                                         functions that are passed as arguments
                                         to `register_printf_function'.
* Printf Extension Example::            How to define a `printf'
                                         handler function.
* Predefined Printf Handlers::          Predefined `printf' handlers.

   *Portability Note:* The ability to extend the syntax of `printf'
template strings is a GNU extension.  ISO standard C has nothing
similar.


File: libc.info,  Node: Registering New Conversions,  Next: Conversion Specifier Options,  Up: Customizing Printf

Registering New Conversions
---------------------------

The function to register a new output conversion is
`register_printf_function', declared in `printf.h'.

 - Function: int register_printf_function (int SPEC, printf_function
          HANDLER-FUNCTION, printf_arginfo_function ARGINFO-FUNCTION)
     This function defines the conversion specifier character SPEC.
     Thus, if SPEC is `'Y'', it defines the conversion `%Y'.  You can
     redefine the built-in conversions like `%s', but flag characters
     like `#' and type modifiers like `l' can never be used as
     conversions; calling `register_printf_function' for those
     characters has no effect.  It is advisable not to use lowercase
     letters, since the ISO C standard warns that additional lowercase
     letters may be standardized in future editions of the standard.

     The HANDLER-FUNCTION is the function called by `printf' and
     friends when this conversion appears in a template string.  *Note
     Defining the Output Handler::, for information about how to define
     a function to pass as this argument.  If you specify a null
     pointer, any existing handler function for SPEC is removed.

     The ARGINFO-FUNCTION is the function called by
     `parse_printf_format' when this conversion appears in a template
     string.  *Note Parsing a Template String::, for information about
     this.

     *Attention:* In the GNU C library versions before 2.0 the
     ARGINFO-FUNCTION function did not need to be installed unless the
     user used the `parse_printf_format' function.  This has changed.
     Now a call to any of the `printf' functions will call this
     function when this format specifier appears in the format string.

     The return value is `0' on success, and `-1' on failure (which
     occurs if SPEC is out of range).

     You can redefine the standard output conversions, but this is
     probably not a good idea because of the potential for confusion.
     Library routines written by other people could break if you do
     this.


File: libc.info,  Node: Conversion Specifier Options,  Next: Defining the Output Handler,  Prev: Registering New Conversions,  Up: Customizing Printf

Conversion Specifier Options
----------------------------

If you define a meaning for `%A', what if the template contains `%+23A'
or `%-#A'?  To implement a sensible meaning for these, the handler when
called needs to be able to get the options specified in the template.

   Both the HANDLER-FUNCTION and ARGINFO-FUNCTION accept an argument
that points to a `struct printf_info', which contains information about
the options appearing in an instance of the conversion specifier.  This
data type is declared in the header file `printf.h'.

 - Type: struct printf_info
     This structure is used to pass information about the options
     appearing in an instance of a conversion specifier in a `printf'
     template string to the handler and arginfo functions for that
     specifier.  It contains the following members:

    `int prec'
          This is the precision specified.  The value is `-1' if no
          precision was specified.  If the precision was given as `*',
          the `printf_info' structure passed to the handler function
          contains the actual value retrieved from the argument list.
          But the structure passed to the arginfo function contains a
          value of `INT_MIN', since the actual value is not known.

    `int width'
          This is the minimum field width specified.  The value is `0'
          if no width was specified.  If the field width was given as
          `*', the `printf_info' structure passed to the handler
          function contains the actual value retrieved from the
          argument list.  But the structure passed to the arginfo
          function contains a value of `INT_MIN', since the actual
          value is not known.

    `wchar_t spec'
          This is the conversion specifier character specified.  It's
          stored in the structure so that you can register the same
          handler function for multiple characters, but still have a
          way to tell them apart when the handler function is called.

    `unsigned int is_long_double'
          This is a boolean that is true if the `L', `ll', or `q' type
          modifier was specified.  For integer conversions, this
          indicates `long long int', as opposed to `long double' for
          floating point conversions.

    `unsigned int is_char'
          This is a boolean that is true if the `hh' type modifier was
          specified.

    `unsigned int is_short'
          This is a boolean that is true if the `h' type modifier was
          specified.

    `unsigned int is_long'
          This is a boolean that is true if the `l' type modifier was
          specified.

    `unsigned int alt'
          This is a boolean that is true if the `#' flag was specified.

    `unsigned int space'
          This is a boolean that is true if the ` ' flag was specified.

    `unsigned int left'
          This is a boolean that is true if the `-' flag was specified.

    `unsigned int showsign'
          This is a boolean that is true if the `+' flag was specified.

    `unsigned int group'
          This is a boolean that is true if the `'' flag was specified.

    `unsigned int extra'
          This flag has a special meaning depending on the context.  It
          could be used freely by the user-defined handlers but when
          called from the `printf' function this variable always
          contains the value `0'.

    `unsigned int wide'
          This flag is set if the stream is wide oriented.

    `wchar_t pad'
          This is the character to use for padding the output to the
          minimum field width.  The value is `'0'' if the `0' flag was
          specified, and `' '' otherwise.


File: libc.info,  Node: Defining the Output Handler,  Next: Printf Extension Example,  Prev: Conversion Specifier Options,  Up: Customizing Printf

Defining the Output Handler
---------------------------

Now let's look at how to define the handler and arginfo functions which
are passed as arguments to `register_printf_function'.

   *Compatibility Note:* The interface changed in GNU libc version 2.0.
Previously the third argument was of type `va_list *'.

   You should define your handler functions with a prototype like:

     int FUNCTION (FILE *stream, const struct printf_info *info,
                         const void *const *args)

   The STREAM argument passed to the handler function is the stream to
which it should write output.

   The INFO argument is a pointer to a structure that contains
information about the various options that were included with the
conversion in the template string.  You should not modify this structure
inside your handler function.  *Note Conversion Specifier Options::, for
a description of this data structure.

   The ARGS is a vector of pointers to the arguments data.  The number
of arguments was determined by calling the argument information
function provided by the user.

   Your handler function should return a value just like `printf' does:
it should return the number of characters it has written, or a negative
value to indicate an error.

 - Data Type: printf_function
     This is the data type that a handler function should have.

   If you are going to use `parse_printf_format' in your application,
you must also define a function to pass as the ARGINFO-FUNCTION
argument for each new conversion you install with
`register_printf_function'.

   You have to define these functions with a prototype like:

     int FUNCTION (const struct printf_info *info,
                         size_t n, int *argtypes)

   The return value from the function should be the number of arguments
the conversion expects.  The function should also fill in no more than
N elements of the ARGTYPES array with information about the types of
each of these arguments.  This information is encoded using the various
`PA_' macros.  (You will notice that this is the same calling
convention `parse_printf_format' itself uses.)

 - Data Type: printf_arginfo_function
     This type is used to describe functions that return information
     about the number and type of arguments used by a conversion
     specifier.


File: libc.info,  Node: Printf Extension Example,  Next: Predefined Printf Handlers,  Prev: Defining the Output Handler,  Up: Customizing Printf

`printf' Extension Example
--------------------------

Here is an example showing how to define a `printf' handler function.
This program defines a data structure called a `Widget' and defines the
`%W' conversion to print information about `Widget *' arguments,
including the pointer value and the name stored in the data structure.
The `%W' conversion supports the minimum field width and
left-justification options, but ignores everything else.

     #include <stdio.h>
     #include <stdlib.h>
     #include <printf.h>
     
     typedef struct
     {
       char *name;
     }
     Widget;
     
     int
     print_widget (FILE *stream,
                   const struct printf_info *info,
                   const void *const *args)
     {
       const Widget *w;
       char *buffer;
       int len;
     
       /* Format the output into a string. */
       w = *((const Widget **) (args[0]));
       len = asprintf (&buffer, "<Widget %p: %s>", w, w->name);
       if (len == -1)
         return -1;
     
       /* Pad to the minimum field width and print to the stream. */
       len = fprintf (stream, "%*s",
                      (info->left ? -info->width : info->width),
                      buffer);
     
       /* Clean up and return. */
       free (buffer);
       return len;
     }
     
     
     int
     print_widget_arginfo (const struct printf_info *info, size_t n,
                           int *argtypes)
     {
       /* We always take exactly one argument and this is a pointer to the
          structure.. */
       if (n > 0)
         argtypes[0] = PA_POINTER;
       return 1;
     }
     
     
     int
     main (void)
     {
       /* Make a widget to print. */
       Widget mywidget;
       mywidget.name = "mywidget";
     
       /* Register the print function for widgets. */
       register_printf_function ('W', print_widget, print_widget_arginfo);
     
       /* Now print the widget. */
       printf ("|%W|\n", &mywidget);
       printf ("|%35W|\n", &mywidget);
       printf ("|%-35W|\n", &mywidget);
     
       return 0;
     }

   The output produced by this program looks like:

     |<Widget 0xffeffb7c: mywidget>|
     |      <Widget 0xffeffb7c: mywidget>|
     |<Widget 0xffeffb7c: mywidget>      |


File: libc.info,  Node: Predefined Printf Handlers,  Prev: Printf Extension Example,  Up: Customizing Printf

Predefined `printf' Handlers
----------------------------

The GNU libc also contains a concrete and useful application of the
`printf' handler extension.  There are two functions available which
implement a special way to print floating-point numbers.

 - Function: int printf_size (FILE *FP, const struct printf_info *INFO,
          const void *const *ARGS)
     Print a given floating point number as for the format `%f' except
     that there is a postfix character indicating the divisor for the
     number to make this less than 1000.  There are two possible
     divisors: powers of 1024 or powers of 1000.  Which one is used
     depends on the format character specified while registered this
     handler.  If the character is of lower case, 1024 is used.  For
     upper case characters, 1000 is used.

     The postfix tag corresponds to bytes, kilobytes, megabytes,
     gigabytes, etc.  The full table is:

     +------+--------------+--------+--------+---------------+
     |low|Multiplier|From|Upper|Multiplier|
     +------+--------------+--------+--------+---------------+
     |' '|1||' '|1|
     +------+--------------+--------+--------+---------------+
     |k|2^10 (1024)|kilo|K|10^3 (1000)|
     +------+--------------+--------+--------+---------------+
     |m|2^20|mega|M|10^6|
     +------+--------------+--------+--------+---------------+
     |g|2^30|giga|G|10^9|
     +------+--------------+--------+--------+---------------+
     |t|2^40|tera|T|10^12|
     +------+--------------+--------+--------+---------------+
     |p|2^50|peta|P|10^15|
     +------+--------------+--------+--------+---------------+
     |e|2^60|exa|E|10^18|
     +------+--------------+--------+--------+---------------+
     |z|2^70|zetta|Z|10^21|
     +------+--------------+--------+--------+---------------+
     |y|2^80|yotta|Y|10^24|
     +------+--------------+--------+--------+---------------+

     The default precision is 3, i.e., 1024 is printed with a lower-case
     format character as if it were `%.3fk' and will yield `1.000k'.

   Due to the requirements of `register_printf_function' we must also
provide the function which returns information about the arguments.

 - Function: int printf_size_info (const struct printf_info *INFO,
          size_t N, int *ARGTYPES)
     This function will return in ARGTYPES the information about the
     used parameters in the way the `vfprintf' implementation expects
     it.  The format always takes one argument.

   To use these functions both functions must be registered with a call
like

     register_printf_function ('B', printf_size, printf_size_info);

   Here we register the functions to print numbers as powers of 1000
since the format character `'B'' is an upper-case character.  If we
would additionally use `'b'' in a line like

     register_printf_function ('b', printf_size, printf_size_info);

we could also print using a power of 1024.  Please note that all that is
different in these two lines is the format specifier.  The
`printf_size' function knows about the difference between lower and
upper case format specifiers.

   The use of `'B'' and `'b'' is no coincidence.  Rather it is the
preferred way to use this functionality since it is available on some
other systems which also use format specifiers.


File: libc.info,  Node: Formatted Input,  Next: EOF and Errors,  Prev: Customizing Printf,  Up: I/O on Streams

Formatted Input
===============

The functions described in this section (`scanf' and related functions)
provide facilities for formatted input analogous to the formatted
output facilities.  These functions provide a mechanism for reading
arbitrary values under the control of a "format string" or "template
string".

* Menu:

* Formatted Input Basics::      Some basics to get you started.
* Input Conversion Syntax::     Syntax of conversion specifications.
* Table of Input Conversions::  Summary of input conversions and what they do.
* Numeric Input Conversions::   Details of conversions for reading numbers.
* String Input Conversions::    Details of conversions for reading strings.
* Dynamic String Input::	String conversions that `malloc' the buffer.
* Other Input Conversions::     Details of miscellaneous other conversions.
* Formatted Input Functions::   Descriptions of the actual functions.
* Variable Arguments Input::    `vscanf' and friends.


File: libc.info,  Node: Formatted Input Basics,  Next: Input Conversion Syntax,  Up: Formatted Input

Formatted Input Basics
----------------------

Calls to `scanf' are superficially similar to calls to `printf' in that
arbitrary arguments are read under the control of a template string.
While the syntax of the conversion specifications in the template is
very similar to that for `printf', the interpretation of the template
is oriented more towards free-format input and simple pattern matching,
rather than fixed-field formatting.  For example, most `scanf'
conversions skip over any amount of "white space" (including spaces,
tabs, and newlines) in the input file, and there is no concept of
precision for the numeric input conversions as there is for the
corresponding output conversions.  Ordinarily, non-whitespace
characters in the template are expected to match characters in the
input stream exactly, but a matching failure is distinct from an input
error on the stream.

   Another area of difference between `scanf' and `printf' is that you
must remember to supply pointers rather than immediate values as the
optional arguments to `scanf'; the values that are read are stored in
the objects that the pointers point to.  Even experienced programmers
tend to forget this occasionally, so if your program is getting strange
errors that seem to be related to `scanf', you might want to
double-check this.

   When a "matching failure" occurs, `scanf' returns immediately,
leaving the first non-matching character as the next character to be
read from the stream.  The normal return value from `scanf' is the
number of values that were assigned, so you can use this to determine if
a matching error happened before all the expected values were read.

   The `scanf' function is typically used for things like reading in
the contents of tables.  For example, here is a function that uses
`scanf' to initialize an array of `double':

     void
     readarray (double *array, int n)
     {
       int i;
       for (i=0; i<n; i++)
         if (scanf (" %lf", &(array[i])) != 1)
           invalid_input_error ();
     }

   The formatted input functions are not used as frequently as the
formatted output functions.  Partly, this is because it takes some care
to use them properly.  Another reason is that it is difficult to recover
from a matching error.

   If you are trying to read input that doesn't match a single, fixed
pattern, you may be better off using a tool such as Flex to generate a
lexical scanner, or Bison to generate a parser, rather than using
`scanf'.  For more information about these tools, see *Note Top:
(flex.info)Top, and *Note Top: (bison.info)Top.


File: libc.info,  Node: Input Conversion Syntax,  Next: Table of Input Conversions,  Prev: Formatted Input Basics,  Up: Formatted Input

Input Conversion Syntax
-----------------------

A `scanf' template string is a string that contains ordinary multibyte
characters interspersed with conversion specifications that start with
`%'.

   Any whitespace character (as defined by the `isspace' function;
*note Classification of Characters::) in the template causes any number
of whitespace characters in the input stream to be read and discarded.
The whitespace characters that are matched need not be exactly the same
whitespace characters that appear in the template string.  For example,
write ` , ' in the template to recognize a comma with optional
whitespace before and after.

   Other characters in the template string that are not part of
conversion specifications must match characters in the input stream
exactly; if this is not the case, a matching failure occurs.

   The conversion specifications in a `scanf' template string have the
general form:

     % FLAGS WIDTH TYPE CONVERSION

   In more detail, an input conversion specification consists of an
initial `%' character followed in sequence by:

   * An optional "flag character" `*', which says to ignore the text
     read for this specification.  When `scanf' finds a conversion
     specification that uses this flag, it reads input as directed by
     the rest of the conversion specification, but it discards this
     input, does not use a pointer argument, and does not increment the
     count of successful assignments.

   * An optional flag character `a' (valid with string conversions only)
     which requests allocation of a buffer long enough to store the
     string in.  (This is a GNU extension.)  *Note Dynamic String
     Input::.

   * An optional decimal integer that specifies the "maximum field
     width".  Reading of characters from the input stream stops either
     when this maximum is reached or when a non-matching character is
     found, whichever happens first.  Most conversions discard initial
     whitespace characters (those that don't are explicitly
     documented), and these discarded characters don't count towards
     the maximum field width.  String input conversions store a null
     character to mark the end of the input; the maximum field width
     does not include this terminator.

   * An optional "type modifier character".  For example, you can
     specify a type modifier of `l' with integer conversions such as
     `%d' to specify that the argument is a pointer to a `long int'
     rather than a pointer to an `int'.

   * A character that specifies the conversion to be applied.

   The exact options that are permitted and how they are interpreted
vary between the different conversion specifiers.  See the descriptions
of the individual conversions for information about the particular
options that they allow.

   With the `-Wformat' option, the GNU C compiler checks calls to
`scanf' and related functions.  It examines the format string and
verifies that the correct number and types of arguments are supplied.
There is also a GNU C syntax to tell the compiler that a function you
write uses a `scanf'-style format string.  *Note Declaring Attributes
of Functions: (gcc.info)Function Attributes, for more information.


File: libc.info,  Node: Table of Input Conversions,  Next: Numeric Input Conversions,  Prev: Input Conversion Syntax,  Up: Formatted Input

Table of Input Conversions
--------------------------

Here is a table that summarizes the various conversion specifications:

`%d'
     Matches an optionally signed integer written in decimal.  *Note
     Numeric Input Conversions::.

`%i'
     Matches an optionally signed integer in any of the formats that
     the C language defines for specifying an integer constant.  *Note
     Numeric Input Conversions::.

`%o'
     Matches an unsigned integer written in octal radix.  *Note Numeric
     Input Conversions::.

`%u'
     Matches an unsigned integer written in decimal radix.  *Note
     Numeric Input Conversions::.

`%x', `%X'
     Matches an unsigned integer written in hexadecimal radix.  *Note
     Numeric Input Conversions::.

`%e', `%f', `%g', `%E', `%G'
     Matches an optionally signed floating-point number.  *Note Numeric
     Input Conversions::.

`%s'
     Matches a string containing only non-whitespace characters.  *Note
     String Input Conversions::.  The presence of the `l' modifier
     determines whether the output is stored as a wide character string
     or a multibyte string.  If `%s' is used in a wide character
     function the string is converted as with multiple calls to
     `wcrtomb' into a multibyte string.  This means that the buffer
     must provide room for `MB_CUR_MAX' bytes for each wide character
     read.  In case `%ls' is used in a multibyte function the result is
     converted into wide characters as with multiple calls of `mbrtowc'
     before being stored in the user provided buffer.

`%S'
     This is an alias for `%ls' which is supported for compatibility
     with the Unix standard.

`%['
     Matches a string of characters that belong to a specified set.
     *Note String Input Conversions::.  The presence of the `l' modifier
     determines whether the output is stored as a wide character string
     or a multibyte string.  If `%[' is used in a wide character
     function the string is converted as with multiple calls to
     `wcrtomb' into a multibyte string.  This means that the buffer
     must provide room for `MB_CUR_MAX' bytes for each wide character
     read.  In case `%l[' is used in a multibyte function the result is
     converted into wide characters as with multiple calls of `mbrtowc'
     before being stored in the user provided buffer.

`%c'
     Matches a string of one or more characters; the number of
     characters read is controlled by the maximum field width given for
     the conversion.  *Note String Input Conversions::.

     If the `%c' is used in a wide stream function the read value is
     converted from a wide character to the corresponding multibyte
     character before storing it.  Note that this conversion can
     produce more than one byte of output and therefore the provided
     buffer be large enough for up to `MB_CUR_MAX' bytes for each
     character.  If `%lc' is used in a multibyte function the input is
     treated as a multibyte sequence (and not bytes) and the result is
     converted as with calls to `mbrtowc'.

`%C'
     This is an alias for `%lc' which is supported for compatibility
     with the Unix standard.

`%p'
     Matches a pointer value in the same implementation-defined format
     used by the `%p' output conversion for `printf'.  *Note Other
     Input Conversions::.

`%n'
     This conversion doesn't read any characters; it records the number
     of characters read so far by this call.  *Note Other Input
     Conversions::.

`%%'
     This matches a literal `%' character in the input stream.  No
     corresponding argument is used.  *Note Other Input Conversions::.

   If the syntax of a conversion specification is invalid, the behavior
is undefined.  If there aren't enough function arguments provided to
supply addresses for all the conversion specifications in the template
strings that perform assignments, or if the arguments are not of the
correct types, the behavior is also undefined.  On the other hand, extra
arguments are simply ignored.


File: libc.info,  Node: Numeric Input Conversions,  Next: String Input Conversions,  Prev: Table of Input Conversions,  Up: Formatted Input

Numeric Input Conversions
-------------------------

This section describes the `scanf' conversions for reading numeric
values.

   The `%d' conversion matches an optionally signed integer in decimal
radix.  The syntax that is recognized is the same as that for the
`strtol' function (*note Parsing of Integers::) with the value `10' for
the BASE argument.

   The `%i' conversion matches an optionally signed integer in any of
the formats that the C language defines for specifying an integer
constant.  The syntax that is recognized is the same as that for the
`strtol' function (*note Parsing of Integers::) with the value `0' for
the BASE argument.  (You can print integers in this syntax with
`printf' by using the `#' flag character with the `%x', `%o', or `%d'
conversion.  *Note Integer Conversions::.)

   For example, any of the strings `10', `0xa', or `012' could be read
in as integers under the `%i' conversion.  Each of these specifies a
number with decimal value `10'.

   The `%o', `%u', and `%x' conversions match unsigned integers in
octal, decimal, and hexadecimal radices, respectively.  The syntax that
is recognized is the same as that for the `strtoul' function (*note
Parsing of Integers::) with the appropriate value (`8', `10', or `16')
for the BASE argument.

   The `%X' conversion is identical to the `%x' conversion.  They both
permit either uppercase or lowercase letters to be used as digits.

   The default type of the corresponding argument for the `%d' and `%i'
conversions is `int *', and `unsigned int *' for the other integer
conversions.  You can use the following type modifiers to specify other
sizes of integer:

`hh'
     Specifies that the argument is a `signed char *' or `unsigned char
     *'.

     This modifier was introduced in ISO C99.

`h'
     Specifies that the argument is a `short int *' or `unsigned short
     int *'.

`j'
     Specifies that the argument is a `intmax_t *' or `uintmax_t *'.

     This modifier was introduced in ISO C99.

`l'
     Specifies that the argument is a `long int *' or `unsigned long
     int *'.  Two `l' characters is like the `L' modifier, below.

     If used with `%c' or `%s' the corresponding parameter is
     considered as a pointer to a wide character or wide character
     string respectively.  This use of `l' was introduced in
     Amendment 1 to ISO C90.

`ll'
`L'
`q'
     Specifies that the argument is a `long long int *' or `unsigned
     long long int *'.  (The `long long' type is an extension supported
     by the GNU C compiler.  For systems that don't provide extra-long
     integers, this is the same as `long int'.)

     The `q' modifier is another name for the same thing, which comes
     from 4.4 BSD; a `long long int' is sometimes called a "quad" `int'.

`t'
     Specifies that the argument is a `ptrdiff_t *'.

     This modifier was introduced in ISO C99.

`z'
     Specifies that the argument is a `size_t *'.

     This modifier was introduced in ISO C99.

   All of the `%e', `%f', `%g', `%E', and `%G' input conversions are
interchangeable.  They all match an optionally signed floating point
number, in the same syntax as for the `strtod' function (*note Parsing
of Floats::).

   For the floating-point input conversions, the default argument type
is `float *'.  (This is different from the corresponding output
conversions, where the default type is `double'; remember that `float'
arguments to `printf' are converted to `double' by the default argument
promotions, but `float *' arguments are not promoted to `double *'.)
You can specify other sizes of float using these type modifiers:

`l'
     Specifies that the argument is of type `double *'.

`L'
     Specifies that the argument is of type `long double *'.

   For all the above number parsing formats there is an additional
optional flag `''.  When this flag is given the `scanf' function
expects the number represented in the input string to be formatted
according to the grouping rules of the currently selected locale (*note
General Numeric::).

   If the `"C"' or `"POSIX"' locale is selected there is no difference.
But for a locale which specifies values for the appropriate fields in
the locale the input must have the correct form in the input.
Otherwise the longest prefix with a correct form is processed.


File: libc.info,  Node: String Input Conversions,  Next: Dynamic String Input,  Prev: Numeric Input Conversions,  Up: Formatted Input

String Input Conversions
------------------------

This section describes the `scanf' input conversions for reading string
and character values: `%s', `%S', `%[', `%c', and `%C'.

   You have two options for how to receive the input from these
conversions:

   * Provide a buffer to store it in.  This is the default.  You should
     provide an argument of type `char *' or `wchar_t *' (the latter of
     the `l' modifier is present).

     *Warning:* To make a robust program, you must make sure that the
     input (plus its terminating null) cannot possibly exceed the size
     of the buffer you provide.  In general, the only way to do this is
     to specify a maximum field width one less than the buffer size.
     *If you provide the buffer, always specify a maximum field width
     to prevent overflow.*

   * Ask `scanf' to allocate a big enough buffer, by specifying the `a'
     flag character.  This is a GNU extension.  You should provide an
     argument of type `char **' for the buffer address to be stored in.
     *Note Dynamic String Input::.

   The `%c' conversion is the simplest: it matches a fixed number of
characters, always.  The maximum field width says how many characters to
read; if you don't specify the maximum, the default is 1.  This
conversion doesn't append a null character to the end of the text it
reads.  It also does not skip over initial whitespace characters.  It
reads precisely the next N characters, and fails if it cannot get that
many.  Since there is always a maximum field width with `%c' (whether
specified, or 1 by default), you can always prevent overflow by making
the buffer long enough.

   If the format is `%lc' or `%C' the function stores wide characters
which are converted using the conversion determined at the time the
stream was opened from the external byte stream.  The number of bytes
read from the medium is limited by `MB_CUR_LEN * N' but at most N wide
character get stored in the output string.

   The `%s' conversion matches a string of non-whitespace characters.
It skips and discards initial whitespace, but stops when it encounters
more whitespace after having read something.  It stores a null character
at the end of the text that it reads.

   For example, reading the input:

      hello, world

with the conversion `%10c' produces `" hello, wo"', but reading the
same input with the conversion `%10s' produces `"hello,"'.

   *Warning:* If you do not specify a field width for `%s', then the
number of characters read is limited only by where the next whitespace
character appears.  This almost certainly means that invalid input can
make your program crash--which is a bug.

   The `%ls' and `%S' format are handled just like `%s' except that the
external byte sequence is converted using the conversion associated
with the stream to wide characters with their own encoding.  A width or
precision specified with the format do not directly determine how many
bytes are read from the stream since they measure wide characters.  But
an upper limit can be computed by multiplying the value of the width or
precision by `MB_CUR_MAX'.

   To read in characters that belong to an arbitrary set of your choice,
use the `%[' conversion.  You specify the set between the `[' character
and a following `]' character, using the same syntax used in regular
expressions.  As special cases:

   * A literal `]' character can be specified as the first character of
     the set.

   * An embedded `-' character (that is, one that is not the first or
     last character of the set) is used to specify a range of
     characters.

   * If a caret character `^' immediately follows the initial `[', then
     the set of allowed input characters is the everything _except_ the
     characters listed.

   The `%[' conversion does not skip over initial whitespace characters.

   Here are some examples of `%[' conversions and what they mean:

`%25[1234567890]'
     Matches a string of up to 25 digits.

`%25[][]'
     Matches a string of up to 25 square brackets.

`%25[^ \f\n\r\t\v]'
     Matches a string up to 25 characters long that doesn't contain any
     of the standard whitespace characters.  This is slightly different
     from `%s', because if the input begins with a whitespace character,
     `%[' reports a matching failure while `%s' simply discards the
     initial whitespace.

`%25[a-z]'
     Matches up to 25 lowercase characters.

   As for `%c' and `%s' the `%[' format is also modified to produce
wide characters if the `l' modifier is present.  All what is said about
`%ls' above is true for `%l['.

   One more reminder: the `%s' and `%[' conversions are *dangerous* if
you don't specify a maximum width or use the `a' flag, because input
too long would overflow whatever buffer you have provided for it.  No
matter how long your buffer is, a user could supply input that is
longer.  A well-written program reports invalid input with a
comprehensible error message, not with a crash.


File: libc.info,  Node: Dynamic String Input,  Next: Other Input Conversions,  Prev: String Input Conversions,  Up: Formatted Input

Dynamically Allocating String Conversions
-----------------------------------------

A GNU extension to formatted input lets you safely read a string with no
maximum size.  Using this feature, you don't supply a buffer; instead,
`scanf' allocates a buffer big enough to hold the data and gives you
its address.  To use this feature, write `a' as a flag character, as in
`%as' or `%a[0-9a-z]'.

   The pointer argument you supply for where to store the input should
have type `char **'.  The `scanf' function allocates a buffer and
stores its address in the word that the argument points to.  You should
free the buffer with `free' when you no longer need it.

   Here is an example of using the `a' flag with the `%[...]'
conversion specification to read a "variable assignment" of the form
`VARIABLE = VALUE'.

     {
       char *variable, *value;
     
       if (2 > scanf ("%a[a-zA-Z0-9] = %a[^\n]\n",
                      &variable, &value))
         {
           invalid_input_error ();
           return 0;
         }
     
       ...
     }


File: libc.info,  Node: Other Input Conversions,  Next: Formatted Input Functions,  Prev: Dynamic String Input,  Up: Formatted Input

Other Input Conversions
-----------------------

This section describes the miscellaneous input conversions.

   The `%p' conversion is used to read a pointer value.  It recognizes
the same syntax used by the `%p' output conversion for `printf' (*note
Other Output Conversions::); that is, a hexadecimal number just as the
`%x' conversion accepts.  The corresponding argument should be of type
`void **'; that is, the address of a place to store a pointer.

   The resulting pointer value is not guaranteed to be valid if it was
not originally written during the same program execution that reads it
in.

   The `%n' conversion produces the number of characters read so far by
this call.  The corresponding argument should be of type `int *'.  This
conversion works in the same way as the `%n' conversion for `printf';
see *Note Other Output Conversions::, for an example.

   The `%n' conversion is the only mechanism for determining the
success of literal matches or conversions with suppressed assignments.
If the `%n' follows the locus of a matching failure, then no value is
stored for it since `scanf' returns before processing the `%n'.  If you
store `-1' in that argument slot before calling `scanf', the presence
of `-1' after `scanf' indicates an error occurred before the `%n' was
reached.

   Finally, the `%%' conversion matches a literal `%' character in the
input stream, without using an argument.  This conversion does not
permit any flags, field width, or type modifier to be specified.


File: libc.info,  Node: Formatted Input Functions,  Next: Variable Arguments Input,  Prev: Other Input Conversions,  Up: Formatted Input

Formatted Input Functions
-------------------------

Here are the descriptions of the functions for performing formatted
input.  Prototypes for these functions are in the header file `stdio.h'.

 - Function: int scanf (const char *TEMPLATE, ...)
     The `scanf' function reads formatted input from the stream `stdin'
     under the control of the template string TEMPLATE.  The optional
     arguments are pointers to the places which receive the resulting
     values.

     The return value is normally the number of successful assignments.
     If an end-of-file condition is detected before any matches are
     performed, including matches against whitespace and literal
     characters in the template, then `EOF' is returned.

 - Function: int wscanf (const wchar_t *TEMPLATE, ...)
     The `wscanf' function reads formatted input from the stream
     `stdin' under the control of the template string TEMPLATE.  The
     optional arguments are pointers to the places which receive the
     resulting values.

     The return value is normally the number of successful assignments.
     If an end-of-file condition is detected before any matches are
     performed, including matches against whitespace and literal
     characters in the template, then `WEOF' is returned.

 - Function: int fscanf (FILE *STREAM, const char *TEMPLATE, ...)
     This function is just like `scanf', except that the input is read
     from the stream STREAM instead of `stdin'.

 - Function: int fwscanf (FILE *STREAM, const wchar_t *TEMPLATE, ...)
     This function is just like `wscanf', except that the input is read
     from the stream STREAM instead of `stdin'.

 - Function: int sscanf (const char *S, const char *TEMPLATE, ...)
     This is like `scanf', except that the characters are taken from the
     null-terminated string S instead of from a stream.  Reaching the
     end of the string is treated as an end-of-file condition.

     The behavior of this function is undefined if copying takes place
     between objects that overlap--for example, if S is also given as
     an argument to receive a string read under control of the `%s',
     `%S', or `%[' conversion.

 - Function: int swscanf (const wchar_t *WS, const char *TEMPLATE, ...)
     This is like `wscanf', except that the characters are taken from
     the null-terminated string WS instead of from a stream.  Reaching
     the end of the string is treated as an end-of-file condition.

     The behavior of this function is undefined if copying takes place
     between objects that overlap--for example, if WS is also given as
     an argument to receive a string read under control of the `%s',
     `%S', or `%[' conversion.


File: libc.info,  Node: Variable Arguments Input,  Prev: Formatted Input Functions,  Up: Formatted Input

Variable Arguments Input Functions
----------------------------------

The functions `vscanf' and friends are provided so that you can define
your own variadic `scanf'-like functions that make use of the same
internals as the built-in formatted output functions.  These functions
are analogous to the `vprintf' series of output functions.  *Note
Variable Arguments Output::, for important information on how to use
them.

   *Portability Note:* The functions listed in this section were
introduced in ISO C99 and were before available as GNU extensions.

 - Function: int vscanf (const char *TEMPLATE, va_list AP)
     This function is similar to `scanf', but instead of taking a
     variable number of arguments directly, it takes an argument list
     pointer AP of type `va_list' (*note Variadic Functions::).

 - Function: int vwscanf (const wchar_t *TEMPLATE, va_list AP)
     This function is similar to `wscanf', but instead of taking a
     variable number of arguments directly, it takes an argument list
     pointer AP of type `va_list' (*note Variadic Functions::).

 - Function: int vfscanf (FILE *STREAM, const char *TEMPLATE, va_list
          AP)
     This is the equivalent of `fscanf' with the variable argument list
     specified directly as for `vscanf'.

 - Function: int vfwscanf (FILE *STREAM, const wchar_t *TEMPLATE,
          va_list AP)
     This is the equivalent of `fwscanf' with the variable argument list
     specified directly as for `vwscanf'.

 - Function: int vsscanf (const char *S, const char *TEMPLATE, va_list
          AP)
     This is the equivalent of `sscanf' with the variable argument list
     specified directly as for `vscanf'.

 - Function: int vswscanf (const wchar_t *S, const wchar_t *TEMPLATE,
          va_list AP)
     This is the equivalent of `swscanf' with the variable argument list
     specified directly as for `vwscanf'.

   In GNU C, there is a special construct you can use to let the
compiler know that a function uses a `scanf'-style format string.  Then
it can check the number and types of arguments in each call to the
function, and warn you when they do not match the format string.  For
details, *Note Declaring Attributes of Functions: (gcc.info)Function
Attributes.


File: libc.info,  Node: EOF and Errors,  Next: Error Recovery,  Prev: Formatted Input,  Up: I/O on Streams

End-Of-File and Errors
======================

Many of the functions described in this chapter return the value of the
macro `EOF' to indicate unsuccessful completion of the operation.
Since `EOF' is used to report both end of file and random errors, it's
often better to use the `feof' function to check explicitly for end of
file and `ferror' to check for errors.  These functions check
indicators that are part of the internal state of the stream object,
indicators set if the appropriate condition was detected by a previous
I/O operation on that stream.

 - Macro: int EOF
     This macro is an integer value that is returned by a number of
     narrow stream functions to indicate an end-of-file condition, or
     some other error situation.  With the GNU library, `EOF' is `-1'.
     In other libraries, its value may be some other negative number.

     This symbol is declared in `stdio.h'.

 - Macro: int WEOF
     This macro is an integer value that is returned by a number of wide
     stream functions to indicate an end-of-file condition, or some
     other error situation.  With the GNU library, `WEOF' is `-1'.  In
     other libraries, its value may be some other negative number.

     This symbol is declared in `wchar.h'.

 - Function: int feof (FILE *STREAM)
     The `feof' function returns nonzero if and only if the end-of-file
     indicator for the stream STREAM is set.

     This symbol is declared in `stdio.h'.

 - Function: int feof_unlocked (FILE *STREAM)
     The `feof_unlocked' function is equivalent to the `feof' function
     except that it does not implicitly lock the stream.

     This function is a GNU extension.

     This symbol is declared in `stdio.h'.

 - Function: int ferror (FILE *STREAM)
     The `ferror' function returns nonzero if and only if the error
     indicator for the stream STREAM is set, indicating that an error
     has occurred on a previous operation on the stream.

     This symbol is declared in `stdio.h'.

 - Function: int ferror_unlocked (FILE *STREAM)
     The `ferror_unlocked' function is equivalent to the `ferror'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

     This symbol is declared in `stdio.h'.

   In addition to setting the error indicator associated with the
stream, the functions that operate on streams also set `errno' in the
same way as the corresponding low-level functions that operate on file
descriptors.  For example, all of the functions that perform output to a
stream--such as `fputc', `printf', and `fflush'--are implemented in
terms of `write', and all of the `errno' error conditions defined for
`write' are meaningful for these functions.  For more information about
the descriptor-level I/O functions, see *Note Low-Level I/O::.


File: libc.info,  Node: Error Recovery,  Next: Binary Streams,  Prev: EOF and Errors,  Up: I/O on Streams

Recovering from errors
======================

You may explicitly clear the error and EOF flags with the `clearerr'
function.

 - Function: void clearerr (FILE *STREAM)
     This function clears the end-of-file and error indicators for the
     stream STREAM.

     The file positioning functions (*note File Positioning::) also
     clear the end-of-file indicator for the stream.

 - Function: void clearerr_unlocked (FILE *STREAM)
     The `clearerr_unlocked' function is equivalent to the `clearerr'
     function except that it does not implicitly lock the stream.

     This function is a GNU extension.

   Note that it is _not_ correct to just clear the error flag and retry
a failed stream operation.  After a failed write, any number of
characters since the last buffer flush may have been committed to the
file, while some buffered data may have been discarded.  Merely retrying
can thus cause lost or repeated data.

   A failed read may leave the file pointer in an inappropriate
position for a second try.  In both cases, you should seek to a known
position before retrying.

   Most errors that can happen are not recoverable -- a second try will
always fail again in the same way.  So usually it is best to give up and
report the error to the user, rather than install complicated recovery
logic.

   One important exception is `EINTR' (*note Interrupted Primitives::).
Many stream I/O implementations will treat it as an ordinary error,
which can be quite inconvenient.  You can avoid this hassle by
installing all signals with the `SA_RESTART' flag.

   For similar reasons, setting nonblocking I/O on a stream's file
descriptor is not usually advisable.


File: libc.info,  Node: Binary Streams,  Next: File Positioning,  Prev: Error Recovery,  Up: I/O on Streams

Text and Binary Streams
=======================

The GNU system and other POSIX-compatible operating systems organize all
files as uniform sequences of characters.  However, some other systems
make a distinction between files containing text and files containing
binary data, and the input and output facilities of ISO C provide for
this distinction.  This section tells you how to write programs portable
to such systems.

   When you open a stream, you can specify either a "text stream" or a
"binary stream".  You indicate that you want a binary stream by
specifying the `b' modifier in the OPENTYPE argument to `fopen'; see
*Note Opening Streams::.  Without this option, `fopen' opens the file
as a text stream.

   Text and binary streams differ in several ways:

   * The data read from a text stream is divided into "lines" which are
     terminated by newline (`'\n'') characters, while a binary stream is
     simply a long series of characters.  A text stream might on some
     systems fail to handle lines more than 254 characters long
     (including the terminating newline character).

   * On some systems, text files can contain only printing characters,
     horizontal tab characters, and newlines, and so text streams may
     not support other characters.  However, binary streams can handle
     any character value.

   * Space characters that are written immediately preceding a newline
     character in a text stream may disappear when the file is read in
     again.

   * More generally, there need not be a one-to-one mapping between
     characters that are read from or written to a text stream, and the
     characters in the actual file.

   Since a binary stream is always more capable and more predictable
than a text stream, you might wonder what purpose text streams serve.
Why not simply always use binary streams?  The answer is that on these
operating systems, text and binary streams use different file formats,
and the only way to read or write "an ordinary file of text" that can
work with other text-oriented programs is through a text stream.

   In the GNU library, and on all POSIX systems, there is no difference
between text streams and binary streams.  When you open a stream, you
get the same kind of stream regardless of whether you ask for binary.
This stream can handle any file content, and has none of the
restrictions that text streams sometimes have.


File: libc.info,  Node: File Positioning,  Next: Portable Positioning,  Prev: Binary Streams,  Up: I/O on Streams

File Positioning
================

The "file position" of a stream describes where in the file the stream
is currently reading or writing.  I/O on the stream advances the file
position through the file.  In the GNU system, the file position is
represented as an integer, which counts the number of bytes from the
beginning of the file.  *Note File Position::.

   During I/O to an ordinary disk file, you can change the file position
whenever you wish, so as to read or write any portion of the file.  Some
other kinds of files may also permit this.  Files which support changing
the file position are sometimes referred to as "random-access" files.

   You can use the functions in this section to examine or modify the
file position indicator associated with a stream.  The symbols listed
below are declared in the header file `stdio.h'.

 - Function: long int ftell (FILE *STREAM)
     This function returns the current file position of the stream
     STREAM.

     This function can fail if the stream doesn't support file
     positioning, or if the file position can't be represented in a
     `long int', and possibly for other reasons as well.  If a failure
     occurs, a value of `-1' is returned.

 - Function: off_t ftello (FILE *STREAM)
     The `ftello' function is similar to `ftell', except that it
     returns a value of type `off_t'.  Systems which support this type
     use it to describe all file positions, unlike the POSIX
     specification which uses a long int.  The two are not necessarily
     the same size.  Therefore, using ftell can lead to problems if the
     implementation is written on top of a POSIX compliant low-level
     I/O implementation, and using `ftello' is preferable whenever it
     is available.

     If this function fails it returns `(off_t) -1'.  This can happen
     due to missing support for file positioning or internal errors.
     Otherwise the return value is the current file position.

     The function is an extension defined in the Unix Single
     Specification version 2.

     When the sources are compiled with `_FILE_OFFSET_BITS == 64' on a
     32 bit system this function is in fact `ftello64'.  I.e., the LFS
     interface transparently replaces the old interface.

 - Function: off64_t ftello64 (FILE *STREAM)
     This function is similar to `ftello' with the only difference that
     the return value is of type `off64_t'.  This also requires that the
     stream STREAM was opened using either `fopen64', `freopen64', or
     `tmpfile64' since otherwise the underlying file operations to
     position the file pointer beyond the 2^31 bytes limit might fail.

     If the sources are compiled with `_FILE_OFFSET_BITS == 64' on a 32
     bits machine this function is available under the name `ftello'
     and so transparently replaces the old interface.

 - Function: int fseek (FILE *STREAM, long int OFFSET, int WHENCE)
     The `fseek' function is used to change the file position of the
     stream STREAM.  The value of WHENCE must be one of the constants
     `SEEK_SET', `SEEK_CUR', or `SEEK_END', to indicate whether the
     OFFSET is relative to the beginning of the file, the current file
     position, or the end of the file, respectively.

     This function returns a value of zero if the operation was
     successful, and a nonzero value to indicate failure.  A successful
     call also clears the end-of-file indicator of STREAM and discards
     any characters that were "pushed back" by the use of `ungetc'.

     `fseek' either flushes any buffered output before setting the file
     position or else remembers it so it will be written later in its
     proper place in the file.

 - Function: int fseeko (FILE *STREAM, off_t OFFSET, int WHENCE)
     This function is similar to `fseek' but it corrects a problem with
     `fseek' in a system with POSIX types.  Using a value of type `long
     int' for the offset is not compatible with POSIX.  `fseeko' uses
     the correct type `off_t' for the OFFSET parameter.

     For this reason it is a good idea to prefer `ftello' whenever it is
     available since its functionality is (if different at all) closer
     the underlying definition.

     The functionality and return value is the same as for `fseek'.

     The function is an extension defined in the Unix Single
     Specification version 2.

     When the sources are compiled with `_FILE_OFFSET_BITS == 64' on a
     32 bit system this function is in fact `fseeko64'.  I.e., the LFS
     interface transparently replaces the old interface.

 - Function: int fseeko64 (FILE *STREAM, off64_t OFFSET, int WHENCE)
     This function is similar to `fseeko' with the only difference that
     the OFFSET parameter is of type `off64_t'.  This also requires
     that the stream STREAM was opened using either `fopen64',
     `freopen64', or `tmpfile64' since otherwise the underlying file
     operations to position the file pointer beyond the 2^31 bytes
     limit might fail.

     If the sources are compiled with `_FILE_OFFSET_BITS == 64' on a 32
     bits machine this function is available under the name `fseeko'
     and so transparently replaces the old interface.

   *Portability Note:* In non-POSIX systems, `ftell', `ftello', `fseek'
and `fseeko' might work reliably only on binary streams.  *Note Binary
Streams::.

   The following symbolic constants are defined for use as the WHENCE
argument to `fseek'.  They are also used with the `lseek' function
(*note I/O Primitives::) and to specify offsets for file locks (*note
Control Operations::).

 - Macro: int SEEK_SET
     This is an integer constant which, when used as the WHENCE
     argument to the `fseek' or `fseeko' function, specifies that the
     offset provided is relative to the beginning of the file.

 - Macro: int SEEK_CUR
     This is an integer constant which, when used as the WHENCE
     argument to the `fseek' or `fseeko' function, specifies that the
     offset provided is relative to the current file position.

 - Macro: int SEEK_END
     This is an integer constant which, when used as the WHENCE
     argument to the `fseek' or `fseeko' function, specifies that the
     offset provided is relative to the end of the file.

 - Function: void rewind (FILE *STREAM)
     The `rewind' function positions the stream STREAM at the beginning
     of the file.  It is equivalent to calling `fseek' or `fseeko' on
     the STREAM with an OFFSET argument of `0L' and a WHENCE argument
     of `SEEK_SET', except that the return value is discarded and the
     error indicator for the stream is reset.

   These three aliases for the `SEEK_...' constants exist for the sake
of compatibility with older BSD systems.  They are defined in two
different header files: `fcntl.h' and `sys/file.h'.

`L_SET'
     An alias for `SEEK_SET'.

`L_INCR'
     An alias for `SEEK_CUR'.

`L_XTND'
     An alias for `SEEK_END'.


File: libc.info,  Node: Portable Positioning,  Next: Stream Buffering,  Prev: File Positioning,  Up: I/O on Streams

Portable File-Position Functions
================================

On the GNU system, the file position is truly a character count.  You
can specify any character count value as an argument to `fseek' or
`fseeko' and get reliable results for any random access file.  However,
some ISO C systems do not represent file positions in this way.

   On some systems where text streams truly differ from binary streams,
it is impossible to represent the file position of a text stream as a
count of characters from the beginning of the file.  For example, the
file position on some systems must encode both a record offset within
the file, and a character offset within the record.

   As a consequence, if you want your programs to be portable to these
systems, you must observe certain rules:

   * The value returned from `ftell' on a text stream has no predictable
     relationship to the number of characters you have read so far.
     The only thing you can rely on is that you can use it subsequently
     as the OFFSET argument to `fseek' or `fseeko' to move back to the
     same file position.

   * In a call to `fseek' or `fseeko' on a text stream, either the
     OFFSET must be zero, or WHENCE must be `SEEK_SET' and and the
     OFFSET must be the result of an earlier call to `ftell' on the
     same stream.

   * The value of the file position indicator of a text stream is
     undefined while there are characters that have been pushed back
     with `ungetc' that haven't been read or discarded.  *Note
     Unreading::.

   But even if you observe these rules, you may still have trouble for
long files, because `ftell' and `fseek' use a `long int' value to
represent the file position.  This type may not have room to encode all
the file positions in a large file.  Using the `ftello' and `fseeko'
functions might help here since the `off_t' type is expected to be able
to hold all file position values but this still does not help to handle
additional information which must be associated with a file position.

   So if you do want to support systems with peculiar encodings for the
file positions, it is better to use the functions `fgetpos' and
`fsetpos' instead.  These functions represent the file position using
the data type `fpos_t', whose internal representation varies from
system to system.

   These symbols are declared in the header file `stdio.h'.

 - Data Type: fpos_t
     This is the type of an object that can encode information about the
     file position of a stream, for use by the functions `fgetpos' and
     `fsetpos'.

     In the GNU system, `fpos_t' is an opaque data structure that
     contains internal data to represent file offset and conversion
     state information.  In other systems, it might have a different
     internal representation.

     When compiling with `_FILE_OFFSET_BITS == 64' on a 32 bit machine
     this type is in fact equivalent to `fpos64_t' since the LFS
     interface transparently replaces the old interface.

 - Data Type: fpos64_t
     This is the type of an object that can encode information about the
     file position of a stream, for use by the functions `fgetpos64' and
     `fsetpos64'.

     In the GNU system, `fpos64_t' is an opaque data structure that
     contains internal data to represent file offset and conversion
     state information.  In other systems, it might have a different
     internal representation.

 - Function: int fgetpos (FILE *STREAM, fpos_t *POSITION)
     This function stores the value of the file position indicator for
     the stream STREAM in the `fpos_t' object pointed to by POSITION.
     If successful, `fgetpos' returns zero; otherwise it returns a
     nonzero value and stores an implementation-defined positive value
     in `errno'.

     When the sources are compiled with `_FILE_OFFSET_BITS == 64' on a
     32 bit system the function is in fact `fgetpos64'.  I.e., the LFS
     interface transparently replaces the old interface.

 - Function: int fgetpos64 (FILE *STREAM, fpos64_t *POSITION)
     This function is similar to `fgetpos' but the file position is
     returned in a variable of type `fpos64_t' to which POSITION points.

     If the sources are compiled with `_FILE_OFFSET_BITS == 64' on a 32
     bits machine this function is available under the name `fgetpos'
     and so transparently replaces the old interface.

 - Function: int fsetpos (FILE *STREAM, const fpos_t *POSITION)
     This function sets the file position indicator for the stream
     STREAM to the position POSITION, which must have been set by a
     previous call to `fgetpos' on the same stream.  If successful,
     `fsetpos' clears the end-of-file indicator on the stream, discards
     any characters that were "pushed back" by the use of `ungetc', and
     returns a value of zero.  Otherwise, `fsetpos' returns a nonzero
     value and stores an implementation-defined positive value in
     `errno'.

     When the sources are compiled with `_FILE_OFFSET_BITS == 64' on a
     32 bit system the function is in fact `fsetpos64'.  I.e., the LFS
     interface transparently replaces the old interface.

 - Function: int fsetpos64 (FILE *STREAM, const fpos64_t *POSITION)
     This function is similar to `fsetpos' but the file position used
     for positioning is provided in a variable of type `fpos64_t' to
     which POSITION points.

     If the sources are compiled with `_FILE_OFFSET_BITS == 64' on a 32
     bits machine this function is available under the name `fsetpos'
     and so transparently replaces the old interface.


File: libc.info,  Node: Stream Buffering,  Next: Other Kinds of Streams,  Prev: Portable Positioning,  Up: I/O on Streams

Stream Buffering
================

Characters that are written to a stream are normally accumulated and
transmitted asynchronously to the file in a block, instead of appearing
as soon as they are output by the application program.  Similarly,
streams often retrieve input from the host environment in blocks rather
than on a character-by-character basis.  This is called "buffering".

   If you are writing programs that do interactive input and output
using streams, you need to understand how buffering works when you
design the user interface to your program.  Otherwise, you might find
that output (such as progress or prompt messages) doesn't appear when
you intended it to, or displays some other unexpected behavior.

   This section deals only with controlling when characters are
transmitted between the stream and the file or device, and _not_ with
how things like echoing, flow control, and the like are handled on
specific classes of devices.  For information on common control
operations on terminal devices, see *Note Low-Level Terminal
Interface::.

   You can bypass the stream buffering facilities altogether by using
the low-level input and output functions that operate on file
descriptors instead.  *Note Low-Level I/O::.

* Menu:

* Buffering Concepts::          Terminology is defined here.
* Flushing Buffers::            How to ensure that output buffers are flushed.
* Controlling Buffering::       How to specify what kind of buffering to use.


File: libc.info,  Node: Buffering Concepts,  Next: Flushing Buffers,  Up: Stream Buffering

Buffering Concepts
------------------

There are three different kinds of buffering strategies:

   * Characters written to or read from an "unbuffered" stream are
     transmitted individually to or from the file as soon as possible.

   * Characters written to a "line buffered" stream are transmitted to
     the file in blocks when a newline character is encountered.

   * Characters written to or read from a "fully buffered" stream are
     transmitted to or from the file in blocks of arbitrary size.

   Newly opened streams are normally fully buffered, with one
exception: a stream connected to an interactive device such as a
terminal is initially line buffered.  *Note Controlling Buffering::,
for information on how to select a different kind of buffering.
Usually the automatic selection gives you the most convenient kind of
buffering for the file or device you open.

   The use of line buffering for interactive devices implies that output
messages ending in a newline will appear immediately--which is usually
what you want.  Output that doesn't end in a newline might or might not
show up immediately, so if you want them to appear immediately, you
should flush buffered output explicitly with `fflush', as described in
*Note Flushing Buffers::.


File: libc.info,  Node: Flushing Buffers,  Next: Controlling Buffering,  Prev: Buffering Concepts,  Up: Stream Buffering

Flushing Buffers
----------------

"Flushing" output on a buffered stream means transmitting all
accumulated characters to the file.  There are many circumstances when
buffered output on a stream is flushed automatically:

   * When you try to do output and the output buffer is full.

   * When the stream is closed.  *Note Closing Streams::.

   * When the program terminates by calling `exit'.  *Note Normal
     Termination::.

   * When a newline is written, if the stream is line buffered.

   * Whenever an input operation on _any_ stream actually reads data
     from its file.

   If you want to flush the buffered output at another time, call
`fflush', which is declared in the header file `stdio.h'.

 - Function: int fflush (FILE *STREAM)
     This function causes any buffered output on STREAM to be delivered
     to the file.  If STREAM is a null pointer, then `fflush' causes
     buffered output on _all_ open output streams to be flushed.

     This function returns `EOF' if a write error occurs, or zero
     otherwise.

 - Function: int fflush_unlocked (FILE *STREAM)
     The `fflush_unlocked' function is equivalent to the `fflush'
     function except that it does not implicitly lock the stream.

   The `fflush' function can be used to flush all streams currently
opened.  While this is useful in some situations it does often more than
necessary since it might be done in situations when terminal input is
required and the program wants to be sure that all output is visible on
the terminal.  But this means that only line buffered streams have to be
flushed.  Solaris introduced a function especially for this.  It was
always available in the GNU C library in some form but never officially
exported.

 - Function: void _flushlbf (void)
     The `_flushlbf' function flushes all line buffered streams
     currently opened.

     This function is declared in the `stdio_ext.h' header.

   *Compatibility Note:* Some brain-damaged operating systems have been
known to be so thoroughly fixated on line-oriented input and output
that flushing a line buffered stream causes a newline to be written!
Fortunately, this "feature" seems to be becoming less common.  You do
not need to worry about this in the GNU system.

   In some situations it might be useful to not flush the output pending
for a stream but instead simply forget it.  If transmission is costly
and the output is not needed anymore this is valid reasoning.  In this
situation a non-standard function introduced in Solaris and available in
the GNU C library can be used.

 - Function: void __fpurge (FILE *STREAM)
     The `__fpurge' function causes the buffer of the stream STREAM to
     be emptied.  If the stream is currently in read mode all input in
     the buffer is lost.  If the stream is in output mode the buffered
     output is not written to the device (or whatever other underlying
     storage) and the buffer the cleared.

     This function is declared in `stdio_ext.h'.


File: libc.info,  Node: Controlling Buffering,  Prev: Flushing Buffers,  Up: Stream Buffering

Controlling Which Kind of Buffering
-----------------------------------

After opening a stream (but before any other operations have been
performed on it), you can explicitly specify what kind of buffering you
want it to have using the `setvbuf' function.

   The facilities listed in this section are declared in the header
file `stdio.h'.

 - Function: int setvbuf (FILE *STREAM, char *BUF, int MODE, size_t
          SIZE)
     This function is used to specify that the stream STREAM should
     have the buffering mode MODE, which can be either `_IOFBF' (for
     full buffering), `_IOLBF' (for line buffering), or `_IONBF' (for
     unbuffered input/output).

     If you specify a null pointer as the BUF argument, then `setvbuf'
     allocates a buffer itself using `malloc'.  This buffer will be
     freed when you close the stream.

     Otherwise, BUF should be a character array that can hold at least
     SIZE characters.  You should not free the space for this array as
     long as the stream remains open and this array remains its buffer.
     You should usually either allocate it statically, or `malloc'
     (*note Unconstrained Allocation::) the buffer.  Using an automatic
     array is not a good idea unless you close the file before exiting
     the block that declares the array.

     While the array remains a stream buffer, the stream I/O functions
     will use the buffer for their internal purposes.  You shouldn't
     try to access the values in the array directly while the stream is
     using it for buffering.

     The `setvbuf' function returns zero on success, or a nonzero value
     if the value of MODE is not valid or if the request could not be
     honored.

 - Macro: int _IOFBF
     The value of this macro is an integer constant expression that can
     be used as the MODE argument to the `setvbuf' function to specify
     that the stream should be fully buffered.

 - Macro: int _IOLBF
     The value of this macro is an integer constant expression that can
     be used as the MODE argument to the `setvbuf' function to specify
     that the stream should be line buffered.

 - Macro: int _IONBF
     The value of this macro is an integer constant expression that can
     be used as the MODE argument to the `setvbuf' function to specify
     that the stream should be unbuffered.

 - Macro: int BUFSIZ
     The value of this macro is an integer constant expression that is
     good to use for the SIZE argument to `setvbuf'.  This value is
     guaranteed to be at least `256'.

     The value of `BUFSIZ' is chosen on each system so as to make stream
     I/O efficient.  So it is a good idea to use `BUFSIZ' as the size
     for the buffer when you call `setvbuf'.

     Actually, you can get an even better value to use for the buffer
     size by means of the `fstat' system call: it is found in the
     `st_blksize' field of the file attributes.  *Note Attribute
     Meanings::.

     Sometimes people also use `BUFSIZ' as the allocation size of
     buffers used for related purposes, such as strings used to receive
     a line of input with `fgets' (*note Character Input::).  There is
     no particular reason to use `BUFSIZ' for this instead of any other
     integer, except that it might lead to doing I/O in chunks of an
     efficient size.

 - Function: void setbuf (FILE *STREAM, char *BUF)
     If BUF is a null pointer, the effect of this function is
     equivalent to calling `setvbuf' with a MODE argument of `_IONBF'.
     Otherwise, it is equivalent to calling `setvbuf' with BUF, and a
     MODE of `_IOFBF' and a SIZE argument of `BUFSIZ'.

     The `setbuf' function is provided for compatibility with old code;
     use `setvbuf' in all new programs.

 - Function: void setbuffer (FILE *STREAM, char *BUF, size_t SIZE)
     If BUF is a null pointer, this function makes STREAM unbuffered.
     Otherwise, it makes STREAM fully buffered using BUF as the buffer.
     The SIZE argument specifies the length of BUF.

     This function is provided for compatibility with old BSD code.  Use
     `setvbuf' instead.

 - Function: void setlinebuf (FILE *STREAM)
     This function makes STREAM be line buffered, and allocates the
     buffer for you.

     This function is provided for compatibility with old BSD code.  Use
     `setvbuf' instead.

   It is possible to query whether a given stream is line buffered or
not using a non-standard function introduced in Solaris and available
in the GNU C library.

 - Function: int __flbf (FILE *STREAM)
     The `__flbf' function will return a nonzero value in case the
     stream STREAM is line buffered.  Otherwise the return value is
     zero.

     This function is declared in the `stdio_ext.h' header.

   Two more extensions allow to determine the size of the buffer and how
much of it is used.  These functions were also introduced in Solaris.

 - Function: size_t __fbufsize (FILE *STREAM)
     The `__fbufsize' function return the size of the buffer in the
     stream STREAM.  This value can be used to optimize the use of the
     stream.

     This function is declared in the `stdio_ext.h' header.

 - Function: size_t __fpending (FILE *STREAM) The `__fpending'
     function returns the number of bytes currently in the output
     buffer.  For wide-oriented stream the measuring unit is wide
     characters.  This function should not be used on buffers in read
     mode or opened read-only.

     This function is declared in the `stdio_ext.h' header.


File: libc.info,  Node: Other Kinds of Streams,  Next: Formatted Messages,  Prev: Stream Buffering,  Up: I/O on Streams

Other Kinds of Streams
======================

The GNU library provides ways for you to define additional kinds of
streams that do not necessarily correspond to an open file.

   One such type of stream takes input from or writes output to a
string.  These kinds of streams are used internally to implement the
`sprintf' and `sscanf' functions.  You can also create such a stream
explicitly, using the functions described in *Note String Streams::.

   More generally, you can define streams that do input/output to
arbitrary objects using functions supplied by your program.  This
protocol is discussed in *Note Custom Streams::.

   *Portability Note:* The facilities described in this section are
specific to GNU.  Other systems or C implementations might or might not
provide equivalent functionality.

* Menu:

* String Streams::              Streams that get data from or put data in
                                 a string or memory buffer.
* Obstack Streams::		Streams that store data in an obstack.
* Custom Streams::              Defining your own streams with an arbitrary
                                 input data source and/or output data sink.

