%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%									%
%	Copyright (C) 1992, 1993 Michael K. Johnson,			%
%	johnsonm@sunsite.unc.edu					%
%									%
%	This file is freely copyable, but you must preserve this	%
%	copyright notice on all copies, it must only be distributed	%
%	as part of the Linux Kernel Hackers' Guide, and its use is	%
%	is subject to the conditions expressed in the copyright for	%
%	the whole guide, in the file prelim/copyright.tex		%
%									%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\section{Device Driver Basics}\label{dev-drv-bas}

We will assume that you decide that you do not wish to write a
user-space device, and would rather implement your device in the kernel.  You
will probably be writing writing two files, a {\tt .c} file and a {\tt
.h} file, and possibly modifying other files as well, as will be
described below.  We will refer to your files as foo.c and foo.h, and
your driver will be the {\tt foo} driver.

{\bf [Should I include at the beginning of this chapter an example of
chargen and charsink?  Many writers do, but I don't know that it is
the best way.  I'd like people's opinions on this.]}

\subsection{Namespace}

One of the first things you will need to do, before writing any code,
is to name your device.  This name should be a short (probably two or
three character) string.  For instance, the parallel device is the
``{\tt lp}'' device, the floppies are the ``{\tt fd}'' devices, and
SCSI disks are the ``{\tt sd}'' devices.  As you write your driver,
you will give your functions names prefixed with your chosen string to
avoid any namespace confusion.  We will call your prefix {\tt foo,}
and give your functions names like {\tt foo\_read(), foo\_write(),}
etc.

\subsection{Allocating memory}

Memory allocation in the kernel is a little different from memory
allocation in normal user-level programs.  Instead of having a {\tt
malloc()} capable of delivering almost unlimited amounts of memory,
there is a {\tt kmalloc()} function that is a bit different:
\begin{itemize}
\item The largest size that can be allocated is 1 page, 4096 bytes,
and all memory is provided in pieces whose size is a power of 2.
You can request any odd size, but memory will not be used any more
efficiently if you request a 31-byte piece than it will if you request
a 32 byte piece.
\item {\tt kmalloc()} takes a second argument, the priority.  This is
used as an argument to the {\tt get\_free\_page()} function, where it
is used to determine when to return.  The usual priority is {\tt
GFP\_KERNEL}.  If it may be called from within an interrupt, use {\tt
GFP\_ATOMIC} and be truly prepared for it to fail (i.e.\ don't panic).
This is because if you specify {\tt GFP\_KERNEL}, {\tt kmalloc()} may
sleep, which cannot be done on an interrupt.  The other option is {\tt
GFP\_BUFFER}, which is used only when the kernel is allocating buffer
space, and never in device drivers.
\end{itemize}
To free memory allocated with {\tt kmalloc()}, use one of two
functions: {\tt kfree()} or {\tt kfree\_s()}.  These differ from {\tt free()}
in a few ways as well:
\begin{itemize}
\item {\tt kfree()} is a macro which calls {\tt kfree\_s()} and acts
like the standard {\tt free()} outside the kernel.
\item If you know what size object you are freeing, you can speed
things up by calling {\tt kfree\_s()} directly.  It takes two
arguments:  the first is the pointer that you are freeing, as in the
single argument to {\tt kfree()}, and the second is the size of the
object being freed.
\end{itemize}
See section~\ref{sec-dev-funcs} for more information on {\tt
kmalloc()}, {\tt kfree()}, and other useful functions.

The other way to acquire memory is to allocate it at initialization time.
Your initialization function, {\tt foo\_init()}, takes one argument,
a pointer to the current end of memory.  It can take as much memory as
it wants to, save a pointer or pointers to that memory, and return a
pointer to the new end of memory.  The advantage of this over
statically allocating large buffers ({\tt char bar[20000]}) is that if
the foo driver detects that the foo device is not attached to the
computer, the memory is not wasted.  The {\tt init()} function is
discussed in Section~\ref{sec-vfs}.

\subsection{Character vs.\ block devices}

There are two main types of devices under all \unix\ systems,
character and block devices.  Character devices are those for which no
buffering is performed, and block devices are those which are accessed
through a cache.  Block devices must be random access, but character
devices are not required to be, though some are.  Filesystems can only
be mounted if they are on block devices.

Character devices are read from and written to with two function: {\tt
foo\_read()} and {\tt foo\_write()}.  The {\tt read()} and {\tt write()}
calls do not return until the operation is complete.  By contrast,
block devices do not even implement the {\tt read()} and {\tt write()}
functions, and instead have a function which has historically been
called the ``strategy routine.''  Reads and writes are done through
the buffer cache mechanism by the generic functions {\tt bread(),}
{\tt breada(),} and {\tt bwrite()}.  These functions go through the
buffer cache, and so may or may not actually call the strategy
routine, depending on whether or not the block requested is in the
buffer cache (for reads) or on whether or not the buffer cache is full
(for writes).  A request may be asyncronous: {\tt breada()} can
request the strategy routine to schedule reads that have not been
asked for, and to do it asyncronously, in the background, in the hopes
that they will be needed later.  A more complete explanation of the
buffer cache is presented below in Section~\ref{buffer-cache}

The sources for character devices are kept in \dots/kernel/chr\_drv/,
and the sources for block devices are kept in \dots/kernel/blk\_drv/.
They have similar interfaces, and are very much alike, except for
reading and writing.  Because of the difference in reading and
writing, initialization is different, as block devices have to
register a strategy routine, which is registered in a different way
than the {\tt foo\_read()} and {\tt foo\_write()} routines of a
character device driver.  Specifics are dealt with in
Section~\ref{character-initialization} and
Section~\ref{block-initialization}

\subsection{Interrupts vs. Polling}\label{int-poll-basic}

Hardware is slow.  That is, in the time it takes to get information
from your average device, the CPU could be off doing something far
more useful than waiting for a busy but slow device.  So to keep from
having to {\bf busy-wait} all the time, {\bf interrupts} are provided
which can interrupt whatever is happening so that the operating system
can do some task and return to what it was doing without losing
information.  In an ideal world, all devices would probably work by
using interrupts.  However, on a PC or clone, there are only a few
interrupts available for use by your peripherals, so some drivers have
to poll the hardware: ask the hardware if it is ready to transfer
data yet.  This unfortunately wastes time, but it sometimes needs to
be done.

Also, some hardware (like memory-mapped displays) is as fast as the
rest of the machine, so an interrupt-driven driver would be rather
silly, even if interrupts were provided.

In \linux, many of the drivers are interrupt-driven, but some are not,
and at least one can be either, and can be switched back and forth at
runtime.  For instance, the {\tt lp} device (the parallel port driver)
normally polls the printer to see if the printer is ready to accept
output, and if the printer stays in a not ready phase for too long,
the driver will sleep for a while, and try again later.  This improves
system performance.  However, if you have a parallel card that
supplies an interrupt, the driver will utilize that, which will
usually make performance even better.

There are some important programming differences between
interrupt-driven drivers and polling drivers.  To understand this
difference, you have to understand a little bit of how system calls
work under \unix.  The kernel is not a separate task under \unix.
Rather, it is as if each process has a copy of the kernel.  When a
process executes a system call, it does not transfer control to
another process, but rather, the process changes execution modes, and
is said to be ``in kernel mode.''  In this mode, it executes kernel
code which is trusted to be safe.

In kernel mode, the process can still access the user-space memory
that it was previously executing in, which is done through a set of
macros: {\tt get\_fs\_*()} and {\tt memcpy\_fromfs()} read user-space
memory, and {\tt put\_fs\_*()} and {\tt memcpy\_tofs()} write to
user-space memory.  Because the process is still running, but in a
different mode, there is no question of where in memory to put the
data, or where to get it from.  However, when an interrupt occurs, any
process might currently be running, so these macros cannot be used~---
if they are, they will either write over random memory space of the running
process or cause the kernel to panic.

Instead, when scheduling the interrupt, a driver must also provide
temporary space in which to put the information, and then sleep.  When
the interrupt-driven part of the driver has filled up that temporary
space, it wakes up the process, which copies the information from that
temporary space into the process' user space and returns.  In a block
device driver, this temporary space is automatically provided by the
buffer cache mechanism, but in a character device driver, the driver
is responsible for allocating it itself.

\subsection{The sleep-wakeup mechanism}

{\bf [Begin by giving a general description of how sleeping is used
and what it does.  This should mention things like all processes
sleeping on an event are woken at once, and then they contend for the
event again, etc\dots]}

Perhaps the best way to try to understand the \linux\ sleep-wakeup
mechanism is to read the source for the {\tt \_\_sleep\_on()}
function, used to implement both the {\tt sleep\_on()} and {\tt
interruptible\_sleep\_on()} calls.
\begin{screen}\begin{verbatim}
static inline void __sleep_on(struct wait_queue **p, int state)
{
    unsigned long flags;
    struct wait_queue wait = { current, NULL };

    if (!p)
        return;
    if (current == task[0])
        panic("task[0] trying to sleep");
    current->state = state;
    add_wait_queue(p, &wait);
    save_flags(flags);
    sti();
    schedule();
    remove_wait_queue(p, &wait);
    restore_flags(flags);
}
\end{verbatim}\end{screen}

A {\tt wait\_queue} is a circular list of pointers to task structures,
defined in {\tt <linux/wait.h>} to be
\begin{screen}\begin{verbatim}
struct wait_queue {
    struct task_struct * task;
    struct wait_queue * next;
};
\end{verbatim}\end{screen}
{\tt state} is either {\tt TASK\_INTERRUPTIBLE} or {\tt
TASK\_UNINTERUPTIBLE}, depending on whether or not the sleep should be
interruptable by such things as system calls.  In general, the sleep
should be interruptible if the device is a slow one; one which can
block indefinitely, including terminals and network devices or
pseudodevices.

{\tt add\_wait\_queue()} turns off interrupts, if they were enabled,
and adds the new {\tt struct wait\_queue} declared at the beginning of
the function to the list {\tt p}.  It then recovers the original
interrupt state (enabled or disabled), and returns.

{\tt save\_flags()} is a macro which saves the process flags in its
argument.  This is done to preserve the previous state of the
interrupt enable flag.  This way, the {\tt restore\_flags()} later can
restore the interrupt state, whether it was enabled or disabled.  {\tt
sti()} then allows interrupts to occur, and {\tt schedule()} finds a
new process to run, and switches to it.  Schedule will not choose this
process to run again until the state is changed to {\tt TASK\_RUNNING}
by {\tt wake\_up()} called on the same wait queue, {\tt p}, or
conceivably by something else.

The process then removes itself from the {\tt wait\_queue}, restores
the orginal interrupt condition with {\tt restore\_flags()}, and
returns.

Whenever contention for a resource might occur, there needs to be a
pointer to a {\tt wait\_queue} associated with that resource.  Then,
whenever contention does occur, each process that finds itself locked
out of access to the resource sleeps on that resource's {\tt
wait\_queue}.  When any process is finished using a resource for which
there is a {\tt wait\_queue}, it should wake up and processes that
might be sleeping on that {\tt wait\_queue}, probably by calling {\tt
wake\_up()}, or possibly {\tt wake\_up\_interruptible()}.

If you don't understand why a process might want to sleep, or want
more details on when and how to structure this sleeping, I urge you to
buy one of the operating systems textbooks listed in
Appendix~\ref{bibliography} and look up {\bf mutual exclusion} and {\bf
deadlock.}

{\bf [This is a cop-out.  I should take the time to explain and give
examples, but I am not trying to write an OS text, and I
want to keep this under 1000 pages\dots]}

\subsubsection{More advanced sleeping}

If the {\tt sleep\_on()}/{\tt wake\_up()} mechanism in \linux\ does
not satisfy your device driver needs, you can code your own versions
of {\tt sleep\_on()} and {\tt wake\_up()} that fit your needs.  For an
example of this, look at the serial device driver
(\dots/kernel/chr\_drv/serial.c) in function {\tt
block\_til\_ready()}, where quite a bit has to be done between the
{\tt add\_wait\_queue()} and the {\tt schedule()}.

\subsection{The VFS}\label{sec-vfs}

The Virtual Filesystem Switch, or {\bf VFS}, is the mechanism which allows
\linux\ to mount many different filesystems at the same time.  In the
first versions of \linux, all filesystem access went straight into
routines which understood the {\tt minix} filesystem.  To make it
possible for other filesystems to be written, filesystem calls had to
pass through a layer of indirection which would switch the call to the
routine for the correct filesystem.  This was done by some generic
code which can handle generic cases and a structure of pointers to
functions which handle specific cases.  One structure is of interest
to the device driver writer; the {\tt file\_operations} structure.

From /usr/include/linux/fs.h:
\begin{screen}\begin{verbatim}
struct file_operations {
    int  (*lseek)   (struct inode *, struct file *, off_t, int);
    int  (*read)    (struct inode *, struct file *, char *, int);
    int  (*write)   (struct inode *, struct file *, char *, int);
    int  (*readdir) (struct inode *, struct file *, struct dirent *,
                     int count);
    int  (*select)  (struct inode *, struct file *, int,
                     select_table *);
    int  (*ioctl)   (struct inode *, struct file *, unsigned int,
                     unsigned int);
    int  (*mmap)    (struct inode *, struct file *, unsigned long,
                     size_t, int, unsigned long);
    int  (*open)    (struct inode *, struct file *);
    void (*release) (struct inode *, struct file *);
};
\end{verbatim}\end{screen}

Essentially, this structure constitutes a parital list of the
functions that you may have to write to create your driver.

This section details the actions and requirements of the functions in
the {\tt file\_operations} structure.  It documents all the arguments
that these functions take.  {\bf [It should also detail all the
defaults, and cover more carefully the possible return values.}

\subsubsection{{\bf The {\tt lseek()} function}}
This function is called when the system call {\tt lseek()} is called
on the device special file representing your device.  An understanding of
what the system call {\tt lseek()} does should be sufficient to
explain this function, which moves to the desired offset.  It takes
these four arguments:
\begin{dispitems}
\item [{\tt struct inode * inode}]
Pointer to the inode structure for this device.
\item [{\tt struct file * file}]
Pointer to the file structure for this device.
\item [{\tt off\_t offset}]
Offset {\bf from origin} to move to.
\item [{\tt int origin}]
0 = take the offset from absolute offset 0 (the beginning).\\
1 = take the offset from the current position.\\
2 = take the offset from the end.
\end{dispitems}
{\tt lseek()} returns {\tt -errno} on error, or $\ge$ 0 the absolute
position after the lseek.

If there is no {\tt lseek()}, the kernel will take the default action,
which is to modify the {\tt file->f\_pos} element.  For an {\tt origin}
of 2, the default action is to return {\tt -EINVAL} if {\tt
file->f\_inode} is NULL, otherwise it sets {\tt file->f\_pos} to {\tt
file->f\_inode->i\_size} $+$ {\tt offset}.  Because of this, if {\tt
lseek()} should return an error for your device, you must write an
{\tt lseek()} function which returns that error.

\subsubsection{{\bf The {\tt read()} and {\tt write()} functions}}
The read and write functions read and write a character string to the
device.  If there is no {\tt read()} or {\tt write()} function in the
{\tt file\_operations} structure registered with the kernel, and the
device is a character device, {\tt read()} or {\tt write()} system
calls, respectively, will return {\tt -EINVAL}.  If the device is a
block device, these functions should not be implemented, as the VFS
will route requests through the buffer cache, which will call your
strategy routine.  See Section~\ref{block-buffer-cache} for details on
how the buffer cache does this.  The {\tt read} and {\tt write}
functions take these arguments:
\begin{dispitems}
\item [{\tt struct inode * inode}]
This is a pointer to the inode of the device special file which was
accessed.  From this, you can do several things, based on the {\tt
struct inode} declaration about 100 lines into
/usr/include/linux/fs.h.  For instance, you can find the minor number
of the file by this construction: {\tt unsigned int minor =
MINOR(inode->i\_rdev);} The definition of the {\tt MINOR} macro is in
{\tt <linux/fs.h>}, as are many other useful definitions.  Read fs.h
and a few device drivers for more details, and see
section~\ref{sec-dev-funcs} for a short description.  {\tt inode->i\_mode}
can be used to find the mode of the file, and there are macros
available for this, as well.
\item [{\tt struct file * file}]
Pointer to file structure for this device.
\item [{\tt char * buf}]
This is a buffer of characters to read or write.  It is located in
{\em user-space\/} memory, and therefore must be accessed using the
{\tt get\_fs*(), put\_fs*(),} and {\tt memcpy*fs()} macros detailed in
section~\ref{sec-dev-funcs}.  User-space memory is inaccessible during
an interrupt, so if your driver is interrupt driven, you will have to
copy the contents of your buffer into a queue.
\item [{\tt int count}]
This is a count of characters in {\tt buf} to
be read or written.  It is the size of {\tt buf}, and is how you know
that you have reached the end of {\tt buf}, as {\tt buf} is not
guaranteed to be null-terminated.
\end{dispitems}

\subsubsection{{\bf The {\tt readdir()} function}}
This function is another artifact of {\tt file\_operations} being used
for implementing filesystems as well as device drivers.  Do not
implement it.  The kernel will return {\tt -ENOTDIR} if the system
call {\tt readdir()} is called on your device special file.

\subsubsection{{\bf The {\tt select()} function}}
The {\tt select()} function is generally most useful with character
devices.  It is usually used to multiplex reads without polling ---
the application calls the {\tt select()} system call, giving it a list
of file descriptors to watch, and the kernel reports back to the
program on which file descriptor has woken it up.  It is also used as
a timer.  However, the {\tt select()} function in your device driver
is not directly called by the system call {\tt select()}, and so the
{\tt file\_operations} {\tt select()} only needs to do a few things.
Its arguments are:
\begin{dispitems}
\item [{\tt struct inode * inode}]
Pointer to the inode structure for this device.
\item [{\tt struct file * file}]
Pointer to the file structure for this device.
\item [{\tt int sel\_type}] 
The select type to perform:\\
\begin{tabular}{|r|l|}\hline
 {\tt SEL\_IN} & read\\\hline
{\tt SEL\_OUT} & write\\\hline
 {\tt SEL\_EX} & exception\\\hline
\end{tabular}
\item [{\tt select\_table * wait}]
If {\tt wait} is not NULL and there is no error condition caused by
the select, {\tt select()} should put the process to sleep, and
arrange to be woken up when the device becomes ready, usually through
an interrupt.  If {\tt wait} is NULL, then the driver should quickly
see if the device is ready, and return even if it is not.  The {\tt
select\_wait()} function does this already.
\end{dispitems}

If the calling program wants to wait until one of the devices upon
which it is selecting becomes available for the operation it is
interested in, the process will have to be put to sleep until one of
those operations becomes available.  This does {\bf not} require use
of a {\tt sleep\_on*()} function, however.  Instead the {\tt
select\_wait()} function is used.  (See section~\ref{sec-dev-funcs}
for the definition of the {\tt select\_wait()} function).  The sleep
that {\tt select()} causes is very similar to that of {\tt
sleep\_on\_interruptible()}, and, in fact, {\tt
wake\_up\_interruptible()} is used to wake the process.

However, {\tt select\_wait()} will return~--- and the {\tt select()}
function will return.  The process isn't put to sleep until the system
call {\tt sys\_select()} uses the information given to it by the {\tt
select\_wait()} function and puts the process to sleep.  {\tt
select\_wait()} adds the process to the wait queue, and {\tt
sys\_select()} puts the process to sleep.

The first argument to {\tt select\_wait()} is the same {\tt
wait\_queue} that should be used for a {\tt sleep\_on()}, and the
second is the {\tt select\_table} that was passed to your {\tt
select()} function.

After having explained all this in excruciating detail, here are two
rules to follow:
\begin{enumerate}
\item Call {\tt select\_wait()} if the device is not ready, and return 0.
\item Return 1 if the device is ready.
\end{enumerate}

If you provide a {\tt select()} function, do not provide timeouts by
setting {\tt current->timeout}, as the select mechanism uses {\tt
current->timeout}, and the two methods cannot co-exist, as there is
only one {\tt timeout} for each process.  Instead, consider using a
timer to provide timeouts.  Just be sure that you need to use a timer
before you call {\tt add\_timer()}, as the whole system currently has
only 64 timers, and the kernel will panic if it tries to install more
than 64 timers concurrently.  {\bf [I believe that this has change
recently.  Check this out.]}

\subsubsection{{\bf The {\tt ioctl()} function}}
The {\tt ioctl()} function processes ioctl calls.  The structure of
your {\tt ioctl()} function will be: first error checking, then one
giant (possibly nested) switch statement to handle all possible
ioctls.  The ioctl number is passed as {\tt cmd}, and the argument to
the ioctl is passed as {\tt arg}.  It is good to have an understanding
of how {\tt ioctls} ought to work before making them up.  If you are
not sure about your ioctls, do not feel ashamed to ask someone
knowledgeable about it, for a few reasons: you may not even need an
ioctl for your purpose, and if you do need an ioctl, there may be a
better way to do it than what you have thought of.  Since ioctls are
the least regular part of the device interface, it takes perhaps the
most work to get this part right.  Take the time and energy you need
to get it right.
\begin{dispitems}
\item [{\tt struct inode * inode}]
Pointer to the inode structure for this device.
\item [{\tt struct file * file}]
Pointer to the file structure for this device.
\item [{\tt unsigned int cmd}] This is the ioctl command.  It is
generally used as the switch variable for a case statement.
\item [{\tt unsigned int arg}] This is the argument to the command.
This is user defined.   Since this is the same size as a {\tt
(void~*)}, this can be used as a pointer to user space, accessed
through the fs register as usual.
\end{dispitems}
\begin{dispitems}
\item [{\bf Returns:}] {\tt -errno} on error\\
Every other return is user-defined.
\end{dispitems}

If the {\tt ioctl()} slot in the {\tt file\_operations} structure is
not filled in, the VFS will return {\tt -EINVAL}.  However, in
all cases, if {\tt cmd} is one of {\tt FIOCLEX}, {\tt FIONCLEX}, {\tt
FIONBIO}, or {\tt FIOASYNC}, default processing will be done:
\begin{dispitems}
\item [{\tt FIOCLEX}] 0x5451\\
Sets the close-on-exec bit.
\item [{\tt FIONCLEX}] 0x5450\\
Clears the close-on-exec bit.
\item [{\tt FIONBIO}] 0x5421\\
If {\tt arg} is non-zero, set {\tt O\_NONBLOCK},
otherwise clear {\tt O\_NONBLOCK}.
\item [{\tt FIOASYNC}] 0x5452\\
If {\tt arg} is non-zero, set {\tt O\_SYNC},
otherwise clear {\tt O\_SYNC}.  {\tt O\_SYNC} is not yet implemented,
but it is documented here and parsed in the kernel for completeness.
\end{dispitems}
Note that you have to avoid these four numbers when creating your own
ioctls, as if they conflict, the VFS ioctl code will interpret them as
being one of these four, and act appropriately, causing a very hard to
track down bug.

\subsubsection{{\bf The {\tt mmap()} function}}
\begin{dispitems}
\item [{\tt struct inode * inode}] Pointer to inode structure for device.
\item [{\tt struct file * file}] Pointer to file structure for device.
\item [{\tt unsigned long addr}] Beginning of address in main memory
to {\tt mmap()} into.
\item [{\tt size\_t len}] Length of memory to {\tt mmap()}.
\item [{\tt int prot}] One of:\\
\begin{tabular}{|r|l|}\hline
 {\tt PROT\_READ} & region can be read.\\\hline
{\tt PROT\_WRITE} & region can be written.\\\hline
 {\tt PROT\_EXEC} & region can be executed.\\\hline
 {\tt PROT\_NONE} & region cannot be accessed.\\\hline
\end{tabular}\\
\item [{\tt unsigned long off}] Offset in the file to {\tt mmap()} from.
This address in the file will be mapped to address {\tt addr}.

{\bf [Here, give a pointer to the documentation for the new vmm
(Virtual Memory Mangament) interface, and show how the functions can
be used by a device {\tt mmap()} function.  Krishna should have the
documentation for the vmm interface in the memory management section.]}
\end{dispitems}


\subsubsection{{\bf The {\tt open()} and {\tt release()} functions}}
\begin{dispitems}
\item [{\tt struct inode * inode}] Pointer to inode structure for device.
\item [{\tt struct file * file}] Pointer to file structure for device.
\end{dispitems}
{\tt open()} is called when a device special files is opened.  It is the
policy mechanism responsible for ensuring consistency.  If only one
process is allowed to open the device at once, {\tt open()} should lock
the device, using whatever locking mechanism is appropriate, usually
setting a bit in some state variable to mark it as busy.  If a process
already is using the device (if the busy bit is already set) then {\tt
open()} should return {\tt -EBUSY}.  If more than one process may open
the device, this function is responsible to set up any necessary
queues that would not be set up in {\tt write()}.  If no such device
exists, {\tt open()} should return {\tt -ENODEV} to indicate this.
Return 0 on success.

{\tt release()} is called only when the process closes its last open
file descriptor on the files.  If devices have been marked as busy,
{\tt release()} should unset the busy bits if appropriate.  If you
need to clean up {\tt kmalloc()}'ed queues or reset devices to preserve
their sanity, this is the place to do it.  If no {\tt release()}
function is defined, none is called.

\subsubsection{{\bf The {\tt init()} function}}
This function is not actually included in the {\tt file\_operations}
structure, but you are required to implement it, because it is this
function that registers the {\tt file\_operations} structure with the
VFS in the first place~--- without this function, the VFS could not
route any requests to the driver.  This function is called when the
kernel first boots and is configuring itself.  {\tt init()} is passed
a variable holding the address of the current end of used memory.  The
init function then detects all devices, allocates any memory it will
want based on how many devices exist (this is often used to hold such
things as queues, for interrupt driven devices), and then, saving the
addresses it needs, it returns the new end of memory.  You will have
to call your {\tt init()} function from the correct place: for a
character device, this is {\tt chr\_dev\_init()} in
\dots/kernel/chr\_dev/mem.c.  In general, you will only pass the {\tt
memory\_start} variable to your {\tt init()} function.

While the {\tt init()} function runs, it registers your driver by
calling the proper registration function.  For character devices, this
is {\tt register\_chrdev()}.\footnote{See section\ref{sec-dev-funcs
for more information on the registration functions.}} {\tt
register\_chrdev()} takes three arguments: the major device number (an
int), the ``name'' of the device (a string), and the address of the
{\tt{\em device}\_fops} {\tt file\_operations} structure.

When this is done, and a character or block special file is accessed,
the VFS filesystem switch automagically routes the call, whatever it
is, to the proper function, if a function exists.  If the function
does not exist, the VFS routines take some default action.

The {\tt init()} function usually displays some information about the
driver, and usually reports all hardware found.  All reporting is done
via the {\tt printk()} function.