Received: from PACIFIC-CARRIER-ANNEX.MIT.EDU by po7.MIT.EDU (5.61/4.7) id AA06837; Fri, 12 Jan 96 21:15:12 EST
Received: from SENATOR-BEDFELLOW.MIT.EDU by MIT.EDU with SMTP
	id AA00721; Fri, 12 Jan 96 21:15:09 EST
Received: (from root@localhost) by senator-bedfellow.MIT.EDU (8.6.12/2.3JIK) id VAA21487 for Linux-Development-System-Dist@senator-bedfellow.mit.edu; Fri, 12 Jan 1996 21:13:44 -0500
Message-Id: <199601130213.VAA21487@senator-bedfellow.MIT.EDU>
From: Digestifier <Linux-Development-System-Request@senator-bedfellow.MIT.EDU>
To: Linux-Development-System@senator-bedfellow.MIT.EDU
Reply-To: Linux-Development-System@senator-bedfellow.MIT.EDU
Date:     Fri, 12 Jan 96 21:13:41 EST
Subject:  Linux-Development-System Digest #226

Linux-Development-System Digest #226, Volume #2  Fri, 12 Jan 96 21:13:41 EST

Contents:
  Re: [Q] R/W for HPFS-Filesystem? (David A Willmore)
  Re: linux kernel projects (Markus Kuhn)

----------------------------------------------------------------------------

From: willmore@whelk.cig.mot.com (David A Willmore)
Subject: Re: [Q] R/W for HPFS-Filesystem?
Date: 13 Jan 96 00:15:25 GMT

bobh@wasatch.com (Bob Hauck) writes:

>Bad form to make comments on your .sig, but I should point out that
>there is a book called "Unix for Dummies" and one called "The
>Internet for Dummies".  Scary, eh?

Isn't the latter a bit redundant? ;)

Cheers,
David

Disclaimer:  I only speak for myself.

------------------------------

From: mskuhn@unrza3.dialin.rrze.uni-erlangen.de (Markus Kuhn)
Subject: Re: linux kernel projects
Date: 12 Jan 1996 13:43:50 +0100
Reply-To: mskuhn@cip.informatik.uni-erlangen.de

brian@cs.ucr.edu (Brian Harvey) writes:

>       I'm the TA for the undergraduate course in Operating Systems here
>at U.C. Riverside. Our design project this quarter (10 weeks) involves
>modifying the linux kernel. Basically, we have an "army" of eager students
>that need ideas for projects involving the kernel (additions,modifications,
>improvements,fixing bugs, etc.).

>I (as well as the students) would greatly appreciate any project ideas that
>you may have. Thanks!


I think, I have a few very nice ideas:

You could work on some of the new real-time features which the
POSIX.1b standard requires. These are a number projects of manageable
size which involve hacking at various parts of the kernel.

Get a copy of Bill Gallmeister's book (see below) and of IEEE Std
1003.1b-1993 and then examine what in the Linux kernel is still
missing for POSIX.1b full conformance.

Some project ideas are:

  a) POSIX.1b IPC (shared memory, message passing, semaphores)
  b) POSIX.1b queued priority real-time signals
  c) POSIX.1b timers (requires b) )
  d) POSIX.1b asynchronous I/O

I hope you find something of these challanging and interesting. These
are all projects where your students will learn a lot about kernel
hacking and operating system design. Some of the projects (especially
semaphores, message passing, and async I/O) involve not only the
kernel, but also libc, and with a good design, even most of the
functionality can be kept in libc.

In addition, you can let your students perform an analysis of the
current deficiencies of Linux for real-time applications. Plug a board
with a hardware timer into the PC and make measurements how large the
worst case latency is with which a user process can react on an
external event like an interrupt. Try to analyze where these latencies
come from and how the kernel could be modified to avoid them without
degrading general scheduling performance.

Another highly fascinating project could be an analysis of how the
Linux timers could be made more precise so that you can start
algorithms with microsecond precision.

Below, I'll include an introductory posting about Linux and POSIX.1b.

Markus




===========================================================================

[This is another update of the text about implementing POSIX real-time
features in the Linux kernel which I posted here a few weeks ago. Some
important new real-time features (mlock, scheduler) have recently been
added to Linux. Markus]


A Vision for Linux 1.4 -- POSIX.1b Compatibility
================================================

Markus Kuhn -- 1997-01-07


Today, the Linux kernel and libc are quite well compatible with the
POSIX.1 and POSIX.2 standards, which specify system calls, library
functions and shell command compatibility for UNIX-style operating
systems. However the POSIX.1 system calls and library functions define
only a minimum core functionality required by anything that looks like
UNIX. Many slightly more advanced functions like mmap(), fsync(),
timers, modifiable scheduling algorithms, IPC, etc. which are
essential for many real world applications have not been standardized
by POSIX.1 in 1990.

The new POSIX.1b standard (now officially called IEEE Std
1003.1b-1993, ISBN 1-55937-375-X, during development of the standard,
it was called POSIX.4) corrects this and I believe POSIX.1b contains a
large number of useful ideas for further development on Linux.

In the very short introduction below, I hope to rise your interest in
POSIX.1b and in real-time problems in general. Happy reading!

The new POSIX extensions focus on the requirements of real-time
applications and on applications which have to perform high
performance I/O. Many applications like interactive video games, high
performance database servers, multimedia players and control software
for all kinds of hardware require more deterministic scheduling,
paging, signaling, timing and inter process communication mechanisms
than what is available on traditional UNIX systems like BSD4.3. The
functionality of systems like BSD4.3 has been optimized with mainframe
multi-user time-sharing scenarios in mind, while operating systems for
personal computers should also support real-time applications in
addition. On a personal computer, it is often acceptable and desired
that e.g. interactive games or CPU and memory intensive multimedia
applications are excluded from the normal paging and scheduling
strategies that try to be as fair as possible to all users of a large
mainframe.

The lack of real-time capability of Linux 1.2 has so far been the main
reason why still a number of interesting applications that run fine on
MS-DOS were unimplementable as user processes under Linux. Some
examples are e.g. highly reliable audio recording/replay tools,
control software for astronomical CCD cameras, real-time signal
processing algorithms, serial port smartcard emulators, etc. With the
recent addition of POSIX.1b memory locking and static priority
scheduling functions to Linux 1.3, this starts to change now.


POSIX.1b-1993 defines in addition to POSIX.1-1990 the following new
concepts and functions:


Improved Signals
================

POSIX.1b adds a new class of signals. These have the following new
features:

  - there are much more user specified signals now, not only SIGUSR1
    and SIGUSR2.

  - The additional POSIX.1b signals can now carry a little bit data (a
    pointer or an integer value) that can be used to transfer to the
    signal handler information about why the signal has been caused.

  - The new signals are queued, which means that if several signals of
    the same type arrive before the signal handler is called, all of
    them will be delivered.

  - POSIX.1b signals have a well-defined delivery order, i.e. you can
    work now with signal priorities.

  - A new function sigwaitinfo() allows to wait on signals and to
    continue quicklyafter the signal arrived with program execution
    without the overhead of calling a signal handler first.

New functions for signals are:

  sigwaitinfo(), sigtimedwait(), sigqueue().

Implementation status: not yet implemented.


Inter Process Communication (IPC) and memory mapped files
=========================================================

POSIX.1b now defines shared memory, messages and semaphores. The
functionality and design of these is similar or better than the System
V IPC mechanisms which we have already in Linux. The major extensions
are:

  - Strings (like filename paths) instead of integers are used now to
    identify IPC resources. This will allow to avoid IPC resource
    collisions much easier than in SysV IPC. The POSIX IPC name space
    should probably be made visible as a /proc/ipc subdirectory, so
    that the usual tools like ls and rm can be used to locate and
    remove stale persistent IPC resources.

  - Semaphores come in two flavors: kernel based semaphores (as in
    System V, which requires a system call for each P/V operation) and
    now also user memory based semaphores. Kernel based semaphores are
    sometimes necessary for security reasons, however they are a real
    pain if you want to build e.g. a high performance database:
    Suppose there are 20 server processes operating on a single B-tree
    in a memory mapped database file. Inserting a node with minimal
    blocking of other concurrent accesses by the other 19 processes in
    a large B-tree can require around 100 semaphore operations, i.e.
    currently 100 kernel calls :-(. With POSIX.1b's user memory based
    semaphores, you put all your semaphores in a piece of shared
    memory and the library accesses them with highly efficient
    test-and-set machine code. System calls are now only necessary in
    the rare case of a blocking P operation. A high performance database
    programmer's dream and easy to implement!

  - In POSIX.1b, both memory mapped files and shared memory are done
    with the mmap() system call.

The new functions for IPC are:

  mmap(), munmap(), shm_open(), shm_close(), shm_unlink(), ftruncate(),
  sem_init(), sem_destroy(), sem_open(), sem_close(), sem_unlink(),
  sem_wait(), sem_trywait(), sem_post(), sem_getvalue(), mq_open(),
  mq_close(), mq_mq_unlink(), mq_send(), mq_receive(), mq_notify(),
  mq_setattr(), mq_getattr(), mprotect().

Implementation status: POSIX IPC is not yet implemented (although a
part of the mechanisms is already available in the existing SysV IPC
code). A subset of the mmap() functionality has already been available
in Linux for a long time and Linus has recently completed mmap()
support in Linux 1.3. Eric Dumas <dumas@freenix.fr> has done some work
on POSIX IPC, however there are no patches available, yet.


Memory locking
==============

Four new functions mlock(), munlock(), mlockall() and munlockall()
allow to disable paging for either specified memory regions (mlock())
or for all pages (code, stack, data, shared memory, mapped files,
shared libraries) to which a process has access (mlockall()). This
allows to guarantee that e.g. small time-critical daemons stay in
memory which can help to guarantee response time of these processes.
Under Linux, this (like most other real-time related features) should
of course only be allowed for root processes in order to avoid abuse
of this feature by normal users in large time-sharing systems.

Another application would be in cryptographic computer security
programs. Using mlock(), these systems can ensure that an unencrypted
secret key or a password which is temporarily stored in a small user
space array will never get in contact with the swap device, where under
rare circumstances, someone might find the secret bytes even many
months later. For these applications, it would be desirable if Linux
allowed even non-root processes a small number of mlock()ed pages
(e.g. up to four locked pages per non-root process should be ok).

Implementation status: Linus has now added full POSIX.1b memory
locking support to Linux alpha test kernel version 1.3.43. So you
won't have to apply the POSIX.4_locking patch from Ralf Haller
<hal@iitb.fhg.de> any more.



Synchronous I/O
===============

Databases, e-mail systems, etc. require to be sure that the written
piece of data has actually reached the harddisk, because transaction
protocols require that a power failure after the write command can not
harm the data. POSIX.1b defines the fsync() and O_SYNC mechanisms
which Linux 1.2 already has.

In addition, there is a very useful new function fdatasync() which
requires that the data block is flushed to disk, however which does
NOT require that the inode with the latest access/modification time is
also flushed each time. With fdatasync(), the inode has only to be
written in case the file length has changed. In database applications
with mostly constant file sizes, where you sometimes require an
fsync() after each few written blocks, but where you don't care about
whether the access times in the inodes on the disc are always 100%
up-to-date, fdatasync() could easily double (!) the performance of
your system.

There is also an msync() function for flushing a range of pages from
memory mapped files to the disk.

Implementation status: fsync(), fdatasync(), msync(), and O_SYNC are
already available. O_DSYNC has not yet been implemented. However
fdatasync() in Linux 1.3.55 is currently only an alias for fsync().


Timers
======

  - Instead of the old BSD style gettimeofday()/settimeofday() calls,
    POSIX.1b defines clock_gettimer(), clock_settimer() and
    clock_getres(). They offer nanosecond resolution instead of
    microseconds as with the old BSD calls (at least on Pentiums, it
    is not difficult to implement a timer with a resolution much
    better than a microsecond). In addition, you can query now the
    actual resolution of the timer with clock_getres() (this might
    e.g. be higher on a Pentium than on an i386 if the Pentium
    clock count registers are utilized).

  - A new function nanosleep() allows to sleep also for less than a
    second (the old sleep had only second resolution). In addition,
    nanosleep won't interfere with SIGALRM and in case of EINTR, it
    returns the time left, so you can easily continue in a while loop.

    In order to implement this correctly with really high resolution
    (i.e. with better than 10 ms resolution), the 100 Hz interrupt in
    sched.c would have to check each time whether during the next time
    slice, a nanosleep() is scheduled to wake up and it would have to
    reprogram the interrupt timer to interrupt at precisely this time.
    If well done, this could be implemented without performance
    reduction for users of systems which do not use a nanosleep() at
    the moment and it would bring Linux (together with the POSIX.1b
    scheduler extensions below) a lot towards real-time capability.

  - POSIX.1b defines also itimers, however instead of what the
    existing BSD itimers provide, you now can deal with several timers
    (at least 32 per process) and you have again theoretically up to
    one nanosecond resolution. The old itimer functions can still
    easily be implemented in libc for compatibility reasons using new
    POSIX-style itimer system calls.

Implementation status: not yet implemented, although much of the
functionality is already available in the form of the BSD timers.
Queued Signals have to be implemented first.


Scheduling
==========

Linux 1.2 has so far been optimized a lot as a time sharing system,
where several people run application programs like editors, compilers,
debuggers, X window servers, networking daemons, etc. and do word
processing, software development, etc.

However there are a lot of applications for which Linux is currently
unusable and for which even die-hard Linux enthusiasts have to keep a
stand-alone DOS version on their disk. For >90% of these applications,
the fact that Linux is incapable of guaranteeing the response time of
an application is the major problem. Software for controlling e.g. an
EPROM programmer, a robot arm or an astronomical CCD camera is
currently not realizable under Linux if there is no dedicated
real-time controller present in the controlled device. A lot of
commercially available hardware has been designed with the real-time
capability of DOS in mind and has no own microcontroller for
time-critical actions, so this is a real world problem.

A real-world example: I have myself spent a long frustrating time of
trying to implement an interface to a pay-TV decoder for Linux (which
emulates a chip card and allows you to watch pay-TV for free :-). In
this application, you have to wait for an incoming byte on the serial
port, then you have to wait for around 0.7 to 2 ms (never shorter,
never longer, otherwise the TV decoder gets a timeout and stops!)
before returning an answer byte. It is virtually impossible to
implement a user process for this task under Linux 1.2, while it is
trivial to do this under DOS. I am looking forward to the day when
Linux provides enough real-time support for this application so that I
can finally remove MS-DOS from my harddisk.

For these and similar real-time applications, POSIX.1b specifies three
different schedulers, each with static priorities:

  SCHED_FIFO     A preemptive, priority based scheduler. Each process
                 managed under this scheduling priority possesses the
                 CPU as long as (a) it does not block itself and (b)
                 there comes no interrupt which puts another process
                 into a higher priority wait queue. There exists a FIFO
                 queue for each priority level and every process which
                 gets runnable again is inserted into the queue behind
                 all other processes. This is the most popular
                 scheduler used in typical real-time operating
                 systems. Function sched_yield() allows the process to
                 go to the end of the FIFO queue without blocking.

  SCHED_RR       A preemptive, priority based round robin scheduling
                 strategy with quanta. It is a very similar to
                 SCHED_FIFO, however each process has a time quantum and
                 the process becomes preempted and is inserted at the
                 end of the FIFO for the same priority level if it
                 runs longer than the time quantum and other processes
                 of the same priority level are waiting in the queue.
                 Processes of lower priorities will like in SCHED_FIFO
                 never get the CPU as long as a higher level process
                 is in a ready queue and if a higher priority process
                 becomes ready to run, it also gets the CPU immediately.

  SCHED_OTHER    This is any implementation defined scheduler and would
                 for Linux obviously be the the current time-sharing
                 scheduler with "nice" values, etc. For simplicity, I
                 suggest that under Linux 1.4, all SCHED_OTHER
                 processes should have the lowest static priority
                 level and that all SCHED_RR or SCHED_FIFO processes
                 can only have higher priorities. Inside this common
                 lowest SCHED_OTHER priority level, the classic Linux
                 scheduling algorithm would determine the Linux
                 scheduler priority which decides which process gets
                 the CPU next depending on nice levels, how long the
                 process has already had the CPU, etc. as it is done
                 already now.

For security reasons, only root processes should under Linux be
allowed to get any static priority higher than the one for
SCHED_OTHER, because if these real-time scheduling mechanisms are
abused, the whole system can be blocked.

If one is developing a real-time application, it is a very good idea
to have a shell with a higher SCHED_FIFO priority somewhere open in
order to be able to kill the tested application in case something goes
wrong. You should be aware, that if you use X11, not only the shell,
but also the X server, the window manager and the xterm will require a
higher SCHED_FIFO or SCHED_RR priority in order to stop processes
blocking the rest of the system. Therefore, testing real-time software
will usually better be done on the console.

With this POSIX.1b functionality, it would be possible to run
real-time software under Linux by giving it root permissions and
assigning it a SCHED_FIFO strategy and a higher static priority than
all other classic SCHED_OTHER Linux processes. In addition, this
real-time application would lock its pages with mlockall() into the
memory in order to avoid being swapped out. This will guarantee that
the real-time application can react as soon as possible on any
interrupts and that the response time will not be influenced by the
complicated normal Linux time-sharing priority mechanisms or by
paging. Then the only final piece missing towards a full real-time OS
like QNX or LynxOS would be a preemptable kernel (BTW: has Windows NT
a preemptable kernel?). However this is a much more complicated task
(as the kernel won't be a monitor any more) and I have some doubts
whether implementing this last step is possible without a noticeable
performance loss.

The new functions are here:

  sched_setparam(), sched_getparam(), sched_setscheduler(),
  sched_getscheduler(), sched_yield(), sched_get_priority_max(),
  sched_get_priority_min(), sched_rr_get_interval().

Implementation status: The sched_*() system calls are now available
since Linux 1.3.55. Although they do not yet guarantee the precise
queueing requirements described in POSIX.1b and some more testing on
this is necessary, you can already use them in order to get the
processor exclusively. Some earlier work on this was done by David F.
Carlson <carlson@dot4.com> in his POSIX.4_scheduler patch against
Linux 1.2 (available on sunsite).


Asynchronous I/O (aio)
======================

POSIX.1b defines a number of functions which allow to send a long list
of read/write requests at various seek positions in various files to
the kernel with one single lio_listio() system call. While the process
continues to execute the next instructions, the kernel will
asynchronously read or write the requested pages and will send signals
when the task has been completed (if this is desired).

This is e.g. very nice for a database which knows that it will require
a lot of different blocks scattered on a file. It will simply pass a
list of the blocks to the kernel, and the kernel can optimize the disk
head movement before sending the requests to the device. In addition
this minimizes the number of system calls and allows the database to
do something else in the meantime (e.g. waiting for the client process
sending an abort instruction in which case the database server can
cancel the async i/o requests with aio_cancel()).

Another important application of aio are multimedia systems like MPEG
players or sound file players and recorders. These programs want to
preload the next few seconds of the data stream from harddisk into
locked memory, but also want to continue e.g. showing the video on the
screen at the same time without any interruptions caused by
synchronous I/O.

POSIX.1b also defines priorities for asynchronous I/O, i.e. there is a
way to tell the kernel that the read request for the MPEG player is
more important than the read request of gcc. On a future real-time
Linux, you don't want to see any image distortions while watching MPEG
video and compiling a kernel at the same time if you gave the MPEG
player a higher static priority.

New functions in this area are:

  aio_read(), aio_write(), lio_listio(), aio_suspend(), aio_cancel(),
  aio_error(), aio_return(), aio_fsync().

Implementation status: Not yet implemented. The aio functions are
probably best implemented in libc using kernel threads and the normal
synchronous I/O system calls. There has recently been some progress on
implementing kernel threads in Linux 1.3 using the clone() system
call. Adding priority I/O to Linux might be a more complicated job,
because many device drivers would have to be extended by priority wait
queues.


Implemented options
===================

As POSIX.1b conformance does not require the implementation of all
these functions, macros have been specified for <unistd.h> that
indicate to application software which of the POSIX.1b functionality
is available on this system. This way, portable software can be
written that uses real-time features only when they are available.

Under the latest Linux kernel and libc development versions, the
following POSIX.1b macros have been defined and indicate implemented
functions:

     _POSIX_FSYNC
     _POSIX_MAPPED_FILES
     _POSIX_MEMLOCK
     _POSIX_MEMLOCK_RANGE
     _POSIX_MEMORY_PROTECTION
     _POSIX_PRIORITY_SCHEDULING

The POSIX.1b options indicated by the following macros have not yet
been implemented under Linux:

     _POSIX_ASYNCHRONOUS_IO
     _POSIX_MESSAGE_PASSING
     _POSIX_PRIORITIZED_IO
     _POSIX_REALTIME_SIGNALS
     _POSIX_SEMAPHORES
     _POSIX_SHARED_MEMORY_OBJECTS
     _POSIX_SYNCHRONIZED_IO
     _POSIX_TIMERS


For those of you who have become interested in POSIX.1b, there exists
a good book:

  Bill O. Gallmeister, POSIX.4 -- Programming for the Real World,
  O'Reilly & Associates, 1995, ISBN 1-56592-074-0.

This book is not only a good introduction into POSIX.1b (which was
originally called POSIX.4), it is also an easy reading nice way into
the world of real-time operating systems for those developers who have
so far been very UNIX and time-sharing oriented.

You can order the POSIX.1b standard (officially called IEEE Std
1003.1b-1993; this book includes also all text of POSIX.1) as well as
the other POSIX standards directly from IEEE:

  phone:  +1 908 981 1393 (TZ: eastern standard time)
           1 800 678 4333 (from US+Canada only)
  fax:    +1 908 981 9667
  e-mail: stds.info@ieee.org

Information about POSIX and other IEEE standards is also available on
<http://stdsbbs.ieee.org:70/0/pub/ieeestds.htm>, however unfortunately
the full standard documents are only available as books or on CD-ROM,
not on the Internet.

Here is a brief list of some of the POSIX standards:

  POSIX.1          Basic OS interface (C language)
  POSIX.1a         Misc. extensions (symlinks, etc.)
  POSIX.1b         Real-time and I/O extensions (was: POSIX.4)
  POSIX.1c         Threads (was: POSIX.4a)
  POSIX.1d         More real-time extensions (was: POSIX.4b)
  POSIX.1e         Security extensions, ACLs (was: POSIX.6)
  POSIX.1f         Transparent network file access (was: POSIX.8)
  POSIX.1g         Protocol independent communication, sockets (was: POSIX.12)
  POSIX.2          Shell and common utility programs (date, ln, ...)
  POSIX.3          Test methods
  POSIX.5          ADA binding to POSIX.1
  POSIX.7          System administration
  POSIX.9          FORTRAN-77 binding to POSIX.1
  POSIX.15         Supercomputing extensions (checkpoint/recovery, etc.)

and a few others which are still in early draft stage. If you want to
follow progress on POSIX standardization, you should follow the
announcements in the moderated USENET group comp.std.unix.

This text just summarizes POSIX.1b and related work on Linux. Many
people interested in POSIX.1b support seem also to be interested in
POSIX.1c support (threads). Some information about POSIX.1c support is
on <http://www.mit.edu:8001/people/proven/pthreads.html>.

Markus

-- 
Markus Kuhn, Computer Science student -- University of Erlangen,
Internet Mail: <mskuhn@cip.informatik.uni-erlangen.de> - Germany
WWW Home: <http://wwwcip.informatik.uni-erlangen.de/user/mskuhn>

------------------------------


** FOR YOUR REFERENCE **

The service address, to which questions about the list itself and requests
to be added to or deleted from it should be directed, is:

    Internet: Linux-Development-System-Request@NEWS-DIGESTS.MIT.EDU

You can send mail to the entire list (and comp.os.linux.development.system) via:

    Internet: Linux-Development-System@NEWS-DIGESTS.MIT.EDU

Linux may be obtained via one of these FTP sites:
    nic.funet.fi				pub/OS/Linux
    tsx-11.mit.edu				pub/linux
    sunsite.unc.edu				pub/Linux

End of Linux-Development-System Digest
******************************
