<!-- $Id: report-OOM.xml 249 2016-01-04 19:34:44Z kib $ -->

<project cat='kern'>
  <title>Out of memory handler rewrite</title>

  <contact>
    <person>
      <name>
	<given>Konstantin</given>
	<common>Belousov</common>
      </name>
      <email>kib@FreeBSD.org</email>
    </person>
  </contact>

  <body>
    <p>Out of Memory (OOM) code is intended to handle the situation
    where the system needs free memory to make progress, while no
    memory can be reused.  Most often, the situation is that to free
    memory, the system needs more free memory.  Consider a case where
    the system needs to page-out dirty pages, but needs to allocate
    structures to track the writes.  OOM 'solves' the problem by
    killing some selection of user processes.  In other words, it
    trades the system deadlock for partial loss of user data.  The
    assumption is it is better to kill a process and recover data in
    other processes, than lose everything.</p>

    <p>Free memory in the FreeBSD Virtual Memory (VM) system appears
    from two sources.  One is the voluntary reclamation of pages used
    by a process, for example unmapping private anonymous regions, or
    the last unlink of an otherwise unreferenced file with cached
    pages.  Another source is the pagedaemon, which forcefully frees
    pages which carry data, of course, after the data is moved to some
    other storage, like swap or file blocks.  OOM is triggered when
    pagedaemon definitely cannot free memory to satisfy the
    requests.</p>

    <p>The old criteria to trigger OOM action was low free swap space
    and low count of free pages (the later is expressed precisely with
    the paging targets constants, but this is not relevant to the
    discussion).  That test is mostly incorrect, e.g., a low free page
    state might be caused by a greedy consumer allocating all pages
    freed by the page daemon in the current pass, but this does not
    preclude the page daemon from producing more pages.  Also, since
    the page-outs are asynchronous, the previous page daemon pass
    might not immmediately produce free pages, but they would appear
    some short time later.</p>

    <p>More seriously, low swap space does not necessarily indicate
    that we are in trouble: lots of pages may not require swap
    allocations to freed, e.g. clean pages or pages backed by files.
    The last notion is serious, since swap-less systems were
    considered as having full swap.</p>

    <p>Instead of trying to deduce the deadlock from looking at the
    current VM state, the new OOM handler tracks the history of page
    daemon passes.  Only if several consequtive passes failed to meet
    the paging target is OOM kill considered neccessary.  The count of
    consequent failed passes was selected empirically, by testing on
    small (32M) and large (512G) machines.  Auto-tuning of the counter
    is possible, but requires some more architectural changes to the
    I/O subsystem.</p>

    <p>Another issue was identified with the algorithm which selects a
    victim process for OOM kill.  It compared the counts of pages
    mapping entries (PTEs) installed into the machine paging
    structures.  For different reasons, machine-dependent VM code
    (pmap) may remove the pte for memory-resident page.  Under some
    circumstances, related to other measures to prevent low memory
    deadlock, very large processes which consume all system memory,
    could have few or no ptes, and the old OOM selector ignored the
    process which caused the deadlock, killing unrelated
    processes.</p>

    <p>A new function vm_pageout_oom_pagecount() was written which
    applies a reasonable heuristic to estimate the number of pages
    which would be freed by killing the given process.  This
    eliminates the effect of selecting small unrelated processes for
    OOM kill.</p>

    <p>The rewrite was committed to HEAD in r290917 and r290920.</p>

  <sponsor>The FreeBSD Foundation</sponsor>

</project>
