=================== = MemTest-86 v1.4 = =================== Introduction ============ Memtest86 is thorough, stand alone memory test for 386, 486 and 586 systems. Memtest86 uses a "moving inversions" algorithm that is proven to be effective in finding memory errors. The BIOS based memory test is just a quick check that will often miss many of the failures that are detected by Memtest86. Enhancements in v1.4 ==================== 1) Changes to the memory sizing code to avoid problems with some motherboards where memtest would find more memory than actually exists. 2) Added support for a console serial port. (thanks to Doug Sisk) 3) On-line commands are now available for configuring memtest86 on the fly (see On-line Commands). Enhancements in v1.3 ==================== 1) Scrolling of memory errors is now provided. Previously, only one screen of error information was displayed. 2) Memtest86 can now be booted from any disk via lilo. 3) Testing of up to 4gb of memory has been fixed is now enabled by default. This capability was clearly broken in v1.2a and should work correctly now but has not been fully tested (4gb PC's are a bit rare). 4) The maximum memory size supported by the motherboard is now being calculated correctly. In previous versions there were cases where not all of memory would be tested and the maximum memory size supported was incorrect. 5) For some types of failures the good and bad values were reported to be same with an Xor value of 0. This has been fixed by retaining the data read from memory and not re-reading the bad data in the error reporting routine. 6) APM (advanced power management) is now disabled by memtest86. This keeps the screen from blanking while the test is running. 7) Problems with enabling & disabling cache on some motherboards has been corrected. Installation ============ Memtest86 is a stand alone program and can be loaded from either a disk partition or from a floppy disk. To build Memtest86: 1) Edit the Makefile and adjust options as needed. 2) Type "make" This creates a file named "image" which is a boot-able image. This image file may be copied to a floppy disk or lilo may be used to boot this image from a hard disk partition. To create a Memtest86 boot-disk 1) Insert a blank write enabled floppy disk. 2) As root, Type "make install" To boot from a disk partition via lilo 1) Copy the image file to a permanent location (ie. /memtest86). 2) Add an entry in the lilo config file (usually /etc/lilo.conf) to boot memtest86. Only the image and label fields need to be specified. The following is a sample lilo entry for booting memtest86: image = /memtest86 label = memt 3) As root, type "lilo" At the lilo prompt enter memt to boot memtest86. If you encounter build problems a binary image has been included (image.bin). To create a boot-disk with this pre-built image do the following: 1) Insert a blank write enabled floppy disk. 2) Type "make install-bin" Online Commands =============== A help bar is displayed at the bottom of the screen listing the available on-line commands. The commands are: Command Description ------- -------------------------------------------------------- ESC Exits the test and does a warm restart via the BIOS. c Cache configuration options: 1) Toggle Alternates cache on and off for each pass (default) 2) On Cache always on 3) Off Cache always off r Refresh configuration options: 1) Toggle Alternates between normal and extended refresh on every other pass (default) 2) Normal Use normal (15)ms refresh 3) Extended Use extended (150ms) refresh 4) Xlong Use extra long (500ms) refresh Note: the Xlong refresh is far outside of the normal operating mode and errors do not necessarily mean that memory is not working correctly. 5) Nochange This option inhibits changing of the memory refresh rate. Selecting this option before the second pass will cause the refresh rate set by the BIOS to be used for the duration of the test. SP Set scroll lock. Stops scrolling of error messages CR Clear scroll lock Enables error message scrolling Problems ======== Memtest86 has not been designed for or tested with parity memory enabled or error correcting (ECC) memory. With memory parity enabled the test should execute without problem, but will most likely die with an unexpected exception when an error is detected. With ECC memory the test will not be able to detect single bit errors but the should otherwise execute correctly. Support for parity and ecc is planned for a future release. There have been a number of compatibility problems reported. Most of these problems have been identified and corrected, but it is likely that there are still some incompatibilities. Please report problems. It has been reported that with some motherboards Memtest86 fails when shadow memory is enabled. The cause for this failure is not yet understood. If you encounter large numbers of errors on an otherwise working system try disabling shadow memory. Incompatibilities have been encountered with various versions of the loader. Building memtest86 has been tested with the slackware 3.1 and 3.2 releases. Functional Description ====================== Bootstrap and setup code (robbed shamelessly from the Linux kernel) is used to load Memtest86, setup memory management registers and do miscellaneous setup. When the load and setup are complete the memory map is as follows: 0x000 |-----------------------------------------------| | Stack (4k) | 0x1000 |-----------------------------------------------| | Memtest-text (9k) Origin 0x1000 | 0x3400 ------------------------------------------------- | Memtest-data (3k) Origin 0x3400 | 0x4000 ------------------------------------------------- | Memtest-text (9k) Origin 0x104000 | 0x6400 ------------------------------------------------- | Memtest-data (3k) Origin 0x106400 | 0x7000 ------------------------------------------------- | Common variables (1k) | 0x7400 ------------------------------------------------- Relocation of the test is accomplished by using two copies of the test code that have been built to execute at different addresses (different origins). When the test is started, the code with an origin of 0x1000 is executed. At the end of the testing phase the code from 0x1000 to 0x7400 is copied to 0x104000, the stack is set to 0x104000 and then we jump to address 0x104000 (the code with an origin of 0x1004000). When the code is relocated only the first 640k of memory is tested. When this test is complete then the code is moved back to 0x1000, the stack is set back to 0x1000 and then we jump to 0x1000 (the code with an origin of 0x1000). When Memtest86 is loaded into memory it first scans memory to find all segments of available read/write memory (DRAM). DRAM is identified by reading a location and then writing its complement. If at least one bit in each byte changes then we assume that it is DRAM. To save time we only do this check every 1024 bytes. All memory from 0xa0000 to 0xf0000 is skipped. Each segment of memory is displayed on the left side of the screen. All segments of memory that are found will be tested regardless of size. The memory scan is limited to the maximum memory size supported by the motherboard. After the memory segments have been identified the actual testing begins. Since the memory chips currently used in PC's are either one or four bits wide (ie. 1x1meg, 4x1meg, 1x4meg or 4x4meg) a four bit wide test patterns replicated to fill a 32 bit word are used. If memory chips become available in PC's that are wider than four bits then this pattern should be adjusted. A moving inversions algorithm is then used to test all of memory. Every pass through memory is done sequentially through each memory segment. This must be done to preserve the integrity of the moving inversions algorithm. A provision to disable L1, L2 caches was added in release 1.1. With this enhancement cache is on for even numbered passes and off for odd passes. The test algorithms used by Memtest86 are able to do fairly effective testing of memory with cache on. However, in some instances testing with cache off may be more effective. In addition the test algorithms will not work correctly on systems with write-back cache when cache is enabled. The down side is that execution time is much, much longer when cache is off. With version 1.4 online commands are available for cache configuration. Memtest86 has the ability to test memory using longer refresh rates. This makes is possible to detect marginal errors that otherwise would go undetected with the normal refresh rate. Three refresh rates are available, the normal rate of 15ms, an extended refresh rate of 150ms and an extra long rate of 500ms. Normally the test alternates between normal and extended refresh every two passes. This allows for testing with cache both on and off before switching refresh rates. An online command is available for controlling refresh rate, including selection of the extra long refresh rate. WARNING: the extra long refresh rate is much longer than normal and errors reported with this refresh rate do not necessarily indicate faulty memory. Display Description =================== The following is a description of each field displayed by Memtest86: Testing: A list of all DRAM segments that have been found and will be tested. Max_Mem: Maximum memory size supported by the motherboard. Pattern: The current 32 bit data pattern used for testing Refresh: The current refresh rate ("Default" indicates that the refresh rate has not been altered) Pass: Pass count. (A complete checkout with/without cache and extended refresh requires four passes) Errors: Total errors. Cache: Cache status for both L1 and L2 cache. The "windmill" turns 9 times per pattern. There are 10 patterns per pass: 8, 7, 4, B, 2, D, 1, E, 0, F, each filling a 32 bit word. The following information is displayed when a memory error is detected. An error message is only displayed for errors with a different address or failing bit pattern. All displayed values are hexadecimal. Addrs: Failing memory address Good: Current data pattern Bad: Failing data pattern Xor: Exclusive or of good and bad data (this shows the position of the failing bit(s)) Count: Number of consecutive errors with the same address and failing bits Troubleshooting Memory Errors ============================= Once a memory error has been detected, determining the failing SIMM is not a clear cut procedure. With the large number of motherboard vendors and possible combinations of simm slots it would be difficult if not impossible to assemble complete information about how a particular error would map to a failing simm. However, there are steps that may be taken to determine the failing simm. Here are three techniques that you may wish to use: 1) Removing simms This is simplest method for isolating a failing simm, but may only be employed when one or more simms can be removed from the system. By selectively removing simms from the system and then running the test you will be able to find the bad simm. Be sure to note exactly which simms are in the system when the test passes and when the test fails. 2) Rotating simms When none of the simms can be removed then you may wish to rotate simms to find the failing one. This technique can only be used if there are 4 or more simms in the system. Change the location of two simms at a time. For example put the simm from slot 1 into slot 2 and put the simm from slot 2 in slot 1. Run the test and if either the failing bit or address changes then you know that the failing simm is one of the ones just moved. By using several combinations of simm movement you should be able to determine which simm is failing. 3) Replacing simms If you are unable to use either of the previous techniques then you are left to selective replacement of simms to find the failure. Acknowledgments =============== The initial versions of the source files bootsect.S, setup.S, head.S and build.c are from the Linux 1.2.1 kernel and have been heavily modified. Doug Sisk provided code to support a console connected via a serial port. -- Chris Brady E-mail cbrady@cray.com