SED - A Non-interactive Text Editor Lee E. McMahon Bell Laboratories Murray Hill, New Jersey 07974 _A_B_S_T_R_A_C_T _S_e_d is a non-interactive context editor that runs on the UNIX* operating system. _S_e_d is designed to be especially useful in three cases: 1) To edit files too large for comfortable interactive editing; 2) To edit any size file when the sequence of editing commands is too complicated to be comfortably typed in interactive mode. 3) To perform multiple `global' editing func- tions efficiently in one pass through the input. This memorandum constitutes a manual for users of _s_e_d. _I_n_t_r_o_d_u_c_t_i_o_n _S_e_d is a non-interactive context editor designed to be espe- cially useful in three cases: 1) To edit files too large for comfortable interactive editing; 2) To edit any size file when the sequence of editing commands is too complicated to be comfortably typed in interactive mode; 3) To perform multiple `global' editing functions effi- ciently in one pass through the input. Since only a few lines of the input reside in core at one time, and no temporary files are used, the effective size of file that can be edited is limited only by the requirement that the input and output fit simultaneously into available secondary storage. __________________________ * UNIX is a Trademark of Bell Laboratories. - 2 - Complicated editing scripts can be created separately and given to _s_e_d as a command file. For complex edits, this saves considerable typing, and its attendant errors. _S_e_d running from a command file is much more efficient than any interactive editor known to the author, even if that editor can be driven by a pre-written script. The principal loss of functions compared to an interactive editor are lack of relative addressing (because of the line-at-a-time operation), and lack of immediate verifica- tion that a command has done what was intended. _S_e_d is a lineal descendant of the UNIX editor, _e_d. Because of the differences between interactive and non-interactive operation, considerable changes have been made between _e_d and _s_e_d; even confirmed users of _e_d will frequently be surprised (and probably chagrined), if they rashly use _s_e_d without reading Sections 2 and 3 of this document. The most striking family resemblance between the two editors is in the class of patterns (`regular expressions') they recog- nize; the code for matching patterns is copied almost verba- tim from the code for _e_d, and the description of regular expressions in Section 2 is copied almost verbatim from the UNIX Programmer's Manual[1]. (Both code and description were written by Dennis M. Ritchie.) _1. _O_v_e_r_a_l_l _O_p_e_r_a_t_i_o_n _S_e_d by default copies the standard input to the standard output, perhaps performing one or more editing commands on each line before writing it to the output. This behavior may be modified by flags on the command line; see Section 1.1 below. The general format of an editing command is: [address1,address2][function][arguments] One or both addresses may be omitted; the format of addresses is given in Section 2. Any number of blanks or tabs may separate the addresses from the function. The function must be present; the available commands are dis- cussed in Section 3. The arguments may be required or optional, according to which function is given; again, they are discussed in Section 3 under each individual function. Tab characters and spaces at the beginning of lines are ignored. - 3 - _1._1. _C_o_m_m_a_n_d-_l_i_n_e _F_l_a_g_s Three flags are recognized on the command line: -_n: tells _s_e_d not to copy all lines, but only those specified by _p functions or _p flags after _s func- tions (see Section 3.3); -_e: tells _s_e_d to take the next argument as an editing command; -_f: tells _s_e_d to take the next argument as a file name; the file should contain editing commands, one to a line. _1._2. _O_r_d_e_r _o_f _A_p_p_l_i_c_a_t_i_o_n _o_f _E_d_i_t_i_n_g _C_o_m_m_a_n_d_s Before any editing is done (in fact, before any input file is even opened), all the editing commands are compiled into a form which will be moderately efficient during the execu- tion phase (when the commands are actually applied to lines of the input file). The commands are compiled in the order in which they are encountered; this is generally the order in which they will be attempted at execution time. The com- mands are applied one at a time; the input to each command is the output of all preceding commands. The default linear order of application of editing commands can be changed by the flow-of-control commands, _t and _b (see Section 3). Even when the order of application is changed by these commands, it is still true that the input line to any command is the output of any previously applied command. _1._3. _P_a_t_t_e_r_n-_s_p_a_c_e The range of pattern matches is called the pattern space. Ordinarily, the pattern space is one line of the input text, but more than one line can be read into the pattern space by using the _N command (Section 3.6.). _1._4. _E_x_a_m_p_l_e_s Examples are scattered throughout the text. Except where otherwise noted, the examples all assume the following input text: In Xanadu did Kubla Khan A stately pleasure dome decree: Where Alph, the sacred river, ran Through caverns measureless to man Down to a sunless sea. (In no case is the output of the _s_e_d commands to be con- sidered an improvement on Coleridge.) - 4 - _E_x_a_m_p_l_e: The command 2q will quit after copying the first two lines of the input. The output will be: In Xanadu did Kubla Khan A stately pleasure dome decree: _2. _A_D_D_R_E_S_S_E_S: _S_e_l_e_c_t_i_n_g _l_i_n_e_s _f_o_r _e_d_i_t_i_n_g Lines in the input file(s) to which editing commands are to be applied can be selected by addresses. Addresses may be either line numbers or context addresses. The application of a group of commands can be controlled by one address (or address-pair) by grouping the commands with curly braces (`{ }')(Sec. 3.6.). _2._1. _L_i_n_e-_n_u_m_b_e_r _A_d_d_r_e_s_s_e_s A line number is a decimal integer. As each line is read from the input, a line-number counter is incremented; a line-number address matches (selects) the input line which causes the internal counter to equal the address line- number. The counter runs cumulatively through multiple input files; it is not reset when a new input file is opened. As a special case, the character $ matches the last line of the last input file. _2._2. _C_o_n_t_e_x_t _A_d_d_r_e_s_s_e_s A context address is a pattern (`regular expression') enclosed in slashes (`/'). The regular expressions recog- nized by _s_e_d are constructed as follows: 1) An ordinary character (not one of those discussed below) is a regular expression, and matches that character. 2) A circumflex `^' at the beginning of a regular expression matches the null character at the beginning of a line. 3) A dollar-sign `$' at the end of a regular expression matches the null character at the end of a line. 4) The characters `\n' match an imbedded newline char- acter, but not the newline at the end of the pat- tern space. - 5 - 5) A period `.' matches any character except the termi- nal newline of the pattern space. 6) A regular expression followed by an asterisk `*' matches any number (including 0) of adjacent occurrences of the regular expression it follows. 7) A string of characters in square brackets `[ ]' matches any character in the string, and no oth- ers. If, however, the first character of the string is circumflex `^', the regular expression matches any character _e_x_c_e_p_t the characters in the string and the terminal newline of the pattern space. 8) A concatenation of regular expressions is a regular expression which matches the concatenation of strings matched by the components of the regular expression. 9) A regular expression between the sequences `\(' and `\)' is identical in effect to the unadorned regu- lar expression, but has side-effects which are described under the _s command below and specifica- tion 10) immediately below. 10) The expression `_\_d' means the same string of char- acters matched by an expression enclosed in `\(' and `\)' earlier in the same pattern. Here _d is a single digit; the string specified is that begin- ning with the _dth occurrence of `\(' counting from the left. For example, the expression `^\(.*\)\1' matches a line beginning with two repeated occurrences of the same string. 11) The null regular expression standing alone (e.g., `//') is equivalent to the last regular expres- sion compiled. To use one of the special characters (^ $ . * [ ] \ /) as a literal (to match an occurrence of itself in the input), precede the special character by a backslash `\'. For a context address to `match' the input requires that the whole pattern within the address match some portion of the pattern space. _2._3. _N_u_m_b_e_r _o_f _A_d_d_r_e_s_s_e_s The commands in the next section can have 0, 1, or 2 addresses. Under each command the maximum number of allowed addresses is given. For a command to have more addresses than the maximum allowed is considered an error. If a command has no addresses, it is applied to every line in the input. If a command has one address, it is applied to all lines which match that address. - 6 - If a command has two addresses, it is applied to the first line which matches the first address, and to all subsequent lines until (and including) the first subsequent line which matches the second address. Then an attempt is made on sub- sequent lines to again match the first address, and the pro- cess is repeated. Two addresses are separated by a comma. _E_x_a_m_p_l_e_s: /an/ matches lines 1, 3, 4 in our sample text /an.*an/ matches line 1 /^an/ matches no lines /./ matches all lines /\./ matches line 5 /r*an/ matches lines 1,3, 4 (number = zero!) /\(an\).*\1/ matches line 1 _3. _F_U_N_C_T_I_O_N_S All functions are named by a single character. In the fol- lowing summary, the maximum number of allowable addresses is given enclosed in parentheses, then the single character function name, possible arguments enclosed in angles (< >), an expanded English translation of the single-character name, and finally a description of what each function does. The angles around the arguments are _n_o_t part of the argu- ment, and should not be typed in actual editing commands. _3._1. _W_h_o_l_e-_l_i_n_e _O_r_i_e_n_t_e_d _F_u_n_c_t_i_o_n_s (2)d -- delete lines The _d function deletes from the file (does not write to the output) all those lines matched by its address(es). It also has the side effect that no further commands are attempted on the corpse of a deleted line; as soon as the _d function is executed, a new line is read from the input, and the list of editing commands is re- started from the beginning on the new line. (2)n -- next line The _n function reads the next line from the input, replacing the current line. The current line is written to the output if it should be. The list of editing commands is continued following the _n command. (1)a\ -- append lines The _a function causes the argument to be written to the output after the line matched by its address. The _a command is inherently multi- line; _a must appear at the end of a line, and may contain any number of lines. To preserve the one-command-to-a-line fiction, the - 7 - interior newlines must be hidden by a backslash character (`\') immediately preceding the newline. The argument is terminated by the first unhidden newline (the first one not immediately preceded by backslash). Once an _a function is successfully executed, will be written to the output regardless of what later commands do to the line which triggered it. The triggering line may be deleted entirely; will still be written to the output. The is not scanned for address matches, and no editing commands are attempted on it. It does not cause any change in the line-number counter. (1)i\ -- insert lines The _i function behaves identically to the _a func- tion, except that is written to the output _b_e_f_o_r_e the matched line. All other comments about the _a function apply to the _i function as well. (2)c\ -- change lines The _c function deletes the lines selected by its address(es), and replaces them with the lines in . Like _a and _i, _c must be followed by a newline hidden by a backslash; and interior new lines in must be hidden by backslashes. The _c command may have two addresses, and there- fore select a range of lines. If it does, all the lines in the range are deleted, but only one copy of is written to the output, _n_o_t one copy per line deleted. As with _a and _i, is not scanned for address matches, and no editing com- mands are attempted on it. It does not change the line-number counter. After a line has been deleted by a _c function, no further commands are attempted on the corpse. If text is appended after a line by _a or _r functions, and the line is subsequently changed, the text inserted by the _c function will be placed _b_e_f_o_r_e the text of the _a or _r functions. (The _r function is described in Section 3.4.) _N_o_t_e: Within the text put in the output by these functions, leading blanks and tabs will disappear, as always in _s_e_d commands. To get leading blanks and tabs into the output, precede the first desired blank or tab by a backslash; the backslash will not appear in the output. _E_x_a_m_p_l_e: The list of editing commands: n a\ XXXX - 8 - d applied to our standard input, produces: In Xanadu did Kubhla Khan XXXX Where Alph, the sacred river, ran XXXX Down to a sunless sea. In this particular case, the same effect would be produced by either of the two following command lists: n n i\ c\ XXXX XXXX d _3._2. _S_u_b_s_t_i_t_u_t_e _F_u_n_c_t_i_o_n One very important function changes parts of lines selected by a context search within the line. (2)s -- substitute The _s function replaces _p_a_r_t of a line (selected by ) with . It can best be read: Substitute for , The argument contains a pattern, exactly like the patterns in addresses (see 2.2 above). The only difference between and a con- text address is that the context address must be delimited by slash (`/') characters; may be delimited by any character other than space or newline. By default, only the first string matched by is replaced, but see the _g flag below. The argument begins immediately after the second delimiting character of , and must be followed immediately by another instance of the delimiting character. (Thus there are exactly _t_h_r_e_e instances of the delimiting character.) The is not a pattern, and the characters which are special in patterns do not have special meaning in . Instead, other characters are special: & is replaced by the string matched by _\_d (where _d is a single digit) is replaced by the _dth substring matched by parts of enclosed in `\(' and `\)'. If nested substrings occur in , the _dth is determined by counting open- ing delimiters (`\('). As in patterns, special characters may be made literal - 9 - by preceding them with backslash (`\'). The argument may contain the following flags: g -- substitute for all (non- overlapping) instances of in the line. After a successful substitu- tion, the scan for the next instance of begins just after the end of the inserted characters; characters put into the line from are not rescanned. p -- print the line if a successful replace- ment was done. The _p flag causes the line to be written to the output if and only if a substitution was actually made by the _s function. Notice that if several _s functions, each followed by a _p flag, successfully substitute in the same input line, multiple copies of the line will be written to the output: one for each successful substitution. w -- write the line to a file if a successful replacement was done. The _w flag causes lines which are actually substituted by the _s function to be written to a file named by . If exists before _s_e_d is run, it is overwritten; if not, it is created. A single space must separate _w and . The possibilities of multiple, somewhat different copies of one input line being written are the same as for _p. A maximum of 10 dif- ferent file names may be mentioned after _w flags and _w functions (see below), combined. _E_x_a_m_p_l_e_s: The following command, applied to our standard input, s/to/by/w changes produces, on the standard output: In Xanadu did Kubhla Khan A stately pleasure dome decree: Where Alph, the sacred river, ran Through caverns measureless by man Down by a sunless sea. and, on the file `changes': Through caverns measureless by man - 10 - Down by a sunless sea. If the nocopy option is in effect, the command: s/[.,;?:]/*P&*/gp produces: A stately pleasure dome decree*P:* Where Alph*P,* the sacred river*P,* ran Down to a sunless sea*P.* Finally, to illustrate the effect of the _g flag, the com- mand: /X/s/an/AN/p produces (assuming nocopy mode): In XANadu did Kubhla Khan and the command: /X/s/an/AN/gp produces: In XANadu did Kubhla KhAN _3._3. _I_n_p_u_t-_o_u_t_p_u_t _F_u_n_c_t_i_o_n_s (2)p -- print The print function writes the addressed lines to the standard output file. They are writ- ten at the time the _p function is encountered, regardless of what succeeding editing commands may do to the lines. (2)w -- write on The write func- tion writes the addressed lines to the file named by . If the file previously existed, it is overwritten; if not, it is created. The lines are written exactly as they exist when the write function is encountered for each line, regardless of what subsequent editing commands may do to them. Exactly one space must separate the _w and . A maximum of ten different files may be mentioned in write functions and _w flags after _s functions, combined. (1)r -- read the contents of a file The read function reads the contents of , and appends them after the line matched by the address. The file is read and appended regardless of what subsequent editing commands do to the line which matched its address. If _r and _a functions - 11 - are executed on the same line, the text from the _a functions and the _r functions is written to the output in the order that the functions are exe- cuted. Exactly one space must separate the _r and . If a file mentioned by a _r function cannot be opened, it is considered a null file, not an error, and no diagnostic is given. NOTE: Since there is a limit to the number of files that can be opened simultaneously, care should be taken that no more than ten files be mentioned in _w functions or flags; that number is reduced by one if any _r functions are present. (Only one read file is open at one time.) _E_x_a_m_p_l_e_s Assume that the file `note1' has the following contents: Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent suc- cessor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China. Then the following command: /Kubla/r note1 produces: In Xanadu did Kubla Khan Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent suc- cessor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China. A stately pleasure dome decree: Where Alph, the sacred river, ran Through caverns measureless to man Down to a sunless sea. _3._4. _M_u_l_t_i_p_l_e _I_n_p_u_t-_l_i_n_e _F_u_n_c_t_i_o_n_s Three functions, all spelled with capital letters, deal spe- cially with pattern spaces containing imbedded newlines; they are intended principally to provide pattern matches across lines in the input. (2)N -- Next line The next input line is appended to the current line in the pattern space; the two input lines are separated by an imbedded newline. Pattern matches may extend across the imbedded newline(s). (2)D -- Delete first part of the pattern space Delete up to and including the first newline character in the current pattern space. If the pattern space becomes empty (the only newline was the terminal - 12 - newline), read another line from the input. In any case, begin the list of editing commands again from its beginning. (2)P -- Print first part of the pattern space Print up to and including the first newline in the pattern space. The _P and _D functions are equivalent to their lower-case counterparts if there are no imbedded newlines in the pat- tern space. _3._5. _H_o_l_d _a_n_d _G_e_t _F_u_n_c_t_i_o_n_s Four functions save and retrieve part of the input for pos- sible later use. (2)h -- hold pattern space The _h functions copies the contents of the pattern space into a hold area (destroying the previous contents of the hold area). (2)H -- Hold pattern space The _H function appends the contents of the pattern space to the contents of the hold area; the former and new contents are separated by a newline. (2)g -- get contents of hold area The _g function copies the contents of the hold area into the pattern space (destroying the previous contents of the pattern space). (2)G -- Get contents of hold area The _G function appends the contents of the hold area to the con- tents of the pattern space; the former and new contents are separated by a newline. (2)x -- exchange The exchange command interchanges the contents of the pattern space and the hold area. _E_x_a_m_p_l_e The commands 1h 1s/ did.*// 1x G s/\n/ :/ applied to our standard example, produce: In Xanadu did Kubla Khan :In Xanadu A stately pleasure dome decree: :In Xanadu Where Alph, the sacred river, ran :In Xanadu Through caverns measureless to man :In Xanadu Down to a sunless sea. :In Xanadu _3._6. _F_l_o_w-_o_f-_C_o_n_t_r_o_l _F_u_n_c_t_i_o_n_s These functions do no editing on the input lines, but con- trol the application of functions to the lines selected by the address part. (2)! -- Don't The _D_o_n'_t command causes the next command - 13 - (written on the same line), to be applied to all and only those input lines _n_o_t selected by the adress part. (2){ -- Grouping The grouping command `{' causes the next set of commands to be applied (or not applied) as a block to the input lines selected by the addresses of the grouping command. The first of the commands under control of the grouping may appear on the same line as the `{' or on the next line. The group of commands is terminated by a matching `}' standing on a line by itself. Groups can be nested. (0):