August 9, 1987 Advanced Editing on UNIX USD:13-1 Advanced Editing on UNIX Brian W. Kernighan (Updated for 4.3BSD by Mark Seiden) _A_B_S_T_R_A_C_T This paper is meant to help secretaries, typ- ists and programmers to make effective use of the UNIX|- facilities for preparing and editing text. It provides explanations and examples of o+ special characters, line addressing and glo- bal commands in the editor _e_d; o+ commands for ``cut and paste'' operations on files and parts of files, including the _m_v, _c_p, _c_a_t and _r_m commands, and the _r, _w, _m and _t commands of the editor; o+ editing scripts and editor-based programs like _g_r_e_p and _s_e_d. Although the treatment is aimed at non- programmers, new users with any background should find helpful hints on how to get their jobs done more easily. _1. _I_N_T_R_O_D_U_C_T_I_O_N Although UNIX provides remarkably effective tools for text editing, that by itself is no guarantee that everyone will automatically make the most effective use of them. In particular, people who are not computer specialists _ typ- _________________________ |- UNIX is a trademark of Bell Laboratories. August 9, 1987 USD:13-2 Advanced Editing on UNIX ists, secretaries, casual users _ often use the system less effectively than they might. (There is a good argument that new users would better use their time learning a display editor, like _v_i, or perhaps a version of _e_m_a_c_s, like _j_o_v_e, rather than an editor as ignorant of display terminals as _e_d.) This document is intended as a sequel to _A _T_u_t_o_r_i_a_l _I_n_t_r_o_d_u_c_t_i_o_n _t_o _t_h_e _U_N_I_X _T_e_x_t _E_d_i_t_o_r [1], providing explana- tions and examples of how to edit using _e_d with less effort. (You should also be familiar with the material in _U_N_I_X _F_o_r _B_e_g_i_n_n_e_r_s [2].) Further information on all commands dis- cussed here can be found in section 1 of the _T_h_e _U_N_I_X _U_s_e_r'_s _M_a_n_u_a_l [3]. Examples are based on observations of users and the difficulties they encounter. Topics covered include special characters in searches and substitute commands, line addressing, the global commands, and line moving and copy- ing. There are also brief discussions of effective use of related tools, like those for file manipulation, and those based on _e_d, like _g_r_e_p and _s_e_d. A word of caution. There is only one way to learn to use something, and that is to _u_s_e it. Reading a description is no substitute for trying something. A paper like this one should give you ideas about what to try, but until you actually try something, you will not learn it. August 9, 1987 Advanced Editing on UNIX USD:13-3 _2. _S_P_E_C_I_A_L _C_H_A_R_A_C_T_E_R_S The editor _e_d is the primary interface to the system for many people, so it is worthwhile to know how to get the most out of _e_d for the least effort. The next few sections will discuss shortcuts and labor-saving devices. Not all of these will be instantly useful to any one person, of course, but a few will be, and the others should give you ideas to store away for future use. And as always, until you try these things, they will remain theoretical knowledge, not something you have confi- dence in. _T_h_e _L_i_s_t _c_o_m_m_a_n_d `_l' _e_d provides two commands for printing the contents of the lines you're editing. Most people are familiar with _p, in combinations like 1,$p to print all the lines you're editing, or s/abc/def/p to change `abc' to `def' on the current line. Less familiar is the _l_i_s_t command _l (the letter `_l'), which gives slightly more information than _p. In particular, _l makes visible characters that are normally invisible, such as tabs and backspaces. If you list a line that contains some of these, _l will print each tab as -> and each backspace as -<.|- This makes it much easier to correct the sort of typing mistake that inserts extra spaces _________________________ |- These composite characters are created by overstrik- ing a minus and a > or <, so they only appear as < or > on display terminals. August 9, 1987 USD:13-4 Advanced Editing on UNIX adjacent to tabs, or inserts a backspace followed by a space. The _l command also `folds' long lines for printing _ any line that exceeds 72 characters is printed on multiple lines; each printed line except the last is terminated by a backslash , so you can tell it was folded. This is useful for printing long lines on short terminals. Occasionally the _l command will print in a line a string of numbers preceded by a backslash, such as 07 or 16. These combinations are used to make visible characters that normally don't print, like form feed or vertical tab or bell. Each such combination is a single character. When you see such characters, be wary _ they may have surprising meanings when printed on some terminals. Often their pres- ence means that your finger slipped while you were typing; you almost never want them. _T_h_e _S_u_b_s_t_i_t_u_t_e _C_o_m_m_a_n_d `_s' Most of the next few sections will be taken up with a discussion of the substitute command _s. Since this is the command for changing the contents of individual lines, it probably has the most complexity of any _e_d command, and the most potential for effective use. As the simplest place to begin, recall the meaning of a trailing _g after a substitute command. With s/this/that/ and s/this/that/g the first one replaces the _f_i_r_s_t `this' on August 9, 1987 Advanced Editing on UNIX USD:13-5 the line with `that'. If there is more than one `this' on the line, the second form with the trailing _g changes _a_l_l of them. Either form of the _s command can be followed by _p or _l to `print' or `list' (as described in the previous section) the contents of the line: s/this/that/p s/this/that/l s/this/that/gp s/this/that/gl are all legal, and mean slightly different things. Make sure you know what the differences are. Of course, any _s command can be preceded by one or two `line numbers' to specify that the substitution is to take place on a group of lines. Thus 1,$s/mispell/misspell/ changes the _f_i_r_s_t occurrence of `mispell' to `misspell' on every line of the file. But 1,$s/mispell/misspell/g changes _e_v_e_r_y occurrence in every line (and this is more likely to be what you wanted in this particular case). You should also notice that if you add a _p or _l to the end of any of these substitute commands, only the last line that got changed will be printed, not all the lines. We will talk later about how to print all the lines that were modified. _T_h_e _U_n_d_o _C_o_m_m_a_n_d `_u' Occasionally you will make a substitution in a line, only to realize too late that it was a ghastly mistake. The `undo' command _u lets you `undo' the last substitution: the August 9, 1987 USD:13-6 Advanced Editing on UNIX last line that was substituted can be restored to its previ- ous state by typing the command u _T_h_e _M_e_t_a_c_h_a_r_a_c_t_e_r `' As you have undoubtedly noticed when you use _e_d, cer- tain characters have unexpected meanings when they occur in the left side of a substitute command, or in a search for a particular line. In the next several sections, we will talk about these special characters, which are often called `metacharacters'. The first one is the period `'. On the left side of a substitute command, or in a search with `/.../', `' stands for _a_n_y single character. Thus the search /xy/ finds any line where `x' and `y' occur separated by a single charac- ter, as in x+y x-y xy xy and so on. (We will use to stand for a space whenever we need to make it visible.) Since `' matches a single character, that gives you a way to deal with funny characters printed by _l. Suppose you have a line that, when printed with the _l command, appears as .... th07is .... and you want to get rid of the 07 (which represents the bell character, by the way). The most obvious solution is to try s/07// but this will fail. (Try it.) The brute force solution, which most people would now take, is to re-type the entire line. This is guaranteed, and is actually quite a reasonable tactic if August 9, 1987 Advanced Editing on UNIX USD:13-7 the line in question isn't too big, but for a very long line, re-typing is a bore. This is where the metacharacter `' comes in handy. Since `07' really represents a single character, if we say s/this/this/ the job is done. The `' matches the mysterious character between the `h' and the `i', _w_h_a_t_e_v_e_r _i_t _i_s. Bear in mind that since `' matches any single charac- ter, the command s//,/ converts the first character on a line into a `,', which very often is not what you intended. As is true of many characters in _e_d, the `' has several meanings, depending on its context. This line shows all three: s/// The first `' is a line number, the number of the line we are editing, which is called `line dot'. (We will discuss line dot more in Section 3.) The second `' is a metacharacter that matches any single character on that line. The third `' is the only one that really is an honest literal period. On the _r_i_g_h_t side of a substitution, `' is not special. If you apply this command to the line Now is the time the result will be ow is the time which is probably not what you intended. _T_h_e _B_a_c_k_s_l_a_s_h `' Since a period means `any character', the question naturally arises of what to do when you really want a period. For example, how do you convert the line Now is the time into Now is the time? The backslash `' does the job. August 9, 1987 USD:13-8 Advanced Editing on UNIX A backslash turns off any special meaning that the next character might have; in particular, `' converts the `' from a `match anything' into a period, so you can use it to replace the period in Now is the time like this: s//?/ The pair of characters `' is considered by _e_d to be a single real period. The backslash can also be used when searching for lines that contain a special character. Suppose you are looking for a line that contains PP The search /PP/ isn't adequate, for it will find a line like THE APPLICATION OF ... because the `' matches the letter `A'. But if you say /PP/ you will find only lines that contain `PP'. The backslash can also be used to turn off special meanings for characters other than `'. For example, con- sider finding a line that contains a backslash. The search // won't work, because the `' isn't a literal `', but instead means that the second `/' no longer delimits the search. But by preceding a backslash with another one, you can search for a literal backslash. Thus // does work. Similarly, you can search for a forward slash `/' with /// The backslash turns off the meaning of the immediately fol- lowing `/' so that it doesn't terminate the /.../ construc- tion prematurely. As an exercise, before reading further, find two sub- stitute commands each of which will convert the line xy into the line xy August 9, 1987 Advanced Editing on UNIX USD:13-9 Here are several solutions; verify that each works as advertised. s/// s/x/x/ s/y/y/ A couple of miscellaneous notes about backslashes and special characters. First, you can use any character to delimit the pieces of an _s command: there is nothing sacred about slashes. (But you must use slashes for context searching.) For instance, in a line that contains a lot of slashes already, like //exec //sys.fort.go // etc... you could use a colon as the delimiter _ to delete all the slashes, type s:/::g Second, if # and @ are your character erase and line kill characters, you have to type # and @; this is true whether you're talking to _e_d or any other program. When you are adding text with _a or _i or _c, backslash is not special, and you should only put in one backslash for each one you really want. _T_h_e _D_o_l_l_a_r _S_i_g_n `$' The next metacharacter, the `$', stands for `the end of the line'. As its most obvious use, suppose you have the line Now is the and you wish to add the word `time' to the end. Use the $ like this: s/$/time/ to get Now is the time Notice that a space is needed before `time' in the substi- tute command, or you will get Now is thetime As another example, replace the second comma in the August 9, 1987 USD:13-10 Advanced Editing on UNIX following line with a period without altering the first: Now is the time, for all good men, The command needed is s/,$// The $ sign here provides context to make specific which comma we mean. Without it, of course, the _s command would operate on the first comma to produce Now is the time for all good men, As another example, to convert Now is the time into Now is the time? as we did earlier, we can use s/$/?/ Like `', the `$' has multiple meanings depending on context. In the line $s/$/$/ the first `$' refers to the last line of the file, the second refers to the end of that line, and the third is a literal dollar sign, to be added to that line. _T_h_e _C_i_r_c_u_m_f_l_e_x `^' The circumflex (or hat or caret) `^' stands for the beginning of the line. For example, suppose you are looking for a line that begins with `the'. If you simply say /the/ you will in all likelihood find several lines that contain `the' in the middle before arriving at the one you want. But with /^the/ you narrow the context, and thus arrive at the desired one more easily. The other use of `^' is of course to enable you to insert something at the beginning of a line: s/^// places a space at the beginning of the current line. August 9, 1987 Advanced Editing on UNIX USD:13-11 Metacharacters can be combined. To search for a line that contains _o_n_l_y the characters PP you can use the command /^PP$/ _T_h_e _S_t_a_r `*' Suppose you have a line that looks like this: _t_e_x_t x y _t_e_x_t where _t_e_x_t stands for lots of text, and there are some indeterminate number of spaces between the _x and the _y. Suppose the job is to replace all the spaces between _x and _y by a single space. The line is too long to retype, and there are too many spaces to count. What now? This is where the metacharacter `*' comes in handy. A character followed by a star stands for as many consecutive occurrences of that character as possible. To refer to all the spaces at once, say s/x*y/xy/ The construction `*' means `as many spaces as possible'. Thus `x*y' means `an x, as many spaces as possible, then a y'. The star can be used with any character, not just space. If the original example was instead _t_e_x_t x--------y _t_e_x_t then all `-' signs can be replaced by a single space with the command s/x-*y/xy/ Finally, suppose that the line was _t_e_x_t xy _t_e_x_t Can you see what trap lies in wait for the unwary? If you blindly type s/x*y/xy/ what will happen? The answer, natur- ally, is that it depends. If there are no other x's or y's on the line, then everything works, but it's blind luck, not August 9, 1987 USD:13-12 Advanced Editing on UNIX good management. Remember that `' matches _a_n_y single char- acter? Then `*' matches as many single characters as possi- ble, and unless you're careful, it can eat up a lot more of the line than you expected. If the line was, for example, like this: _t_e_x_t x _t_e_x_t xy _t_e_x_t y _t_e_x_t then saying s/x*y/xy/ will take everything from the _f_i_r_s_t `x' to the _l_a_s_t `y', which, in this example, is undoubtedly more than you wanted. The solution, of course, is to turn off the special meaning of `' with `': s/x*y/xy/ Now everything works, for `*' means `as many _p_e_r_i_o_d_s as possible'. There are times when the pattern `*' is exactly what you want. For example, to change Now is the time for all good men .... into Now is the time use `*' to eat up every- thing after the `for': s/for*// There are a couple of additional pitfalls associated with `*' that you should be aware of. Most notable is the fact that `as many as possible' means _z_e_r_o or more. The fact that zero is a legitimate possibility is sometimes rather surprising. For example, if our line contained _t_e_x_t xy _t_e_x_t x y _t_e_x_t and we said s/x*y/xy/ the _f_i_r_s_t `xy' matches this pattern, for it consists of an `x', zero spaces, and a `y'. The result is that the substitute acts on the first `xy', and does not touch the later one that actually contains some intervening spaces. August 9, 1987 Advanced Editing on UNIX USD:13-13 The way around this, if it matters, is to specify a pattern like /x*y/ which says `an x, a space, then as many more spaces as possible, then a y', in other words, one or more spaces. The other startling behavior of `*' is again related to the fact that zero is a legitimate number of occurrences of something followed by a star. The command s/x*/y/g when applied to the line abcdef produces yaybycydyeyfy which is almost certainly not what was intended. The reason for this behavior is that zero is a legal number of matches, and there are no x's at the beginning of the line (so that gets converted into a `y'), nor between the `a' and the `b' (so that gets converted into a `y'), nor ... and so on. Make sure you really want zero matches; if not, in this case write s/xx*/y/g `xx*' is one or more x's. _T_h_e _B_r_a_c_k_e_t_s `[ ]' Suppose that you want to delete any numbers that appear at the beginning of all lines of a file. You might first think of trying a series of commands like 1,$s/^1*// 1,$s/^2*// 1,$s/^3*// and so on, but this is clearly going to take forever if the numbers are at all long. Unless you want to repeat the commands over and over until finally all numbers are gone, you must get all the digits on one pass. This is the purpose of the brackets [ and ]. The construction [0123456789] matches any single digit August 9, 1987 USD:13-14 Advanced Editing on UNIX _ the whole thing is called a `character class'. With a character class, the job is easy. The pattern `[0123456789]*' matches zero or more digits (an entire number), so 1,$s/^[0123456789]*// deletes all digits from the beginning of all lines. Any characters can appear within a character class, and just to confuse the issue there are essentially no special characters inside the brackets; even the backslash doesn't have a special meaning. To search for special characters, for example, you can say /[$^[]/ Within [...], the `[' is not special. To get a `]' into a character class, make it the first character. It's a nuisance to have to spell out the digits, so you can abbreviate them as [0-9]; similarly, [a-z] stands for the lower case letters, and [A-Z] for upper case. As a final frill on character classes, you can specify a class that means `none of the following characters'. This is done by beginning the class with a `^': [^0-9] stands for `any character _e_x_c_e_p_t a digit'. Thus you might find the first line that doesn't begin with a tab or space by a search like /^[^(space)(tab)]/ Within a character class, the circumflex has a special meaning only if it occurs at the beginning. Just to con- vince yourself, verify that /^[^^]/ finds a line that doesn't begin with a circumflex. August 9, 1987 Advanced Editing on UNIX USD:13-15 _T_h_e _A_m_p_e_r_s_a_n_d `&' The ampersand `&' is used primarily to save typing. Suppose you have the line Now is the time and you want to make it Now is the best time Of course you can always say s/the/the best/ but it seems silly to have to repeat the `the'. The `&' is used to eliminate the repetition. On the _r_i_g_h_t side of a substitute, the ampersand means `whatever was just matched', so you can say s/the/& best/ and the `&' will stand for `the'. Of course this isn't much of a saving if the thing matched is just `the', but if it is something truly long or awful, or if it is something like `.*' which matches a lot of text, you can save some tedious typing. There is also much less chance of making a typing error in the replacement text. For example, to parenthesize a line, regardless of its length, s/*/(&)/ The ampersand can occur more than once on the right side: s/the/& best and & worst/ makes Now is the best and the worst time and s/*/&? &!!/ converts the original line into Now is the time? Now is the time!! To get a literal ampersand, naturally the backslash is used to turn off the special meaning: s/ampersand/&/ con- verts the word into the symbol. Notice that `&' is not spe- cial on the left side of a substitute, only on the _r_i_g_h_t side. August 9, 1987 USD:13-16 Advanced Editing on UNIX _S_u_b_s_t_i_t_u_t_i_n_g _N_e_w_l_i_n_e_s _e_d provides a facility for splitting a single line into two or more shorter lines by `substituting in a newline'. As the simplest example, suppose a line has gotten unmanage- ably long because of editing (or merely because it was unwisely typed). If it looks like _t_e_x_t xy _t_e_x_t you can break it between the `x' and the `y' like this: s/xy/x y/ This is actually a single command, although it is typed on two lines. Bearing in mind that `' turns off special mean- ings, it seems relatively intuitive that a `' at the end of a line would make the newline there no longer special. You can in fact make a single line into several lines with this same mechanism. As a large example, consider underlining the word `very' in a long line by splitting `very' onto a separate line, and preceding it by the _r_o_f_f or _n_r_o_f_f formatting command `.ul'. _t_e_x_t a very big _t_e_x_t The command s/very/ .ul very / converts the line into four shorter lines, preceding the word `very' by the line `.ul', and eliminating the spaces around the `very', all at the same time. When a newline is substituted in, dot is left pointing at the last line created. _J_o_i_n_i_n_g _L_i_n_e_s Lines may also be joined together, but this is done August 9, 1987 Advanced Editing on UNIX USD:13-17 with the _j command instead of _s. Given the lines Now is the time and supposing that dot is set to the first of them, then the command j joins them together. No blanks are added, which is why we carefully showed a blank at the beginning of the second line. All by itself, a _j command joins line dot to line dot+1, but any contiguous set of lines can be joined. Just specify the starting and ending line numbers. For example, 1,$jp joins all the lines into one big one and prints it. (More on line numbers in Section 3.) _R_e_a_r_r_a_n_g_i_n_g _a _L_i_n_e _w_i_t_h ( ... ) (This section should be skipped on first reading.) Recall that `&' is a shorthand that stands for whatever was matched by the left side of an _s command. In much the same way you can capture separate pieces of what was matched; the only difference is that you have to specify on the left side just what pieces you're interested in. Suppose, for instance, that you have a file of lines that consist of names in the form Smith, A. B. Jones, C. and so on, and you want the initials to precede the name, as in A. B. Smith C. Jones It is possible to do this with a series of editing commands, but it is tedious and error- prone. (It is instructive to figure out how it is done, though.) The alternative is to `tag' the pieces of the pattern August 9, 1987 USD:13-18 Advanced Editing on UNIX (in this case, the last name, and the initials), and then rearrange the pieces. On the left side of a substitution, if part of the pattern is enclosed between ( and ), whatever matched that part is remembered, and available for use on the right side. On the right side, the symbol `1' refers to whatever matched the first (...) pair, `2' to the second (...), and so on. The command 1,$s/^([^,]*),*(*)/21/ although hard to read, does the job. The first (...) matches the last name, which is any string up to the comma; this is referred to on the right side with `1'. The second (...) is whatever fol- lows the comma and any spaces, and is referred to as `2'. Of course, with any editing sequence this complicated, it's foolhardy to simply run it and hope. The global com- mands _g and _v discussed in section 4 provide a way for you to print exactly those lines which were affected by the sub- stitute command, and thus verify that it did what you wanted in all cases. _3. _L_I_N_E _A_D_D_R_E_S_S_I_N_G _I_N _T_H_E _E_D_I_T_O_R The next general area we will discuss is that of line addressing in _e_d, that is, how you specify what lines are to be affected by editing commands. We have already used con- structions like 1,$s/x/y/ to specify a change on all lines. And most users are long since familiar with using a single newline (or return) to print the next line, and with /thing/ August 9, 1987 Advanced Editing on UNIX USD:13-19 to find a line that contains `thing'. Less familiar, surprisingly enough, is the use of ?thing? to scan _b_a_c_k_- _w_a_r_d_s for the previous occurrence of `thing'. This is espe- cially handy when you realize that the thing you want to operate on is back up the page from where you are currently editing. The slash and question mark are the only characters you can use to delimit a context search, though you can use essentially any character in a substitute command. _A_d_d_r_e_s_s _A_r_i_t_h_m_e_t_i_c The next step is to combine the line numbers like `', `$', `/.../' and `?...?' with `+' and `-'. Thus $-1 is a command to print the next to last line of the current file (that is, one line before line `$'). For example, to recall how far you got in a previous editing session, $-5,$p prints the last six lines. (Be sure you understand why it's six, not five.) If there aren't six, of course, you'll get an error message. As another example, -3,+3p prints from three lines before where you are now (at line dot) to three lines after, thus giving you a bit of context. By the way, the `+' can be omitted: -3,3p is absolutely identical in meaning. Another area in which you can save typing effort in specifying lines is to use `-' and `+' as line numbers by themselves. - by itself is a command to move back up one August 9, 1987 USD:13-20 Advanced Editing on UNIX line in the file. In fact, you can string several minus signs together to move back up that many lines: --- moves up three lines, as does `-3'. Thus -3,+3p is also identical to the examples above. Since `-' is shorter than `-1', constructions like -,s/bad/good/ are useful. This changes `bad' to `good' on the previous line and on the current line. `+' and `-' can be used in combination with searches using `/.../' and `?...?', and with `$'. The search /thing/-- finds the line containing `thing', and positions you two lines before it. _R_e_p_e_a_t_e_d _S_e_a_r_c_h_e_s Suppose you ask for the search /horrible thing/ and when the line is printed you discover that it isn't the hor- rible thing that you wanted, so it is necessary to repeat the search again. You don't have to re-type the search, for the construction // is a shorthand for `the previous thing that was searched for', whatever it was. This can be repeated as many times as necessary. You can also go back- wards: ?? searches for the same thing, but in the reverse direction. Not only can you repeat the search, but you can use `//' as the left side of a substitute command, to mean `the most recent pattern'. /horrible thing/ .... _e_d _p_r_i_n_t_s _l_i_n_e _w_i_t_h `_h_o_r_r_i_b_l_e _t_h_i_n_g' ... s//good/p To August 9, 1987 Advanced Editing on UNIX USD:13-21 go backwards and change a line, say ??s//good/ Of course, you can still use the `&' on the right hand side of a sub- stitute to stand for whatever got matched: //s//&&/p finds the next occurrence of whatever you searched for last, replaces it by two copies of itself, then prints the line just to verify that it worked. _D_e_f_a_u_l_t _L_i_n_e _N_u_m_b_e_r_s _a_n_d _t_h_e _V_a_l_u_e _o_f _D_o_t One of the most effective ways to speed up your editing is always to know what lines will be affected by a command if you don't specify the lines it is to act on, and on what line you will be positioned (i.e., the value of dot) when a command finishes. If you can edit without specifying unnecessary line numbers, you can save a lot of typing. As the most obvious example, if you issue a search com- mand like /thing/ you are left pointing at the next line that contains `thing'. Then no address is required with commands like _s to make a substitution on that line, or _p to print it, or _l to list it, or _d to delete it, or _a to append text after it, or _c to change it, or _i to insert text before it. What happens if there was no `thing'? Then you are left right where you were _ dot is unchanged. This is also true if you were sitting on the only `thing' when you issued the command. The same rules hold for searches that use `?...?'; the only difference is the direction in which you August 9, 1987 USD:13-22 Advanced Editing on UNIX search. The delete command _d leaves dot pointing at the line that followed the last deleted line. When line `$' gets deleted, however, dot points at the _n_e_w line `$'. The line-changing commands _a, _c and _i by default all affect the current line _ if you give no line number with them, _a appends text after the current line, _c changes the current line, and _i inserts text before the current line. _a, _c, and _i behave identically in one respect _ when you stop appending, changing or inserting, dot points at the last line entered. This is exactly what you want for typing and editing on the fly. For example, you can say a ... text ... ... botch ... (minor error) s/botch/correct/(fix botched line) a ... more text ... without specifying any line number for the substitute command or for the second append command. Or you can say a ... text ... ... horrible botch ...(major error) c (replace entire line) ... fixed up line ... You should experiment to determine what happens if you add _n_o lines with _a, _c or _i. The _r command will read a file into the text being August 9, 1987 Advanced Editing on UNIX USD:13-23 edited, either at the end if you give no address, or after the specified line if you do. In either case, dot points at the last line read in. Remember that you can even say _0_r to read a file in at the beginning of the text. (You can also say _0_a or _1_i to start adding text at the beginning.) The _w command writes out the entire file. If you pre- cede the command by one line number, that line is written, while if you precede it by two line numbers, that range of lines is written. The _w command does _n_o_t change dot: the current line remains the same, regardless of what lines are written. This is true even if you say something like /^AB/,/^AE/w abstract which involves a context search. Since the _w command is so easy to use, you should save what you are editing regularly as you go along just in case the system crashes, or in case you do something foolish, like clobbering what you're editing. The least intuitive behavior, in a sense, is that of the _s command. The rule is simple _ you are left sitting on the last line that got changed. If there were no changes, then dot is unchanged. To illustrate, suppose that there are three lines in the buffer, and you are sitting on the middle one: x1 x2 x3 Then the command -,+s/x/y/p prints the third line, which is the last one changed. But if the three lines had been x1 y2 y3 and the same command had been issued while dot pointed at August 9, 1987 USD:13-24 Advanced Editing on UNIX the second line, then the result would be to change and print only the first line, and that is where dot would be set. _S_e_m_i_c_o_l_o_n `;' Searches with `/.../' and `?...?' start at the current line and move forward or backward respectively until they either find the pattern or get back to the current line. Sometimes this is not what is wanted. Suppose, for example, that the buffer contains lines like this: ab bc Starting at line 1, one would expect that the command /a/,/b/p prints all the lines from the `ab' to the `bc' inclusive. Actually this is not what happens. _B_o_t_h searches (for `a' and for `b') start from the same point, and thus they both find the line that contains `ab'. The result is to print a single line. Worse, if there had been a line with a `b' in it before the `ab' line, then the print command would be in error, since the second line number would be less than the first, and it is illegal to try to print lines in reverse order. This is because the comma separator for line numbers August 9, 1987 Advanced Editing on UNIX USD:13-25 doesn't set dot as each address is processed; each search starts from the same place. In _e_d, the semicolon `;' can be used just like comma, with the single difference that use of a semicolon forces dot to be set at that point as the line numbers are being evaluated. In effect, the semicolon `moves' dot. Thus in our example above, the command /a/;/b/p prints the range of lines from `ab' to `bc', because after the `a' is found, dot is set to that line, and then `b' is searched for, starting beyond that line. This property is most often useful in a very simple situation. Suppose you want to find the _s_e_c_o_n_d occurrence of `thing'. You could say /thing/ // but this prints the first occurrence as well as the second, and is a nuisance when you know very well that it is only the second one you're interested in. The solution is to say /thing/;// This says to find the first occurrence of `thing', set dot to that line, then find the second and print only that. Closely related is searching for the second previous occurrence of something, as in ?something?;?? Printing the third or fourth or ... in either direction is left as an exercise. Finally, bear in mind that if you want to find the first occurrence of something in a file, starting at an arbitrary place within the file, it is not sufficient to say 1;/thing/ because this fails if `thing' occurs on line 1. But it is possible to say 0;/thing/ (one of the few places August 9, 1987 USD:13-26 Advanced Editing on UNIX where 0 is a legal line number), for this starts the search at line 1. _I_n_t_e_r_r_u_p_t_i_n_g _t_h_e _E_d_i_t_o_r As a final note on what dot gets set to, you should be aware that if you hit the interrupt or delete or rubout or break key while _e_d is doing a command, things are put back together again and your state is restored as much as possi- ble to what it was before the command began. Naturally, some changes are irrevocable _ if you are reading or writing a file or making substitutions or deleting lines, these will be stopped in some clean but unpredictable state in the mid- dle (which is why it is not usually wise to stop them). Dot may or may not be changed. Printing is more clear cut. Dot is not changed until the printing is done. Thus if you print until you see an interesting line, then hit delete, you are _n_o_t sitting on that line or even near it. Dot is left where it was when the _p command was started. _4. _G_L_O_B_A_L _C_O_M_M_A_N_D_S The global commands _g and _v are used to perform one or more editing commands on all lines that either contain _g) or don't contain _v) a specified pattern. As the simplest example, the command g/UNIX/p prints all lines that contain the word `UNIX'. The pattern that August 9, 1987 Advanced Editing on UNIX USD:13-27 goes between the slashes can be anything that could be used in a line search or in a substitute command; exactly the same rules and limitations apply. As another example, then, g/^/p prints all the format- ting commands in a file (lines that begin with `'). The _v command is identical to _g, except that it operates on those line that do _n_o_t contain an occurrence of the pattern. (Don't look too hard for mnemonic significance to the letter `v'.) So v/^/p prints all the lines that don't begin with `' _ the actual text lines. The command that follows _g or _v can be anything: g/^/d deletes all lines that begin with `', and g/^$/d deletes all empty lines. Probably the most useful command that can follow a glo- bal is the substitute command, for this can be used to make a change and print each affected line for verification. For example, we could change the word `Unix' to `UNIX' every- where, and verify that it really worked, with g/Unix/s//UNIX/gp Notice that we used `//' in the substitute command to mean `the previous pattern', in this case, `Unix'. The _p command is done on every line that matches the pattern, not just those on which a substitution took place. The global command operates by making two passes over the file. On the first pass, all lines that match the August 9, 1987 USD:13-28 Advanced Editing on UNIX pattern are marked. On the second pass, each marked line in turn is examined, dot is set to that line, and the command executed. This means that it is possible for the command that follows a _g or _v to use addresses, set dot, and so on, quite freely. g/^PP/+ prints the line that follows each `.PP' command (the signal for a new paragraph in some for- matting packages). Remember that `+' means `one line past dot'. And g/topic/?^SH?1 searches for each line that con- tains `topic', scans backwards until it finds a line that begins `.SH' (a section heading) and prints the line that follows that, thus showing the section headings under which `topic' is mentioned. Finally, g/^EQ/+,/^EN/-p prints all the lines that lie between lines beginning with `.EQ' and `.EN' formatting commands. The _g and _v commands can also be preceded by line numbers, in which case the lines searched are only those in the range specified. _M_u_l_t_i-_l_i_n_e _G_l_o_b_a_l _C_o_m_m_a_n_d_s It is possible to do more than one command under the control of a global command, although the syntax for expressing the operation is not especially natural or pleasant. As an example, suppose the task is to change `x' to `y' and `a' to `b' on all lines that contain `thing'. Then g/thing/s/x/y/ s/a/b/ is sufficient. The `' signals the _g command that the set of commands continues on the next line; it terminates on the first line that does not end with August 9, 1987 Advanced Editing on UNIX USD:13-29 `'. (As a minor blemish, you can't use a substitute command to insert a newline within a _g command.) You should watch out for this problem: the command g/x/s//y/ s/a/b/ does _n_o_t work as you expect. The remem- bered pattern is the last pattern that was actually exe- cuted, so sometimes it will be `x' (as expected), and some- times it will be `a' (not expected). You must spell it out, like this: g/x/s/x/y/ s/a/b/ It is also possible to execute _a, _c and _i commands under a global command; as with other multi-line construc- tions, all that is needed is to add a `' at the end of each line except the last. Thus to add a `.nf' and `.sp' command before each `.EQ' line, type g/^EQ/i nf sp There is no need for a final line containing a `' to terminate the _i command, unless there are further commands being done under the glo- bal. On the other hand, it does no harm to put it in either. _5. _C_U_T _A_N_D _P_A_S_T_E _W_I_T_H _U_N_I_X _C_O_M_M_A_N_D_S One editing area in which non-programmers seem not very confident is in what might be called `cut and paste' opera- tions _ changing the name of a file, making a copy of a file somewhere else, moving a few lines from one place to another in a file, inserting one file in the middle of another, splitting a file into pieces, and splicing two or more files together. August 9, 1987 USD:13-30 Advanced Editing on UNIX Yet most of these operations are actually quite easy, if you keep your wits about you and go cautiously. The next several sections talk about cut and paste. We will begin with the UNIX commands for moving entire files around, then discuss _e_d commands for operating on pieces of files. _C_h_a_n_g_i_n_g _t_h_e _N_a_m_e _o_f _a _F_i_l_e You have a file named `memo' and you want it to be called `paper' instead. How is it done? The UNIX program that renames files is called _m_v (for `move'); it `moves' the file from one name to another, like this: mv memo paper That's all there is to it: _m_v from the old name to the new name. mv oldname newname Warning: if there is already a file around with the new name, its present contents will be silently clobbered by the informa- tion from the other file. The one exception is that you can't move a file to itself _ mv x x is illegal. _M_a_k_i_n_g _a _C_o_p_y _o_f _a _F_i_l_e Sometimes what you want is a copy of a file _ an entirely fresh version. This might be because you want to work on a file, and yet save a copy in case something gets fouled up, or just because you're paranoid. In any case, the way to do it is with the _c_p command. _c_p stands for `copy'; the system is big on short command names, which are appreciated by heavy users, but sometimes a August 9, 1987 Advanced Editing on UNIX USD:13-31 strain for novices.) Suppose you have a file called `good' and you want to save a copy before you make some dramatic editing changes. Choose a name _ `savegood' might be acceptable _ then type cp good savegood This copies `good' onto `savegood', and you now have two identical copies of the file `good'. (If `savegood' previously contained some- thing, it gets overwritten.) Now if you decide at some time that you want to get back to the original state of `good', you can say mv savegood good (if you're not interested in `savegood' any more), or cp savegood good if you still want to retain a safe copy. In summary, _m_v just renames a file; _c_p makes a dupli- cate copy. Both of them clobber the `target' file if it already exists, so you had better be sure that's what you want to do _b_e_f_o_r_e you do it. _R_e_m_o_v_i_n_g _a _F_i_l_e If you decide you are really done with a file forever, you can remove it with the _r_m command: rm savegood throws away (irrevocably) the file called `savegood'. _P_u_t_t_i_n_g _T_w_o _o_r _M_o_r_e _F_i_l_e_s _T_o_g_e_t_h_e_r The next step is the familiar one of collecting two or more files into one big one. This will be needed, for exam- ple, when the author of a paper decides that several sec- August 9, 1987 USD:13-32 Advanced Editing on UNIX tions need to be combined into one. There are several ways to do it, of which the cleanest, once you get used to it, is a program called _c_a_t. (Not _a_l_l programs have two-letter names.) _c_a_t is short for `concatenate', which is exactly what we want to do. Suppose the job is to combine the files `file1' and `file2' into a single file called `bigfile'. If you say cat file the contents of `file' will get printed on your termi- nal. If you say cat file1 file2 the contents of `file1' and then the contents of `file2' will _b_o_t_h be printed on your terminal, in that order. So _c_a_t combines the files, all right, but it's not much help to print them on the ter- minal _ we want them in `bigfile'. Fortunately, there is a way. You can tell the system that instead of printing on your terminal, you want the same information put in a file. The way to do it is to add to the command line the character > and the name of the file where you want the output to go. Then you can say cat file1 file2 >bigfile and the job is done. (As with _c_p and _m_v, you're putting something into `bigfile', and anything that was already there is destroyed.) This ability to `capture' the output of a program is one of the most useful aspects of the system. Fortunately it's not limited to the _c_a_t program _ you can use it with _a_n_y program that prints on your terminal. We'll see some more uses for it in a moment. August 9, 1987 Advanced Editing on UNIX USD:13-33 Naturally, you can combine several files, not just two: cat file1 file2 file3 ... >bigfile collects a whole bunch. Question: is there any difference between cp good savegood and cat good >savegood Answer: for most purposes, no. You might reasonably ask why there are two programs in that case, since _c_a_t is obviously all you need. The answer is that _c_p can do some other things as well, which you can investigate for yourself by reading the manual. For now we'll stick to simple usages. _A_d_d_i_n_g _S_o_m_e_t_h_i_n_g _t_o _t_h_e _E_n_d _o_f _a _F_i_l_e Sometimes you want to add one file to the end of another. We have enough building blocks now that you can do it; in fact before reading further it would be valuable if you figured out how. To be specific, how would you use _c_p, _m_v and/or _c_a_t to add the file `good1' to the end of the file `good'? You could try cat good good1 >temp mv temp good which is probably most direct. You should also understand why cat good good1 >good doesn't work. (Don't practice with a good `good'!) The easy way is to use a variant of >, called >>. In fact, >> is identical to > except that instead of clobbering the old file, it simply tacks stuff on at the end. Thus you could say cat good1 >>good and `good1' is added to the end August 9, 1987 USD:13-34 Advanced Editing on UNIX of `good'. (And if `good' didn't exist, this makes a copy of `good1' called `good'.) _6. _C_U_T _A_N_D _P_A_S_T_E _W_I_T_H _T_H_E _E_D_I_T_O_R Now we move on to manipulating pieces of files _ indi- vidual lines or groups of lines. This is another area where new users seem unsure of themselves. _F_i_l_e_n_a_m_e_s The first step is to ensure that you know the _e_d com- mands for reading and writing files. Of course you can't go very far without knowing _r and _w. Equally useful, but less well known, is the `edit' command _e. Within _e_d, the command e newfile says `I want to edit a new file called _n_e_w_f_i_l_e, without leaving the editor.' The _e command discards whatever you're currently working on and starts over on _n_e_w_f_i_l_e. It's exactly the same as if you had quit with the _q command, then re-entered _e_d with a new file name, except that if you have a pattern remembered, then a command like // will still work. If you enter _e_d with the command ed file _e_d remembers the name of the file, and any subsequent _e, _r or _w commands that don't contain a filename will refer to this remembered file. Thus ed file1 ... (editing) ... w (writes back in file1) e file2(edit new file, without leaving editor) ... (editing on file2) ... w (writes back on file2) August 9, 1987 Advanced Editing on UNIX USD:13-35 (and so on) does a series of edits on various files without ever leaving _e_d and without typing the name of any file more than once. (As an aside, if you examine the sequence of commands here, you can see why many UNIX systems use _e as a synonym for _e_d.) You can find out the remembered file name at any time with the _f command; just type _f without a file name. You can also change the name of the remembered file name with _f; a useful sequence is ed precious f junk ... (editing) ... which gets a copy of a precious file, then uses _f to guarantee that a careless _w command won't clobber the original. _I_n_s_e_r_t_i_n_g _O_n_e _F_i_l_e _i_n_t_o _A_n_o_t_h_e_r Suppose you have a file called `memo', and you want the file called `table' to be inserted just after the reference to Table 1. That is, in `memo' somewhere is a line that says Table 1 shows that ... and the data contained in `table' has to go there, probably so it will be formatted properly by _n_r_o_f_f or _t_r_o_f_f. Now what? This one is easy. Edit `memo', find `Table 1', and add the file `table' right there: ed memo /Table 1/ _T_a_b_l_e _1 _s_h_o_w_s _t_h_a_t ... [_r_e_s_p_o_n_s_e _f_r_o_m _e_d] r table The critical line August 9, 1987 USD:13-36 Advanced Editing on UNIX is the last one. As we said earlier, the _r command reads a file; here you asked for it to be read in right after line dot. An _r command without any address adds lines at the end, so it is the same as $_r. _W_r_i_t_i_n_g _o_u_t _P_a_r_t _o_f _a _F_i_l_e The other side of the coin is writing out part of the document you're editing. For example, maybe you want to copy out into a separate file that table from the previous example, so it can be formatted and tested separately. Sup- pose that in the file being edited we have TS ...[lots of stuff] TE which is the way a table is set up for the _t_b_l program. To isolate the table in a separate file called `table', first find the start of the table (the `.TS' line), then write out the interesting part: /^TS/ _T_S [_e_d _p_r_i_n_t_s _t_h_e _l_i_n_e _i_t _f_o_u_n_d] ,/^TE/w table and the job is done. If you are confident, you can do it all at once with /^TS/;/^TE/w table and now you have two copies, one in the file you're still editing, one in the file `table' you've just written. The point is that the _w command can write out a group of lines, instead of the whole file. In fact, you can write out a single line if you like; just give one line number instead of two. For example, if you have just typed a hor- ribly complicated line and you know that it (or something like it) is going to be needed later, then save it _ don't re-type it. In the editor, say a ...lots of stuff... August 9, 1987 Advanced Editing on UNIX USD:13-37 ...horrible line... w temp a more stuff r temp a more stuff This last example is worth studying, to be sure you appreciate what's going on. _M_o_v_i_n_g _L_i_n_e_s _A_r_o_u_n_d Suppose you want to move a paragraph from its present position in a paper to the end. How would you do it? As a concrete example, suppose each paragraph in the paper begins with the formatting command `.PP'. Think about it and write down the details before reading on. The brute force way (not necessarily bad) is to write the paragraph onto a temporary file, delete it from its current position, then read in the temporary file at the end. Assuming that you are sitting on the `.PP' command that begins the paragraph, this is the sequence of commands: ,/^PP/-w temp ,//-d $r temp That is, from where you are now (`') until one line before the next `PP' (`/^PP/-') write onto `temp'. Then delete the same lines. Finally, read `temp' at the end. As we said, that's the brute force way. The easier way (often) is to use the _m_o_v_e command _m that _e_d provides _ it lets you do the whole set of operations at one crack, without any temporary file. The _m command is like many other _e_d commands in that it takes up to two line numbers in front that tell what lines are to be affected. It is also _f_o_l_l_o_w_e_d by a line number August 9, 1987 USD:13-38 Advanced Editing on UNIX that tells where the lines are to go. Thus line1, line2 m line3 says to move all the lines between `line1' and `line2' after `line3'. Naturally, any of `line1' etc., can be pat- terns between slashes, $ signs, or other ways to specify lines. Suppose again that you're sitting at the first line of the paragraph. Then you can say ,/^PP/-m$ That's all. As another example of a frequent operation, you can reverse the order of two adjacent lines by moving the first one to after the second. Suppose that you are positioned at the first. Then m+ does it. It says to move line dot to after one line after line dot. If you are positioned on the second line, m-- does the interchange. As you can see, the _m command is more succinct and direct than writing, deleting and re-reading. When is brute force better anyway? This is a matter of personal taste _ do what you have most confidence in. The main difficulty with the _m command is that if you use patterns to specify both the lines you are moving and the target, you have to take care that you specify them properly, or you may well not move the lines you thought you did. The result of a botched _m command can be a ghastly mess. Doing the job a step at a time makes it easier for you to verify at each step that you accomplished what you wanted to. It's also a good idea to issue a _w command before doing anything compli- cated; then if you goof, it's easy to back up to where you August 9, 1987 Advanced Editing on UNIX USD:13-39 were. _M_a_r_k_s _e_d provides a facility for marking a line with a par- ticular name so you can later reference it by name regard- less of its actual line number. This can be handy for mov- ing lines, and for keeping track of them even after they've been moved. The _m_a_r_k command is _k; the command kx marks the current line with the name `x'. If a line number precedes the _k, that line is marked. (The mark name must be a single lower case letter.) Now you can refer to the marked line with the address 'x Marks are most useful for moving things around. Find the first line of the block to be moved, and mark it with '_a. Then find the last line and mark it with '_b. Now posi- tion yourself at the place where the stuff is to go and say 'a,'bm Bear in mind that only one line can have a particular mark name associated with it at any given time. _C_o_p_y_i_n_g _L_i_n_e_s We mentioned earlier the idea of saving a line that was hard to type or used often, so as to cut down on typing time. Of course this could be more than one line; then the saving is presumably even greater. _e_d provides another command, called _t (for `transfer') August 9, 1987 USD:13-40 Advanced Editing on UNIX for making a copy of a group of one or more lines at any point. This is often easier than writing and reading. The _t command is identical to the _m command, except that instead of moving lines it simply duplicates them at the place you named. Thus 1,$t$ duplicates the entire con- tents that you are editing. A more common use for _t is for creating a series of lines that differ only slightly. For example, you can say a .......... x ......... (long line) t (make a copy) s/x/y/ (change it a bit) t (make third copy) s/y/z/ (change it a bit) and so on. _T_h_e _T_e_m_p_o_r_a_r_y _E_s_c_a_p_e `!' Sometimes it is convenient to be able to temporarily escape from the editor to do some other UNIX command, perhaps one of the file copy or move commands discussed in section 5, without leaving the editor. The `escape' command ! provides a way to do this. If you say !any UNIX command your current editing state is suspended, and the UNIX command you asked for is exe- cuted. When the command finishes, _e_d will signal you by printing another !; at that point you can resume editing. You can really do _a_n_y UNIX command, including another _e_d. (This is quite common, in fact.) In this case, you can even do another !. August 9, 1987 Advanced Editing on UNIX USD:13-41 On Berkeley UNIX systems, there is an additional (and preferable) mechanism called _j_o_b _c_o_n_t_r_o_l which lets you suspend your edit session (or, for that matter, any pro- gram), return to the shell from which you invoked that pro- gram, and issue any commands, then resume the program from the point where it was stopped. See _A_n _I_n_t_r_o_d_u_c_t_i_o_n _t_o _t_h_e _C _S_h_e_l_l for more details. _7. _S_U_P_P_O_R_T_I_N_G _T_O_O_L_S There are several tools and techniques that go along with the editor, all of which are relatively easy once you know how _e_d works, because they are all based on the editor. In this section we will give some fairly cursory examples of these tools, more to indicate their existence than to pro- vide a complete tutorial. More information on each can be found in [3]. _G_r_e_p Sometimes you want to find all occurrences of some word or pattern in a set of files, to edit them or perhaps just to verify their presence or absence. It may be possible to edit each file separately and look for the pattern of interest, but if there are many files this can get very tedious, and if the files are really big, it may be impossi- ble because of limits in _e_d. The program _g_r_e_p was invented to get around these limi- tations. The search patterns that we have described in the August 9, 1987 USD:13-42 Advanced Editing on UNIX paper are often called `regular expressions', and `grep' stands for g/re/p That describes exactly what _g_r_e_p does _ it prints every line in a set of files that contains a particu- lar pattern. Thus grep 'thing' file1 file2 file3 ... finds `thing' wherever it occurs in any of the files `file1', `file2', etc. _g_r_e_p also indicates the file in which the line was found, so you can later edit it if you like. The pattern represented by `thing' can be any pattern you can use in the editor, since _g_r_e_p and _e_d use exactly the same mechanism for pattern searching. It is wisest always to enclose the pattern in the single quotes '...' if it con- tains any non-alphabetic characters, since many such charac- ters also mean something special to the UNIX command inter- preter (the `shell'). If you don't quote them, the command interpreter will try to interpret them before _g_r_e_p gets a chance. There is also a way to find lines that _d_o_n'_t contain a pattern: grep -v 'thing' file1 file2 ... finds all lines that don't contains `thing'. The -_v must occur in the position shown. Given _g_r_e_p and _g_r_e_p -_v, it is possible to do things like selecting all lines that contain some combi- nation of patterns. For example, to get all lines that con- tain `x' but not `y': grep x file... | grep -v y (The notation | is a `pipe', which causes the output of the first command to be used as input to the second command; see [2].) August 9, 1987 Advanced Editing on UNIX USD:13-43 _E_d_i_t_i_n_g _S_c_r_i_p_t_s If a fairly complicated set of editing operations is to be done on a whole set of files, the easiest thing to do is to make up a `script', i.e., a file that contains the opera- tions you want to perform, then apply this script to each file in turn. For example, suppose you want to change every `Unix' to `UNIX' and every `Gcos' to `GCOS' in a large number of files. Then put into the file `script' the lines g/Unix/s//UNIX/g g/Gcos/s//GCOS/g w q Now you can say ed file1