May 23, 2021 UNIX Getting started
1. Regular expressions and SEDs
9. Use a replaceable string separator
10.. Replace with an empty string
A regular expression is a string that can be used to describe several character sequences. Regular expressions, including ed, sed, awk, grep, and vi, are used in these commands of UNIX.
This tutorial will teach you how to use regular expressions and seds.
The stream editor represented here by sed is a stream-oriented editor that was created specifically for executing scripts. So all your input is sent to STDOUT and it doesn't change the input file.
Before we get started, let's make sure you have a local copy of the /etc/passwd text file.
As mentioned earlier, s ed can be called by sending data over a pipe, as follows:
$ cat /etc/passwd | sed
Usage: sed [OPTION]... {script-other-script} [input-file]...
-n, --quiet, --silent
suppress automatic printing of pattern space
-e script, --expression=script
...............................
The cat command dumps
/etc/passwd
to the sed through the pipe into the pattern space of the sed.
sed uses the internal working buffer of the pattern space to do its work.
Below is the general syntax of sed
/pattern/action
Here, pattern is a regular expression, and action is the command given in the table below. When you save pattern, as we've seen above, action executes every line of command.
Slash characters (/) around patterns are not omitted because they are used as separators.
Range | Describe |
P | Output the row |
D | Delete the row |
s/mode 1/mode 2/ | Replaces mode 1 and mode 2 that first appear |
Sed is called again, but this time a row of records is deleted using sed's edit command, using the letter d to represent it:
$ cat /etc/passwd | sed 'd'
$
In addition to calling sed by sending a file through pipe, you can instruct sed to read data from the file, as shown below.
The following command is exactly the same as before, try it, which does not include the cat command:
$ sed -e 'd' /etc/passwd
$
sed can also be understood as a so-called address. T he address can be a location in the file or the scope to which a special edit command applies. When sed encounters a situation where there is no address, it performs its actions on each line in the file.
The following command adds a basic address to the sed command you use:
$ cat /etc/passwd | sed '1d' |more
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$
Note that the number 1 is added before the delete command. T
his tells sed to execute the edit command on the first line of the file.
In this example, sed deletes the
/etc/password
file and prints the rest of the file.
So if you want to remove a line from the file, you can specify an address range as follows:
$ cat /etc/passwd | sed '1, 5d' |more
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$
The above commands are applied in a range of 1 to 5 lines. So these five lines will be deleted.
Try the following address range:
Range | Describe |
'4,10d' | Delete lines 4 through 10 |
'10,4d' | Delete only line 10 because sed cannot work in the opposite direction |
'4,+5d' | This matches the fourth line in the file, deletes the line, continues to delete the next five lines, then stops its deletion and outputs the other rows |
'2,5!d' | This removes all but 2 to 5 rows. |
'1~3d' | After you delete the first line, skip the next three lines, and then delete the fourth row. s ed continues this mode until the end of the file. |
'2~2d' | Sed deletes the second line, skips the next line, deletes the following line, and repeats until you reach the end of the file. |
'4,10p' | Output content between 4 and 10 lines. |
'4,d' | A syntax error is generated. |
',10d' | Syntax errors are also generated. |
Note: When
p action
you should use the
- n
to avoid duplicate output.
Check the betweek differences for the following two commands:
$ cat /etc/passwd | sed -n '1,3p'
The above command does not add - n as follows:
$ cat /etc/passwd | sed '1,3p'
The replacement command, represented by s, will replace any other strings that you specify.
Instead of one string, you need to tell sed where your first string ends and where the string you want to replace begins. Traditionally, two strings are separated by a forward slash (/).
The following command replaces the root and amrood strings that first appear.
$ cat /etc/passwd | sed 's/root/amrood/'
amrood:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
..........................
It is important that sed overrides be used only when a string on a command line first appears. If the string root appears more than once in a row, only the first root string is replaced.
Sed to make a global replacement, you need to add the letter g to the end of the command, the command is as follows:
$ cat /etc/passwd | sed 's/root/amrood/g'
amrood:x:0:0:amrood user:/amrood:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
...........................
In addition to the g flag, there are many other useful flags that can be used, and you can specify one additional flag at a time.
Sign | Describe |
G | Replace all characters that can be matched, not just the first one |
NUMBER | Replace only the nuMBLER matching characters |
P | If a replacement occurs, the output mode space |
w FILENAME | If a replacement occurs, the result is written to FILENAME |
I or i | Matches in a case-insenso sensitive manner |
M or m | In addition to the normal behavior of having special regular expression characters '' and '$', this flag makes ''' match the empty string after the line break, and $matches the empty string before the line break. |
You'll find yourself having to make a substitution for strings that contain slash characters. In this case, you can specify a different separator for the character after s.
$ cat /etc/passwd | sed 's:/root:/amrood:g'
amrood:x:0:0:amrood user:/amrood:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
In the example above:
/
used as a boundary character, not as a
/
.
Because we're trying
/root
instead of a simple root string.
Use an empty replacement string to remove the
/etc/passwd
file.
$ cat /etc/passwd | sed 's/root//g'
:x:0:0::/:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
If you want to replace the string quiet with the string sh only on line 10, you can specify the following:
$ cat /etc/passwd | sed '10s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/quiet
Similarly, by making an address range replacement, you can do the following:
$ cat /etc/passwd | sed '1,5s/sh/quiet/g'
root:x:0:0:root user:/root:/bin/quiet
daemon:x:1:1:daemon:/usr/sbin:/bin/quiet
bin:x:2:2:bin:/bin:/bin/quiet
sys:x:3:3:sys:/dev:/bin/quiet
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
As you can see from the output, the string sh in the first five lines has changed to quiet, but the sh in the other lines has not changed at all.
You can output all matching rows using the p and -n parameters, as follows:
$ cat testing | sed -n '/root/p'
root:x:0:0:root user:/root:/bin/sh
[root@ip-72-167-112-17 amrood]# vi testing
root:x:0:0:root user:/root:/bin/sh
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
When you make pattern matching, you can use regular expressions, which provide more flexibility.
Check the line that starts with daemon in the following example and delete it:
$ cat testing | sed '/^daemon/d'
root:x:0:0:root user:/root:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
Here is an example of all rows ending in sh that will be deleted:
$ cat testing | sed '/sh$/d'
sync:x:4:65534:sync:/bin:/bin/sync
The following table lists four special characters that are useful in regular expressions.
Character | Describe |
^ | Matches the start of a line |
$ | Matches the end of a line |
. | Matches any single character |
* | Matches zero or more previously occurring characters |
[chars] | In order to match the characters of any string. Y ou can use the '-' character to represent the range of characters. |
Take a look at how meta-characters are used in other expressions. For example, the following pattern:
The expression | Describe |
/a.c/ | Matches contain strings such as a-c, a-c, abc, match, and a3c |
/a*c/ | Matches the same strings as strings such as ace, yacc, and arctic |
/[tT]he/ | Match the characters The and the |
/^$/ | Matches blank lines |
/^.*$/ | In any case, a whole line is matched |
/ */ | Matches one or more spaces |
/^$/ | Match empty lines |
The following table gives some commonly used characters:
Set | Describe |
[a-z] | Match a lowercase letter |
[A-Z] | Match a capital letter |
[a-zA-Z] | Match a letter |
[0-9] | Match the number |
[a-zA-Z0-9] | Match a single letter or number |
In general, special keywords are also useful for regexp, especially if the GNU utility uses regexp. These are useful for sed regular expressions because they simplify expressions and enhance readability.
For example, the characters a through z and the characters A through Z make up a class that
[[:alpha:]]
by the keyword .
Using the alphabet's character class keywords, this command outputs the line in
/etc/syslog.conf
that begins with the alphabet's letters:
$ cat /etc/syslog.conf | sed -n '/^[[:alpha:]]/p'
authpriv.* /var/log/secure
mail.* -/var/log/maillog
cron.* /var/log/cron
uucp,news.crit /var/log/spooler
local7.* /var/log/boot.log
The following table is a complete list of available character class keywords for GNU sed.
The character class | Describe |
[[:alnum:]] | Letters (a - z A-Z 0 - 9) |
[[:alpha:]] | Letters (a - z A-Z) |
[[:blank:]] | Blank characters (spaces or watch keys) |
[[:cntrl:]] | Control characters |
[[:digit:]] | Numbers . |
[[:graph:]] | Any visible characters (excluding spaces) |
[[:lower:]] | Lowercase letter of the |
[[:print:]] | Printable characters (unsotrolled characters) |
[[:punct:]] | Punctuation characters |
[[:space:]] | Blank |
[[:upper:]] | Capital Letters of the A-Z |
[[:xdigit:]] | He heteen-digits s0 - 9 a - f A-F |
Sed meta-characters and represent the contents of the pattern that is matched. For example, suppose you have a file called .txt, which has phone numbers, as follows:
5555551212
5555551213
5555551214
6665551215
6665551216
7775551217
You want the first three numbers to be bracketed to make them easier to read. To do this, you can replace the characters with the following:
$ sed - e ' s / ^[[数位:]][[数位:]][[数位:]](&)/ g phone.txt
(555)5551212
(555)5551213
(555)5551214
(666)5551215
(666)5551216
(777)5551217
Match the 3 digits first, and then replace those bracketed numbers with .
You can use multiple sed commands under one sed command, as follows:
$ sed -e 'command1' -e 'command2' ... -e 'commandN' files
The commandN to command1 here are the sed type commands we discussed earlier. These commands apply to the lines of each file list.
With the same mechanism, we can write the phone number above in the following way:
$ sed - e ' s / ^[[数位:]]\ \ { 3 } /(&)/ g \
- e ' s /)[[数位:]]\ \ { 3 } / & - / g phone.txt
(555)555 - 1212
(555)555 - 1213
(555)555 - 1214
(666)555 - 1215
(666)555 - 1216
(777)555 - 1217
Note: In the example above, instead of repeating the character class keyword
[[:digit:]]
" , " it is
\{3\}
the keyword , which means that the first three regular expressions match.
Meta-characters are useful, but a more useful feature is the ability to define a specific area in a regular expression that you can reference by defining a specific part of a regular expression.
When you reverse reference, you must first define a region and then review it. D
efining an area is to insert the s and parentheses
\
area you are interested in.
The first area around you is
\ 1
the second region is
\ 2
and so on.
Suppose the phone .txt have the following text:
(555)555 - 1212
(555)555 - 1213
(555)555 - 1214
(666)555 - 1215
(666)555 - 1216
(777)555 - 1217
Now try the following command:
$ cat phone.txt | sed 's/\(.*)\)\(.*-\)\(.*$\)/Area \
code: \1 Second: \2 Third: \3/'
Area code: (555) Second: 555- Third: 1212
Area code: (555) Second: 555- Third: 1213
Area code: (555) Second: 555- Third: 1214
Area code: (666) Second: 555- Third: 1215
Area code: (666) Second: 555- Third: 1216
Area code: (777) Second: 555- Third: 1217
Note: In the example above, each regular expression in parentheses will reference
\ 1
s2,
\ 2
on.