Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

UNIX regular expressions and SEDs


May 23, 2021 UNIX Getting started


Table of contents


Regular expressions and SEDs

A regular expression is a string that can be used to describe several character sequences. Regular expressions, including ed, sed, awk, grep, and vi, are used in these commands of UNIX.

This tutorial will teach you how to use regular expressions and seds.

The stream editor represented here by sed is a stream-oriented editor that was created specifically for executing scripts. So all your input is sent to STDOUT and it doesn't change the input file.

Call sed

Before we get started, let's make sure you have a local copy of the /etc/passwd text file.

As mentioned earlier, s ed can be called by sending data over a pipe, as follows:

$ cat /etc/passwd | sed
Usage: sed [OPTION]... {script-other-script} [input-file]...

  -n, --quiet, --silent
 suppress automatic printing of pattern space
  -e script, --expression=script
...............................

The cat command dumps /etc/passwd to the sed through the pipe into the pattern space of the sed. sed uses the internal working buffer of the pattern space to do its work.

General syntax of sed:

Below is the general syntax of sed

    /pattern/action

Here, pattern is a regular expression, and action is the command given in the table below. When you save pattern, as we've seen above, action executes every line of command.

Slash characters (/) around patterns are not omitted because they are used as separators.

Range Describe
P Output the row
D Delete the row
s/mode 1/mode 2/ Replaces mode 1 and mode 2 that first appear

Delete all rows with sed

Sed is called again, but this time a row of records is deleted using sed's edit command, using the letter d to represent it:

    $ cat /etc/passwd | sed 'd'
    $

In addition to calling sed by sending a file through pipe, you can instruct sed to read data from the file, as shown below.

The following command is exactly the same as before, try it, which does not include the cat command:

    $ sed -e 'd' /etc/passwd
    $

Sed address

sed can also be understood as a so-called address. T he address can be a location in the file or the scope to which a special edit command applies. When sed encounters a situation where there is no address, it performs its actions on each line in the file.

The following command adds a basic address to the sed command you use:

$ cat /etc/passwd | sed '1d' |more
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$

Note that the number 1 is added before the delete command. T his tells sed to execute the edit command on the first line of the file. In this example, sed deletes the /etc/password file and prints the rest of the file.

The sed address range

So if you want to remove a line from the file, you can specify an address range as follows:

$ cat /etc/passwd | sed '1, 5d' |more
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
$

The above commands are applied in a range of 1 to 5 lines. So these five lines will be deleted.

Try the following address range:

Range Describe
'4,10d' Delete lines 4 through 10
'10,4d' Delete only line 10 because sed cannot work in the opposite direction
'4,+5d' This matches the fourth line in the file, deletes the line, continues to delete the next five lines, then stops its deletion and outputs the other rows
'2,5!d' This removes all but 2 to 5 rows.
'1~3d' After you delete the first line, skip the next three lines, and then delete the fourth row. s ed continues this mode until the end of the file.
'2~2d' Sed deletes the second line, skips the next line, deletes the following line, and repeats until you reach the end of the file.
'4,10p' Output content between 4 and 10 lines.
'4,d' A syntax error is generated.
',10d' Syntax errors are also generated.

Note: When p action you should use the - n to avoid duplicate output. Check the betweek differences for the following two commands:

    $ cat /etc/passwd | sed -n '1,3p'

The above command does not add - n as follows:

    $ cat /etc/passwd | sed '1,3p'

Replace the command

The replacement command, represented by s, will replace any other strings that you specify.

Instead of one string, you need to tell sed where your first string ends and where the string you want to replace begins. Traditionally, two strings are separated by a forward slash (/).

The following command replaces the root and amrood strings that first appear.

    $ cat /etc/passwd | sed 's/root/amrood/'
    amrood:x:0:0:root user:/root:/bin/sh
    daemon:x:1:1:daemon:/usr/sbin:/bin/sh
    ..........................

It is important that sed overrides be used only when a string on a command line first appears. If the string root appears more than once in a row, only the first root string is replaced.

Sed to make a global replacement, you need to add the letter g to the end of the command, the command is as follows:

    $ cat /etc/passwd | sed 's/root/amrood/g'
    amrood:x:0:0:amrood user:/amrood:/bin/sh
    daemon:x:1:1:daemon:/usr/sbin:/bin/sh
    bin:x:2:2:bin:/bin:/bin/sh
    sys:x:3:3:sys:/dev:/bin/sh
    ...........................

Replace the flag

In addition to the g flag, there are many other useful flags that can be used, and you can specify one additional flag at a time.

Sign Describe
G Replace all characters that can be matched, not just the first one
NUMBER Replace only the nuMBLER matching characters
P If a replacement occurs, the output mode space
w FILENAME If a replacement occurs, the result is written to FILENAME
I or i Matches in a case-insenso sensitive manner
M or m In addition to the normal behavior of having special regular expression characters '' and '$', this flag makes ''' match the empty string after the line break, and $matches the empty string before the line break.

Use a replaceable string separator

You'll find yourself having to make a substitution for strings that contain slash characters. In this case, you can specify a different separator for the character after s.

    $ cat /etc/passwd | sed 's:/root:/amrood:g'
    amrood:x:0:0:amrood user:/amrood:/bin/sh
    daemon:x:1:1:daemon:/usr/sbin:/bin/sh

In the example above: / used as a boundary character, not as a / . Because we're trying /root instead of a simple root string.

Replace with an empty string

Use an empty replacement string to remove the /etc/passwd file.

    $ cat /etc/passwd | sed 's/root//g'
    :x:0:0::/:/bin/sh
    daemon:x:1:1:daemon:/usr/sbin:/bin/sh

Address replacement

If you want to replace the string quiet with the string sh only on line 10, you can specify the following:

    $ cat /etc/passwd | sed '10s/sh/quiet/g'
    root:x:0:0:root user:/root:/bin/sh
    daemon:x:1:1:daemon:/usr/sbin:/bin/sh
    bin:x:2:2:bin:/bin:/bin/sh
    sys:x:3:3:sys:/dev:/bin/sh
    sync:x:4:65534:sync:/bin:/bin/sync
    games:x:5:60:games:/usr/games:/bin/sh
    man:x:6:12:man:/var/cache/man:/bin/sh
    mail:x:8:8:mail:/var/mail:/bin/sh
    news:x:9:9:news:/var/spool/news:/bin/sh
    backup:x:34:34:backup:/var/backups:/bin/quiet

Similarly, by making an address range replacement, you can do the following:

    $ cat /etc/passwd | sed '1,5s/sh/quiet/g'
    root:x:0:0:root user:/root:/bin/quiet
    daemon:x:1:1:daemon:/usr/sbin:/bin/quiet
    bin:x:2:2:bin:/bin:/bin/quiet
    sys:x:3:3:sys:/dev:/bin/quiet
    sync:x:4:65534:sync:/bin:/bin/sync
    games:x:5:60:games:/usr/games:/bin/sh
    man:x:6:12:man:/var/cache/man:/bin/sh
    mail:x:8:8:mail:/var/mail:/bin/sh
    news:x:9:9:news:/var/spool/news:/bin/sh
    backup:x:34:34:backup:/var/backups:/bin/sh

As you can see from the output, the string sh in the first five lines has changed to quiet, but the sh in the other lines has not changed at all.

Match the command

You can output all matching rows using the p and -n parameters, as follows:

    $ cat testing | sed -n '/root/p'
    root:x:0:0:root user:/root:/bin/sh
    [root@ip-72-167-112-17 amrood]# vi testing
    root:x:0:0:root user:/root:/bin/sh
    daemon:x:1:1:daemon:/usr/sbin:/bin/sh
    bin:x:2:2:bin:/bin:/bin/sh
    sys:x:3:3:sys:/dev:/bin/sh
    sync:x:4:65534:sync:/bin:/bin/sync
    games:x:5:60:games:/usr/games:/bin/sh
    man:x:6:12:man:/var/cache/man:/bin/sh
    mail:x:8:8:mail:/var/mail:/bin/sh
    news:x:9:9:news:/var/spool/news:/bin/sh
    backup:x:34:34:backup:/var/backups:/bin/sh

Use regular expressions

When you make pattern matching, you can use regular expressions, which provide more flexibility.

Check the line that starts with daemon in the following example and delete it:

    $ cat testing | sed '/^daemon/d'
    root:x:0:0:root user:/root:/bin/sh
    bin:x:2:2:bin:/bin:/bin/sh
    sys:x:3:3:sys:/dev:/bin/sh
    sync:x:4:65534:sync:/bin:/bin/sync
    games:x:5:60:games:/usr/games:/bin/sh
    man:x:6:12:man:/var/cache/man:/bin/sh
    mail:x:8:8:mail:/var/mail:/bin/sh
    news:x:9:9:news:/var/spool/news:/bin/sh
    backup:x:34:34:backup:/var/backups:/bin/sh

Here is an example of all rows ending in sh that will be deleted:

    $ cat testing | sed '/sh$/d'
    sync:x:4:65534:sync:/bin:/bin/sync

The following table lists four special characters that are useful in regular expressions.

Character Describe
^ Matches the start of a line
$ Matches the end of a line
. Matches any single character
* Matches zero or more previously occurring characters
[chars] In order to match the characters of any string. Y ou can use the '-' character to represent the range of characters.

Match characters

Take a look at how meta-characters are used in other expressions. For example, the following pattern:

The expression Describe
/a.c/ Matches contain strings such as a-c, a-c, abc, match, and a3c
/a*c/ Matches the same strings as strings such as ace, yacc, and arctic
/[tT]he/ Match the characters The and the
/^$/ Matches blank lines
/^.*$/ In any case, a whole line is matched
/ */ Matches one or more spaces
/^$/ Match empty lines

The following table gives some commonly used characters:

Set Describe
[a-z] Match a lowercase letter
[A-Z] Match a capital letter
[a-zA-Z] Match a letter
[0-9] Match the number
[a-zA-Z0-9] Match a single letter or number

Character class keywords

In general, special keywords are also useful for regexp, especially if the GNU utility uses regexp. These are useful for sed regular expressions because they simplify expressions and enhance readability.

For example, the characters a through z and the characters A through Z make up a class that [[:alpha:]] by the keyword .

Using the alphabet's character class keywords, this command outputs the line in /etc/syslog.conf that begins with the alphabet's letters:

    $ cat /etc/syslog.conf | sed -n '/^[[:alpha:]]/p'
    authpriv.* /var/log/secure
    mail.* -/var/log/maillog
    cron.* /var/log/cron
    uucp,news.crit /var/log/spooler
    local7.*   /var/log/boot.log

The following table is a complete list of available character class keywords for GNU sed.

The character class Describe
[[:alnum:]] Letters (a - z A-Z 0 - 9)
[[:alpha:]] Letters (a - z A-Z)
[[:blank:]] Blank characters (spaces or watch keys)
[[:cntrl:]] Control characters
[[:digit:]] Numbers .
[[:graph:]] Any visible characters (excluding spaces)
[[:lower:]] Lowercase letter of the
[[:print:]] Printable characters (unsotrolled characters)
[[:punct:]] Punctuation characters
[[:space:]] Blank
[[:upper:]] Capital Letters of the A-Z
[[:xdigit:]] He heteen-digits s0 - 9 a - f A-F

The reference

Sed meta-characters and represent the contents of the pattern that is matched. For example, suppose you have a file called .txt, which has phone numbers, as follows:

    5555551212

    5555551213

    5555551214

    6665551215

    6665551216

    7775551217

You want the first three numbers to be bracketed to make them easier to read. To do this, you can replace the characters with the following:

    $ sed - e ' s / ^[[数位:]][[数位:]][[数位:]](&)/ g phone.txt

    (555)5551212

    (555)5551213

    (555)5551214

    (666)5551215

    (666)5551216

    (777)5551217

Match the 3 digits first, and then replace those bracketed numbers with .

Use multiple sed commands

You can use multiple sed commands under one sed command, as follows:

    $ sed -e 'command1' -e 'command2' ... -e 'commandN' files

The commandN to command1 here are the sed type commands we discussed earlier. These commands apply to the lines of each file list.

With the same mechanism, we can write the phone number above in the following way:

    $ sed - e ' s / ^[[数位:]]\ \ { 3 } /(&)/ g \

                             - e ' s /)[[数位:]]\ \ { 3 } / & - / g phone.txt

    (555)555 - 1212

    (555)555 - 1213

    (555)555 - 1214

    (666)555 - 1215

    (666)555 - 1216

    (777)555 - 1217

Note: In the example above, instead of repeating the character class keyword [[:digit:]] " , " it is \{3\} the keyword , which means that the first three regular expressions match.

Reference

Meta-characters are useful, but a more useful feature is the ability to define a specific area in a regular expression that you can reference by defining a specific part of a regular expression.

When you reverse reference, you must first define a region and then review it. D efining an area is to insert the s and parentheses \ area you are interested in. The first area around you is \ 1 the second region is \ 2 and so on.

Suppose the phone .txt have the following text:

    (555)555 - 1212

    (555)555 - 1213

    (555)555 - 1214

    (666)555 - 1215

    (666)555 - 1216

    (777)555 - 1217

Now try the following command:

    $ cat phone.txt | sed 's/\(.*)\)\(.*-\)\(.*$\)/Area \
       code: \1 Second: \2 Third: \3/'
    Area code: (555) Second: 555- Third: 1212
    Area code: (555) Second: 555- Third: 1213
    Area code: (555) Second: 555- Third: 1214
    Area code: (666) Second: 555- Third: 1215
    Area code: (666) Second: 555- Third: 1216
    Area code: (777) Second: 555- Third: 1217

Note: In the example above, each regular expression in parentheses will reference \ 1 s2, \ 2 on.