# Shell string operation

May 23, 2021 Shell - An example of programming

## Objective

Busy for a week, finally wait until the weekend, you can empty down to write something.

Numerical operations and Boolean operations have been completed before, and this time it is the turn to introduce string operations. Let's figure out two things first:

• What is a string?
• What do you do with strings?

Here's how the Online Xinhua Dictionary explains it:

String: Short for String. A sequence of finite characters. A linear table in which a data element is a character is a logical structure of data. Y ou can have different storage structures in your computer. Sub-stringing, inserting characters, deleting characters, replacing characters, and so on can be performed on strings.

And the characters?

Characters: Symbols used in computer program design and operation. Includes letters, numbers, space characters, prompts, and various special characters.

In this way, the numbers in the numerical operations described earlier, the true and false values in the Boolean operation, are presented as characters and are special characters, and the operations on them are nothing more than exceptions to character operations. A nd here we will study the operation of general characters, which is very important, because for us, the general work is to deal with characters. These operations will actually revolve around the two definitions above, which include:

• Find out if the type of character or string is a number, letter, or other specific character, a printable character, or a non-printable character (some control characters).

• Find out the number of characters that make up the string and the storage structure of the string, such as an array.

• General operations on strings: sub-stringing, inserting characters, deleting characters, replacing characters, comparing strings, and so on.

• For some of the more complex and interesting operations of the string, here are some interesting examples at the end.

## The property of the string

### The type of string

Characters can be numbers, letters, spaces, other special characters, and strings can be one or more of them, which can be followed by strings of specific significance, such as e-mail addresses, URL addresses, and so on.

#### Example: Numbers or combinations of numbers

``````\$ i=5;j=9423483247234;
\$ echo \$i | grep -q "^[0-9]\$"
\$ echo \$?
0
\$ echo \$j | grep -q "^[0-9]\+\$"
\$ echo \$?
0``````

#### Example: Character combinations (lowercase letters, capital letters, combinations of both)

``````\$ c="A"; d="fwefewjuew"; e="fewfEFWefwefe"
\$ echo \$c | grep -q "^[A-Z]\$"
\$ echo \$d | grep -q "^[a-z]\+\$"
\$ echo \$e | grep -q "^[a-zA-Z]\+\$"``````

#### Example: A combination of letters and numbers

``````\$ ic="432fwfwefeFWEwefwef"
\$ echo \$ic | grep -q "^[0-9a-zA-Z]\+\$"``````

#### Example: Spaces or tabs, etc

``````\$ echo " " | grep " "
\$ echo -e "\t" | grep "[[:space:]]" #[[:space:]]会同时匹配空格和TAB键
\$ echo -e " \t" | grep "[[:space:]]"
\$ echo -e "\t" | grep "" #为在键盘上按下TAB键，而不是字符``````

``````\$ echo "[email protected]" | grep "[0-9a-zA-Z\.]*@[0-9a-zA-Z\.]"
[email protected]``````

``````\$ echo "http://news.lzu.edu.cn/article.jsp?newsid=10135" | grep "^http://[0-9a-zA-Z\./=?]\+\$"
http://news.lzu.edu.cn/article.jsp?newsid=10135``````

Description:

• ``` /dev/null ``` and ``` /dev/zero ``` are very interesting, they're like black holes, and everything that falls in and disappears;
• ``` [[:space:]] ``` is used ``` grep ``` match space or TAB key characters, and the other tags are helpful: ``` man grep ```
• The above is a pattern match with ``` grep ``` ``` sed ``` ``` awk ``` can be used to do pattern matching, about the regular expression used in the match knowledge, please refer to the relevant information later
• If you want to determine if a string is empty, you can tell if ``` test ``` its ``` test ``` ``` -z ``` of the test command, see test ``` man test ```

#### Example: Determines whether a character is printable

``````\$ echo "\t\n" | grep "[[:print:]]"
\t\n
\$ echo \$?
0
\$ echo -e "\t\n" | grep "[[:print:]]"
\$ echo \$?
1``````

### The length of the string

What other properties does a string have besides the type of character that make up it? The number of characters that make up the string.

Let's calculate the length of the string, which is the number of all characters, and briefly describe several ways to specify the number of characters in the string.

#### Example: Calculate the length of a string

That is, to calculate the number of all characters, the calculation method is a lot of different, choose its advantages and use it:

``````\$ var="get the length of me"
\$ echo \${var}     # 这里等同于\$var
get the length of me
\$ echo \${#var}
20
\$ expr length "\$var"
20
\$ echo \$var | awk '{printf("%d\n", length(\$0));}'
20
\$ echo -n \$var |  wc -c
20``````

#### Example: Calculates the number of specified characters or combinations of characters

``````\$ echo \$var | tr -cd g | wc -c
2
\$ echo -n \$var | sed -e 's/[^g]//g' | wc -c
2
\$ echo -n \$var | sed -e 's/[^gt]//g' | wc -c
5``````

#### Example: Count the number of words

``````\$ echo \$var | wc -w
5
\$ echo "\$var" | tr " " "\n" | grep get | uniq -c
1
\$ echo "\$var" | tr " " "\n" | grep get | wc -l
1``````

Description:

``` \${} ``` Operator in Bash head is a "bull", can do quite a lot of work, specifically look at the net people\$s \$Shell 13 ``` \$(()) ``` ``` \$() ``` ``` \${} ``` Let's go.

## The display of the string

Next, discuss how to control the display of characters in the terminal.

### Example: Control the position, color, background, and so on of the display of characters on the screen

``````\$ echo -e "\033[31;40m" #设置前景色为黑色，背景色为红色
\$ echo -e "\033[11;29H Hello, World\!" #在屏幕的第11行，29列开始打印字符串Hello,World!``````

### Example: The current system time is dynamically displayed somewhere on the screen

``\$ while :; do echo -e "\033[11;29H "\$(date "+%Y-%m-%d %H:%M:%S"); done``

### Example: Filter out some control strings

Filtering certain control characters with the ``` col ``` command is useful when working with the output of screenshot commands such as ``` script ``` ``` screen ``` and so on.

``````\$ screen -L
\$ cat /bin/cat
\$ exit
\$ cat screenlog.0 | col -b   # 把一些控制字符过滤后，就可以保留可读的操作日志``````

## The storage of the string

In our opinion, strings are just a series of characters, but for ease of operation, we can often let strings show a certain structure. H ere, we don't care about the actual storage structure of a string in memory, just the logical structure it renders. For example, such a string: ``` get the length of me" ``` we can render it in different ways.

• Renders the character by its position in the string

This way we can find a subchain by specifying a location. T his can often be done with pointers in the C language. I n Shell programming, there are many tools available, such ``` expr ``` ``` awk ``` that provide a similar approach to implementing query actions for subchains. B oth almost support pattern matching ``` match ``` match ``` index ``` This is described in more detail later in string operations.

• Each part of the string is obtained from a split character

The most common ones here are line splits, spaces, or ``` TAB ``` splits, which are used as line numbers, which seem to have become commonplace because our editor processes row splits inexorably (under UNIX, there ``` \\n ``` some differences under other systems, such as ``` \r\n ``` Spaces or ``` TAB ``` are often used to split fields in a database, which seems to be commonplace.

Because of this, a number of excellent line editing tools such as ``` grep ``` ``` awk ``` ``` sed ``` Cut and awk provide superior in-line (processing a single line) processing power in terms of ``` cut ``` for "in-line" ``` awk ``` it this way, that is, strings that no longer contain line splits).

• It's easier to work with parts that are split with split characters

Split characters are also used, but for easier operations to split parts of a later string, we have abstracted a data structure such as an "array", which makes it easier for us to get a specified part by lowering the label. ``` bash ``` provides such a data structure, and the ``` awk ``` also provides it, and we'll briefly describe their use here.

### Example: Split a string into an array of strings

• Bash provides an array of data structures, labeled in numbers, the same as the underseed from 0 in the C language

``````\$ var="get the length of me"
\$ var_arr=(\$var)    #把字符串var存放到字符串数组var_arr中，默认以空格作为分割符
\$ echo \${var_arr[0]} \${var_arr[1]} \${var_arr[2]} \${var_arr[3]} \${var_arr[4]}
get the length of me
\$ echo \${var_arr[@]}    #整个字符串，可以用*代替@，下同
get the length of me
\$ echo \${#var_arr[@]}   #类似于求字符串长度，`#`操作符也可用来求数组元素个数
5``````

You can also assign a value directly to an array element

``````\$ var_arr[5]="new_element"
\$ echo \${var_arr[5]}
6
\$ echo \${var_arr[5]}
new_element``````

Bash actually provides an "array"-like feature, ``` for i in ``` that makes it easy to get parts of a string, such as:

``````\$ for i in \$var; do echo -n \$i"_"; done
get_the_length_of_me_``````
• ``` awk ``` pay attention to compare it with arrays in ``` Bash ```

``` split ``` splits a row by space, stores it in the ``` var\_arr ``` and returns the length of the array. Note: The first element here is not 0, but 1

``````\$ echo \$var | awk '{printf("%d %s\n", split(\$0, var_arr, " "), var_arr[1]);}'
5 get``````

In fact, the above operation ``` awk ``` own row processing capabilities: ``` awk ``` a ``` \$1 ``` ``` \$2 ``` ``` \$3``... ``` ``` \$0 ``` the entire row.

The ``` NF ``` is the total number of fields in the row, similar to the length of the array above, and it also provides a feature that accesses a string through a "underseed"-like approach.

``````\$ echo \$var | awk '{printf("%d | %s %s %s %s %s | %s\n", NF, \$1, \$2, \$3, \$4, \$5, \$0);}'
5 | get the length of me | get the length of me``````

``` awk ``` Array feature stop here, look at ``` for ``` reference, note that this is not quite the same as ``` for ``` in ``` Bash ``` ``` i ``` itself, but the undersequencing:

``````\$ echo \$var | awk '{split(\$0, var_arr, " "); for(i in var_arr) printf("%s ",var_arr[i]);}'
of me get the length
4 5 1 2 3``````

In addition, as you can see from the above ``` for ``` the entire result is not arranged in the principled character order, but it also makes sense if you simply iterate out all the elements.

``` awk ``` has more "powerful" processing power, its underseconding can be not a number, can be a string, thus becoming an "association" array, this "association" in some ways very convenient. For example, replacing the name of a system call in one file with an address based on a function address map in another file can be achieved by:

``````\$ cat symbol
sys_exit
sys_close
\$ ls /boot/System.map*
\$ awk '{if(FILENAME ~ "System.map") map[\$3]=\$1; else {printf("%s\n", map[\$1])}}' \
/boot/System.map-2.6.20-16-generic symbol
c0129a80
c0177310
c0175d80``````

In addition, awk also supports deleting an array element with the delete function. Don't forget that awk also supports 2D arrays if needed on certain occasions.

## String general operation

String operations include sub-stringing, query sub-stringing, inserting sub-strings, deleting sub-strings, sub-string sub-string replacement, sub-string comparison, sub-string sorting, sub-stringing conversion, sub-string encoding conversion, etc.

### Take the string

The main methods for taking substrings are:

• Find subchains directly to the specified location
• Character matching sub-strings

#### Example: Take the string by position

For example, from where to start, how many characters to take

``````\$ var="get the length of me"
\$ echo \${var:0:3}
get
\$ echo \${var:(-2)}   # 方向相反呢
me

\$ echo `expr substr "\$var" 5 3` #记得把\$var引起来，否则expr会因为空格而解析错误
the

\$ echo \$var | awk '{printf("%s\n", substr(\$0, 9, 6))}'
length``````

``` awk ``` ``` \$var ``` variables by space, ``` \$1 ``` ``` \$2 ``` ``` \$3 ``` ``` \$4 ``` ``` \$5 ```

``````\$ echo \$var | awk '{printf("%s\n", \$1);}'
get
\$ echo \$var | awk '{printf("%s\n", \$5);}'
me``````

Almost ``` cut ``` gadget, which is similar in use to awk, where ``` -d ``` specifies a split character, just as ``` awk ``` specifies a split character ``` -F ``` and -f specifies ``` -f ``` as the ``` \$数字 ```

``\$ echo \$var | cut -d" " -f 5``

#### Example: Matching characters for substrules

With Bash built-in support string:

``````\$ echo \${var%% *} #从右边开始计算，删除最左边的空格右边的所有字符
get
\$ echo \${var% *} #从右边开始计算，删除第一个空格右边的所有字符
get the length of
\$ echo \${var##* }  #从左边开始计算，删除最右边的空格左边的所有字符
me
\$ echo \${var#* }  #从左边开始计算，删除第一个空格左边的所有字符
the length of me``````

Remove the ``` 空格＋字母组合 ``` combinations:

``````\$ echo \$var | sed 's/ [a-z]*//g'
get
\$ echo \$var | sed 's/[a-z]* //g'
me``````

``` sed ``` the ability to print (p) by address (line), remember to change the space to a line number with ``` tr ``` first:

``````\$ echo \$var | tr " " "\n" | sed -n 1p
get
\$ echo \$var | tr " " "\n" | sed -n 5p
me``````

``` tr ``` also be used to take a string, ``` # ``` can be similar to the s ``` % ``` to "take off" some strings to implement the string:

``````\$ echo \$var | tr -d " "
getthelengthofme
\$ echo \$var | tr -cd "[a-z]" #把所有的空格都拿掉了，仅仅保留字母字符串，注意-c和-d的用法
getthelengthofme``````

Description:

• ``` % ``` The ``` # ``` the deleted characters is not the same, ``` ## ``` the ``` # ``` former is on the right, the latter is on the ``` % ``` ``` %% ``` and % , and the direction of the former is the largest match, and the latter is the minimum match. (A good memory method can be found in the keyboard memory method ``` # ``` the people in ``` \$ ``` , ``` % ``` is the keyboard in turn from left to right three keys)
• ``` tr ``` ``` -c ``` is ``` complement ``` for the addition, ``` invert ``` while ``` -d ``` option is delete, ``` tr -cd "[a-z]" ``` which becomes to keep all the letters

For string interception, there are actually some commands that, ``` head ``` ``` tail ``` etc. can do interesting things, you can intercept the number of lines or bytes specified before, after, or in front of a string. For example:

``````\$ echo "abcdefghijk" | head -c 4
abcd
\$ echo -n "abcdefghijk" | tail -c 4
hijk``````

### Query subchain

Subchain queries include:

• Returns the subchain itself that conforms to a pattern
• Returns the position of the subchain in the target string

Preparation: Before you do the following, prepare a file test .txt with the content "consists of" in it for the following example.

#### Example: The position of the query subs string in the target string

``` expr index ``` to return only where the first character appears in a character or multiple characters

``````\$ var="get the length of me"
\$ expr index "\$var" t
3``````

awk can find strings, and match can match regular expressions

``````\$ echo \$var | awk '{printf("%d\n", match(\$0,"the"));}'
5``````

#### Example: Query a subchain and return a line that contains the substrule

``` awk ``` ``` sed ``` all implement these features, but ``` grep ``` best at it

``````\$ grep "consists of" test.txt   # 查询文件包含consists of的行，并打印这些行
\$ grep "consists[[:space:]]of" -n -H test.txt # 打印文件名，子串所在行的行号和该行的内容
\$ grep "consists[[:space:]]of" -n -o test.txt # 仅仅打印行号和匹配到的子串本身的内容

\$ awk '/consists of/{ printf("%s:%d:%s\n",FILENAME, FNR, \$0)}' test.txt  #看到没？和grep的结果一样
\$ sed -n -e '/consists of/=;/consists of/p' test.txt #同样可以打印行号``````

Description:

• ``` awk ``` ``` grep ``` , ``` sed ``` through pattern matching, but each has its own strengths and will continue to be used and compared in subsequent chapters to discover their respective advantages
• Here, let's think of the contents of the file as a large string, in a later section will be devoted to the operation of the file, so the contents of the file to hold the operation of strings will have a more in-depth analysis and introduction

### Subchain replacement

Substrucing is replacing a specified subchain with another string, which contains the operation of "insert subchain" and "delete subchain". F or example, before you want to insert a string into a subchain, you can replace the original subchain with a "substring plus a new string" and, if you want to delete a substring, replace the subchain with an empty string. H owever, some tools provide some specialized usage to insert and delete subchains, so the nerds will also be specifically introduced. In addition, in order to replace a sub-string, it is generally first to find the sub-string (query sub-string), and then replace it, in essence, many tools in the use and design of such a point.

#### Example: Replace spaces in variable var with underscores

Do ``` {} ``` using the operator? Tutorials for people in the net

``````\$ var="get the length of me"
\$ echo \${var/ /_}        #把第一个空格替换成下划线
get_the length of me
\$ echo \${var// /_}       #把所有空格都替换成下划线
get_the_length_of_me``````

With ``` awk ``` ``` awk ``` the minimum replacement function ``` sub ``` and the global replacement function ``` gsub ``` similar to ``` / ``` and ``` // ```

``````\$ echo \$var | awk '{sub(" ", "_", \$0); printf("%s\n", \$0);}'
get_the length of me
\$ echo \$var | awk '{gsub(" ", "_", \$0); printf("%s\n", \$0);}'
get_the_length_of_me``````

With ``` sed ``` subsether substitution is the speciality of ``` sed ```

``````\$ echo \$var | sed -e 's/ /_/'    #s <= substitude
get_the length of me
\$ echo \$var | sed -e 's/ /_/g'   #看到没有，简短两个命令就实现了最小匹配和最大匹配g <= global
get_the_length_of_me``````

Have you ``` tr ``` command? You can replace a single character with:

``````\$ echo \$var | tr " " "_"
get_the_length_of_me
\$ echo \$var | tr '[a-z]' '[A-Z]'   #这个可有意思了，把所有小写字母都替换为大写字母
GET THE LENGTH OF ME``````

Description: ``` sed ``` also has interesting label usage, let's talk about it.

One interesting string substitution is the inverted of the entire file line, ``` tac ``` command, which reverses all the lines in the file. In a sense, sorting is actually a string replacement.

### Insert a subchain

Insert a subchain at a specified location, which may be the location of a subchain or a length from the beginning of a file. Through the above exercises, we found that the two are actually similar.

Formula: Insert sub-strings - Replace "old sub-strings" with "old sub-strings and new sub-strings" or "new sub-strings and old subchains"

#### Example: Insert an underscore before or after a space in the var string

With:

``````\$ var="get the length of me"
\$ echo \${var/ /_ }        #在指定字符串之前插入一个字符串
get_ the length of me
\$ echo \${var// /_ }
get_ the_ length_ of_ me
\$ echo \${var/ / _}        #在指定字符串之后插入一个字符串
get _the length of me
\$ echo \${var// / _}
get _the _length _of _me``````

Do the others still use demos? Here's a look at how sed is used to ``` ) ``` insert characters, because its labeling features are interesting to explain: ``` ( ``` and ) put mismatched strings ``` \1 ``` a label, in the order of ``` \2 ``` ...

``````\$ echo \$var | sed -e 's/\( \)/_\1/'
get_ the length of me
\$ echo \$var | sed -e 's/\( \)/_\1/g'
get_ the_ length_ of_ me
\$ echo \$var | sed -e 's/\( \)/\1_/'
get _the length of me
\$ echo \$var | sed -e 's/\( \)/\1_/g'
get _the _length _of _me``````

See if the order of ``` sed ``` labels is ``` \1 ``` ``` \2 ``` ..., see? ``` \2 ``` ``` \1 ``` are changed, ``` the ``` ``` get ``` and get are changed:

``````\$ echo \$var | sed -e 's/\([a-z]*\) \([a-z]*\) /\2 \1 /g'
the get of length me``````

``` sed ``` has special insertion instructions, ``` a ``` ``` i ``` mean inserting specified characters after and before the matching line, respectively

``````\$ echo \$var | sed '/get/a test'
get the length of me
test
\$ echo \$var | sed '/get/i test'
test
get the length of me``````

### Remove the subchain

Delete the sub-string: it should be very simple, replace the sub-string with "empty" (nothing) does not become deleted? Let's briefly review the replacement.

#### Example: Remove all spaces from the var string.

Encouragement: Such a replacement does not know what word has become, who recognizes it? B ut Chinese are connected, so how Chinese it is, do you think about it? Originally you are also a language genius, and English is not terrible, you have the talent to learn it, as long as there is this intention.

Re-use ``` {} ```

``````\$ echo \${var// /}
getthelengthofme``````

Use ``` awk ```

``\$ echo \$var | awk '{gsub(" ","",\$0); printf("%s\n", \$0);}'``

Then use ``` sed ```

``````\$ echo \$var | sed 's/ //g'
getthelengthofme``````

There is also ``` tr ``` ``` tr ``` can also remove the space, see

``````\$ echo \$var | tr -d " "
getthelengthofme``````

What if you want to delete all strings after the first space? R emember ``` {} ``` of the ``` # ``` and ``` % ``` of the s? I f you don't remember, go back to the beginning of this section and start reviewing. (In fact, deleting substrings and taking substrings is not just two complementary operations, removing some unwanted substrules, and at the same time obtaining other substruces that you want - the world is a "binary" world, very interesting))

### Sub-string comparison

This is simple: ``` test ``` use of the test command? ``` man test ``` 。 I t can be used to determine whether two strings are equal. A lso, is there a relationship between the questions "Is the string equal" and "Can the string match another ``` " ``` I f the two strings match exactly, the two strings are equal. So, the string matching method used above can also be used here.

### Sub-string sorting

Almost forget this important content, sub-string sorting is often used, the common is in alphabetical, numerical order and other positive or reverse order. ``` sort ``` command can be used to do this, as do other line processing commands, by row, ``` cut ``` ``` awk ``` to specify split characters and columns that need to be sorted.

``````\$ var="get the length of me"
\$ echo \$var | tr ' ' '\n' | sort   #正序排
get
length
me
of
the
\$ echo \$var | tr ' ' '\n' | sort -r #反序排
the
of
me
length
get
\$ cat > data.txt
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
41 45 44 44 26 44 42 20 20 38 37 25 45 45 45
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
44 20 30 39 35 38 38 28 25 30 36 20 24 32 33
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
41 33 51 39 20 20 44 37 38 39 42 40 37 50 50
46 47 48 49 50 51 52 53 54 55 56
42 43 41 42 45 42 19 39 75 17 17
\$ cat data.txt | sort -k 2 -n
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
44 20 30 39 35 38 38 28 25 30 36 20 24 32 33
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
41 33 51 39 20 20 44 37 38 39 42 40 37 50 50
42 43 41 42 45 42 19 39 75 17 17
41 45 44 44 26 44 42 20 20 38 37 25 45 45 45
46 47 48 49 50 51 52 53 54 55 56``````

### Sub-string conversion

If letter and numeric characters are used to count, there is a problem with the conversion. The bc command has been introduced in the ``` bc ``` Computing section, and we'll review it here.

``````\$ echo "ibase=10;obase=16;10" | bc
A``````

Description: ``` ibase ``` input input, ``` obase ``` indicates output progress, ``` ibase ``` and ``` obase ``` you can turn how you want!

### Subchain encoding conversion

What is character encoding? D on't introduce this, have you seen those messy pages? M ostly due to the browser display of the "code" and the web page actually used "encoding" inconsistent results. Character encoding usually refers to converting a sequence of "printable" characters into a binary represent, while character decoding performs the opposite process, and if the two processes do not match, the so-called "garbled code" occurs.

To solve the problem of "garbled code"? C ode conversion is required. U nder Linux, we can ``` iconv ``` as a tool to do this. This is often the case when moving files between different operating systems, when switching files between different editors, the chinese character ``` gb2312 ``` while under Linux mostly ``` utf8 ```

``````\$ nihao_utf8=\$(echo "你好")
\$ nihao_gb2312=\$(echo \$nihao_utf8 | iconv -f utf8 -t gb2312)``````

In fact, when programming with Bash, you spend most of your time working with strings, so it's important to master this section.

### Regular expression

The URL address (Uniform Resoure Locator: Unified Resource Locator) is almost the playmate of our daily life, and we've reached the point where we can't leave it, and we do a lot of it, including determining the validity of the URL address, intercepting the various parts of the address (server type, server address, port, path, etc.) and taking further action on each part.

Let's deal with this URL address in detail: ftp://anonymous:[email protected]/software/scim-1.4.7.tar.gz

``\$ url="ftp://anonymous:[email protected]/software/scim-1.4.7.tar.gz"``

Match the URL address to determine the validity of the URL address

``\$ echo \$url | grep "ftp://[a-z]*:[a-z]*@[a-z\./-]*"``

The type of intercept server

``````\$ echo \${url%%:*}
ftp
\$ echo \$url | cut -d":" -f 1
ftp``````

Intercept the domain name

``````\$ tmp=\${url##*@} ; echo \${tmp%%/*}
mirror.lzu.edu.cn``````

Intercept the path

``````\$ tmp=\${url##*@} ; echo \${tmp%/*}
mirror.lzu.edu.cn/software``````

Intercept the file name

``````\$ basename \$url
scim-1.4.7.tar.gz
\$ echo \${url##*/}
scim-1.4.7.tar.gz``````

Intercept file type (extension)

``````\$ echo \$url | sed -e 's/.*[0-9].\(.*\)/\1/g'
tar.gz``````

#### Example: Matches a specific range of rows in a file

Prepare a test file, README, first

``````Chapter 7 -- Exercises

7.1 please execute the program: mainwithoutreturn, and print the return value
of it with the command "echo \$?", and then compare the return of the printf
function, they are the same.

7.2 it will depend on the exection mode, interactive or redirection to a file,
if interactive, the "output" action will accur after the \n char with the line
buffer mode, else, it will be really "printed" after all of the strings have
been stayed in the buffer.

7.3 there is no another effective method in most OS. because argc and argv are
not global variables like environ.``````

And then start experimenting,

Specify the line range before printing out the answer: lines 7 through 9, just to find the answer to question 2

``````\$ sed -n 7,9p README
7.2 it will depend on the exection mode, interactive or redirection to a file,
if interactive, the "output" action will accur after the \n char with the line
buffer mode, else, it will be really "printed" after all of the strings have``````

In fact, because this file content format is very characteristic, there is a simpler way

``````\$ awk '/7.2/,/^\$/ {printf("%s\n", \$0);}' README
7.2 it will depend on the exection mode, interactive or redirection to a file,
if interactive, the "output" action will accur after the \n char with the line
buffer mode, else, it will be really "printed" after all of the strings have
been stayed in the buffer.``````

With this knowledge, it's easy to do this: modify the file name of a file, such as adjusting its encoding, downloading all pdf documents ``` pdf ``` page, and so on. Do it yourself as an exercise.

### Process formatted text

Most of the time, you work with "formatted" text, such as text with fixed lines and ``` /etc/passwd ``` ``` tree ``` like the tree command output, and of course other text with a specific structure.

For the processing of text in tree structures, I can refer to another blog post I wrote earlier: Source Analysis: Static Analysis of C program function call diagrams

In fact, as long as you grasp some characteristics of the characteristic structure, and according to the specific application, it will not be difficult to deal with.

Let's describe the operation of the specific text, ``` /etc/passwd ``` as an example. F or help and usage of this file, check ``` man 5 passwd ``` Here's some meaningful work on this file and related files.

#### Example: Pick the specified column

Select the user name and group ID columns in the /etc/passwd file

``\$ cat /etc/passwd | cut -d":" -f1,4``

Select the group name and group ID columns in the /etc/group file

``\$ cat /etc/group | cut -d":" -f1,3``

#### Example: File association operation

What if I want to find out where all the users are in?

``````\$ join -o 1.1,2.1 -t":" -1 4 -2 3 /etc/passwd /etc/group
root:root
bin:bin
daemon:daemon
lp:lp
pop:pop
nobody:nogroup
falcon:users``````

Description: ``` join ``` command is used to connect two files, somewhat like a connection between two tables in a database. ``` -t ``` a split character, ``` -1 4 -2 3 ``` a connection according to column 4 of the first file and column 3 of the second file, i.e. group ``` ID ``` ``` -o``1.1,2.1 ``` the first column of the first and second columns of the first file are output, so that we get the result we want, but unfortunately, the result is not accurate, and then you will find that:

``````\$ cat /etc/passwd | sort -t":" -n -k 4 > /tmp/passwd
\$ cat /etc/group | sort -t":" -n -k 3 > /tmp/group
\$ join -o 1.1,2.1 -t":" -1 4 -2 3 /tmp/passwd /tmp/group
halt:root
operator:root
root:root
shutdown:root
sync:root
bin:bin
daemon:daemon
lp:lp
pop:pop
nobody:nogroup
falcon:users
games:users``````

You can see that this result is correct, so use join in the future ``` join ``` to this problem, otherwise a more conservative approach seems to be more to ensure correctness, more discussion of file connections can be found in the follow-up information.

It involves handling a specified column in a ``` select ``` connections ``` SQL ``` ``` join ``` usage), ``` order by ``` all of which can be split by specifying a split character, and there are many ``` cut ``` not just ``` awk ``` ``` IFS ``` ``` read ``` IFS, such as: ``` SQL ``` ``` SQL ```

``\$ IFS=":"; cat /etc/group | while read C1 C2 C3 C4; do echo \$C1 \$C3; done``

Therefore, familiar with these uses, our work will become very flexible and interesting.

Here, you need to do a simple exercise, how to convert the user name and user ID corresponding to the column into the corresponding line, that is, the following data:

``````\$ cat /etc/passwd | cut -d":" -f1,3 --output-delimiter=" "
root 0
bin 1
daemon 2``````

Convert to:

``````\$ cat a
root    bin     daemon
0       1       2``````

And convert back, what's the way? Remember that ``` tr ``` ``` paste ``` ``` split ``` and so on can be used.

Reference method:

• Positive conversion: The positive conversion is done ``` user ``` intercepting a column of user of the user user, then intercepting the user ``` ID ``` into ``` id ``` and then connecting the two files together with ``` paste -s ``` command
• Reversal: The result of the positive conversion is split into two files with ``` split -1 ``` ``` \t ``` and then the two split files are replaced with ``` tr ``` with the split character , ``` \n ``` only the two files are connected together with the ``` paste ``` command, thus completing the reversal.

## Postscript

• This section was supposed to be finished last week, but these days it's too busy to write a "first draft" until there's time to add specific examples. The examples in this section should be the most interesting, and all of them need to be studied carefully.
• After writing the above part seems to be more than 1 o'clock, ``` check ``` the typos and syntax or something, and then added a section, that is, "the storage structure of strings", to now almost half ``` half past 2 ``` good night, friends.
• 26, adding two sub-string conversions and sub-string encoding conversions, as well as an example ``` URL ``` addresses.