4.8. 文本处理

4.8. 文本处理
上一页	第 4 章 Shell 命令	下一页

4.8.1. iconv - Convert encoding of given files from one encoding to another

4.8.1.1. cconv - A iconv based simplified-traditional chinese conversion tool

cconv是建立在iconv之上，可以UTF8编码直接转换，并增加了词转换。

sudo apt-get install cconv

使用cconv进行简繁转换的方法为：

cconv -f UTF8-CN -t UTF8-HK zh-cn.txt -o zh-hk.txt

4.8.1.2. uconv - convert data from one encoding to another

安装

sudo apt-get install libicu-dev

例子

$ uconv -f cp1252 -t UTF-8 -o file_in_utf8.txt file_in_cp1252_encoding.txt

4.8.2. 字符串处理命令expr

		
字符串处理命令expr用法简介:
名称：expr
用途:求表达式变量的值。
语法: expr Expression
实例如下:
例子1:字串长度
shell>> expr length "this is a test content";
22
例子2:求余数
shell>> expr 20 % 9
2
例子3:从指定位置处截取字符串
shell>> expr substr "this is a test content" 3 5
is is
例子4:指定字符串第一次出现的位置
shell>> expr index "testforthegame" s
3
例子5:字符串真实重现
shell>> expr quote thisisatestformela
thisisatestformela

4.8.3. cat - concatenate files and print on the standard output

-b	不对空白行编号。
-e	使用 $ 字符显示行尾。
-n	从 1 开始对所有输出行编号。
-q	使用静默操作（禁止错误消息）。
-r	将所有多个空行替换为单行（“压缩”空白）。
-t	将制表符显示为 ^I。
-u	不对输出进行缓冲。
-v	可视地显示非打印控制字符。

4.8.3.1. -s, --squeeze-blank suppress repeated empty output lines

-S 将多个空白行压缩到单行中（与 -r 相同）

			
$ cat >> /tmp/test <<EOF
Line1

Line2


Line3




Line4


Line5

EOF

$ cat -s /tmp/test
Line1

Line2

Line3

Line4

Line5

4.8.3.2. -v, --show-nonprinting use ^ and M- notation, except for LFD and TAB

显示控制字符。例如Tab等，下面例子查看文件结尾换行符类型

			
[neo@netkiller ~]# cat -v file.txt
GRANT USAGE ON *.* TO 'esauser'@'localhost' IDENTIFIED BY xxxxxxx; ^M
^M
file^M
2059^M

4.8.3.3. 与管道配合使用

			
[log@logging tmp]$ cat <<EOF | grep 'm'
İsmail
Ahmet
Ali
Elif
Mehmet
EOF
İsmail
Ahmet
Mehmet

多管道

			
cat <<EOF | grep 'm' | tee matched_names.txt
İsmail
Ahmet
Ali
Elif
Mehmet
EOF

4.8.4. nl - number lines of files

$ nl /etc/issue
     1  CentOS release 5.4 (Final)
     2  Kernel \r on an \m

4.8.5. tr - translate or delete characters

		
[:alnum:] ：所有字母字符与数字
[:alpha:] ：所有字母字符
[:blank:] ：所有水平空格
[:cntrl:] ：所有控制字符
[:digit:] ：所有数字
[:graph:] ：所有可打印的字符(不包含空格符)
[:lower:] ：所有小写字母
[:print:] ：所有可打印的字符(包含空格符)
[:punct:] ：所有标点字符
[:space:] ：所有水平与垂直空格符
[:upper:] ：所有大写字母
[:xdigit:] ：所有 16 进位制的数字

4.8.5.1. 替换字符

":"替换为"\n"

			
$ cat /etc/passwd |tr ":" "\n"

			
[root@gitlab ~]# echo "/opt/netkiller.cn/www.netkiller.cn" | tr -- '/.' ':-'
:opt:netkiller-cn:www-netkiller-cn

4.8.5.2. 英文大小写转换

使用 tr '[:lower:]' '[:upper:]' 将小写字母替换成大写字母

			
[root@localhost ~]# echo "Helloworld" | tr '[:lower:]' '[:upper:]'
HELLOWORLD

替换整段文字

						
[root@localhost ~]# cat /etc/redhat-release 
CentOS Linux release 7.5.1804 (Core) 

[root@localhost ~]# cat /etc/redhat-release | tr '[:lower:]' '[:upper:]'
CENTOS LINUX RELEASE 7.5.1804 (CORE)

			
[root@localhost ~]# echo "Netkiller" |  tr ‘a-z’ ‘A-Z’
NETKILLER

			
neo@MacBook-Pro-M2 ~> uuidgen
71386AEE-C468-44E1-A0A3-FB4EBB4600AA

neo@MacBook-Pro-M2 ~> uuidgen | tr [:upper:] [:lower:]
3d807c48-ef5f-4297-869f-120cb713f752

4.8.5.3. [CHAR] 和 [CHARREPEAT]

			
[root@localhost ~]# echo "1234567890" | tr ‘1-5′ ‘[A*]‘ 
AAAAA67890

[root@localhost ~]# echo "1234567890" | tr ‘1-9′ ‘[A*5]BCDE’
AAAAABCDE0

4.8.5.4. -s, --squeeze-repeats replace each input sequence of a repeated character that is listed in SET1 with a single occurrence of that character

删除重复的字符

			
[root@localhost ~]# echo "My      nickname  is      netkiller." | tr -s ' '
My nickname is netkiller.	

[root@localhost ~]# echo "aaaabbbbccccdddd." | tr -s 'a'
abbbbccccdddd.

[root@localhost ~]# echo "aaaabbbbccccdddd." | tr -s 'a-z'
abcd.

4.8.5.5. -d, --delete delete characters in SET1, do not translate

删除字符

			
[root@localhost ~]# echo "My nickname is netkiller" | tr -d ' '
Mynicknameisnetkiller
			
[root@localhost ~]# md5sum /etc/issue | tr -d [0-9] 
ffedfcfbdcaebdec  /etc/issue

删除控制字符

			 
[root@netkiller ~]# cat file | tr -d [:cntrl:]

4.8.6. cut - remove sections from each line of files

列操作

$ last | grep  'neo' | cut -d ' ' -f1

$ cat /etc/passwd | cut -d ':' -f1
root
daemon
bin
sys
sync
games
man
lp
mail
news
uucp
proxy

$ cat /etc/passwd | cut -d ':' -f1,3,4

# cat /etc/passwd | cut -d ':' -f1,6
root:/root
bin:/bin
daemon:/sbin
adm:/var/adm
lp:/var/spool/lpd
sync:/sbin
shutdown:/sbin
halt:/sbin
mail:/var/spool/mail
uucp:/var/spool/uucp
operator:/root
games:/usr/games
gopher:/var/gopher
ftp:/var/ftp
nobody:/
vcsa:/dev
saslauth:/var/empty/saslauth
postfix:/var/spool/postfix
sshd:/var/empty/sshd
rpc:/var/cache/rpcbind
rpcuser:/var/lib/nfs
nfsnobody:/var/lib/nfs
ntp:/etc/ntp
nagios:/var/log/nagios

行操作

$ cat /etc/passwd | cut -c 1-4
root
daem
bin:
sys:
sync
game
man:

$ echo "No such file or directory"| cut -c4-7
such

$ echo "No such file or directory"| cut -c -8
No such

$ echo "No such file or directory"| cut -c-8
No such

4.8.7. printf - format and print data

printf "%d\n" 1234

$ printf "\033[1;33m TEST COLOR \n\033[m"

4.8.8. Free `recode' converts files between various character sets and surfaces.

Following will convert text files between DOS, Mac, and Unix line ending styles:

		
$ recode /cl../cr <dos.txt >mac.txt
$ recode /cr.. <mac.txt >unix.txt
$ recode ../cl <unix.txt >dos.txt

4.8.9. /dev/urandom 随机字符串

		
[neo@test .deploy]$ echo `< /dev/urandom tr -dc A-Z-a-z-0-9 | head -c 8`
GidAuuNN
[neo@test .deploy]$ echo `< /dev/urandom tr -dc A-Z-a-z-0-9 | head -c 8`
UyGaWSKr

我常常使用这样的随机字符初始化密码

		
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:alnum:] | head -c 8`
xig8Meym
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:alnum:] | head -c 8`
23Ac1vZg
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:digit:] | head -c 8`
73652314
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 8`
GO_o>OnJ
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 10`
iGy0FS/aO5
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 50`
;`E^{5(T4v~5$YovW.?%_?9la<`+qPcRh@7mD\!Whx;MJZVQ\K
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:print:] | head -c 50`
fy$[#:'(')jt'gp1/g-)d~p]8 :r9i;MO2d!8M<?Qs3t:QgK$O
[neo@test .deploy]$ echo `< /dev/urandom tr -dc [:graph:] | head -c 50`
6SivJ5y$/FTi8mf}rrqE&s0"WkA}r;uK-=MT!Wp0UlL_lF0|bL

批量生成

		
for i in {1..10}
do
echo `< /dev/urandom tr -dc A-Z-a-z-0-9 | head -c 8`
done

		
# cat /dev/urandom | tr -cd [:alnum:] | fold -w30 | head -n 20
AVqROzjF6ZATJGv2J6PzDHp3jLpKV4
ONt68UFNDwgXpSnLBV7oRDX3VLRYsX
EZTWCGvZc3mIEeuw9sxMtV8ZkzVRJv
BhUiv0a7utsjZFLYpKGZrY5aDXcZL4
5YfUl2hmDT1O9X61DRYg4wSp4lXoXX
ykyPJxH47PzxnNGlujIUF98ZtB01H0
QyP53mksQN8bCNNo1fSD3RtqhhEGfa
u2RkT1M9GUQF4a6O18tG5WD97OOXze
Whm5X7398Q8L9BONN8k2oLy9CL37JO
TmGQz7WB6WnkjhyB4wrBHBJ3HMIRyf
hww43yvddUDYUnbNOKjhv3sLhCA4YD
uY6zQtBC6miwLUl3jkCVVA0Xu8ASgj
jv58qu46VW7LvRIq4txNE8bG9NBlZl
pzaMkydAiCHCF5H2oQVqMn4DTTYgNL
yoN2A9LyrCwLfjP1ad9HMAwxExJL5i
J27iy2L90m9dpcPLJ8tl46GGb9xqmQ
6YwFCvuPHyyEwnctUTpqLFcvUafVZ2
Nuq9XgIgRQGynjlVqGLMOpO0MkGpsn
tChkRG7eoRuKVXgW7ccTGx45E54K3Y
qPv48XqdGlOrdULCOGZ45kwJ1v5kVX

4.8.10. col - filter reverse line feeds from input

清除 ^M 字符

$ cat oldfile | col -b > newfile

4.8.11. apg - generates several random passwords

sudo apt-get install apg

$ apg

Please enter some random data (only first 16 are significant)
(eg. your old password):>
imlogNukcel5 (im-log-Nuk-cel-FIVE)
Drocdaf1 (Droc-daf-ONE)
fagJook0 (fag-Jook-ZERO)
heabugJer4 (heab-ug-Jer-FOUR)
5OsEsudy (FIVE-Os-Es-ud-y)
IrjOgneagOc9 (Irj-Og-neag-Oc-NINE)


$ apg -M SNCL -m 16
WoidWemFut6dryn,
byRowpEus-Flutt0
|QuogCagFaycsic0
ojHoadCyct4Freg_
Vir9blir`orhohoo
bapOip?Ibreawov2

4.8.12. head/tail

head -c 17 | tail -c 1

4.8.12.1. 彩色输出

			
printf "%s" $(printf '\033[0;31m'); tail /etc/passwd

			
tail -f example.log | sed \
-e "s/FATAL/"$'\e[31m'"&"$'\e[m'"/" \
-e "s/ERROR/"$'\e[31m'"&"$'\e[m'"/" \
-e "s/WARNING/"$'\e[33m'"&"$'\e[m'"/" \
-e "s/INFO/"$'\e[32m'"&"$'\e[m'"/" \
-e "s/DEBUG/"$'\e[34m'"&"$'\e[m'"/"

4.8.12.2. 跳过 n 行，输出后面内容

首先看看源文件内容

			
[root@netkiller ~]# head -n 5 /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

现在跳过第一行，显示后面所有内容

			
[root@netkiller ~]# tail -n +2 /etc/passwd
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:65534:65534:Kernel Overflow User:/:/sbin/nologin
systemd-coredump:x:999:996:systemd Core Dumper:/:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
tss:x:59:59:Account used for TPM access:/dev/null:/sbin/nologin
polkitd:x:998:995:User for polkitd:/:/sbin/nologin
sssd:x:997:994:User for sssd:/:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/usr/share/empty.sshd:/sbin/nologin
systemd-oom:x:992:992:systemd Userspace OOM Killer:/:/usr/sbin/nologin
mysql:x:27:27:MySQL Server:/var/lib/mysql:/sbin/nologin
chrony:x:991:991::/var/lib/chrony:/sbin/nologin
docker:x:990:990:Container Administrator:/home/docker:/bin/bash

4.8.12.3. 尾部剪掉 n 行

			
[root@netkiller ~]# nmap -F 121.196.46.109
Starting Nmap 7.91 ( https://nmap.org ) at 2022-08-01 14:52 CST
Nmap scan report for 121.196.46.109
Host is up (0.016s latency).
Not shown: 97 filtered ports
PORT     STATE  SERVICE
113/tcp  closed ident
2000/tcp open   cisco-sccp
5060/tcp open   sip

Nmap done: 1 IP address (1 host up) scanned in 4.38 seconds
[root@netkiller ~]# nmap -F 121.196.46.109 | tail -n +5
PORT     STATE  SERVICE
113/tcp  closed ident
2000/tcp open   cisco-sccp
5060/tcp open   sip

Nmap done: 1 IP address (1 host up) scanned in 1.83 seconds
[root@netkiller ~]# nmap -F 121.196.46.109 | tail -n +5 | head -n -1
PORT     STATE  SERVICE
113/tcp  closed ident
2000/tcp open   cisco-sccp
5060/tcp open   sip

4.8.13. 反转字符串或文件内容

rev - reverse lines of a file or files

反转字符串

# echo hello | rev
olleh

# echo "hello world" | rev
dlrow olleh

反转文件内容

# rev /etc/passwd
hsab/nib/:toor/:toor:0:0:x:toor
nigolon/nibs/:nib/:nib:1:1:x:nib
nigolon/nibs/:nibs/:nomead:2:2:x:nomead
nigolon/nibs/:mda/rav/:mda:4:3:x:mda
nigolon/nibs/:dpl/loops/rav/:pl:7:4:x:pl
cnys/nib/:nibs/:cnys:0:5:x:cnys
nwodtuhs/nibs/:nibs/:nwodtuhs:0:6:x:nwodtuhs
tlah/nibs/:nibs/:tlah:0:7:x:tlah
nigolon/nibs/:liam/loops/rav/:liam:21:8:x:liam
nigolon/nibs/:pcuu/loops/rav/:pcuu:41:01:x:pcuu
nigolon/nibs/:toor/:rotarepo:0:11:x:rotarepo
nigolon/nibs/:semag/rsu/:semag:001:21:x:semag
nigolon/nibs/:rehpog/rav/:rehpog:03:31:x:rehpog
nigolon/nibs/:ptf/rav/:resU PTF:05:41:x:ptf
nigolon/nibs/:/:ydoboN:99:99:x:ydobon
nigolon/nibs/:ved/:renwo yromem elosnoc lautriv:96:96:x:ascv
nigolon/nibs/:ptn/cte/::83:83:x:ptn
nigolon/nibs/:htualsas/ytpme/rav/:"resu dhtualsaS":67:994:x:htualsas
nigolon/nibs/:xiftsop/loops/rav/::98:98:x:xiftsop
nigolon/nibs/:dhss/ytpme/rav/:HSS detarapes-egelivirP:47:47:x:dhss
hsab/nib/:lqsym/bil/rav/:revres LQSyM:994:894:x:lqsym
hsab/nib/:www/:noitacilppA beW:08:08:x:www
nigolon/nibs/:xnign/ehcac/rav/:resu xnign:894:794:x:xnign

4.8.14. TAB符号与空格处理

4.8.14.1. expand - convert tabs to spaces

转换 TAB 字符为空格

root@netkiller /var/log % yum --showduplicates list httpd | expand
Repository epel is listed more than once in the configuration
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Available Packages
httpd.x86_64                    2.4.6-67.el7.centos                      os     
httpd.x86_64                    2.4.6-67.el7.centos.2                    updates

4.8.14.2. unexpand - convert spaces to tabs

转换空格为TAB符

root@netkiller /var/log % cat /etc/fstab | unexpand -t 16
/dev/vda1	     /	          ext3	     noatime,acl,user_xattr 1 1
proc	     /proc	          proc	     defaults	           0 0
sysfs	     /sys	          sysfs	     noauto	           0 0
debugfs	     /sys/kernel/debug    debugfs    noauto	           0 0
devpts	     /dev/pts	          devpts     mode=0620,gid=5       0 0

将16个空格替换为一个TAB符

4.8.15. grep, egrep, fgrep, rgrep - print lines matching a pattern

	
Linux grep (global regular expression) 命令用于查找文件里符合条件的字符串或正则表达式。
grep 指令用于查找内容包含指定的范本样式的文件，如果发现某文件的内容符合所指定的范本样式，预设 grep 指令会把含有范本样式的那一列显示出来。若不指定任何文件名称，或是所给予的文件名为 -，则 grep 指令会从标准输入设备读取数据。

语法 grep [options] pattern [files]
或
grep [-abcEFGhHilLnqrsvVwxy][-A<显示行数>][-B<显示列数>][-C<显示列数>][-d<进行动作>][-e<范本样式>][-f<范本文件>][--help][范本样式][文件或目录...]
pattern - 表示要查找的字符串或正则表达式。
files - 表示要查找的文件名，可以同时查找多个文件，如果省略 files 参数，则默认从标准输入中读取数据。

常用选项：：
-i：忽略大小写进行匹配。
-v：反向查找，只打印不匹配的行。
-n：显示匹配行的行号。
-r：递归查找子目录中的文件。
-l：只打印匹配的文件名。
-c：只打印匹配的行数。
更多参数说明：
-a 或 --text : 不要忽略二进制的数据。
-A<显示行数> 或 --after-context=<显示行数> : 除了显示符合范本样式的那一列之外，并显示该行之后的内容。
-b 或 --byte-offset : 在显示符合样式的那一行之前，标示出该行第一个字符的编号。
-B<显示行数> 或 --before-context=<显示行数> : 除了显示符合样式的那一行之外，并显示该行之前的内容。
-c 或 --count : 计算符合样式的列数。
-C<显示行数> 或 --context=<显示行数>或-<显示行数> : 除了显示符合样式的那一行之外，并显示该行之前后的内容。
-d <动作> 或 --directories=<动作> : 当指定要查找的是目录而非文件时，必须使用这项参数，否则grep指令将回报信息并停止动作。
-e<范本样式> 或 --regexp=<范本样式> : 指定字符串做为查找文件内容的样式。
-E 或 --extended-regexp : 将样式为延伸的正则表达式来使用。
-f<规则文件> 或 --file=<规则文件> : 指定规则文件，其内容含有一个或多个规则样式，让grep查找符合规则条件的文件内容，格式为每行一个规则样式。
-F 或 --fixed-regexp : 将样式视为固定字符串的列表。
-G 或 --basic-regexp : 将样式视为普通的表示法来使用。
-h 或 --no-filename : 在显示符合样式的那一行之前，不标示该行所属的文件名称。
-H 或 --with-filename : 在显示符合样式的那一行之前，表示该行所属的文件名称。
-i 或 --ignore-case : 忽略字符大小写的差别。
-l 或 --file-with-matches : 列出文件内容符合指定的样式的文件名称。
-L 或 --files-without-match : 列出文件内容不符合指定的样式的文件名称。
-n 或 --line-number : 在显示符合样式的那一行之前，标示出该行的列数编号。
-o 或 --only-matching : 只显示匹配PATTERN 部分。
-q 或 --quiet或--silent : 不显示任何信息。
-r 或 --recursive : 此参数的效果和指定"-d recurse"参数相同。
-s 或 --no-messages : 不显示错误信息。
-v 或 --invert-match : 显示不包含匹配文本的所有行。
-V 或 --version : 显示版本信息。
-w 或 --word-regexp : 只显示全字符合的列。
-x --line-regexp : 只显示全列符合的列。
-y : 此参数的效果和指定"-i"参数相同。

4.8.15.1. 删除空行

$ cat file | grep '.'

4.8.15.2. -v, --invert-match

grep -v "grep"

[root@development ~]# ps ax | grep httpd
 6284 ?        Ss     0:10 /usr/local/httpd-2.2.14/bin/httpd -k start
 8372 ?        S      0:00 perl ./wrapper.pl -chdir -name httpd -class com.caucho.server.resin.Resin restart
19136 ?        S      0:00 /usr/local/httpd-2.2.14/bin/httpd -k start
19749 pts/1    R+     0:00 grep httpd
31530 ?        Sl     0:57 /usr/local/httpd-2.2.14/bin/httpd -k start
31560 ?        Sl     1:12 /usr/local/httpd-2.2.14/bin/httpd -k start
31623 ?        Sl     1:06 /usr/local/httpd-2.2.14/bin/httpd -k start
[root@development ~]# ps ax | grep httpd | grep -v grep
 6284 ?        Ss     0:10 /usr/local/httpd-2.2.14/bin/httpd -k start
 8372 ?        S      0:00 perl ./wrapper.pl -chdir -name httpd -class com.caucho.server.resin.Resin restart
19136 ?        S      0:00 /usr/local/httpd-2.2.14/bin/httpd -k start
31530 ?        Sl     0:57 /usr/local/httpd-2.2.14/bin/httpd -k start
31560 ?        Sl     1:12 /usr/local/httpd-2.2.14/bin/httpd -k start
31623 ?        Sl     1:06 /usr/local/httpd-2.2.14/bin/httpd -k start

4.8.15.3. 输出控制（Output control）

显示行号

			
[root@localhost ~]# grep -n 'ftp' /etc/passwd
12:ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin

-o, --only-matching show only the part of a line matching PATTERN

			
$ curl -s http://www.example.com | egrep -o '<a href="(.*)">.*</a>' | sed -e 's/.*href="\([^"]*\)".*/\1/'

$ mysqlshow | egrep -o "|\w(.*)\w|"
Databases
information_schema
test

			
$ cat file.html | grep -o \
    -E '\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))'

$ cat file.html | grep -o -E 'href="([^"#]+)"'

$ cat sss.html | grep -o -E 'thunder://([^<]+)'

			
neo@MacBook-Pro ~/project % cat WikiTest.java  | grep '@Api'
    @Api(method = GET, uri = "/project/:projectName/wikis/page")
    @Api(method = POST, uri = "/project/:projectName/wiki")
    @Api(method = POST, uri = "/project/:projectName/wiki")
    @Api(method = POST, uri = "/project/:projectName/wiki")

neo@MacBook-Pro ~/project % cat WikiTest.java  | egrep -o 'method\s=\s.+,\suri\s=\s.+"'
method = GET, uri = "/project/:projectName/wikis/page"
method = POST, uri = "/project/:projectName/wiki"
method = POST, uri = "/project/:projectName/wiki"
method = POST, uri = "/project/:projectName/wiki"

IP 地址

# grep rhost /var/log/secure | grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b"

UUID

				
neo@MacBook-Pro ~ % curl -s -X POST --user 'api:secret' -d 'grant_type=password&username=netkiller@msn.com&password=123456' http://localhost:8080/oauth/token | grep -o -E '"access_token":"([0-9a-f-]+)"'
"access_token":"863ef5df-6448-40a6-8809-f6f4b680689b"

行列转换

				
$ grep -o . <<< "Helloworld"
H
e
l
l
o
w
o
r
l
d

递归操作

递归查询

$ sudo grep -r 'neo' /etc/*

递归替换

			<![CDATA[
for file in $( grep -rl '8800.org' *  | grep -v .svn ); do
    echo item: $file
	[ -f $file ] && sed -e 's/8800\.org/sf\.net/g' -e 's/netkiller/neo/g' $file >$file.bak; cp $file.bak $file;
done

-c, --count print only a count of matching lines per FILE

$ cat /etc/resolv.conf
nameserver localhost
nameserver 208.67.222.222
nameserver 208.67.220.220
nameserver 202.96.128.166
nameserver 202.96.134.133
$ grep -c nameserver /etc/resolv.conf
5

# grep -c GET /www/logs/access.log
188460

# grep -c POST /www/logs/access.log
421

binary file matches

			
log@logging ~/netkiller> grep '1052302282228360003' spring.2023-02-28.log
grep: spring.2023-02-28.log: binary file matches

虽然这是文本文件，但是文件中含有二进制内容输出，导致 grep 误以为是二进制文件

解决方法 -a, --text equivalent to --binary-files=text

		
log@logging ~/netkiller> grep -a '1052302282228360003' spring.2023-02-28.log

4.8.15.4. Context control

-A, --after-context=NUM print NUM lines of trailing context

返回匹配当前行至下面N行

# grep -A1 game /etc/passwd
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin

# grep -A2 game /etc/passwd
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin

-B, --before-context=NUM print NUM lines of leading context

返回匹配当前行至上面N行

# grep -B1 game /etc/passwd
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin

# grep -B2 game /etc/passwd
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin

-C, --context=NUM print NUM lines of output context

-NUM same as --context=NUM

neo@neo-OptiPlex-380:~$ grep -C 1 new /etc/passwd
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh

neo@neo-OptiPlex-380:~$ grep -C 5 new /etc/passwd
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
proxy:x:13:13:proxy:/bin:/bin/sh
www-data:x:33:33:www-data:/var/www:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
list:x:38:38:Mailing List Manager:/var/list:/bin/sh

# grep -3 game /etc/passwd
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin

--color

# grep --color root  /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

可以通过alias别名启用--color选项

alias egrep='egrep --color=auto'
alias fgrep='fgrep --color=auto'
alias grep='grep --color=auto'

加入.bashrc中，每次用户登录将自动生效

			
# enable color support of ls and also add handy aliases
if [ -x /usr/bin/dircolors ]; then
    test -r ~/.dircolors && eval "$(dircolors -b ~/.dircolors)" || eval "$(dircolors -b)"
    alias ls='ls --color=auto'
    #alias dir='dir --color=auto'
    #alias vdir='vdir --color=auto'

    alias grep='grep --color=auto'
    alias fgrep='fgrep --color=auto'
    alias egrep='egrep --color=auto'
fi

4.8.15.5. Regexp selection and interpretation

n 开头

$ grep '^n' /etc/passwd
news:x:9:9:news:/var/spool/news:/bin/sh
nobody:x:65534:65534:nobody:/nonexistent:/bin/sh
neo:x:1000:1000:neo chan,,,:/home/neo:/bin/bash
nagios:x:116:127::/var/run/nagios2:/bin/false

bash 结尾

$ grep 'bash$' /etc/passwd
root:x:0:0:root:/root:/bin/bash
neo:x:1000:1000:neo chan,,,:/home/neo:/bin/bash
postgres:x:114:124:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash
cvsroot:x:1001:1001:cvsroot,,,,:/home/cvsroot:/bin/bash
svnroot:x:1002:1002:subversion,,,,:/home/svnroot:/bin/bash

中间包含 root

$ grep '.*root' /etc/passwd
root:x:0:0:root:/root:/bin/bash
cvsroot:x:1001:1001:cvsroot,,,,:/home/cvsroot:/bin/bash
svnroot:x:1002:1002:subversion,,,,:/home/svnroot:/bin/bash

.*

			
$ curl -s http://www.example.com | egrep -o '<a href=(.*)>.*</a>'

2010:(13|14|15|16)

regular 匹配一组

egrep "2010:(13|14|15|16)"  access.2010-11-18.log > apache.log

ps ax |grep -E "mysqld|httpd|resin"

			
neo@MacBook-Pro-Neo ~> cat /etc/passwd | egrep -e "root|daemon"
root:*:0:0:System Administrator:/var/root:/bin/sh
daemon:*:1:1:System Services:/var/root:/usr/bin/false
_cvmsroot:*:212:212:CVMS Root:/var/empty:/usr/bin/false

[]与{}

源文件

# cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Sat Sep 10 00:25:46 2011
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=091f295e-ea6d-4f57-9314-e2333f7ebff7 /                       ext4    defaults        1 1
UUID=b3661a0b-2c50-4e18-8030-be2d043cbfc4 /www                    ext4    defaults        1 2
UUID=4d3468de-a2ac-451c-b693-3bdca8832096 swap                    swap    defaults        0 0
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0

匹配每行包含4个连续字符的字符串的行。

# grep '[A-Z]\{4\}' /etc/fstab
UUID=091f295e-ea6d-4f57-9314-e2333f7ebff7 /                       ext4    defaults        1 1
UUID=b3661a0b-2c50-4e18-8030-be2d043cbfc4 /www                    ext4    defaults        1 2
UUID=4d3468de-a2ac-451c-b693-3bdca8832096 swap                    swap    defaults        0 0

-P, --perl-regexp Perl正则表达式

Interpret PATTERN as a Perl regular expression. This is highly experimental and grep -P may warn of unimplemented features.

取网卡IP地址

			
[root@netkiller ~]# ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.104  netmask 255.255.255.0  broadcast 192.168.1.255
        ether 00:16:3e:14:2f:9e  txqueuelen 1000  (Ethernet)
        RX packets 3192683236  bytes 793770390138 (739.2 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3115395437  bytes 2842927192254 (2.5 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@netkiller ~]# ip -4 addr show "eth0" | grep -oP '(?<=inet\s)\d+(\.\d+){3}'
192.168.1.104
[root@netkiller ~]#

取出 orderId=105230428153439001 中的订单号

			
root@logging ~# echo "<a href='https://api.netkiller.cn/neo-service/orderComputerRetry?orderId=1052304281534390016&orderStatus=1'>重试</a>" | grep -oP '(?<=orderId\=\d)\d+'

			
[neo@netkiller nginx]$ grep -Po '\w+\.js' www.netkiller.cn.access.log
index.js
min.js
min.js
mCustomScrollbar.js
min.js
ajax_gd.js
ajax.js
validation.js
AC_RunActiveContent.js
WdatePicker.js
cookie.js
msg_modal.js
all.js
common.js
commonjs.js
swfobject.js
dateutil.js
form.js
live800.js
lang.js
cycle2.js
min.js
carousel.js
tabify.js
image.js
min.js
ctrl.js
packed.js
min.js
common.js

4.8.15.6. fgrep

^M 处理

fgrep -rl `echo -ne '\r'` .

find . -type f -exec grep $'\r' {} +

4.8.15.7. egrep

egrep = grep -E 在egrep中不许看使用转意字符，例如

# grep '\(oo\).*\1' /etc/passwd
root:x:0:0:root:/root:/bin/bash

# grep -E '(oo).*\1' /etc/passwd
root:x:0:0:root:/root:/bin/bash

# egrep '(oo).*\1' /etc/passwd
root:x:0:0:root:/root:/bin/bash

$ snmpwalk -v2c -c public 172.16.1.254 | egrep -i 'if(in|out)'

for pid in $(ps -axf |grep 'php-cgi' | egrep egrep "0:00.(6|7|8|9)"'{print $1}'); do kill -9 $pid; done

for pid in $(ps -axf |grep 'php-cgi' | egrep "0:(0|1|2|3|4|5)0.(6|7|8|9)" |awk '{print $1}'); do kill -9 $pid; done

匹配多个条件

			
[root@localhost src]# egrep "^r|^d" /etc/group
root:x:0:
daemon:x:2:
disk:x:6:
dialout:x:18:
dbus:x:81:
render:x:998:
docker:x:991:www,gitlab-runner

需求：日志如下，需要取出 orderId=1052304281528490008 中的订单号 1052304281528490008

			
[2023-04-28 15:28:54] [netkiller-5f6fb96b97-mh4rm] [ERROR] [ConsumeMessageThread_3] cn.netkiller.service.factory.OrderComputeFactory.execute(OrderComputeFactory.java:86) - <a href='https://api.netkiller.cn/netkiller/orderComputerRetry?orderId=1052304281528380003&orderStatus=1'>重试</a>,订单计算执行失败，OrderComputeDto{orderId=1052304281528380003, orderStatus=1, orderType=10, platformId=103, platformName='全球购骑士卡', productType=79, stationId=9482},异常信息：192.168.11.107:8080 failed to respond executing POST http://test-service/test-service/order/addOrderPayInfo
[2023-04-28 15:29:01] [netkiller-5f6fb96b97-mh4rm] [ERROR] [ConsumeMessageThread_19] cn.netkiller.service.factory.OrderComputeFactory.execute(OrderComputeFactory.java:86) - <a href='https://api.netkiller.cn/netkiller/orderComputerRetry?orderId=1052304281528410008&orderStatus=1'>重试</a>,订单计算执行失败，OrderComputeDto{orderId=1052304281528410008, orderStatus=1, orderType=10, platformId=103, platformName='全球购骑士卡', productType=79, stationId=39094},异常信息：192.168.9.78:8080 failed to respond executing POST http://test-service/test-service/station/settleInfo/getSettleInfoByStationId?stationId=39094
[2023-04-28 15:29:00] [netkiller-5f6fb96b97-6qgqj] [ERROR] [ConsumeMessageThread_1] cn.netkiller.service.factory.OrderComputeFactory.execute(OrderComputeFactory.java:86) - <a href='https://api.netkiller.cn/netkiller/orderComputerRetry?orderId=1052304281528430003&orderStatus=1'>重试</a>,订单计算执行失败，OrderComputeDto{orderId=1052304281528430003, orderStatus=1, orderType=10, platformId=1587, platformName='平安好车主', productType=820, stationId=17749},异常信息：Connect to 192.168.9.78:8080 [/192.168.9.78] failed: Connection refused (Connection refused) executing POST http://test-service/test-service/order/addOrderStationSettleInfo
[2023-04-28 15:29:13] [netkiller-5f6fb96b97-6qgqj] [ERROR] [ConsumeMessageThread_3] cn.netkiller.service.factory.OrderComputeFactory.execute(OrderComputeFactory.java:86) - <a href='https://api.netkiller.cn/netkiller/orderComputerRetry?orderId=1052304281528490008&orderStatus=1'>重试</a>,订单计算执行失败，OrderComputeDto{orderId=1052304281528490008, orderStatus=1, orderType=10, platformId=1293, platformName='高德地图-api', productType=674, stationId=39697},异常信息：192.168.10.10:8080 failed to respond executing POST http://test-service/test-service/order/addOrderBaseInfo
[2023-04-28 15:32:14] [netkiller-5f6fb96b97-mh4rm] [ERROR] [ConsumeMessageThread_14] cn.netkiller.service.factory.OrderComputeFactory.execute(OrderComputeFactory.java:86) - <a href='https://api.netkiller.cn/netkiller/orderComputerRetry?orderId=1052304281532030009&orderStatus=1'>重试</a>,订单计算执行失败，OrderComputeDto{orderId=1052304281532030009, orderStatus=1, orderType=10, platformId=531, platformName='货拉拉API', productType=293, stationId=39496},异常信息：192.168.10.12:8080 failed to respond executing POST http://test-service/test-service/order/addOrderStationSettleInfo
[2023-04-28 15:34:53] [netkiller-5f6fb96b97-mh4rm] [ERROR] [ConsumeMessageThread_18] cn.netkiller.service.factory.OrderComputeFactory.execute(OrderComputeFactory.java:86) - <a href='https://api.netkiller.cn/netkiller/orderComputerRetry?orderId=1052304281534390016&orderStatus=1'>重试</a>,订单计算执行失败，OrderComputeDto{orderId=1052304281534390016, orderStatus=1, orderType=10, platformId=1587, platformName='平安好车主', productType=820, stationId=37711},异常信息：192.168.14.155:8080 failed to respond executing POST http://test-service/test-service/order/addOrderStationSettleInfo

第一步、先初步取出需要的数据

			
root@logging ~# cat prod/netkiller/04/failed.log | egrep -o 'orderId=(.*)&'
orderId=1052304281517470004&
orderId=1052304281517280005&
orderId=1052304281517060003&
orderId=1052304281517370019&
orderId=1052304281517140014&
orderId=1052304281517250005&
orderId=1052304281517140006&

第二步、去掉不需要的字符串，只保留订单号

			
root@logging ~# cat prod/netkiller/04/failed.log | egrep -o 'orderId=(.*)&' | sed -e 's/orderId=//' -e 's/&//' 
1052304281517470004
1052304281517280005
1052304281517060003
1052304281517370019
1052304281517140014
1052304281517250005
1052304281517140006
1052304281517190008
1052304281517440008

第三步、对数据排序

			
root@logging ~# cat prod/netkiller/04/failed.log | egrep -o 'orderId=(.*)&' | sed -e 's/orderId=//' -e 's/&//' | sort 
1052304281439440007
1052304281516190004
1052304281517060003
1052304281517070003
1052304281517110004
1052304281517140006
1052304281517140014
1052304281517160006
1052304281517160013

第四步、去除重复数据

			
root@logging ~# cat prod/netkiller/04/failed.log | egrep -o 'orderId=(.*)&' | sed -e 's/orderId=//' -e 's/&//' | sort | uniq
1052304281439440007
1052304281516190004
1052304281517060003
1052304281517070003
1052304281517110004
1052304281517140006
1052304281517140014

第五步、确认一下去掉了多少重复数据

			
root@logging ~# cat prod/netkiller/04/failed.log | egrep -o 'orderId=(.*)&' | sed -e 's/orderId=//' -e 's/&//' | sort | wc -l
208
root@logging ~# cat prod/netkiller/04/failed.log | egrep -o 'orderId=(.*)&' | sed -e 's/orderId=//' -e 's/&//' | sort | uniq | wc -l
205

第一步

4.8.16. sort - sort lines of text files

$ du -s * | sort -k1,1rn

$ rpm -q -a --qf '%10{SIZE}\t%{NAME}\n' | sort -k1,1n
$ dpkg-query -W -f='${Installed-Size;10}\t${Package}\n' | sort -k1,1n

4.8.16.1. 对列排序

sort -k 具体说来, 你可以使用 -k1,1 来对第一列排序, -k1来对全行排序

# sort -t ':' -k 1 /etc/passwd

ort -n -t ‘ ‘ -k 2 file.txt

多列排序

$ sort -n -t ‘ ‘ -k 2 -k 3 file.txt

4.8.16.2. -s, --stable stabilize sort by disabling last-resort comparison

例如: 如果你要想对两例排序, 先是以第二列, 然后再以第一列, 那么你可以这样. sort -s 会很有用

 sort -k1,1 | sort -s -k2,2

4.8.17. uniq

history | cut -c 8- |sort -r | uniq -u

# netstat -ant|fgrep ":"|cut -b 77-90|sort |uniq -c
      1 CLOSE_WAIT
      1 CLOSING
     88 ESTABLISHED
      7 FIN_WAIT1
      7 FIN_WAIT2
      3 LAST_ACK
      4 LISTEN
      1 SYN_RECV
      1 SYN_SENT
    177 TIME_WAIT

4.8.18. awk

内置变量

	
ARGC               命令行参数个数
ARGV               命令行参数排列
ENVIRON            支持队列中系统环境变量的使用
FILENAME           awk浏览的文件名
FNR                浏览文件的记录数
FS                 设置输入域分隔符，等价于命令行 -F选项
NF                 浏览记录的域的个数
NR                 已读的记录数
OFS                输出域分隔符
ORS                输出记录分隔符
RS                 控制记录分隔符

4.8.18.1. 处理列

# cat /etc/fstab | awk '{print $1}'

4.8.18.2. printf

		
%d 十进制有符号整数
%u 十进制无符号整数
%f 浮点数
%s 字符串
%c 单个字符
%p 指针的值
%e 指数形式的浮点数
%x, %X 无符号以十六进制表示的整数
%0 无符号以八进制表示的整数
%g 自动选择合适的表示法
\n 换行
\f 清屏并换页
\r 回车
\t Tab符
\xhh 表示一个ASCII码用16进表示,其中hh是1到2个16进制数

说明:
(1). 可以在"%"和字母之间插进数字表示最大场宽。
例如: %3d 表示输出3位整型数, 不够3位右对齐。
%9.2f 表示输出场宽为9的浮点数, 其中小数位为2, 整数位为6,小数点占一位, 不够9位右对齐。
%8s 表示输出8个字符的字符串, 不够8个字符右对齐。
如果字符串的长度、或整型数位数超过说明的场宽, 将按其实际长度输出.但对浮点数, 若整数部分位数超过了说明的整数位宽度, 将按实际整数位输出;若小数部分位数超过了说明的小数位宽度, 则按说明的宽度以四舍五入输出.
另外, 若想在输出值前加一些0, 就应在场宽项前加个0。
例如: %04d 表示在输出一个小于4位的数值时, 将在前面补0使其总宽度为4位。
如果用浮点数表示字符或整型量的输出格式, 小数点后的数字代表最大宽度,小数点前的数字代表最小宽度。
例如: %6.9s 表示显示一个长度不小于6且不大于9的字符串。若大于9, 则第9个字符以后的内容将被删除。

echo 1.7 > 2
awk '{printf ("%d\n",$1)} 2
1
awk '{printf ("%f\n",$1)}' 2
1.700000
awk '{printf ("%3.1f\n",$1)}' 2
1.7
awk '{printf ("%4.1f\n",$1)}' 2
1.7
awk '{printf ("%e\n",$1)}' 2

print 拼装rm命令实现，查找文件并删除

#!/bin/sh
LOCATE=/home/samba
find $LOCATE -name '*.eml'>log
find $LOCATE -name '*.nws'>>log
gawk '{print "rm -rf "$1}' log > rmfile
chmod 755 rmfile
./rmfile

4.8.18.3. Pattern(字符匹配)

输出包含（不包含）特定字符的行（sed也可以完成该功能）：
:~$ awk '/[a-c]/ { print }' file.txt
daemon x 1 1 daemon /usr/sbin /bin/sh
bin x 2 2 bin /bin /bin/sh
sys x 3 3 sys /dev /bin/sh
sync x 4 65534 sync /bin /bin/sync
games x 5 60 games /usr/games /bin/sh
man x 6 12 man /var/cache/man /bin/sh
lp x 7 7 lp /var/spool/lpd /bin/sh
mail x 8 8 mail /var/mail /bin/sh
news x 9 9 news /var/spool/news /bin/sh
uucp x 10 10 uucp /var/spool/uucp /bin/sh
proxy x 13 13 proxy /bin /bin/sh
www-data x 33 33 www-data /var/www /bin/sh
backup x 34 34 backup /var/backups /bin/sh
list x 38 38 Mailing List Manager /var/list /bin/sh
irc x 39 39 ircd /var/run/ircd /bin/sh
gnats x 41 41 Gnats Bug-Reporting System (admin) /var/lib/gnats /bin/sh
nobody x 65534 65534 nobody /nonexistent /bin/sh
libuuid x 100 101  /var/lib/libuuid /bin/sh
syslog x 101 103  /home/syslog /bin/false
sshd x 102 65534  /var/run/sshd /usr/sbin/nologin
landscape x 103 108  /var/lib/landscape /bin/false
mysql x 104 112 MySQL Server,,, /var/lib/mysql /bin/false
ntpd x 105 114  /var/run/openntpd /bin/false
postfix x 106 115  /var/spool/postfix /bin/false
nagios x 107 117  /var/lib/nagios /bin/false
chun x 1003 1003 Li Fu Chun,,, /home/chun
munin x 108 118  /var/lib/munin /bin/false


$ awk '!/[a-c]/ { print }' file.txt
root x 0 0 root /root
neo x 1000 1000 neo,,, /home/neo


采用判断来输出特定的列数据：
neo@monitor:~$ sed -e 's/:/ /g' /etc/passwd | awk '$1 == "neo" { print $1 }'
neo

部分包含，不包含指定的字符：
$ awk '$1 ~ /[a-d]/ { print }' file.txt
$ awk '$1 !~ /[a-d]/ { print }' file.txt

Pattern, Pattern

# awk '/www/,/Web/ {print}' /etc/passwd
www:x:80:80:Web User:/www:/bin/bash

# awk '/www/,/[Ww]eb/ {print}' /etc/passwd
www:x:80:80:Web User:/www:/bin/bash

cat /var/log/rinetd.log | awk -F' ' '$7 ~ /0/ {print $1"\t"$2"\t"$7"\t"$8"\t"$9}'

# cat /var/log/rinetd.log | awk -F' ' '$7 ~ /(210|209|210)/ {print $1"\t"$2"\t"$7"\t"$8"\t"$9}'

4.8.18.4. Built-in Variables (NR/NF)

例如 : awk 读入第一笔数据行
"aaa bbb ccc ddd" 之后, 程序中:
$0 之值将是 "aaa bbb ccc ddd"
$1 之值为 "aaa"
$2 之值为 "bbb"
$3 之值为 "ccc"
$4 之值为 "ddd"
$NF 之值为 4
$NR 之值为 1

NR

NR=n 指定n行号

# awk -F':' 'NR==1 {print $(1)}' /etc/passwd
root

# awk -F':' 'NR==2 {print $(1)}' /etc/passwd
bin

取 1，3，4行

# awk 'NR==1; NR==3; NR==4 {print $1}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin

awk ... '{if(NR=1){...}else{exit)}'

$ awk -F' ' '{if(NR==1) print $1}' /etc/issue
Ubuntu

NF

# echo "aaa bbb ccc ddd" | awk  '{print $(NR)}'
aaa
# echo "aaa bbb ccc ddd" | awk  '{print $(NR+1)}'
bbb
# echo "aaa bbb ccc ddd" | awk  '{print $(NR+2)}'
ccc
# echo "aaa bbb ccc ddd" | awk  '{print $(NF)}'
ddd
# echo "aaa bbb ccc ddd" | awk  '{print $(NF-1)}'
ccc
# echo "aaa bbb ccc ddd" | awk  '{print $(NF-2)}'
bbb

uptime | awk '{print $(NF-2)}'

[root@netkiller ~]# netstat -na |awk '/^tcp/ {print NF}' | head -n 1
6


[root@netkiller ~]# netstat -ant |awk '/^tcp/ {print $NF}' | tail -n 5
TIME_WAIT
CLOSE_WAIT
CLOSE_WAIT
LISTEN
LISTEN

[root@netkiller ~]# netstat -ant |awk '/^tcp/ {print $(NF-5)}' | tail -n 5
tcp
tcp
tcp
tcp6
tcp6

练习

使用 ss 命令统计 TCP 状态

[root@netkiller ~]# ss -ant | awk '{++S[$1]} END {for(a in S) print a, S[a]}'
LISTEN 13
CLOSE-WAIT 42
ESTAB 95
State 1
FIN-WAIT-2 20
LAST-ACK 44
SYN-SENT 10
TIME-WAIT 403

[root@netkiller ~]# ss -ant | awk 'BEGIN {stats["CLOSE-WAIT"]=0;stats["ESTAB"]=0;stats["FIN-WAIT-1"]=0;stats["FIN-WAIT-2"]=0;stats["LAST-ACK"]=0;stats["SYN-RECV"]=0;stats["SYN-SENT"]=0;stats["TIME-WAIT"]=0} {++stats[$1]} END {for(a in stats) print a, stats[a]}'
LISTEN 6
SYN-RECV 0
ESTAB 4
CLOSE-WAIT 0
State 1
FIN-WAIT-1 0
LAST-ACK 0
FIN-WAIT-2 0
TIME-WAIT 3
SYN-SENT 0

TCP/IP Status

netstat -ant | awk '/^tcp/ {++state[$NF]} END {for(key in state) print key,"\t",state[key]}'

TIME_WAIT 88
CLOSE_WAIT 6
FIN_WAIT1 9
FIN_WAIT2 9
ESTABLISHED 303
SYN_RECV 126
LAST_ACK 5

ss  | awk '$1 !~ /State/ {++state[$1]} END {for(key in state) print key,"\t",state[key]}'
LAST-ACK 	 1
ESTAB 	 5
FIN-WAIT-2 	 1
CLOSE-WAIT 	 13

用户shell统计

# cat /etc/passwd | awk -F':' '{++shell[$NF]} END {for(key in shell) print key,"\t",shell[key]}'
/sbin/shutdown 	 1
/bin/sh 	 1
/bin/bash 	 3
/sbin/nologin 	 20
/sbin/halt 	 1
/bin/sync 	 1

access.log POST与GET统计

# cat /www/logs/access.log | egrep -o 'GET|POST' | awk '{++method[$NF]} END {for(num in method) print num, method[num]}'
POST 422
GET 188571

# cat /www/logs/access.log | egrep -o 'GET|POST' | awk '{++method[$1]} END {for(num in method) print num, method[num]}'
POST 422
GET 188573

4.8.18.5. Built-in Functions

length

			
# awk -F: 'length($1)<4 {print NR , $1}' /etc/passwd
2 bin
4 adm
5 lp
14 ftp
20 ntp
22 rpc
25 www

toupper() 转为大写字母

			
[root@localhost ~]# awk '{print toupper($1)}' /etc/passwd
ROOT:X:0:0:ROOT:/ROOT:/BIN/BASH
BIN:X:1:1:BIN:/BIN:/SBIN/NOLOGIN
DAEMON:X:2:2:DAEMON:/SBIN:/SBIN/NOLOGIN
ADM:X:3:4:ADM:/VAR/ADM:/SBIN/NOLOGIN
LP:X:4:7:LP:/VAR/SPOOL/LPD:/SBIN/NOLOGIN
SYNC:X:5:0:SYNC:/SBIN:/BIN/SYNC
SHUTDOWN:X:6:0:SHUTDOWN:/SBIN:/SBIN/SHUTDOWN
HALT:X:7:0:HALT:/SBIN:/SBIN/HALT
MAIL:X:8:12:MAIL:/VAR/SPOOL/MAIL:/SBIN/NOLOGIN
OPERATOR:X:11:0:OPERATOR:/ROOT:/SBIN/NOLOGIN
GAMES:X:12:100:GAMES:/USR/GAMES:/SBIN/NOLOGIN
FTP:X:14:50:FTP
NOBODY:X:99:99:NOBODY:/:/SBIN/NOLOGIN
SYSTEMD-NETWORK:X:192:192:SYSTEMD
DBUS:X:81:81:SYSTEM
POLKITD:X:999:997:USER
POSTFIX:X:89:89::/VAR/SPOOL/POSTFIX:/SBIN/NOLOGIN
CHRONY:X:998:996::/VAR/LIB/CHRONY:/SBIN/NOLOGIN
SSHD:X:74:74:PRIVILEGE-SEPARATED
NTP:X:38:38::/ETC/NTP:/SBIN/NOLOGIN
DHCPD:X:177:177:DHCP
WWW:X:80:80:WEB
NGINX:X:997:995:NGINX
MYSQL:X:27:27:MYSQL
REDIS:X:1000:1000::/VAR/LIB/REDIS:/BIN/FALSE
ETHEREUM:X:1001:1001::/HOME/ETHEREUM:/BIN/BASH
MONGOD:X:996:991:MONGOD:/VAR/LIB/MONGO:/BIN/FALSE

tolower() 转为小写字母

			
[root@localhost ~]# awk -F '\n' '{print tolower($1)}' /etc/redhat-release 
centos linux release 7.5.1804 (core)

rand() 随机数生成

			
neo@MacBook-Pro ~ % awk 'BEGIN{print rand()*1000000}' 
840188			
neo@MacBook-Pro ~ % awk 'BEGIN{srand(); print rand()}'
0.0334342
neo@MacBook-Pro ~ % awk 'BEGIN{srand(); print rand()*1000000}'
759412

4.8.18.6. 过滤相同的行

grep 'Baiduspider' access.2011-02-22.log | awk '{print $1}' | awk '! a[$0]++'

awk '! a[$0]++' 1.txt >2.txt
这个是删除文件中所有列都重复的记录

awk '! a[$1]++' 1.txt >2.txt
删除文件中第一列重复的记录

awk '! a[$1,$2]++' 1.txt >2.txt
删除文件中第一，二列都重复的记录

4.8.18.7. 数组演示

		
[root@localhost ~]# awk -F ':' 'BEGIN {count=1;} {name[count] = $1;count++;}; END{for (i = 1; i < NR; i++) print i, name[i]}' /etc/passwd
1 root
2 bin
3 daemon
4 adm
5 lp
6 sync
7 shutdown
8 halt
9 mail
10 operator
11 games
12 ftp
13 nobody
14 systemd-network
15 dbus
16 polkitd
17 postfix
18 chrony
19 sshd
20 ntp
21 dhcpd
22 www
23 nginx
24 mysql
25 redis
26 ethereum

4.8.19. sed

http://sed.sourceforge.net/

4.8.19.1. 查找与替换

find and replace

		
sed -n 's/root/admin/p' /etc/passwd
sed -n 's/root/admin/2p' /etc/passwd        				#在每行的第2个root作替换
sed -n 's/root/admin/gp' /etc/passwd
sed -n '1,10 s/root/admin/gp' /etc/passwd
sed -n 's/root/AAA&BBB/2p' /etc/passwd       				#将root替换成AAArootBBB，&作反向引用，代替前面的匹配项
sed -ne 's/root/AAA&BBB/' -ne 's/bash/AAA&BBB/p' /etc/passwd #-e将多个命令连接起来，将root或bash行作替换
sed -n 's/root/AAA&BBB/;s/bash/AAA&BBB/p' /etc/passwd   	#与上命令功能相同
sed -nr 's/(root)(.*)(bash)/\3\2\1/p' /etc/passwd     		#将root与bash位置替换，两标记替换 或sed -n 's/root.∗bash/\3\2\1/p' /etc/passwd

		
ls -1 *.html| awk '{printf "sed \047s/ADDRESS/address/g\047 %s >%s.sed;mv %s.sed %s\n", $1, $1, $1, $1;}'|bash

for f in `ls -1 *.html`; do [ -f $f ] && sed 's/<\/BODY>/<script src="http:\/\/www.google-analytics.com\/urchin.js" type="text\/javascript"><\/script>\n<script type="text\/javascript">\n_uacct = "UA-2033740-1";\nurchinTracker();\n<\/script>\n<\/BODY>/g' $f >$f.sed;mv $f.sed $f ; done;

		
my=/root/dir
str="/root/dir/file1 /root/dir/file2 /root/dir/file3 /root/dir/file/file1"
echo $str | sed "s:$my::g"

正则

sed s/[[:space:]]//g  filename          删除空格

aaa="bbb" 提取bbb

			

$ echo "aaa=\"bbb\"" | sed 's/.*=\"\(.*\)\"/\1/g'
$ curl -s http://www.example.com | egrep -o '<a href="(.*)">.*</a>' | sed -e 's/.*href="\([^"]*\)".*/\1/'

Mac 地址转换

echo 192.168.2.1-a1f4.40c1.5756 | sed -r 's|(.*-)(..)(..).(..)(..).(..)(..)|\1\2:\3:\4:\5:\6:\7|g'

"aaa": "bbb" 提取bbb

数据样本

			
[root@localhost ~]# curl -s https://registry.hub.docker.com/v1/repositories/centos/tags | jq
[
  {
    "layer": "",
    "name": "latest"
  },
  {
    "layer": "",
    "name": "5"
  },
  {
    "layer": "",
    "name": "5.11"
  },
  {
    "layer": "",
    "name": "6"
  },
  {
    "layer": "",
    "name": "6.10"
  },
  {
    "layer": "",
    "name": "6.6"
  },
  {
    "layer": "",
    "name": "6.7"
  },
  {
    "layer": "",
    "name": "6.8"
  },
  {
    "layer": "",
    "name": "6.9"
  },
  {
    "layer": "",
    "name": "7"
  },
  {
    "layer": "",
    "name": "7.0.1406"
  },
  {
    "layer": "",
    "name": "7.1.1503"
  },
  {
    "layer": "",
    "name": "7.2.1511"
  },
  {
    "layer": "",
    "name": "7.3.1611"
  },
  {
    "layer": "",
    "name": "7.4.1708"
  },
  {
    "layer": "",
    "name": "7.5.1804"
  },
  {
    "layer": "",
    "name": "7.6.1810"
  },
  {
    "layer": "",
    "name": "7.7.1908"
  },
  {
    "layer": "",
    "name": "7.8.2003"
  },
  {
    "layer": "",
    "name": "7.9.2009"
  },
  {
    "layer": "",
    "name": "8"
  },
  {
    "layer": "",
    "name": "8.1.1911"
  },
  {
    "layer": "",
    "name": "8.2.2004"
  },
  {
    "layer": "",
    "name": "8.3.2011"
  },
  {
    "layer": "",
    "name": "8.4.2105"
  },
  {
    "layer": "",
    "name": "centos5"
  },
  {
    "layer": "",
    "name": "centos5.11"
  },
  {
    "layer": "",
    "name": "centos6"
  },
  {
    "layer": "",
    "name": "centos6.10"
  },
  {
    "layer": "",
    "name": "centos6.6"
  },
  {
    "layer": "",
    "name": "centos6.7"
  },
  {
    "layer": "",
    "name": "centos6.8"
  },
  {
    "layer": "",
    "name": "centos6.9"
  },
  {
    "layer": "",
    "name": "centos7"
  },
  {
    "layer": "",
    "name": "centos7.0.1406"
  },
  {
    "layer": "",
    "name": "centos7.1.1503"
  },
  {
    "layer": "",
    "name": "centos7.2.1511"
  },
  {
    "layer": "",
    "name": "centos7.3.1611"
  },
  {
    "layer": "",
    "name": "centos7.4.1708"
  },
  {
    "layer": "",
    "name": "centos7.5.1804"
  },
  {
    "layer": "",
    "name": "centos7.6.1810"
  },
  {
    "layer": "",
    "name": "centos7.7.1908"
  },
  {
    "layer": "",
    "name": "centos7.8.2003"
  },
  {
    "layer": "",
    "name": "centos7.9.2009"
  },
  {
    "layer": "",
    "name": "centos8"
  },
  {
    "layer": "",
    "name": "centos8.1.1911"
  },
  {
    "layer": "",
    "name": "centos8.2.2004"
  },
  {
    "layer": "",
    "name": "centos8.3.2011"
  },
  {
    "layer": "",
    "name": "centos8.4.2105"
  }
]

提取方法

			
[root@localhost ~]# curl -s https://registry.hub.docker.com/v1/repositories/centos/tags | sed 's/}/}\n/g' | sed -e 's/.*"name": "\([^#]*\)".*/\1/'
latest
5
5.11
6
6.10
6.6
6.7
6.8
6.9
7
7.0.1406
7.1.1503
7.2.1511
7.3.1611
7.4.1708
7.5.1804
7.6.1810
7.7.1908
7.8.2003
7.9.2009
8
8.1.1911
8.2.2004
8.3.2011
8.4.2105
centos5
centos5.11
centos6
centos6.10
centos6.6
centos6.7
centos6.8
centos6.9
centos7
centos7.0.1406
centos7.1.1503
centos7.2.1511
centos7.3.1611
centos7.4.1708
centos7.5.1804
centos7.6.1810
centos7.7.1908
centos7.8.2003
centos7.9.2009
centos8
centos8.1.1911
centos8.2.2004
centos8.3.2011
centos8.4.2105

首字母大写

			
$ cat /etc/passwd | cut -d: -f1 | sed 's/\b[a-z]/\U&/g'
Root
Daemon
Bin
Sys
Sync
Games
Man
Lp
Mail
News
Uucp
Proxy
Www-Data
Backup
List
Irc
Gnats
Nobody
Libuuid
Syslog
Messagebus
Whoopsie
Landscape
Sshd
Neo
Ntop
Redis
Postgres
Colord
Mysql
Zookeeper

4.8.19.2. insert 插入字符

i 命令插入一行，并且在当前行前面有两个空格

在root行前插入一个admin

sed '/root/i admin' /etc/passwd

33 行处插入字符

sed -i "33 i \ \ authorization: enabled" /etc/mongod.conf

4.8.19.3. 追加字符

在root行后追加一个admin行

sed '/root/a admin' /etc/passwd

4.8.19.4. 修改字符

将root行替换为admin

sed '/root/c admin' /etc/passwd

4.8.19.5. 删除字符

删除含有root的行

sed '/root/d' /etc/passwd

delete

删除空行

sed /^$/d         filename
sed '/./!d' filename

4.8.19.6. 行操作

模式空间中的内容全部打印出来

定位行：

sed -n '12,~3p' pass #从第12行开始，直到下一个3的倍数行（12-15行）
sed -n '12,+4p' pass #从第12行开始，连续4行（12-16行）
sed -n '12~3p' pass #从第12行开始，间隔3行输出一次（12，15，18，21...）
sed -n '10,$p' pass   #从第10行至结尾
sed -n '4!p' pass   #除去第4行

打印3~6行间的内容

$ sed -n '3,6p' /etc/passwd
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin

打印35行至行尾

$ sed -n '35,$p' /etc/passwd
sshd:x:116:65534::/var/run/sshd:/usr/sbin/nologin
mysql:x:117:126:MySQL Server,,,:/nonexistent:/bin/false
uuidd:x:100:101::/run/uuidd:/bin/false
libvirt-qemu:x:118:128:Libvirt Qemu,,,:/var/lib/libvirt:/bin/false
libvirt-dnsmasq:x:119:129:Libvirt Dnsmasq,,,:/var/lib/libvirt/dnsmasq:/bin/false
redis:x:120:130::/var/lib/redis:/bin/false

4.8.19.7. 编辑文件

-i[SUFFIX], --in-place[=SUFFIX]
                 edit files in place (makes backup if extension supplied)

下面例子是替换t.php中的java字符串为php

		
$ cat t.php
<?java

$ sed -i 's/java/php/g' t.php

$ cat t.php
<?php

		
find -name "*.php" -exec sed -i '/<?.*eval(gzinflate(base64.*?>/ d' '{}' \; -print

指定查找替换的行号

		
sed -i "7,7 s/#server.host: \"localhost\"/server.host: \"0.0.0.0\"/" /etc/kibana/kibana.yml

4.8.19.8. 正则表达式

正则：'/正则式/'

		
sed -n '/root/p' /etc/passwd
sed -n '/^root/p' /etc/passwd
sed -n '/bash$/p' /etc/passwd
sed -n '/ro.t/p' /etc/passwd
sed -n '/ro*/p' /etc/passwd
sed -n '/[ABC]/p' /etc/passwd
sed -n '/[A-Z]/p' /etc/passwd
sed -n '/[^ABC]/p' /etc/passwd
sed -n '/^[^ABC]/p' /etc/passwd
sed -n '/\<root/p' /etc/passwd
sed -n '/root\>/p' /etc/passwd

扩展正则：

		
		
sed -n '/root\|yerik/p' /etc/passwd #拓展正则需要转义
sed -nr '/root|yerik/p' /etc/passwd #加-r参数支持拓展正则
sed -nr '/ro(ot|ye)rik/p' /etc/passwd #匹配rootrik和royerik单词
sed -nr '/ro?t/p' /etc/passwd   #?匹配0-1次前导字符
sed -nr '/ro+t/p' /etc/passwd   #匹配1-n次前导字符
sed -nr '/ro{2}t/p' /etc/passwd   #匹配2次前导字符
sed -nr '/ro{2,}t/p' /etc/passwd   #匹配多于2次前导字符
sed -nr '/ro{2，4}t/p' /etc/passwd #匹配2-4次前导字符
sed -nr '/(root)*/p' /etc/passwd   #匹配0-n次前导单词

4.8.19.9. 管道操作

		
cat <<! | sed '/aaa=\(bbb\|ccc\|ddd\)/!s/\(aaa=\).*/\1xxx/'
> aaa=bbb
> aaa=ccc
> aaa=ddd
> aaa=[something else]
!
aaa=bbb
aaa=ccc
aaa=ddd
aaa=xxx

4.8.19.10. 字母大小写转换

		
[root@localhost ~]# echo "netkiller" | sed 's/[a-z]/\u&/g'
NETKILLER

[root@localhost ~]# echo "NETKILLER" | sed 's/[A-Z]/\l&/g'
netkiller

4.8.19.11. perl

sed -i -e 's/aaa/bbb/g' *
perl -p -i -e 's/aaa/bbb/g' *

4.8.19.12. 案例

HTML 转文本

			
# Remove HTML Tags from a File in Linux
sed 's/<[^>]*>//g ; /^$/d' htmlpage.html
			
# Convert HTML to Text in Linux			
sed 's/<[^>]*>//g ; /^$/d' htmlpage.html > output.txt

上一页	上一级	下一页
4.7. 数值与运算	起始页	4.9. 表格操作/行列转换