This is my attempt to get familiar with the various small yet powerful text processing tools in linux. I am also noting down several oneliners that i found elsewhere.
This is NOT written as a tutorial. For details on usage and options check man pages
tr
– tanslate or delete characters
Usage: tr [OPTIONS] SET1 [SET2]
Options include -d
(delete), -s
(squeeze), -c
(complement) etc.
tr
supports certain sequences like, [:alnum:],[:alpha:],[:blank:],[:lower:],[:upper:]
etc and regular expressions for SET
[tony@localhost tmp]$ tr g e grgat great #pressing CTRL+D quits [tony@localhost tmp]$ echo tgst | tr g e test #adding quotes around SETs are recommended [tony@localhost tmp]$ echo taxt | tr 'xa' 'se' test #Note the translation, x to s and a to e [tony@localhost tmp]$ echo abCDef | tr '[:lower:]' '[:upper:]' ABCDEF [tony@localhost tmp]$tr A-Z a-z abCDe abcde
[tony@localhost tmp]$ echo test | tr -d 'e' tst [tony@localhost tmp]$ echo abc44jk | tr -d 0-9 abcjk [tony@localhost tmp]$ echo 12d34sS5c | tr -d a-z 12345 [tony@localhost tmp]$ echo abCDef | tr -d [:upper:] abef [tony@localhost tmp]$ tr -cd '[:alnum:]' #strip all symbols
[tony@localhost tmp]$ echo ttest | tr -s 't' test [tony@localhost tmp]$ echo tteesttt | tr -s 'et' test [tony@localhost ~]$ tr -s '\n' #strip empty lines.
[tony@localhost tmp]$ echo test | tr -c 'e' 'g' geggg [tony@localhost tmp]$ tr -cs 'a-zA-Z0-9_' '\n'< source.c #extract keywords, identifiers and comments from C source. ofcourse not perfect!
[tony@localhost tmp]$ tr '[:lower:]' '[:upper:]' <small.txt >big.txt [tony@localhost tmp]$ tr -d '\r' <windows.txt >linux.txt [tony@localhost tmp]$ tr '\r' '\n' <mac.txt >linux.txt
--------------------
nl
- number lines
[tony@localhost tmp]$ cat > list.txt aaa bbb ccc [tony@localhost tmp]$ nl list.txt 1 aaa 2 bbb 3 ccc [tony@localhost tmp]$ nl -ba list.txt 1 aaa # -b is numbering syle 2 bbb 3 4 ccc [tony@localhost tmp]$ nl -i10 -nrz -s:: -v10 -w4 list.txt 0010::aaa #-i increment, -s separator, -v start no, -w width, -n is Format 0020::bbb 0030::ccc
----------------------
wc
- print line, word, and byte counts
[tony@localhost tmp]$ wc list.txt list2.txt 3 3 12 list.txt # the counts are line, word, bytes 4 3 13 list2.txt 7 6 25 total [tony@localhost tmp]$ wc -m list.txt 12 list.txt #character count [tony@localhost tmp]$ wc -L sample.txt 82 sample.txt #longest line length [tony@localhost tmp]$ ls -1 | wc -l 13 #no of files in current directory [tony@localhost tmp]$ ps -e | wc -l 178 #no of processes running [tony@localhost tmp]$ cat /etc/group | wc -l 72
----------------------
cat
- concatenate or write files, text or binary
[tony@localhost tmp]$ cat > list.txt aaa bbb ccc #press CTRL+D to stop inputting and save file [tony@localhost tmp]$ cat list.txt aaa bbb ccc [tony@localhost tmp]$ cat f1 f2 aaa bbb [tony@localhost tmp]$ cat f1 f2 > f3 [tony@localhost tmp]$ cat f3 aaa bbb [tony@localhost tmp]$ cat >> f3 ddd #append input to file [tony@localhost tmp]$ cat f1 - f2 > f3 this is what i entered in the middle [tony@localhost tmp]$ cat f3 aaa this is what i entered in the middle bbb [tony@localhost tmp]$ cat video.001 video.002 video.003 > vid.avi # joining split binary files
----------------------------
tac
- concatenate or print in reverse (last line first)
[tony@localhost tmp]$ tac > f1 1 2 #press CTRL+D to stop [tony@localhost tmp]$ tac f1 1 2 [tony@localhost tmp]$ tac vid.avi > vid2.avi ; mv vid2.avi vid.avi #it wont play [tony@localhost tmp]$ tac vid.avi > vid2.avi ; mv vid2.avi vid.avi # back to playable. #This is like a mini encryption for binary files.
----------------------------
rev
— reverse lines of a file or files
[tony@localhost tmp]$ echo this is to be reversed | rev desrever eb ot si siht [tony@localhost ~]$ rev .bash_profile eliforp_hsab. # snoitcnuf dna sesaila eht teG # neht ;] crhsab./~ f- [ fi crhsab./~ . if smargorp putrats dna tnemnorivne cificeps resU # nib/EMOH$:HTAP$=HTAP HTAP tropxe
----------------------------
cut
- remove sections from each line of files
cut
can return section based on no: of bytes (-b
), characters (-c
), or fields (-f
) when fields are separated by a delimiter(-d
). Default delimiter is tab.
A range must be provided in each case which consists of one of N, N-M, N-
(N
to last) or -M
(first to M
)
[tony@localhost tmp]# cut -c 2-4 abcdef #press CTRL+D to stop inputting bcd [tony@localhost tmp]# cut -c 3 abcdef c [tony@localhost tmp]$ cut -c 2,4,7 alongtext lne [tony@localhost tmp]# cut -c -2 abcdef ab [tony@localhost tmp]# cut -c 2- abcdef bcdef [tony@localhost tmp]$ cut -c 1,6-9,16- alongtextwithnospaces atextspaces
[tony@localhost tmp]# cut -f 2- -d ':' 23:34:45:56 # -d specifies delimiter 34:45:56 [tony@localhost tmp]$ cut -f 2 er rt fg wd ji er rt fg wd ji #cut didnt find the delimiter (default is tab) #so returns whole line [tony@localhost tmp]$ cut -f 2 -s er rt fg wd ji #cut wont print as -s flag is used to # prevent printing when delimiter not found. [tony@localhost tmp]$ cut -d: -f1 /etc/passwd >users.txt
-----------------------------
Continued on Linux text processing tools - Part 2