Linux text processing tools – Part 1

This is my attempt to get familiar with the various small yet powerful text processing tools in linux. I am also noting down several oneliners that i found elsewhere.

This is NOT written as a tutorial. For details on usage and options check man pages

tr – tanslate or delete characters

Usage: tr [OPTIONS] SET1 [SET2]

Options include -d (delete), -s (squeeze), -c (complement) etc.

tr supports certain sequences like, [:alnum:],[:alpha:],[:blank:],[:lower:],[:upper:] etc and regular expressions for SET

[tony@localhost tmp]$ tr g e
grgat
great   #pressing CTRL+D quits
[tony@localhost tmp]$ echo tgst | tr g e
test   #adding quotes around SETs are recommended
[tony@localhost tmp]$ echo taxt | tr 'xa' 'se'
test   #Note the translation, x to s and a to e
[tony@localhost tmp]$ echo abCDef | tr '[:lower:]' '[:upper:]'
ABCDEF
[tony@localhost tmp]$tr A-Z a-z
abCDe
abcde
[tony@localhost tmp]$ echo test | tr -d 'e'
tst
[tony@localhost tmp]$ echo abc44jk | tr -d  0-9
abcjk
[tony@localhost tmp]$ echo 12d34sS5c | tr -d  a-z
12345
[tony@localhost tmp]$ echo abCDef | tr -d  [:upper:]
abef
[tony@localhost tmp]$ tr -cd '[:alnum:]'   #strip all symbols
[tony@localhost tmp]$ echo ttest | tr -s 't' 
test
[tony@localhost tmp]$ echo tteesttt | tr -s 'et' 
test
[tony@localhost ~]$ tr -s '\n' #strip empty lines. 
[tony@localhost tmp]$ echo test | tr -c 'e' 'g'
geggg 
[tony@localhost tmp]$ tr -cs 'a-zA-Z0-9_' '\n'< source.c
   #extract keywords, identifiers and comments from C source. ofcourse not perfect!
[tony@localhost tmp]$ tr '[:lower:]' '[:upper:]' <small.txt  >big.txt
[tony@localhost tmp]$ tr -d '\r' <windows.txt  >linux.txt
[tony@localhost tmp]$ tr '\r' '\n' <mac.txt  >linux.txt

 

--------------------

nl - number lines

[tony@localhost tmp]$ cat > list.txt
aaa
bbb

ccc
[tony@localhost tmp]$ nl list.txt 
     1	aaa
     2	bbb

     3	ccc
[tony@localhost tmp]$ nl -ba list.txt 
     1	aaa # -b is numbering syle
     2	bbb
     3	
     4	ccc
[tony@localhost tmp]$ nl -i10 -nrz -s:: -v10 -w4 list.txt 
0010::aaa  #-i increment, -s separator, -v start no, -w width, -n is Format
0020::bbb
      
0030::ccc

 

----------------------

wc - print line, word, and byte counts

[tony@localhost tmp]$ wc list.txt list2.txt 
 3  3 12 list.txt   # the counts are line, word, bytes
 4  3 13 list2.txt
 7  6 25 total
[tony@localhost tmp]$ wc -m list.txt
12 list.txt   #character count
[tony@localhost tmp]$ wc -L sample.txt
82 sample.txt   #longest line length
[tony@localhost tmp]$ ls -1 | wc -l
13   #no of files in current directory
[tony@localhost tmp]$ ps -e | wc -l
178   #no of processes running
[tony@localhost tmp]$ cat /etc/group | wc -l
72

 

----------------------

cat - concatenate or write files, text or binary

[tony@localhost tmp]$ cat > list.txt
aaa

bbb
ccc   #press CTRL+D to stop inputting and save file
[tony@localhost tmp]$ cat list.txt 
aaa

bbb
ccc
[tony@localhost tmp]$ cat f1 f2
aaa
bbb
[tony@localhost tmp]$ cat f1 f2 > f3
[tony@localhost tmp]$ cat f3
aaa
bbb
[tony@localhost tmp]$ cat >> f3
ddd   #append input to file
[tony@localhost tmp]$ cat f1 - f2 > f3 
this is what i entered in the middle
[tony@localhost tmp]$ cat f3
aaa
this is what i entered in the middle
bbb
[tony@localhost tmp]$ cat video.001 video.002 video.003 > vid.avi
  # joining split binary files 

 

----------------------------

tac - concatenate or print in reverse (last line first)

[tony@localhost tmp]$ tac > f1
1
2   #press CTRL+D to stop
[tony@localhost tmp]$ tac f1
1
2
[tony@localhost tmp]$ tac vid.avi > vid2.avi ; mv vid2.avi vid.avi #it wont play
[tony@localhost tmp]$ tac vid.avi > vid2.avi ; mv vid2.avi vid.avi # back to playable. 
  #This is like a mini encryption for binary files.

 

----------------------------

rev — reverse lines of a file or files

[tony@localhost tmp]$ echo this is to be reversed | rev
desrever eb ot si siht
[tony@localhost ~]$ rev .bash_profile 
eliforp_hsab. #

snoitcnuf dna sesaila eht teG #
neht ;] crhsab./~ f- [ fi
crhsab./~ .	
if

smargorp putrats dna tnemnorivne cificeps resU #

nib/EMOH$:HTAP$=HTAP

HTAP tropxe

 

----------------------------

cut - remove sections from each line of files

cut can return section based on no: of bytes (-b), characters (-c), or fields (-f) when fields are separated by a delimiter(-d). Default delimiter is tab.

A range must be provided in each case which consists of one of N, N-M, N-(N to last) or -M (first to M)

[tony@localhost tmp]# cut -c 2-4 
abcdef #press CTRL+D to stop inputting
bcd
[tony@localhost tmp]# cut -c 3
abcdef
c
[tony@localhost tmp]$ cut -c 2,4,7
alongtext 
lne
[tony@localhost tmp]# cut -c -2
abcdef
ab
[tony@localhost tmp]# cut -c 2-
abcdef
bcdef
[tony@localhost tmp]$ cut -c 1,6-9,16-
alongtextwithnospaces
atextspaces
[tony@localhost tmp]# cut -f 2- -d ':'
23:34:45:56 # -d specifies delimiter
34:45:56
[tony@localhost tmp]$ cut -f 2
er rt fg wd ji      
er rt fg wd ji   #cut didnt find the delimiter (default is tab) 
   #so returns whole line
[tony@localhost tmp]$ cut -f 2 -s
er rt fg wd ji    #cut wont print as -s flag is used to
   # prevent printing when delimiter not found.
[tony@localhost tmp]$ cut -d: -f1 /etc/passwd >users.txt

 

-----------------------------

Continued on Linux text processing tools - Part 2

Leave a Reply

Your email address will not be published. Required fields are marked *


four − = 1

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>