Linux text processing tools – Part 2

This is my attempt to get familiar with the various text processing tools in Linux. I am also noting down several oneliners that i found elsewhere.
Its a continuation of my earlier post Linux text processing tools – Part 1

This is NOT written as a tutorial. For details on usage and options check man pages.

sort – sort lines of text files

Sorting is done based on one or more sort keys extracted from each line of input.
If no sort keys are specified, the entire line is taken as key.

some of the options for sort are,
-f (ignore case), -b (ignore leading blanks),-n (numerically sort), -r (reverse), -u (unique), -tx (where x is the delimiter), -k (used to specify keys) -o (output file)

[tony@localhost tmp]$ sort List.txt 
[tony@localhost tmp]$ sort -n age.txt
[tony@localhost tmp]$ sort -bn
  14   #press CTRL+D to stop inputting
[tony@localhost tmp]$ sort -c list.txt
sort: list.txt:2: disorder: aaa   #check whether sorted
[tony@localhost tmp]$ sort -m list.txt list2.txt 
   #merge sorted files, each file must be sorted previously.

[tony@localhost tmp]$ sort list.txt -o list.txt     #sorts and saves in the same file
[tony@localhost tmp]$ sort -R list.txt -o list.txt    #shuffle a list of lines

sort keys(or fields) are specified using the -k option with the syntax
-k m[,n] (start at field m, end at field n including it, or the end if n is omitted).
By default blank is used as the field separator. It can also be specified using -tx

Note that several of the options available can also be specified along with the key.

[tony@localhost tmp]$ sort -k 3b   
   #sort using key from 3rd field to end of line. #Ignore any blanks preceding 3rd field.
[tony@localhost tmp]$ sort -k2n,2   
   #this will sort numerically based on the 2nd field.

[tony@localhost tmp]$ sort -t : -k 2,2n -k 5.3,5.4 
   #Sort numerically on the second field and then sort alphabetically on the third 
   #and fourth characters of field five to break tie. Use `:' as the field delimiter.

[tony@localhost tmp]$ sort -n -t . -k 1,1 -k 2,2 -k 3,3 -k 4,4 IP.txt
   #sort a list of IPv4 addresses.

Note: if you sometimes see strange sorting order(check below example), this is usually bcoz linux locale setting set to a non-POSIX locale. You could prefix sort (or any command) with ‘LC_ALL=c‘ to get the POSIX order.

[tony@localhost tmp]$ sort list
1 1
2 2
[tony@localhost tmp]$ LC_ALL=c sort list
1 1
2 2



shuf – shuffle lines

[tony@localhost tmp]$ sort LIST | shuf
333   #shuf shuffles our sorted LIST
[tony@localhost tmp]$ shuf -i 1-4
2   #generates a shuffled list of numbers
[tony@localhost tmp]$ shuf -e clubs hearts diamonds spades
[tony@localhost tmp]$ shuf LIST -o LIST   #in place save



uniq – Uniquify files

uniq discard adjacent duplicate lines. Non adjacent duplicate lines are not discarded. To discard non adjacent duplicate lines the file must be sorted before or we could use the sort -u command.

[tony@localhost tmp]$ cat list
[tony@localhost tmp]$ uniq list
[tony@localhost tmp]$ uniq -c list
      1 aaa   #prefixes repetition count
      2 bbb
      1 ccc
      1 bbb
[tony@localhost tmp]$ uniq -d list
bbb   #only repeated lines
[tony@localhost tmp]$ uniq -i list2
ccc   #ignores case when comparing

Fields or characters can be ignored before comparison, by using the -fn and -sn options.

[tony@localhost tmp]$ cat > list2
aa bb
ad bb
ad cc
ad dd
[tony@localhost tmp]$ uniq -f 1 list2
aa bb   #we skipped first field from comparison check
ad cc
ad dd
[tony@localhost tmp]$ uniq -s 4 list2
aa bb   #we skipped the first 4 characters
ad cc
ad dd
[tony@localhost tmp]$ uniq -s 2 -w 1 list2
aa bb   #-w option specifies number of characters to compare after
   # any characters and fields have been skipped

[tony@localhost tmp]$ sort file | uniq -c | sort -n
#displays the unique lines along with the number of times they occur



comm -: Compare two sorted files line by line

Comm output 3 columns, the first two columns contain lines unique to the first and second file, respectively. The last column contains lines common to both.
The columns are separated by tabs. The files must be sorted before.

Option -1,-2.-3 can be specified to remove the corresponding column from output.

[tony@localhost tmp]$ cat > list
bbb   # press CTRL+D to stop inputting
[tony@localhost tmp]$ cat > list2
[tony@localhost tmp]$ comm list list2
		bbb   #bbb is common to both
	bbb           #this bbb is only in list2
[tony@localhost tmp]$ comm -32 list list2
aaa   #only the lines unique in first file.



head – Output the first part (10 lines) of files

supports the options -nK, -cK where K is the number of lines or bytes to be printed.

[tony@localhost ~]$ head -n5 .bash_history #prints 1st 5 lines
su -
/etc/init.d/NetworkManager restart
/etc/init.d/NetworkManager status
tail -f /var/log/messages 
[tony@localhost ~]$ head -c10 .bash_history   #1st 10 bytes
su -
[tony@localhost ~]$ head -n-5 .bash_history #prints all but last 5 lines



tail – Output the last part(10 lines) of files

tail supports -nk and -ck as in the case of head.
tail -f periodically (default 1sec) checks to read from the end of the line

[root@localhost ~]# tail -f /var/log/messages
   # is used to scan system messages especially when isolating a problem


split – Split a file into pieces

Usage: split [OPTION] [INPUT [PREFIX]]

Split is generally used to split a file based on lines (-l LINES) or bytes (-b SIZE)
By default files are split by 1000 lines. The files are named by appending aa, ab, ac, etc. to PREFIX (if ommitted x is used)

[tony@localhost tmp]$ cat > file
kkk   # Press CTRL+D to stop inputting
[tony@localhost tmp]$ split -l2 file   #split into files with 2 lines
[tony@localhost tmp]$ head xa?   # viewing each of them
==> xaa <==

==> xab <==

==> xac <==

[tony@localhost tmp]$ split -b100KB thriller.ogg part 
   # splitting music file by size 100KB
[tony@localhost tmp]$ ls -l parta?
-rw-rw-r--. 1 tony tony 100000 Oct 12 14:47 partaa
-rw-rw-r--. 1 tony tony 100000 Oct 12 14:47 partab
-rw-rw-r--. 1 tony tony   9305 Oct 12 14:47 partac
[tony@localhost tmp]$ cat parta? > thriller2.ogg
   # Joining them back



paste – merge lines of files

Paste joins file by horizontally by outputting lines consisting of the sequentially corresponding lines of each file specified.

[tony@localhost tmp]$ cat > Name
tom   # Press CTRL+D to stop inputting
[tony@localhost tmp]$ cat > Age
[tony@localhost tmp]$ paste Name Age
john	20
tom	19

-s opton pastes the lines of one file at a time rather than one line from each file.

[tony@localhost tmp]$ paste -s Name Age
john	tom
20	19

a delimiter list can be specified using -d

[tony@localhost tmp]$ paste -d ':-' Name Age Name



join – join lines of two files on a common field

Usage: join [OPTION] FILE1 FILE2

Join outputs a line for each pair of input lines with identical join fields. Each output line consists of the join field, the remaining fields from FILE1, then the remaining fields from FILE2.
The default join field is the first, delimited by whitespace and leading blanks on the line are ignored;
Both files have to be sorted on the join field.

[tony@localhost tmp]$ cat > j1
a ac
b bc
c cd   # Press CTRL+D to stop inputting
[tony@localhost tmp]$ cat > j2
a acc
b bcc
c cdd
[tony@localhost tmp]$ join j1 j2
a ac acc
b bc bcc
c cd cdd
[tony@localhost tmp]$ cat >> j2
c dcccc
[tony@localhost tmp]$ join j1 j2
a ac acc
b bc bcc
c cd cdd
c cd dcccc

Join supports options like the (-1 N) join field in first file ,(-2 N) join field in sec file, (-i) ignore case, (-t x) separator in output, (-v N) print non joined lines of file 1[or 2]

[tony@localhost tmp]$ join -1 2 -2 1 -t '-' file1 file2
   # this joins based on 2nd field on first file and 1st field of sec file and uses - as separator in output



Linux text processing tools – Part 1

This is my attempt to get familiar with the various small yet powerful text processing tools in linux. I am also noting down several oneliners that i found elsewhere.

This is NOT written as a tutorial. For details on usage and options check man pages

tr – tanslate or delete characters

Usage: tr [OPTIONS] SET1 [SET2]

Options include -d (delete), -s (squeeze), -c (complement) etc.

tr supports certain sequences like, [:alnum:],[:alpha:],[:blank:],[:lower:],[:upper:] etc and regular expressions for SET

[tony@localhost tmp]$ tr g e
great   #pressing CTRL+D quits
[tony@localhost tmp]$ echo tgst | tr g e
test   #adding quotes around SETs are recommended
[tony@localhost tmp]$ echo taxt | tr 'xa' 'se'
test   #Note the translation, x to s and a to e
[tony@localhost tmp]$ echo abCDef | tr '[:lower:]' '[:upper:]'
[tony@localhost tmp]$tr A-Z a-z
[tony@localhost tmp]$ echo test | tr -d 'e'
[tony@localhost tmp]$ echo abc44jk | tr -d  0-9
[tony@localhost tmp]$ echo 12d34sS5c | tr -d  a-z
[tony@localhost tmp]$ echo abCDef | tr -d  [:upper:]
[tony@localhost tmp]$ tr -cd '[:alnum:]'   #strip all symbols
[tony@localhost tmp]$ echo ttest | tr -s 't' 
[tony@localhost tmp]$ echo tteesttt | tr -s 'et' 
[tony@localhost ~]$ tr -s '\n' #strip empty lines. 
[tony@localhost tmp]$ echo test | tr -c 'e' 'g'
[tony@localhost tmp]$ tr -cs 'a-zA-Z0-9_' '\n'< source.c
   #extract keywords, identifiers and comments from C source. ofcourse not perfect!
[tony@localhost tmp]$ tr '[:lower:]' '[:upper:]' <small.txt  >big.txt
[tony@localhost tmp]$ tr -d '\r' <windows.txt  >linux.txt
[tony@localhost tmp]$ tr '\r' '\n' <mac.txt  >linux.txt



nl - number lines

[tony@localhost tmp]$ cat > list.txt

[tony@localhost tmp]$ nl list.txt 
     1	aaa
     2	bbb

     3	ccc
[tony@localhost tmp]$ nl -ba list.txt 
     1	aaa # -b is numbering syle
     2	bbb
     4	ccc
[tony@localhost tmp]$ nl -i10 -nrz -s:: -v10 -w4 list.txt 
0010::aaa  #-i increment, -s separator, -v start no, -w width, -n is Format



wc - print line, word, and byte counts

[tony@localhost tmp]$ wc list.txt list2.txt 
 3  3 12 list.txt   # the counts are line, word, bytes
 4  3 13 list2.txt
 7  6 25 total
[tony@localhost tmp]$ wc -m list.txt
12 list.txt   #character count
[tony@localhost tmp]$ wc -L sample.txt
82 sample.txt   #longest line length
[tony@localhost tmp]$ ls -1 | wc -l
13   #no of files in current directory
[tony@localhost tmp]$ ps -e | wc -l
178   #no of processes running
[tony@localhost tmp]$ cat /etc/group | wc -l



cat - concatenate or write files, text or binary

[tony@localhost tmp]$ cat > list.txt

ccc   #press CTRL+D to stop inputting and save file
[tony@localhost tmp]$ cat list.txt 

[tony@localhost tmp]$ cat f1 f2
[tony@localhost tmp]$ cat f1 f2 > f3
[tony@localhost tmp]$ cat f3
[tony@localhost tmp]$ cat >> f3
ddd   #append input to file
[tony@localhost tmp]$ cat f1 - f2 > f3 
this is what i entered in the middle
[tony@localhost tmp]$ cat f3
this is what i entered in the middle
[tony@localhost tmp]$ cat video.001 video.002 video.003 > vid.avi
  # joining split binary files 



tac - concatenate or print in reverse (last line first)

[tony@localhost tmp]$ tac > f1
2   #press CTRL+D to stop
[tony@localhost tmp]$ tac f1
[tony@localhost tmp]$ tac vid.avi > vid2.avi ; mv vid2.avi vid.avi #it wont play
[tony@localhost tmp]$ tac vid.avi > vid2.avi ; mv vid2.avi vid.avi # back to playable. 
  #This is like a mini encryption for binary files.



rev — reverse lines of a file or files

[tony@localhost tmp]$ echo this is to be reversed | rev
desrever eb ot si siht
[tony@localhost ~]$ rev .bash_profile 
eliforp_hsab. #

snoitcnuf dna sesaila eht teG #
neht ;] crhsab./~ f- [ fi
crhsab./~ .	

smargorp putrats dna tnemnorivne cificeps resU #


HTAP tropxe



cut - remove sections from each line of files

cut can return section based on no: of bytes (-b), characters (-c), or fields (-f) when fields are separated by a delimiter(-d). Default delimiter is tab.

A range must be provided in each case which consists of one of N, N-M, N-(N to last) or -M (first to M)

[tony@localhost tmp]# cut -c 2-4 
abcdef #press CTRL+D to stop inputting
[tony@localhost tmp]# cut -c 3
[tony@localhost tmp]$ cut -c 2,4,7
[tony@localhost tmp]# cut -c -2
[tony@localhost tmp]# cut -c 2-
[tony@localhost tmp]$ cut -c 1,6-9,16-
[tony@localhost tmp]# cut -f 2- -d ':'
23:34:45:56 # -d specifies delimiter
[tony@localhost tmp]$ cut -f 2
er rt fg wd ji      
er rt fg wd ji   #cut didnt find the delimiter (default is tab) 
   #so returns whole line
[tony@localhost tmp]$ cut -f 2 -s
er rt fg wd ji    #cut wont print as -s flag is used to
   # prevent printing when delimiter not found.
[tony@localhost tmp]$ cut -d: -f1 /etc/passwd >users.txt



Continued on Linux text processing tools - Part 2

Hybrid graphics in Sony VAIO and Fedora 15

This a report on how i solved radeon driver related booting problem and automatically switched OFF the Radeon card of my Sony VAIO C series laptop in Fedora 15. This laptop has a AMD Radeon and Intel combination.
The steps may be somewhat similar for other laptops with ATI/Intel combinations in Fedora.

NOTE: This method disables the AMD Radeon card upon booting. The aim is to increase battery life(about 1.5 hrs in mine) and reduce the heat. I dont use any graphic intensive applications in linux, so the Radeon is better switched OFF. If you want 3D acceleration or looking for a method for hot switching of cards, you have no luck here.

Fedora 15 comes with a open source radeon driver(not catalyst driver) and includes vgaswitcheroo a mechanism to support switch ON/OFF cards and login out/login card switching.

Upon booting, if the kernel detects hybrid graphics, a directory /sys/kernel/debug/vgaswitcheroo will be created with a file named switch.

The following commands are available,
cat switch – Displays the status of the cards.

[root@localhost ~]# cd /sys/kernel/debug/vgaswitcheroo/
[root@localhost vgaswitcheroo]# cat switch
1:DIS: :Off:0000:01:00.0

IGD – integrated ie Intel, DIS – discrete ie Radeon, Pwr – power ON, Off – power OFF, and + sign indicates which card if active.

echo OFF > switch – Turns OFF the inactive one.
echo ON > switch – Turns ON the card that is OFF
echo DIS[IGD] > switch – Switches the discrete card [or the IGD] active. You need to logout and login for this to take effect.

GNOME 3 doesnt load in my system with discrete card. So switching to discrete is pretty useless at the moment.

Note that vgaswitcheroo is gone if you have installed the proprietary ‘catalyst’ drivers either using RPMFusion or using the binaries provided at the AMD site.

Booting issue
I often had problem while booting with the system hangs at “dracut: starting plymouth daemon“.

I guess this was due to radeon driver. Not sure though.

The trick was to ‘blacklist’ the radeon driver, and load it once the system has finished booting. And then switch OFF the Radeon card. It worked for me.

You have to do this step only if you have problems during booting caused by the radeon driver. Else skip to here.

Add ‘blacklist radeon’ to the /etc/modprobe.d/blacklist.conf. This will cause the radeon driver not to load during boot.

[root@localhost ~]#echo blacklist radeon >> /etc/modprobe.d/blacklist.conf

Backup and then rebuild the initramfs.

[root@localhost ~]# mv /boot/initramfs-$(uname -r).img{,.bak}
[root@localhost ~]# dracut -f /boot/initramfs-$(uname -r).img $(uname -r)

The dracut command may take a while to finish, and may report some warnings.

If you want, you can reboot now and see whether your issue at boot has resolved.

To load the radeon driver automatically once the boot is fine. We add the script to rc.local.

[root@localhost ~]# echo modprobe radeon >> /etc/rc.local

Note that Radeon is ON after booting whether driver is loaded or not. And radeon driver must be loaded to switch OFF the card,


Add the script to switch OFF the Radeon automatically after booting.

[root@localhost ~]# echo "echo OFF > /sys/kernel/debug/vgaswitcheroo/switch" >> /etc/rc.local

Suspend/Hibernate Issue
After a suspend or hibernate, the Radeon card gets switched ON. However, cat /sys/kernel/debug/vgaswitcheroo/switch wrongly shows the Discrete card as OFF. This can be confirmed by checking the remaining battery time or system temperature.

Now to switch the card OFF again, do

[root@localhost ~]# echo ON > /sys/kernel/debug/vgaswitcheroo/switch
[root@localhost ~]# echo OFF > /sys/kernel/debug/vgaswitcheroo/switch

== To Do: automate this ==

To confirm whether the Radeon is switched OFF
One method is to check the remaining battery time before and after switching OFF. There should be a 1-2hrs difference.

Another method i use, is to check the system temperature by installing lm-sensors

[root@localhost ~]# yum install lm_sensors
[root@localhost ~]# sensors
Adapter: Virtual device
temp1:        +42.0°C  (crit = +96.0°C)
temp2:        +42.0°C  (crit = +96.0°C)

Adapter: PCI adapter
temp1:       -128.0°C

Here, the ‘-‘ value indicates that the card is OFF.
A system temperature > 50C at idle also means Radeon ON in my system.

Thanks to,

doskey @

Tags: vgaswitcheroo howto, switchable graphics in fedora 15, turn off AMD/ATI, Hybrid graphics in fedora, switch off dedicated card in fedora, Sony VAIO VPCCB15FG

Accessing the Linux Virtual Console from X

I wanted to copy some text from my Virtual Console (Ctrl + Alt +F[1-6]).
Each of the Virtual Console have a corresponding /dev/vcs[1-6] file.

So doing a cat /dev/vcs2 you get the screen dump of Virtual Console 2.
Note that the output does not contain newline characters, so not properly formatted.

Now the /dev/ttyX files can be used to output text to the corresponding virtual console.

[root@localhost ~]# echo "what is the use of this?" > /dev/tty3

This will print ‘what is the use of this?’ to the virtual console 3

Shrinking and growing a logical volume in Fedora 15

During my Fedora 15 installation, i made two Logical Volumes, HomeLV 50GB for /home and RootLV 5GB for /. This was a mistake and very soon the / was full.

[root@localhost ~]# df -h /
Filesystem            Size  Used Avail Use% Mounted on
4.8G  4.5G  4.9G  94% /

So i now wanted to shrink the /home and extend the /.

To reduce a logical volume. We first need to reduce the filesystem on it using resize2fs or fsadm. Then use lvresize (or lvreduce) to resize the logical volume.

[root@localhost ~]# resize2fs /dev/mapper/FedVG-HomLV 45G
resize2fs 1.41.14 (22-Dec-2010)
Filesystem at /dev/mapper/FedVG-HomLV is mounted on /home; on-line resizing required
resize2fs: On-line shrinking not supported

Online shrinking is not supported. So now the option is to unmount the underlying filesystem ie, /home. For that i had to logout of the GNOME and work in a Virtual console (Ctrl+Alt+F[1-6]).

Now, a bit more reading of the man page of lvresize and i found that by using -r switch, we can resize the underlying  filesystem  together with the logical volume.

[root@localhost ~]# umount /dev/mapper/FedVG-HomLV
[root@localhost ~]# lvresize -r -L -5G /dev/mapper/FedVG-HomLV
fsck from util-linux2.19.1
dev/mapper/FedVG-HomLV: 900/2949120 files (1.2% non-contiguous), 250200/11796480 blocks

resize2fs 1.41.14 (22-Dec-2010)
Resizing the filesystem on /dev/mapper/FedVG-HomLV to 10977280 (4k) blocks.
The filesystem on /dev/mapper/FedVG-HomLV is now 10977280 blocks long.

Reducing logical volume HomLV to 41.88 GiB
Logical volume HomLV successfully resized

The -L switch with +/-nG directs lvresize it to add or reduce n GB.

[root@localhost ~]# mount /home
[root@localhost ~]# df -h /home
Filesystem            Size  Used Avail Use% Mounted on
42G  253M   39G   1% /home


The next step is to extend the RootLV logical volume and then grow the root filesystem.

[root@localhost ~]# lvextend -l +100%FREE /dev/mapper/FedVG-RootLV
Extending logical volume RootLV to 10.00 GiB
Logical volume RootLV successfully resized

The +100%FREE option directs the lvextend to use the available free extents(free space). Note that the -r switch to resize the underlying filesystem is available here too though i didnt use it.

Grow the filesystem using resizefs. By default, most file system resizing tools will increase the size of the file system to be the size of the underlying logical volume so you do not need to worry about specifying the same size for each of the two commands.

[root@localhost ~]# resize2fs /dev/mapper/FedVG-RootLV
resize2fs 1.41.14 (22-Dec-2010)
Filesystem at /dev/mapper/FedVG-RootLV is mounted on /; on-line resizing required
old desc_blocks = 1, new_desc_blocks = 1
Performing an on-line resize of /dev/mapper/FedVG-RootLV to 2621440 (4k) blocks.
The filesystem on /dev/mapper/FedVG-RootLV is now 2621440 blocks long.

Online growing of filesystem is supported although online shrinking not supported as we saw earlier.

[root@localhost ~]# df -h /
Filesystem            Size  Used Avail Use% Mounted on
9.9G  4.5G  4.9G  48% /