How to perform a faster data compression with pbzip2

Introduction

What if you would be able to perform a data compression four times faster, with the same compression ratio as you normally do. Pbzip2 command line utility can easily accomplish this as it gives you an option to select number CPU and amount of RAM to be used during the compression process.

Regular tar and bzip2 compression

We all know the regular command to perform tar and bzip2 directory compression. The below command will tar and compress our sandbox directory FOOBAR. We are also prefixing the below command to get exact time for how long will it take to output compressed file FOOBAR.tar.Bbz2 from 242MB FOOBAR directory:

# time tar cjf FOOBAR1.tar.bz2 FOOBAR/

real    0m20.030s
user    0m19.828s
sys     0m0.304s

From the above time output we can see that it took about 20 seconds to create following compressed file:

# ls -lh FOOBAR1.tar.bz2 
-rw-r--r-- 1 root root 54M Mar 10 20:25 FOOBAR1.tar.bz2

Faster compression with bpzip2

pbzip2 by default uses all available CPU’s and 100MB RAM to perform compression. The following linux command will perform directory compression using pbzip2. Once again we use time to measure execution time:

# time tar -c FOOBAR | pbzip2 -c > FOOBAR2.tar.bz2

real    0m4.777s
user    0m35.588s
sys     0m1.060s

Alternatively, the bellow command will yield the same result:

# time tar cf FOOBAR3.tar.bz2 --use-compress-prog=pbzip2 FOOBAR

real    0m4.764s
user    0m35.508s
sys     0m1.136s

Reserve Resources

As already mentioned, pbzip2 allows user to select number of CPU’s and amount of RAM to be dedicated to the compression. Below example is using only single CPU to perform requested compression:

# time tar -c FOOBAR | pbzip2 -c -p1 > FOOBAR4.tar.bz2

real    0m20.348s
user    0m19.972s
sys     0m0.648s

In order to dedicate selected amount of RAM use -m switch. By default pbzip2 uses 100MB. Example below performs compression using 1 CPU and 10MB of RAM:

# time tar -c FOOBAR | pbzip2 -c -p1 -m10 > FOOBAR5.tar.bz2

real    0m20.362s
user    0m19.932s
sys     0m0.704s

Compression Level

As it is usually the case with any compression utilities, pbzip2 also allows for compression ratio settings. The compression range is from 1 to 9, where default is 9 which is also the best compression ratio. To change compression rate to eg. 1 use -1:

 time tar -c FOOBAR | pbzip2 -c -1 > FOOBAR6.tar.bz2

real    0m3.786s
user    0m28.612s
sys     0m0.364s

Using the above example you will end up with a faster execution time but larger file name:

 # ls -lh *.bz2
-rw-r--r-- 1 root root 54M Mar 10 20:02 FOOBAR1.tar.bz2
-rw-r--r-- 1 root root 54M Mar 10 20:41 FOOBAR2.tar.bz2
-rw-r--r-- 1 root root 54M Mar 10 20:43 FOOBAR3.tar.bz2
-rw-r--r-- 1 root root 54M Mar 10 20:48 FOOBAR4.tar.bz2
-rw-r--r-- 1 root root 54M Mar 10 20:54 FOOBAR5.tar.bz2
-rw-r--r-- 1 root root 67M Mar 10 21:00 FOOBAR6.tar.bz2

Decompression

To preform a decompression using pbzip2 does to produce significant, if any, time saving in comparison with bzip2. The following linux commands can be used to decompress bzip2 compressed data using pbzip2 utility:

# tar xf FOOBAR1.tar.bz2 --use-compress-prog=pbzip2
OR
# pbzip2 -dc FOOBAR1.tar.bz2 | tar x


Comments and Discussions
Linux Forum