Thursday 17 March 2022

How to unpack bzip2 faster using parallel approach?

There are multiple tools which claim option to decompress bzip2 in parallel:

  • pbzip2
  • lbzip2
Let's compare pbzip2 performance with reference singe thread bzip2:

$ time bzip2 -d /tmp/rib.bz2  --stdout > /dev/null

real 0m52.188s
user 0m52.019s
sys 0m0.160s
$ time pbzip2 -d /tmp/rib.bz2  --stdout > /dev/null

real 0m49.380s
user 0m49.473s
sys 0m0.241s
You may notice that we have no speed improvement at all which means that pbzip2 cannot do decompression in parallel for standard bz2 compressed files.

But lbzip2 actually can do it and it offers great performance improvement:
$ time bzip2 -d /tmp/rib.bz2  --stdout > /dev/null

real 0m52.790s
user 0m52.549s
sys 0m0.224s
$ time lbzip2 -d /tmp/rib.bz2   --stdout > /dev/null

real 0m8.604s
user 1m8.099s
sys 0m0.420s
It's 9 seconds vs 53 seconds. It's 6 times improvement on 8 CPU server. 

Conclusions: use lbzip2 for parallel decompression. 

