Parallel composition of ZIP Archives

Overview

Aspose.ZIP API provides an ability to compose ZIP archives. Because the entries of such archive can be compressed independently, it is possible to parallelize archive creation to some degree.

ZIP multithreaded: explanation

Use ParallelOptions to indicate that archive needs to be prepared with several CPU cores.

Setting ParallelCompressInMemory indicates the strategy we choose to multitask. Here are three options:

ParallelCompressionMode.Never: compression of all entries is sequential. Only one CPU core works on compression and flushes compressed data as it comes.
ParallelCompressionMode.Always: It forces compression of entries in different threads regardless of entry size, available memory, and other factors. Each CPU core simultaneously compresses a file keeping its compressed data in RAM. Upon the entry is compressed it flushes to the result stream. If your RAM amount is small and the total size of some N entries (where N is the number of CPU cores) is huge it may happen that all RAM available for CLR will exhaust and OutOfMemoryExcepton arises.
ParallelCompressionMode.Auto: The intelligent mode. It estimates CPU cores, sizes of entries, available memory and chooses whether to compress entries in parallel or sequentially. In this mode some smaller entries to be compressed in parallel while others sequentially. LZMA entries are not compressed in parallel because of high memory consumption. Generally, it is safe to go with this option, Aspose.ZIP is wary with estimations and switches to sequential compression as a fallback. There is one more property of ParallelOptions for this mode - AvailableMemorySize. It is pointless for any other mode. Roughly speaking, it is the high limit of allocated memory while compressing entries with all CPU cores, in megabytes. Aspose.ZIP uses that number to estimate the biggest size of entry which is safe to be compressed in parallel. Entries above the threshold to be compressed sequentially. AvailableMemorySize is a double-edged sword: being set too high with huge entries, it can produce RAM exhaustion, intense swap, and even might be out of memory exception. Being set too low, most of the entries will be compressed in a sequential way without much speed-up. So, sophisticated users can assign it considering trade-off.

We encourage you to play with different modes of parallel compression on your typical data to determine what is the best settings in your case.

How to Create ZIP Archive with Parallel Compression in C# Sample

Steps: Create ZIP Archive with Parallel Compression in C#

Open a file stream (FileStream) in FileMode.Create to create a new ZIP file (archive.zip).
Initialize a new Archive object for managing ZIP entries.
Use the CreateEntry method to add multiple entries, such as “first.bin” and “last.bin”, using File.OpenRead to read from the source files (data1.bin and dataN.bin).
Set up ArchiveSaveOptions with ParallelOptions, where ParallelCompressInMemory is set to ParallelCompressionMode.Always, enabling parallel compression for faster archiving.
Save the archive with the specified options using the Save method.

 1    using (FileStream zipFile = File.Open("archive.zip", FileMode.Create))
 2    {
 3        using (Archive archive = new Archive())
 4        {
 5            archive.CreateEntry("first.bin", File.OpenRead("data1.bin"));
 6            ...
 7            archive.CreateEntry("last.bin", File.OpenRead("dataN.bin"));
 8            archive.Save(zipFile, new ArchiveSaveOptions()
 9            {
10                ParallelOptions = new ParallelOptions() 
11                { ParallelCompressInMemory = ParallelCompressionMode.Always }
12            });
13        }
14    }

Compress multithreaded Parallel LZMA2 in 7z Archives