Parallel composition of ZIP Archives
Overview
Aspose.ZIP API provides an ability to compose ZIP archives. Because the entries of such archive can be compressed independently, it is possible to parallelize archive creation to some degree.
ZIP multithreaded: explanation
Use ParallelOptions to indicate that archive needs to be prepared with several CPU cores.
Setting ParallelCompressInMemory
(
getParallelCompressInMemory/
setParallelCompressInMemory) indicates the strategy we choose to multitask.
Here are three options:
ParallelCompressionMode.Never
: compression of all entries is sequential. Only one CPU core works on compression and flushes compressed data as it comes.ParallelCompressionMode.Always
: It forces compression of entries in different threads regardless of entry size, available memory, and other factors. Each CPU core simultaneously compresses a file keeping its compressed data in RAM. Upon the entry is compressed it flushes to the result stream. If your RAM amount is small and the total size of some N entries (where N is the number of CPU cores) is huge it may happen that all RAM available for CLR will exhaust and OutOfMemoryExcepton arises.ParallelCompressionMode.Auto
: The intelligent mode. It estimates CPU cores, sizes of entries, available memory and chooses whether to compress entries in parallel or sequentially. In this mode some smaller entries to be compressed in parallel while others sequentially. Generally, it is safe to go with this option, Aspose.ZIP is wary with estimations and switches to sequential compression as a fallback. There is one more property ofParallelOptions
for this mode -AvailableMemorySize
( getAvailableMemorySize/ setAvailableMemorySize). It is pointless for any other mode. Roughly speaking, it is the high limit of allocated memory while compressing entries with all CPU cores, in megabytes. Aspose.ZIP uses that number to estimate the biggest size of entry which is safe to be compressed in parallel. Entries above the threshold to be compressed sequentially.AvailableMemorySize
is a double-edged sword: being set too high with huge entries, it can produce RAM exhaustion, intense swap, and even might be out of memory exception. Being set too low, most of the entries will be compressed in a sequential way without much speed-up. So, sophisticated users can assign it considering trade-off.
We encourage you to play with different modes of parallel compression on your typical data to determine what is the best settings in your case.
Sample
1try (FileOutputStream zipFile = new FileOutputStream("archive.zip")) {
2 try (Archive archive = new Archive()) {
3 archive.createEntry("first.bin", "data1.bin");
4 ...
5 archive.createEntry("last.bin", "dataN.bin");
6 ParallelOptions parallelOptions = new ParallelOptions();
7 parallelOptions.setParallelCompressInMemory(ParallelCompressionMode.Always);
8 ArchiveSaveOptions options = new ArchiveSaveOptions();
9 options.setParallelOptions(parallelOptions);
10 archive.save(zipFile, options);
11 }
12} catch (IOException ex) {
13 System.out.println(ex);
14}