Convert HTML from ZIP archive to JPG – C# example
In this article, we create a custom Message Handler to do a specific task – convert HTML from ZIP archive to JPG.
There are many reasons why would you require to convert HTML from ZIP archive to JPG format. For example, if you want to convert an HTML(XHTML) document containing linked resources to JPG, you should pack all these resources along with the document into a single ZIP archive and convert it to JPG. Aspose.HTML for .NET provides functionality for creating custom message handlers for working with ZIP archives.
Create a Custom Message Handler
Aspose.HTML for .NET offers functionality for a custom message handler creating. Let’s design the custom handler that we can use to work with ZIP archives. Take the following steps:
Use the necessary Namespace, which is the Aspose.Html.Net. This Namespace is presented by classes and interfaces which are responsible for helping easy network processing.
To create a custom Message Handler, you need to define your own class that will be derived from the MessageHandler class. The MessageHandler class represents a base type for message handlers.
1using Aspose.Html.Net;
2...
3
4 class ZipArchiveMessageHandler : MessageHandler
5 {
6 }
Initialize an instance of the ZipArchiveMessageHandler class and define a Filter property for it.
Override the Invoke() method of the MessageHandler class to implement the custom message handler behaviour.
1// This message handler prints a message about start and finish processing request
2class ZipArchiveMessageHandler : MessageHandler, IDisposable
3{
4 private string filePath;
5 private Archive archive;
6
7 // Initialize an instance of the ZipArchiveMessageHandler class
8 public ZipArchiveMessageHandler(string path)
9 {
10 this.filePath = path;
11 Filters.Add(new ProtocolMessageFilter("zip"));
12 }
13
14 // Override the Invoke() method
15 public override void Invoke(INetworkOperationContext context)
16 {
17 // Call the GetFile() method that defines the logic in the Invoke() method
18 var buff = GetFile(context.Request.RequestUri.Pathname.TrimStart('/'));
19 if (buff != null)
20 {
21 // Checking: if a resource is found in the archive, then return it as a Response
22 context.Response = new ResponseMessage(HttpStatusCode.OK)
23 {
24 Content = new ByteArrayContent(buff)
25 };
26 context.Response.Headers.ContentType.MediaType = MimeType.FromFileExtension(context.Request.RequestUri.Pathname);
27 }
28 else
29 {
30 context.Response = new ResponseMessage(HttpStatusCode.NotFound);
31 }
32
33 // Call the next message handler
34 Next(context);
35 }
36
37
38 byte[] GetFile(string path)
39 {
40 path = path.Replace(@"\", @"/");
41 var result = GetArchive().Entries.FirstOrDefault(x => path == x.Name);
42 if (result != null)
43 {
44 using (var fs = result.Open())
45 using (MemoryStream ms = new MemoryStream())
46 {
47 fs.CopyTo(ms);
48 return ms.ToArray();
49 }
50 }
51 return null;
52 }
53
54 Archive GetArchive()
55 {
56 return archive ??= new Archive(filePath);
57 }
58
59 public void Dispose()
60 {
61 archive?.Dispose();
62 }
63}
Let’s consider closer the code snippet:
The custom ZipArchiveMessageHandler needs to inherit from the base MessageHandler class. It has two variables: the archive and the string representation of the path to the archive. Inheriting from IDisposable is necessary to provide a mechanism for the deterministic release of unmanaged resources.
The message handler has the concept of filtering. In this case, a protocol (schema) filter is added, this message handler will only work with the
"zip"
protocol.Filtering messages by resource protocol is implemented using the ProtocolMessageFilter class. The ProtocolMessageFilter() constructor initializes a new instance of the ProtocolMessageFilter class. It takes the
"zip"
protocols as a parameter.The Invoke() method implements the message handler behaviour. It is called for each handler in the pipeline and takes a
context
as a parameter. The GetFile() method defines the logic in the Invoke() method. It implements the chain of duties, after which the next Next(context
) handler is called.The GetFile() method realizes a search for data as a byte array in a zip archive based on Request and forms Response.
context
provides contextual information for network services, the entity of the operation is passed through it, and the result of the operation is returned. In Aspose.HTML, thecontext
is realized by INetworkOperationContext interface that has two properties (parameters) – Request and Response.
Add ZipArchiveMessageHandler to the Pipeline
The key concept of message handlers work is chaining them together, and you would now need to add ZipArchiveMessageHandler in the pipeline. The Configuration() constructor creates an instance of the
Configuration class. After the configuration is created, the GetService<INetworkService>(), and MessageHandlers.Add() methods are invoked. The Add() method takes a zip
object as a parameter and appends ZipArchiveMessageHandler to the end of the message handlers’ collection.
1// Add this line before you try to use the 'IBM437' encoding
2System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
3
4// Prepare path to a source zip file
5string documentPath = Path.Combine(DataDir, "test.zip");
6
7// Prepare path for converted file saving
8string savePath = Path.Combine(OutputDir, "zip-to-jpg.jpg");
9
10// Create an instance of ZipArchiveMessageHandler
11using var zip = new ZipArchiveMessageHandler(documentPath);
12
13// Create an instance of the Configuration class
14using var configuration = new Configuration();
15
16// Add ZipArchiveMessageHandler to the chain of existing message handlers
17configuration
18 .GetService<INetworkService>()
19 .MessageHandlers.Add(zip);
20
21// Initialize an HTML document with specified configuration
22using var document = new HTMLDocument("zip:///test.html", configuration);
23
24// Create an instance of Rendering Options
25var options = new ImageRenderingOptions()
26{
27 Format = ImageFormat.Jpeg
28};
29
30// Create an instance of Image Device
31using var device = new ImageDevice(options, savePath);
32
33// Render ZIP to JPG
34document.RenderTo(device);
In the example, the ZIP archive (test.zip) has the HTML document (test.html) in which all related resources have paths relative to the HTML document.
Note: The
HTMLDocument(address, configuration
) constructor takes the absolute path to the ZIP archive. But all related resources have relative paths in the HTML document and in the example’s code.
For more information on how to convert HTML to JPG using
Renderto(device
) method, please read the
Fine-Tuning Converters article.
You can download the complete C# examples and data files from GitHub.