Retrieve images from *.docx Word document


This article teaches you how to compose an ASP.NET web application for fetching all the images from Microsoft Docx format document.

Prepare ASP.NET Web Application

Using Visual Studio compose simple ASP.NET Core Web Application with Razor Pages. Using NuGet Package Manager refer two packages for your project: Aspose.Zip for decompression and Aspose.Imaging for image verification.
Find page Index.cshtml within your Solution Explorer. Now, add a form to that page with enctype="multipart/form-data" attribute within <form> tag. It needed to transfer Word document to web server. Then, add a input field of type file for uploaded docx file. Here is full HTML markup for the form:

1<form method="post" enctype="multipart/form-data">
2    <<span>Microsoft *.docx document: </span><input type="file" name="uploadedFile" required="required" accept=".docx" />   
3    <br />
4    <input type="submit" value="Upload" />

Accept attributes have been added for user convenience.

Docx Structure

Docx documet is a zip archive itself. If it has any embedded images they are located within ‘word/media’ folder after extraction. So, the user provides the docx file and submits the form. On the server side we need to compose an appropriate OnPost method to Index.cshtml.cs source. Within this method we extract zip archive using the appropriate constructor. Here is the draft of the method:

 1public void OnPost(IFormFile uploadedFile) {
 2    using (Archive archive = new Archive(uploadedFile.OpenReadStream())) {
 3        using (Archive archive = new Archive(uploadedFile.OpenReadStream()))
 4        {
 5            foreach (var entry in archive.Entries.Where(e => e.Name.StartsWith(@"word/media", StringComparison.InvariantCultureIgnoreCase)))
 6			{
 7			 ...
 8			}
 9		}	
10    }

Page Model and Image Rendering

After extraction we need to verify that extracted entry is actually an image. For this purpose we can use Image.CanLoad method. If it approves the valid image, we need to store bytes of it within the page model to render that picture. Add property public List<byte[]> ImageBytes { get; private set; } to IndexModel. We fill this list with extracted image bytes. To show them on the web page we use data URI with convertion image bytes to base64 string. Here is a rendering Razor code at Index.cshtml

2if (Model.ImageBytes != null && Model.ImageBytes.Count > 0) {
3        <h4>Images within document:</h4>
4        foreach (byte[] image in Model.ImageBytes) {
5            <img src="data:image;base64,@Convert.ToBase64String(image)"/>
6        }
7    }

Finishing response

So, put it all together. Each entry of the archive has been decompressed, then an image composed from these bytes.
We do not validate user file in this sample. In real-world applications, you should verify an uploaded archive and its content.
Below is the final OnPost method.

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.