Retrieve images from .docx Word document
Scenario
This article teaches you how to compose an ASP.NET web application for fetching all the images from Microsoft Docx format document.
Prepare ASP.NET Web Application
Using Visual Studio compose simple ASP.NET Core Web Application with Razor Pages.
Using NuGet Package Manager refer two packages for your project:
Aspose.Zip for decompression and
Aspose.Imaging for image verification.
Find page Index.cshtml
within your Solution Explorer. Now, add a form to that page with enctype="multipart/form-data"
attribute within <form>
tag. It needed to transfer Word document to web server. Then, add a input field of type file
for uploaded docx file.
Here is full HTML markup for the form:
1<form method="post" enctype="multipart/form-data">
2 <span>Microsoft *.docx document: </span>
3 <input type="file" name="uploadedFile" required="required" accept=".docx" />
4 <br />
5 <input type="submit" value="Upload" />
6</form>
Accept
attributes have been added for user convenience.
Docx Structure
Docx documet is a zip archive itself. If it has any embedded images they are located within ‘word/media’ folder after extraction. So, the user provides the docx file and submits the form. On the server side we need to compose an appropriate OnPost
method to Index.cshtml.cs
source. Within this method we extract
zip archive using
the appropriate constructor. Here is the draft of the method:
1 public void OnPost(IFormFile uploadedFile) {
2 using (Archive archive = new Archive(uploadedFile.OpenReadStream())) {
3 using (Archive archive = new Archive(uploadedFile.OpenReadStream()))
4 {
5 foreach (var entry in archive.Entries.Where(e => e.Name.StartsWith(@"word/media", StringComparison.InvariantCultureIgnoreCase)))
6 {
7 ...
8 }
9 }
10 }
11 }
Page Model and Image Rendering
After extraction we need to verify that extracted entry is actually an image. For this purpose we can use
Image.CanLoad method. If it approves the valid image, we need to store bytes of it within the page model to render that picture. Add property public List<byte[]> ImageBytes { get; private set; }
to IndexModel
.
We fill this list with extracted image bytes. To show them on the web page we use
data URI with convertion image bytes to base64 string.
Here is a rendering Razor code at Index.cshtml
1 @{
2 if (Model.ImageBytes != null && Model.ImageBytes.Count > 0) {
3 <h4>Images within document:</h4>
4 foreach (byte[] image in Model.ImageBytes) {
5 <img src="data:image;base64,@Convert.ToBase64String(image)"/>
6 }
7 }
8 }
Finishing response
So, put it all together. Each entry of the archive has been decompressed, then an image composed from these bytes.
We do not validate user file in this sample. In real-world applications, you should verify an uploaded archive and its content.
Below is the final OnPost
method.
1public void OnPost(IFormFile uploadedFile)
2{
3 ImageBytes = new List<byte[]>();
4
5 using (Archive archive = new Archive(uploadedFile.OpenReadStream()))
6 {
7 foreach (var entry in archive.Entries.Where(e => e.Name.StartsWith(@"word/media", StringComparison.InvariantCultureIgnoreCase)))
8 {
9 using (MemoryStream extracted = new MemoryStream())
10 {
11 entry.Open().CopyTo(extracted);
12 extracted.Seek(0, SeekOrigin.Begin);
13
14 if (Aspose.Imaging.Image.CanLoad(extracted))
15 ImageBytes.Add(extracted.ToArray());
16 }
17 }
18 }
19}