Extract SVG From Website in Python – Aspose.HTML
SVG is a vector format designed for the Web, frequently used within HTML documents. Its primary advantage is its ability to scale to any size without losing quality, making it ideal for maintaining visual clarity across various display sizes. Beyond its scalability, SVG offers additional benefits such as programmability, a small file size, advanced styling options, and interactivity, all of which enhance web pages’ visual appeal and functionality. For designers and developers, extracting SVG images from a website can be challenging, especially when standard methods like right-clicking to save or open the image are ineffective.
With Aspose.HTML for Python .NET, you can easily programmatically extract SVG images from a website. This library provides tools for handling both inline SVGs and external SVGs, streamlining the process of locating and extracting these images. Our Python library simplifies obtaining SVGs from websites, offering a more efficient alternative to manual extraction methods.
Extract SVG from Website – Inline SVG
Inline SVG images are SVG elements <svg>
whose content describes the image. Unlike external SVG images that are linked via URLs, inline SVGs are embedded directly within the HTML code of a webpage. These embedded SVGs are not stored as separate files and thus require special handling to be accessed and saved.
To save inline SVG images from a website, we will need to find all <svg>
elements in the HTML document and extract their content using the outer_html
property. Here is a step-by-step guide to download SVG images from a website:
- Use the
HTMLDocument(Url)
constructor to create an instance of the HTMLDocument class and pass it the URL of the website from which you want to extract inline SVG images. - Use the
get_elements_by_tag_name(“svg”) method to collect all
<svg>
elements. This method returns a list of<svg>
elements embedded within the HTML. - Iterate over the list of
<svg>
elements and save each SVG to the local file system. - Use the
outer_html property to get the complete HTML representation of the
<svg>
element. Each<svg>
is saved with a unique filename to avoid overwriting.
The following Python code demonstrates a straightforward approach to extracting inline SVG images from a webpage using the Aspose.HTML Python library:
1import os
2from aspose.html import *
3
4# Prepare the output directory
5output_dir = "output/svg/"
6os.makedirs(output_dir, exist_ok=True)
7
8# Open a document you want to extract inline SVG images from
9with HTMLDocument("https://docs.aspose.com/svg/net/drawing-basics/svg-shapes/") as document:
10 # Collect all inline SVG images
11 images = document.get_elements_by_tag_name("svg")
12
13 for i, image in enumerate(images):
14 # Save every SVG image to a local file system
15 with open(os.path.join(output_dir, f"{i}.svg"), 'w', encoding='utf-8') as file:
16 file.write(image.outer_html)
Note: Some SVG files may be protected by copyright, so check the terms of use before extracting and using them. For example, using the company logo or other extracted SVG files in your design projects might be considered plagiarism, and you shouldn’t do it. It might be a good idea to ask the website owners for permission before you use their files.
Extract SVG from Website – External SVG
External SVG is an SVG file stored outside an HTML document and loaded into the document using, for example, a <img>
tag. Separating SVG files from HTML makes it possible to reuse the same SVG image in multiple places without duplicating the code, making web pages more efficient and easier to maintain.
External SVG images are represented by the <img>
element, which in turn can also refer to other types of images, so SVG images should be further filtered. This following Python code automates the process of extracting SVG images from a web page using the Aspose.HTML Python library. It begins by defining and creating an output directory to store the SVG files. The code then loads an HTML document from a specified URL. It collects all <img>
elements from the document, retrieves their src
attributes, and filters out URLs that end with “.svg” to isolate SVG images. The code converts these relative URLs into absolute URLs using the document’s base URI. For each absolute SVG URL, it sends a network request to retrieve the SVG image. If the request is successful, it saves the SVG file to the local file system using the file name derived from the URL’s pathname.
1import os
2from aspose.html import *
3from aspose.html.net import *
4
5# Define the output directory
6output_dir = "output/svg/"
7os.makedirs(output_dir, exist_ok=True)
8
9# Open the document you want to extract external SVGs from
10document = HTMLDocument("https://products.aspose.com/html/net/")
11
12# Collect all image elements
13images = document.get_elements_by_tag_name("img")
14
15# Create a distinct collection of relative image URLs
16urls = set(element.get_attribute("src") for element in images)
17
18# Filter out non-SVG images
19svg_urls = [url for url in urls if url.endswith(".svg")]
20
21# Create absolute SVG image URLs
22abs_urls = [Url(url, document.base_uri) for url in svg_urls]
23
24for url in abs_urls:
25 # Create a request message
26 request = RequestMessage(url)
27
28 # Extract SVG
29 response = document.context.network.send(request)
30
31 # Check whether the response is successful
32 if response.is_success:
33 # Save SVG image to the local file system
34 file_path = os.path.join(output_dir, os.path.basename(url.pathname))
35 with open(file_path, 'wb') as file:
36 file.write(response.content.read_as_byte_array())
Conclusion
The Aspose.HTML for Python via .NET library offers robust capabilities for programmatically extracting SVG images from websites, encompassing both inline and external SVGs. To extract inline SVG images, utilize the HTMLDocument(Url)
constructor to load the document, apply the get_elements_by_tag_name("svg")
method to gather all SVG elements, and then use the outer_html
property to save each SVG image locally. For external SVGs, follow a similar approach by collecting <img>
elements, filtering for SVG files, constructing absolute URLs, and saving the images.
These Python examples demonstrate how to automate extracting SVG images from web pages. This is useful for archiving or analyzing web content and is beneficial for designers and developers seeking to pull SVGs from sites without delving into the source code.
Download our Aspose.HTML for Python via .NET library to successfully, quickly, and easily manipulate your HTML documents. The Python library can create, modify, extract data, convert, and render HTML documents without the need for external software. It supports popular file formats such as EPUB, MHTML, XML, SVG, and Markdown and can render to PDF, DOCX, XPS, and Image file formats.
Aspose.HTML offers HTML Web Applications, which are an online collection of free converters, mergers, SEO tools, HTML code generators, URL tools, web accessibility checkers, and more. The applications work on any operating system with a web browser and do not require any additional software installation. Easily convert, merge, encode, generate HTML code, extract data from the web, or analyze web pages for SEO, wherever you are. Use our collection of HTML Web Applications to perform everyday tasks and make your workflow flawless!