编辑 HTML 文档 – 用 C# 编辑 HTML 文件

正如我们在 创建 HTML 文档 一文中提到的, HTMLDocument 以及整个 DOM 都是基于 WHATWG DOM 标准实现的。因此,只要具备 HTMLJavaScript 语言的基础知识,就可以轻松使用 Aspose.HTML for .NET。

DOM 命名空间

DOM 树是文档的内存表示。DOM 是访问和操作文档内容的 API。HTML 文档由一棵包含多种节点的树组成,其根节点是一个 Document。DOM 命名空间 由以下基本数据类型表示:

Data typeDescription
DocumentThe Document class represents the entire HTML, XML or SVG document. Conceptually, it is the root of the document tree and provides the primary access to the document’s data.
EventTargetThe EventTarget class is implemented by all Nodes in an implementation that supports the DOM Event Model. An EventTarget object represents a target to which an event can be dispatched when something has occurred.
NodeThe Node class is the primary datatype for the entire Document Object Model. It represents a single node in the document tree.
ElementThe element type is based on node and represents a base class for HTML, XML or SVG DOM.
AttrThe Attr class represents an attribute in an Element object. Typically the allowable values for the attribute are defined in a schema associated with the document.

下面简要列出了核心数据类型提供的实用 API 方法:

MethodDescription
Document.GetElementById(elementId)The method, when invoked, must return the first element whose ID is elementId and null if there is no such element otherwise.
Document.GetElementsByTagName(tagname)The method must return the list of elements with the given name.
Document.CreateElement(localname)The method creates an element of the type specified, or an HTMLUnknownElement if tagname isn’t recognized.
Node.AppendChild(node)The method adds a node to the end of the list of children of a specified parent node.
Element.SetAttribute(name, value)Sets the value of an attribute on the specified element.
Element.GetAttribute(name)The method returns the value of a specified attribute on the element.
Element.InnerHTMLThe property returns a fragment of markup contained within the element.

有关 DOM 命名空间中的接口和方法的完整列表,请访问 API 参考源

编辑 HTML

使用我们的库,您可以通过多种方式编辑 HTML。您可以通过插入新节点、删除或编辑现有节点的内容来修改文档。如果需要创建新节点,则需要调用以下方法:

MethodDescription
Document.CreateCDATASection(data)Creates a CDATASection node whose value is the specified string.
Document.CreateComment(data)Creates a Comment node given the specified string.
Document.CreateDocumentFragment()Creates an empty DocumentFragment object.
Document.CreateElement(localname)Creates an element of the type specified.
Document.CreateEntityReference(name)Creates an EntityReference object.
Document.CreateProcessingInstruction(target, data)Creates an ProcessingInstruction with the specified name and data.
Document.CreateTextNode(data)Creates a Text node given the specified string.

创建新节点后,DOM 中有几种方法可以帮助您将节点插入文档树。下面列出了插入节点的最常用方法:

MethodDescription
Node.InsertBefore(node, child)Inserts the node before the reference child node.
Node.AppendChild(node)Adds the node to the list of children of the current node.
Node.RemoveChild(child)Removes the child node from the list of children.
Element.Remove()Removes this instance from the HTML DOM tree.

您可以从 GitHub 下载完整的示例和数据文件。

编辑文档树

HTML 文档由元素树组成。在源代码中,每个元素都由一个起始标记(如<body>)和一个结束标记(如"")来表示。元素可以有属性,这些属性控制元素如何工作。Aspose.HTML for .NET API 支持 HTML 标准中定义的一系列 HTML 元素,以及关于如何嵌套元素的规则。

考虑创建和编辑 HTML 的简单步骤。文档将包含一个带有 id 属性的文本段落:

 1// Edit HTML document using DOM Tree
 2
 3// Create an instance of an HTML document
 4using (HTMLDocument document = new HTMLDocument())
 5{
 6    HTMLElement body = document.Body;
 7
 8    // Create a paragraph element
 9    HTMLParagraphElement p = (HTMLParagraphElement)document.CreateElement("p");
10
11    // Set a custom attribute
12    p.SetAttribute("id", "my-paragraph");
13
14    // Create a text node
15    Text text = document.CreateTextNode("my first paragraph");
16
17    // Attach the text to the paragraph
18    p.AppendChild(text);
19
20    // Attach the paragraph to the document body
21    body.AppendChild(p);
22
23    // Save the HTML document to a file
24    document.Save(Path.Combine(OutputDir, "edit-document-tree.html"));
25}

让我们来看看如何创建一个更复杂的 HTML 文档。每个 HTML 文档都以节点树的形式表示,树中的某些节点可以有子节点。下面的代码片段展示了如何使用 DOM 树和上述功能编辑 HTML 文档:

 1// Create and add new HTML elements using C#
 2
 3// Create an instance of an HTML document
 4using (HTMLDocument document = new HTMLDocument())
 5{
 6    // Create a <style> element and assign the green color for all elements with class-name equals 'gr'.
 7    Element style = document.CreateElement("style");
 8    style.TextContent = ".gr { color: green }";
 9
10    // Find the document <head> element and append the <style> element to it
11    Element head = document.GetElementsByTagName("head").First();
12    head.AppendChild(style);
13
14    // Create a paragraph element with class-name 'gr'.
15    HTMLParagraphElement p = (HTMLParagraphElement)document.CreateElement("p");
16    p.ClassName = "gr";
17
18    // Create a text node
19    Text text = document.CreateTextNode("Hello World!!");
20
21    // Append the text node to the paragraph
22    p.AppendChild(text);
23
24    // Append the paragraph to the document <body> element
25    document.Body.AppendChild(p);
26
27    // Save the HTML document to a file 
28    document.Save(Path.Combine(OutputDir, "using-dom.html"));
29
30    // Create an instance of the PDF output device and render the document into this device
31    using (PdfDevice device = new PdfDevice(Path.Combine(OutputDir, "using-dom.pdf")))
32    {
33        // Render HTML to PDF
34        document.RenderTo(device);
35    }
36}

使用 InnerHTML 和 OuterHTML 属性

有了 DOM 对象,您就有了操作 HTML 文档的强大工具。不过,有时只使用 System.String 会更好。下面的代码片段展示了如何使用 InnerHTMLOuterHTML 属性编辑 HTML。

 1// Edit HTML body content and get modified document as string
 2
 3// Create an instance of an HTML document
 4using (HTMLDocument document = new HTMLDocument())
 5{
 6    // Write the content of the HTML document into the console output
 7    Console.WriteLine(document.DocumentElement.OuterHTML); // output: <html><head></head><body></body></html>
 8
 9    // Set the content of the body element
10    document.Body.InnerHTML = "<p>HTML is the standard markup language for Web pages.</p>";
11
12    // Set an html variable for the document content viewing
13    string html = document.DocumentElement.OuterHTML;
14
15    // Write the content of the HTML document into the console output
16    Console.WriteLine(html); // output: <html><head></head><body><p>HTML is the standard markup language for Web pages.</p></body></html>
17}

编辑 CSS

层叠样式表(CSS)是一种样式表语言,用于描述网页在浏览器中的外观。CSS 可以内联、内部和外部方式添加到 HTML 文档中。因此,您可以使用内联 CSS 为单个 HTML 元素设置独特的样式,也可以通过在单独的 .css 文件中指定相关 CSS 来共享多个网页的格式。Aspose.HTML 不仅支持开箱即用的 CSS,还提供了在将 HTML 文档转换为其他格式之前即时管理文档样式的工具,如下所示。

内联 CSS

在 HTML 标签内使用 style 属性编写 CSS 时,称为 “内联 CSS 样式”。内联 CSS 可让你一次对一个 HTML 元素应用一个单独的样式。您可以通过使用 style 属性为 HTML 元素设置 CSS,并在其中定义 CSS 属性。 在下面的代码片段中,你可以看到如何为 HTML <p>元素指定 CSS 样式属性。

 1// How to set inline CSS styles in an HTML element using C#
 2
 3// Create an instance of an HTML document with specified content
 4string content = "<p>InlineCSS </p>";
 5using (HTMLDocument document = new HTMLDocument(content, "."))
 6{
 7    // Find the paragraph element to set a style
 8    HTMLElement paragraph = (HTMLElement)document.GetElementsByTagName("p").First();
 9
10    // Set the style attribute
11    paragraph.SetAttribute("style", "font-size:250%; font-family:verdana; color:#cd66aa");
12
13    // Save the HTML document to a file 
14    document.Save(Path.Combine(OutputDir, "edit-inline-css.html"));
15
16    // Create an instance of PDF output device and render the document into this device
17    using (PdfDevice device = new PdfDevice(Path.Combine(OutputDir, "edit-inline-css.pdf")))
18    {
19        // Render HTML to PDF
20        document.RenderTo(device);
21    }
22}
Example-EditInlineCss.cs hosted with ❤ by GitHub

在本例中,颜色、字体大小和字体家族适用于 <p> 元素。呈现的 pdf 页面片段如下:

文本 “内联 CSS

内部 CSS

内部 CSS 样式选项通过将所有样式封装在 HTML 文档的<head>中的<style>元素中,将属性应用于单个页面,因此很受欢迎。

 1// Edit HTML with internal CSS using C#
 2
 3// Create an instance of an HTML document with specified content
 4string content = "<div><p>Internal CSS</p><p>An internal CSS is used to define a style for a single HTML page</p></div>";
 5using (HTMLDocument document = new HTMLDocument(content, "."))
 6{
 7    Element style = document.CreateElement("style");
 8    style.TextContent = ".frame1 { margin-top:50px; margin-left:50px; padding:20px; width:360px; height:90px; background-color:#a52a2a; font-family:verdana; color:#FFF5EE;} \r\n" +
 9                        ".frame2 { margin-top:-90px; margin-left:160px; text-align:center; padding:20px; width:360px; height:100px; background-color:#ADD8E6;}";
10
11    // Find the document header element and append the style element to the header
12    Element head = document.GetElementsByTagName("head").First();
13    head.AppendChild(style);
14
15    // Find the first paragraph element to inspect the styles
16    HTMLElement paragraph = (HTMLElement)document.GetElementsByTagName("p").First();
17    paragraph.ClassName = "frame1";
18
19    // Find the last paragraph element to inspect the styles
20    HTMLElement lastParagraph = (HTMLElement)document.GetElementsByTagName("p").Last();
21    lastParagraph.ClassName = "frame2";
22
23    // Set a color to the first paragraph
24    paragraph.Style.FontSize = "250%";
25    paragraph.Style.TextAlign = "center";
26
27    // Set a font-size to the last paragraph
28    lastParagraph.Style.Color = "#434343";
29    lastParagraph.Style.FontSize= "150%";
30    lastParagraph.Style.FontFamily = "verdana";
31
32    // Save the HTML document to a file 
33    document.Save(Path.Combine(OutputDir, "edit-internal-css.html"));
34
35    // Create the instance of the PDF output device and render the document into this device
36    using (PdfDevice device = new PdfDevice(Path.Combine(OutputDir, "edit-internal-css.pdf")))
37    {
38        // Render HTML to PDF
39        document.RenderTo(device);
40    }
41}

在本例中,我们使用了内部 CSS,并使用 HTMLElement 类的 Style 属性为单个元素声明了额外的样式属性。下图展示了 “edit-internal-css.pdf “文件的渲染片段:

文本 “内部 CSS

外部 CSS

外部样式表可以用任何文本编辑器编写,并以 .css 扩展名保存。它是一个独立的 CSS 文件,与网页链接。外部 CSS 的优点是只需创建一次,即可将规则应用于多个网页。

示例 1

让我们来看一个外部 CSS 实现的例子,在这个例子中,我们使用了一个指向 CSS 文件 URL 地址的链接:

 1// How to use an external CSS file in HTML using Aspose.HTML for .NET
 2
 3// Create an instance of HTML document with specified content
 4string htmlContent = "<link rel=\"stylesheet\" href=\"https://docs.aspose.com/html/net/edit-html-document/external.css\" type=\"text/css\" />\r\n" +
 5                     "<div class=\"rect1\" ></div>\r\n" +
 6                     "<div class=\"rect2\" ></div>\r\n" +
 7                     "<div class=\"frame\">\r\n" +
 8                     "<p style=\"font-size:2.5em; color:#ae4566;\"> External CSS </p>\r\n" +
 9                     "<p class=\"rect3\"> An external CSS can be created once and applied to multiple web pages</p></div>\r\n";                             
10
11using (HTMLDocument document = new HTMLDocument(htmlContent, "."))
12{
13    // Save the HTML document to a file 
14    document.Save(Path.Combine(OutputDir, "external-css.html"));
15
16    // Create the instance of the PDF output device and render the document into this device
17    using (PdfDevice device = new PdfDevice(Path.Combine(OutputDir, "external-css.pdf")))
18    {
19        // Render HTML to PDF
20        document.RenderTo(device);
21    }
22}

外部 CSS 应用的结果如下所示:

文本 “外部 CSS

示例 2

你可以将 CSS 文件的内容写入字符串,并保存到一个单独的链接文件中,如下例所示:

 1// Edit HTML with external CSS using C#
 2
 3// Prepare content of a CSS file
 4string styleContent = ".flower1 { width:120px; height:40px; border-radius:20px; background:#4387be; margin-top:50px; } \r\n" +
 5                      ".flower2 { margin-left:0px; margin-top:-40px; background:#4387be; border-radius:20px; width:120px; height:40px; transform:rotate(60deg); } \r\n" +
 6                      ".flower3 { transform:rotate(-60deg); margin-left:0px; margin-top:-40px; width:120px; height:40px; border-radius:20px; background:#4387be; }\r\n" +
 7                      ".frame { margin-top:-50px; margin-left:310px; width:160px; height:50px; font-size:2em; font-family:Verdana; color:grey; }\r\n";
 8
 9// Prepare a linked CSS file
10File.WriteAllText("flower.css", styleContent);
11
12// Create an instance of an HTML document with specified content
13string htmlContent = "<link rel=\"stylesheet\" href=\"flower.css\" type=\"text/css\" /> \r\n" +
14                     "<div style=\"margin-top: 80px; margin-left:250px; transform: scale(1.3);\" >\r\n" +
15                     "<div class=\"flower1\" ></div>\r\n" +
16                     "<div class=\"flower2\" ></div>\r\n" +
17                     "<div class=\"flower3\" ></div></div>\r\n" +
18                     "<div style = \"margin-top: -90px; margin-left:120px; transform:scale(1);\" >\r\n" +
19                     "<div class=\"flower1\" style=\"background: #93cdea;\"></div>\r\n" +
20                     "<div class=\"flower2\" style=\"background: #93cdea;\"></div>\r\n" +
21                     "<div class=\"flower3\" style=\"background: #93cdea;\"></div></div>\r\n" +
22                     "<div style =\"margin-top: -90px; margin-left:-80px; transform: scale(0.7);\" >\r\n" +
23                     "<div class=\"flower1\" style=\"background: #d5effc;\"></div>\r\n" +
24                     "<div class=\"flower2\" style=\"background: #d5effc;\"></div>\r\n" +
25                     "<div class=\"flower3\" style=\"background: #d5effc;\"></div></div>\r\n" +
26                     "<p class=\"frame\">External</p>\r\n" +
27                     "<p class=\"frame\" style=\"letter-spacing:10px; font-size:2.5em \">  CSS </p>\r\n";
28
29using (HTMLDocument document = new HTMLDocument(htmlContent, "."))
30{
31    // Save the HTML document to a file 
32    document.Save(Path.Combine(OutputDir, "edit-external-css.html"));
33}

本例展示了如何从零开始创建 CSS 图形。图中显示了 “edit-external-css.html “文件的可视化效果:

文本 “外部 CSS

如前所述,CSS 最常见的应用是为 HTML 和其他标记语言编写的网页设计样式。但除了网页设计之外,你还可以使用 CSS 创建一些漂亮的图形,比如我们上面展示的图形。CSS 绘图的关键概念是使用边框半径、旋转和定位来创造性地创建曲线和形状。

Subscribe to Aspose Product Updates

Get monthly newsletters & offers directly delivered to your mailbox.