By Rob Sheldon, 2014/03/26 (首次發表於: 2012/09/20)sql
本文屬於進階系列:XML進階數據庫
自2003年以來,XML一直是SQL標準的一部分,許多動態管理視圖也返回XML數據,因此,對於每個數據庫管理員而言,一樣是相當重要的。如今,這個行業更多地用於由文檔標記定義的數據,數據庫開發人員和數據庫管理員比以往任什麼時候候都更瞭解XML技術和使用它的方式。 在這一系列文章中,羅伯特·謝爾頓(Robert Sheldon)將盡力把複雜的東西簡單化。 注意:這一系列的文章已經可做爲電子書獲取。api
可擴展標記語言(XML)首先在SQL Server的SQL Server 2000發行版中提供支持。然而,與XML相關的功能僅限於專一於映射關係數據和XML數據的數據管理功能。例如,SQL Server 2000中添加了FOR XML子句,能夠將關係查詢結果做爲XML返回。app
可是到添加了XML數據類型的SQL Server 2005發佈時,支持XML就變得有意義起來。 XML數據類型容許將XML文檔自己存儲在使用該類型配置的列和變量中。數據類型還支持一組可用於檢索和修改XML文檔中特定組件的方法。less
爲了充分利用SQL Server支持的與XML相關的功能,你可能會發現對XML自己有一個基本的瞭解是頗有用的。爲此,XML進階系列的第一部分將介紹XML是什麼,並對構成XML文檔的各類組件進行介紹。ide
和超文本標記語言(HTML)類似,XML也是一種標記語言,它使用標籤來描述和這些標籤相關聯的數據的性質。使XML可擴展的是它的自描述性質,也就是建立特定於XML文檔中包含的數據值的標籤。在HTML中,這些標籤是預約義的。 (經過XML組件,XML的可擴展性將變得更加清晰。)sqlserver
儘管XML具備可擴展性,但它仍然是一種標準化語言,必須符合由萬維網聯盟(W3C)定義的特定格式規則集。因爲這種標準化與用於顯示數據的HTML不一樣,這種語言被普遍地用於傳輸和存儲數據。post
XML能夠輕鬆地在異構系統之間共享數據,而無論硬件,操做系統或應用程序類型如何,XML的廣泛應用意味着數據能夠更少的進行人爲干預處理。同時能夠控制數據的描述、排列和顯示方式。flex
構成XML文檔的主要組件和支配這些組件使用的規則一般很是簡單,但爲了使XML文檔可以被SQL Server XML解析器正確處理,你必須嚴格遵照一些規則。ui
XML文檔中主要包含兩種類型的信息:要存儲的數據和描述數據的標籤。 標籤由一組尖括號(<>)組成,它們包含描述與標籤相關聯的數據的描述性詞或複合詞(無空格)。 正是因爲這些標籤的自描述性質,XML一般被認爲是元語言。
每一個離散的存儲數據片斷都被封裝在開始標籤和結束標籤中,以下例所示:
<Person>John Doe</Person>
在這種狀況下,開始標籤是<Person>
,結束標籤是</ Person>
。 請注意,正斜槓位於結束標記中的標記描述以前。 正斜槓必須在全部結束標籤以前,但標籤的語言必須與開頭標籤相同,上面的示例是Person。 我能夠選擇一個除Person以外的名稱,包括與人無關的名稱,可是一個好的作法是始終提供最能描述打開和關閉標籤中包含的數據的標籤名稱。 在這種狀況下,標籤描述了一我的的名字John Doe,所以標籤名稱爲<Person>
。
一樣的,標籤和封閉的數據表明一個單一的元素。 可是,元素並不老是包含數據。 一個空的元素能夠以兩種方式之一呈現。 第一個是經過指定打開和關閉標籤,但不包括數據,以下例所示:
<Person></Person>
表示一個空白元素的另外一種方法是隻使用一組括號,但仍包括正斜槓:
<Person />
再者,只有當元素不包含值時,才能使用這個方法。 之後你在XML進階系列中可能看到模式須要一個沒有值的元素。 在這種狀況下,可使用縮寫格式來表示元素的兩個標籤。
不管元素是否包含值,只要使用兩個標籤,開始和關閉標籤必須徹底匹配,精確到大小寫(除告終束標籤中的正斜槓)。 例如,如下元素在SQL Server XML解析器中生成錯誤,由於兩個標籤不能匹配:
<person> John Doe </ Person>
開始標籤中的描述性文字都是小寫的; 然而,結束標籤中的描述性詞應以大寫字母開頭。 開始和結束標籤必須匹配,以被認爲是適當的或者格式良好的XML。
可是你能夠將元素嵌入到彼此中。 在如下示例中,我將<Person>
元素的兩個實例嵌入到<People>
元素中:
<People> <Person>John Doe</Person> <Person>Jane Doe</Person> </People>
請注意,每一個<Person>
元素自己都是完整的。 它包括打開和關閉標籤及其附帶的數據。 嵌入其餘元素中的元素稱爲子元素( child),或在某些狀況下稱爲subelements。 外部元素(在這種狀況下爲<People>
)是父元素。 XML文檔最高級別的父元素被認爲是根元素。 全部XML文檔必須有且只有一個根元素。 所以,上面示例中的<People>
元素是兩個<Person>
元素的父元素,它是XML文檔的根元素。
SQL Server還容許將XML片斷存儲在XML列或變量中。 片斷是沒有根元素的XML代碼塊,例如如下示例中顯示的兩個元素:
<Person>John Doe</Person> <Place>Seattle, WA</Place>
元素必須仍然是格式良好的XML,也就是具備包含數據的匹配標籤,但它們不必定是XML文檔。 像之後你將在XML進階系列中看到的那樣,你能夠指定只容許在XML列或變量中使用XML文檔,但如今只要知道SQL Server能夠區分和存儲XML文檔和片斷便可。
在其餘元素中嵌入元素時,必須確保子元素在完成父元素以前已經完成。 例如,在如下示例中,我在<Person>
元素以前結束了<People>
元素,這會致使SQL Server XML解析器生成錯誤:
<People><Person>John Doe</People></Person>
你必須確保不管多少級別包含嵌入元素,你的子元素都是完整的。 在下面的示例中,<FirstName>
和<LastName>
元素嵌入在每一個<Person>
元素中,<Person>
元素嵌入在<People>
元素中:
<People> <Person> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
在這種狀況下,<Person>
元素充當子元素和父元素。 可是請注意,每一個嵌入的元素,不管級別如何,必須落在父元素的開始和結束標記以內。 例如,<FirstName>
和<LastName>
元素的第一個實例徹底落在<Person>
元素的第一個實例中,而且<Person>
元素的兩個實例徹底落在<People>
元素中,即 文檔的根元素。
元素也能夠具備與它們相關聯的屬性。 屬性是能夠賦值的屬性。 該屬性被定義爲元素的開始標籤的一部分。 在如下示例中,我已經將id屬性添加到<Person>
元素的每一個實例中:
<People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
如示例所示,屬性由屬性名稱(在這種狀況下爲id),後跟等號,屬性值由雙引號括起來。所以,<Person>
元素的第一個實例的id屬性的值爲1234,而<Person>
元素的第二個實例的id屬性的值爲5678。
許多XML文檔中包含的另外一個組件是聲明,它至少指定了文檔符合的XML標準的版本。到目前爲止,只有兩個版本:1.0和1.1。若是使用XML 1.0,聲明是沒必要要的;可是XML 1.1須要聲明。所以,你應該知道如何在XML文檔中包含聲明。
若是你包含聲明,則必須將其放在文檔的開頭,用<?
打開標籤,並使用?>
關閉標籤結束。另外,你必須包含xml關鍵字(小寫)和版本屬性(也是小寫)。一般包括的另外一個屬性(儘管是可選的)是編碼,它指定用於XML文檔的字符編碼。在下面的示例中,我包括一個聲明,它指定版本1.0和UTF-8的編碼,這意味着數據被存儲爲8位Unicode字符序列:
<?xml version="1.0" encoding="UTF-8"?> <People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
您還能夠向XML文檔添加註釋。 要作到這一點,只需在<! -
標籤以前註釋,並將其結束於 - >
標籤,就像我在下面的例子中所作的那樣:
<?xml version="1.0" encoding="UTF-8"?> <!-- A list of current clients --> <People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
正如你所看到的那樣,我已經添加了評論當前客戶端的列表,附加在註釋標籤中。 SQL Server XML解析器將忽略標籤內的任何內容,所以你不只可使用註釋功能提供有關XML文檔及其數據的信息,還能夠掛起保留不想做爲文檔進行處理的XML代碼部分。
另外一個使用XML時要考慮的是,當元素值出現時,某些字符將沒法解析。 例如,您不能在元素的值中包含&符(&),就像我在如下示例中的<FavoriteBook>
子元素中所作的那樣:
<?xml version="1.0" encoding="UTF-8"?> <!-- A list of current clients --> <People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> <FavoriteBook>Crime & Punishment</FavoriteBook> </Person> </People>
若是我嘗試將此XML文檔分配給XML列或變量,則<FavoriteBook>
子元素將致使解析器生成錯誤,由於「Crime & Punishment」值包含&符號。 你必須使用實體引用替換此類型的字符,該實體引用會告知解析器保留原始打算的字符。 實體引用以&符號開頭,以分號結尾,其間包含表明原始值的多字符代碼。 對於&符號,實體參考應該是&amp;,我將在下面的例子中使用:
<?xml version="1.0" encoding="UTF-8"?> <!-- A list of current clients --> <People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> <FavoriteBook>Crime & Punishment</FavoriteBook> </Person> </People>
請注意,我已經用了&amp;實體參考。如今XML解析器處理<FavoriteBook>
元素將沒有任何問題。可是請注意,&符號不是惟一會產生錯誤的字符。 XML標準標識了應該用實體引用替換的五個字符,就像我在上面的例子中所作的那樣:
該示例提出的另外一個問題是子元素不必定要從一個父實例到另外一個父實例相同。你能夠看到,<Person>
元素的第一個實例只包含<FirstName>
和<LastName>
子元素,但<Person>
元素的第二個實例包含<FirstName>
和<LastName>
子元素以及<FavoriteBook>
元素。只要你的子元素造成良好,就能夠包括描述和定義數據所需的任何元素。
在這個級別,咱們已經看到組成一個XML文檔須要的主要組件。 元素做爲全部XML文檔的基本構建塊,每一個元素由開始標籤和結束標籤劃定,數據值自己被包含在這兩個標籤之間。 元素能夠彼此嵌入,可是一個元素 - 根 - 必須做爲XML文檔中全部其餘元素的父元素。 元素還能夠包括被定義爲元素的開始標籤的一部分的屬性。
咱們已經儘量方便地瞭解瞭如何組織一個XML文檔,但本課程的目的並非培養如何建立這些類型的文檔,而是提供XML的介紹以便你們能更有效地在SQL Server中使用XML工做 。 在下一節,咱們將介紹如何在SQL Server中實現XML數據類型,以及如何將其分配給列和變量以便存儲XML文檔和XML片斷。
原文
By Rob Sheldon, 2014/03/26 (first published: 2012/09/20)
This article is part of the Stairway Series: Stairway to >XML
XML has been part of the SQL Standard since 2003, and it is also essential for any DBA because so many of the dynamic management views return XML data. Now that the industry is more used to data defined by document markup, it is becoming more important than ever for Database Developers and DBAs to understand the technology and to know where it makes sense to use XML. In this series of articles, Robert Sheldon flexes his talent to make the complicated seem simple. Note: This series of articles is now available as an eBook.
Support for the eXtensible Markup Language (XML) was first introduced in SQL Server with the release of SQL Server 2000. However, XML-related features were limited to data management capabilities that focused on mapping relational and XML data. For example, SQL Server 2000 added the FOR XML clause, which lets you return relational query results as XML.
However, it wasn’t until the release of SQL Server 2005—when the XML data type was added—that support for XML got interesting. The XML data type lets you natively store XML documents in columns and variables configured with that type. The data type also supports a set of methods you can use to retrieve and modify specific components within the XML document.
To take full advantage of the XML-related features supported in SQL Server, you might find it useful to have a fundamental understanding of XML itself. To that end, this first Level of the Stairway to XML series explains what XML is and describes the various components that make up an XML document.
Similar to the HyperText Markup Language (HTML), XML is a markup language that uses tags to delineate and describe the nature of the data associated with those tags. What makes XML extensible is its self-describing nature, that is, you create tags that are specific to the data values contained in the XML document. In HTML, those tags are pre-defined. (XML’s extensible nature will become clearer as we work through the XML components.)
Despite its extensibility, XML is still a standardized language that must conform to a specific set of formatting rules, as defined by the World Wide Web Consortium (W3C). Because of this standardization, the language has been widely adopted in order to transport and store data, unlike HTML, which is used to display data. XML makes it possible to easily share data among heterogeneous systems, regardless of hardware, operating system, or application type, and XML’s universal adoption means that data can be processed with little human intervention. At the same time, you can control how the data is described, while also controlling how the data is ordered and displayed.
The primary components that make up an XML document—and the rules that govern the use of >those components—are generally very straightforward, but you must adhere strictly to these rules in order for an XML document to be properly processed by the SQL Server XML parser.
There are primarily two types of information included in an XML document: the data to be stored and the tags that describe the data. A tag is made up of a set of angle brackets (< >) that enclose a descriptive word or compound word (no spaces) that describes the data associated with the tag. It’s because of the self-describing nature of these tags that XML is often considered a meta-language.
Each discrete piece of stored data is enclosed in an opening tag and a closing tag, as shown in the following example:
<Person>John Doe</Person>
In this case, the opening tag is <Person>
, and the closing tag is </Person>
. Notice that a forward slash precedes the tag description in the end tag. A forward slash must precede all end tags, but the language of the tag must be the same as the opening tag, which in the example above is Person
. I could have chosen a name other than Person
, including a name that has nothing to do with people, but a good practice is to always provide tag names that best describe the data enclosed in the opening and closing tags. In this case, the tags are describing the name of a person, John Doe
, thus the tag name <Person>
.
Together, the tags and enclosed data represent a single element. However, an element does not always have to contain data. An empty element can be rendered in one of two ways. The first is by specifying the opening and closing tags, but including no data, as I do in the following example:
<Person></Person>
Another way to represent an empty element is use only one set of brackets, but still include the forward slash:
<Person />
Again, this method can be used only when an element contains no value. As you’ll see later in the Stairway to XML series, a schema might require an element for which there is no value. In that case, you can use the shortened format to represent the both tags of the element.
Whether or not an element contains a value, whenever both tags are used, the opening and closing tags must match exactly, down to the capitalization (except for the forward slash in the closing tag). For instance, the following element generates an error in the SQL Server XML parser because the case is different between the two tags:
<person>John Doe</Person>
The descriptive word in the opening tag is all lowercase; however, the descriptive word in the closing tag begins with a capital letter. The opening and closing tags must match to be considered proper, or well formed, XML.
You can, however, embed elements within each other. In the following example, I embed two instances of the <Person>
element within the <People>
element:
<People> <Person>John Doe</Person> <Person>Jane Doe</Person> </People>
Notice that each <Person>
element is complete in itself. It includes the opening and closing tags and the data they enclose. Elements embedded in other elements are referred to as child elements or, in some cases, subelements. The outer element, in this case, <People>
, is the parent element. The parent element at the highest level of an XML document is considered the root element. All XML documents must have one, and only one, root element. So the <People>
element in the example above is the parent element to the two <Person>
elements, and it is the root element for the XML document.
SQL Server also permits you to store XML fragments in an XML
column or variable. A fragment is a chunk of XML code without a root element, such as the two elements shown in the following example:
<Person>John Doe</Person> <Place>Seattle, WA</Place>
The elements must still be well formed XML, that is, have matching tags that enclose the data, but they don’t have to be an XML document. As you’ll see later in the Stairway to XML series, you can specify that only XML documents be permitted in an XML
column or variable, but for now just know that SQL Server distinguishes between XML documents and fragments and can store both.
When you embed elements within other elements, you must ensure that the child elements are complete before you end the parent element. For instance, in the following example, I end the <People>
element before the <Person>
element, which causes the SQL Server XML parser to generate an error:
<People><Person>John Doe</People></Person>
You must ensure that your child elements are complete no matter how many levels contain embedded elements. In this following example, the <FirstName>
and <LastName>
elements are embedded in each <Person>
element, and the <Person>
elements are embedded in the <People>
element:
<People> <Person> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
In this case, the <Person>
elements act as both child and parent elements. Notice, however, that each embedded element, regardless of the level, falls completely within the opening and closing tags of the parent element. For example, the first instances of the <FirstName>
and <LastName>
elements fall completely within the first instance of the <Person>
element, and the two instances of the <Person>
elements fall completely within the <People>
element, which is the document’s root element.
Elements can also have attributes associated with them. An attribute is a property to which you can assign a value. The attribute is defined as part of the element’s opening tag. In the following example, I’ve added the id
attribute to each instance of the <Person>
element:
<People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
As the example demonstrates, an attribute consists of the attribute name (in this case, id
), followed by an equal sign and the attribute value, enclosed in double quotes. So the id
attribute for the first instance of the <Person>
element has a value of 1234
, and the id
attribute for the second instance of the <Person>
element has a value of 5678
.
Another component contained in many XML documents is the declaration, which at a minimum specifies the version of the XML standard that the document conforms to. To date, there are only two versions: 1.0 and 1.1. If using XML 1.0, the declaration is not necessary; however, XML 1.1 requires one. For that reason, you should be aware of how to include a declaration in your XML document.
If you include a declaration, you must place it at the beginning of the document, start the declaration with the <?
opening tag, and end it with the ?>
closing tag. In addition, you must include the xml
keyword (lowercase) and the version attribute (also lowercase). Another attribute commonly included, although optional, is encoding, which specifies the character encoding used for the XML document. In the following example, I include a declaration that specifies version
1.0 and an encoding
of UTF-8, which means the data is stored as a sequence of 8-bit Unicode characters:
<?xml version="1.0" encoding="UTF-8"?> <People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
You can also add comments to your XML documents. To do so, simply precede the comment with the <!
-- tag and end it with the -->
tag, as I’ve done in the following example:
<?xml version="1.0" encoding="UTF-8"?> <!-- A list of current clients --> <People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Person> </People>
As you can see, I’ve added the comment A list of current clients
, which I’ve enclosed in the comment tags. The SQL Server XML parser will ignore anything within the tags, so you can use the commenting feature not only to provide information about the XML document and its data, but also to preserve parts of the XML code that you want to hang on to but you don’t want to have processed as part of the document.
Another consideration when working with XML is that certain characters cannot be parsed when they appear in element values. For example, you cannot include an ampersand (&) in an element’s value, as I’ve done in the <FavoriteBook>
child element in the following example:
<?xml version="1.0" encoding="UTF-8"?> <!-- A list of current clients --> <People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> <FavoriteBook>Crime & Punishment</FavoriteBook> </Person> </People>
If I try to assign this XML
document to an XML column or variable, the <FavoriteBook>
child element will cause the parser to generate an error because the value Crime & Punishment
includes the ampersand. You must replace this type of character with an entity reference that tells the parser to preserve the character as it is originally intended. An entity reference begins with an ampersand and ends with a semi-colon and in between includes a multi-character code that represents the original value. For an ampersand, the entity reference should be &
;, which I use in the following example:
<?xml version="1.0" encoding="UTF-8"?> <!-- A list of current clients --> <People> <Person id="1234"> <FirstName>John</FirstName> <LastName>Doe</LastName> </Person> <Person id="5678"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> <FavoriteBook>Crime & Punishment</FavoriteBook> </Person> </People>
Notice that I’ve replaced the ampersand with the &
; entity reference. Now the XML parser will handle the <FavoriteBook>
element with no problem. But note that the ampersand is not the only character that will generate an error. The XML standard identifies five characters that should be replaced with entity references, as I’ve done in the above example:
Another issue that the example raises is the fact that the child elements do not have to be the same from one parent instance to the next. As you can see, the first instance of the <Person>
element contains only the <FirstName>
and <LastName>
child elements, but the second instance of the <Person>
element contains the <FirstName>
and <LastName>
child elements, as well as the <FavoriteBook>
element. As long as your child elements are well formed, you can include whatever elements necessary to delineate and define your data.
In this Level, we’ve looked at the primary components that make up an XML document. Elements serve as the basic building blocks for all XML documents, with each element being delineated by an opening tag and a closing tag and the data value itself being enclosed between those two tags. Elements can be embedded within each other, but one element—the root—must act as the parent to all other elements in an XML document. An element can also include attributes, which are defined as part of an element’s opening tag.
As handy as it might be to know how to put together an XML document, the purpose of this Level has not been to train you in how to create these types of documents, but rather to provide an introduction to XML so you can more effectively work with XML in SQL Server. In the next Level, we’ll look at how the XML data type is implemented in SQL Server and how it can be assigned to columns and variables in order to store both XML documents and XML fragments.