JAVA中URI、URL和URN的概念

時間 2019-11-20

標籤 java uri url urn 概念欄目 Java 简体版

原文原文鏈接

URI

URI = Uniform Resource Identifier

There are two types of URIs: URLs andURNs.

See RFC 1630: Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the WWW.

URL

URL = Uniform Resource Locator

See RFC 1738: Uniform Resource Locators (URL)

URN

URN = Uniform Resource Name.

URI 、 URL 和 URN 是彼此關聯的。 URI 位於頂層， URL 和 URN 的範疇位於底層。 URL 和 URN 都是 URI 的子範疇。

URI 翻譯爲統一資源標識，它是以某種標準化的方式標識資源的字符串。這種字符串以 scheme 開頭，語法以下：

[scheme:] scheme-specific-part

URI 以 scheme 和冒號開頭。冒號把 scheme 與 scheme-specific-part 分開，而且 scheme-specific-part 的語法由 URI 的 scheme 決定。例如 http://www.cnn.com ，其中 http 是 scheme ， //www.cnn.com 是 scheme-specific-part 。

URI 分爲絕對（ absolute ）或相對（ relative ）兩類。絕對 URI 指以 scheme （後面跟着冒號）開頭的 URI 。前面提到的 http://www.cnn.com 就是絕對的 URI 的一個例子，其它的例子還有 mailto:jeff@javajeff.com 、 news:comp.lang.java.help 和 xyz://whatever 。能夠把絕對 URI 看做是以某種方式引用某種資源，而對環境沒有依賴。若是使用文件系統做類比，絕對 URI 相似於從根目錄開始的某個文件的路徑。相對 URI 不以 scheme 開始，一個例子是 articles/articles.html 。能夠把相對 URI 看做是以某種方式引用某種資源，而這種方式依賴於標識符出現的環境。若是用文件系統做類比，相對 URI 相似於從當前目錄開始的文件路徑。

URI 能夠進一步分爲不透明的（ opaque ）和分層（ hierarchical ）的兩類。不透明的 URI 指 scheme-specific-part 不是以‘ / ’開頭的絕對的 URI 。其例子有 news:comp.lang.java 和前面的 mailto:jeff@javajeff.com 。不透明的 URI 不能作進一步的解析，不須要驗證 scheme-specific-part 的有效性。與它不一樣的是，分層的 URI 是以‘ / ’開頭的絕對的 URI 或相對的 URL 。分層的 URI 的 scheme-specific-part 必須被分解爲幾個組成部分。分層的 URI 的 scheme-specific-part 必須符合下面的語法：

[//authority] [path] [?query] [#fragment]

可選的受權機構（ authority ）標識了該 URI 名字空間的命名機構。若是有這一部分則以‘ // ’開始。它能夠是基於服務器或基於受權機構的。基於受權機構有特定的語法（本文沒有討論，由於不多使用它），而基於服務器的語法以下：

[userinfo@] host [:port]

基於服務器的 authority 以用戶信息（例如用戶名）開始，後面跟着一個 @ 符號，緊接着是主機的名稱，以及冒號和端口號。例如 jeff@x.com:90 就是一個基於服務器的 authority ，其中 jeff 爲用戶信息， x.com 爲主機， 90 爲端口。

可選的 path 根據 authority （若是提供了）或 schema （若是沒有 authority ）定義資源的位置。路徑（ path ）能夠分紅一系列的路徑片段（ path segment ），每一個路徑片段使用‘ / ’與其它片段隔開。若是第一個路徑片段以‘ / ’開始，該路徑就被認爲是絕對的，不然路徑就被認爲是相對的。例如， /a/b/c 由三個路徑片段 a 、 b 和 c 組成，此外這個路徑是絕對的，由於第一個路徑片段（ a ）的前綴是‘ / ’。

可選的 query 定義要傳遞給資源的查詢信息。資源使用該信息獲取或生成其它的的數據傳遞迴調用者。例如， http://www.somesite.net/a?x=y, x=y 就是一個 query ，在這個查詢中 x 是某種實體的名稱， y 是該實體的值。

最後一個部分是 fragment 。當使用 URI 進行某種檢索操做時，後面執行操做的軟件使用 fragment 聚焦於軟件感興趣的資源部分。

分析一個例子 ftp://george@x.com:90/public/notes?text=shakespeare#hamlet

上面的 URI 把 ftp 識別爲 schema ，把 george@x.com:90 識別爲基於服務器的 authority （其中 george 是用戶信息， x.com 是主機， 90 是端口），把 /public/notes 識別爲路徑，把 text=shakespeare 識別爲查詢，把 hamlet 識別爲片段。本質上它是一個叫作 george 的用戶但願經過 /public/notes 路徑在服務器 x.com 的 90 端口上檢索 shakespeare 文本的 hamlet 信息。

URI 的標準化（ normalize ）

標準化能夠經過目錄術語來理解。假定目錄 x 直接位於根目錄之下， x 有子目錄 a 和 b ， b 有文件 memo.txt ， a 是當前目錄。爲了顯示 memo.txt 中的內容，你可能輸入 type \x\.\b\memo.txt 。你也可能輸入 type \x\a\..\b\memo.txt ，在這種狀況下， a 和 .. 的出現是沒有必要的。這兩種形式都不是最簡單的。可是若是輸入 \x\b\memo.txt ，你就指定了最簡單的路徑了，從根目錄開始訪問 memo.txt 。最簡單的 \x\b\memo.txt 路徑就是標準化的路徑。

一般經過 base + relative URI 訪問資源。 Base URI 是絕對 URI ，而 Relative URI 標識了與 Base URI 相對的資源。所以有必要把兩種 URI 經過解析過程合併，相反地從合併的 URI 中提取 Relative URI 也是可行的。

假定把 x://a/ 做爲 Base URI ，並把 b/c 做爲 Relative URI 。 Resolve 這個相對 URI 將產生 x://a/b/c 。根據 x://a/ 相對化（ Relative ） x://a/b/c 將產生 b/c 。

URI 不能讀取 / 寫入資源，這是統一的資源定位器（ URL ）的任務。 URL 是一種 URI ，它的 schema 是已知的網絡協議，而且它把 URI 與某種協議處理程序聯繫起來（一種與資源通信的讀 / 寫機制）。

URI 通常不能爲資源提供持久不變的名稱。這是統一的資源命名（ URN ）的任務。 URN 也是一種 URI ，可是全球惟一的、持久不便的，即便資源再也不存在或再也不使用。

使用 URI

Java API 經過提供 URI 類（位於 java.net 包中），使咱們在代碼中使用 URI 成爲可能。 URI 的構造函數創建 URI 對象，而且分析 URI 字符串，提取 URI 組件。 URI 的方法提供了以下功能： 1 ）決定 URI 對象的 URI 是絕對的仍是相對的； 2 ）決定 URI 對象是 opaque 仍是 hierarchical ； 3 ）比較兩個 URI 對象； 4 ）標準化（ normalize ） URI 對象； 5 ）根據 Base URI 解析某個 Relative URI ； 6 ）根據 Base URI 計算某個 URI 的相對 URI ； 7 ）把 URI 對象轉換爲 URL 對象。

在 URI 裏面有多個構造函數，最簡單的是 URI(String uri) 。這個構造函數把 String 類型的參數 URI 分解爲組件，並把這些組件存儲在新的 URI 對象中。若是 String 對象的 URI 違反了 RFC 2396 的語法規則，將會產生一個 java.net.URISyntaxException 。

下面的代碼演示了使用 URI(String uri) 創建 URI 對象：

URI uri = new URI ("http://www.cnn.com");

若是知道 URI 是有效的，不會產生 URISyntaxException ，可使用靜態的 create(String uri) 方法。這個方法分解 uri ，若是沒有違反語法規則就創建 URI 對象，不然將捕捉到一個內部 URISyntaxException ，並把該對象包裝在一個 IllegalArgumentException 中拋出。

下面的代碼片段演示了 create(String uri) ：

URI uri = URI.create ("http://www.cnn.com");

URI 構造函數和 create(String uri) 方法試圖分解出 URI 的 authority 的用戶信息、主機和端口部分。對於正確形式的字符串會成功，對於錯誤形式的字符串，他們將會失敗。若是想確認某個 URI 的 authority 是基於服務器的，而且能分解出用戶信息、主機和端口，這時候能夠調用 URI 的 parseServerAuthority() 方法。若是成功分解出 URI ，該方法將返回包含用戶信息、主機和端口部分的新 URI 對象，不然該方法將產生一個 URISyntaxException 。

下面的代碼片段演示了 parseServerAuthority() ：

// 下面的 parseServerAuthority() 調用出現後會發生什麼狀況？
URI uri = new URI ("//foo:bar").parseServerAuthority();

一旦擁有了 URI 對象，你就能夠經過調用 getAuthority() 、 getFragment() 、 getHost() 、 getPath() 、 getPort() 、 getQuery() 、 getScheme() 、 getSchemeSpecificPart() 和 getUserInfo() 方法提取信息。以及 isAbsolute() 、 isOpaque() 等方法。

程序 1: URIDemo1.java

import java.net.*;

public class URIDemo1 {
public static void main (String [] args) throws Exception {
    if (args.length != 1) {
      System.err.println ("usage: java URIDemo1 uri");
      return;
    }
    URI uri = new URI (args [0]);

    System.out.println ("Authority = " +uri.getAuthority ());
    System.out.println ("Fragment = " +uri.getFragment ());
    System.out.println ("Host = " +uri.getHost ());
    System.out.println ("Path = " +uri.getPath ());
    System.out.println ("Port = " +uri.getPort ());
    System.out.println ("Query = " +uri.getQuery ());
    System.out.println ("Scheme = " +uri.getScheme ());
    System.out.println ("Scheme-specific part = " + uri.getSchemeSpecificPart ());
    System.out.println ("User Info = " +uri.getUserInfo ());
    System.out.println ("URI is absolute: " +uri.isAbsolute ());
    System.out.println ("URI is opaque: " +uri.isOpaque ());
}
}

輸入 java URIDemo1 命令後，輸出結果以下：

query://jeff@books.com:9000/public/manuals/appliances?stove#ge
Authority = jeff@books.com:9000
Fragment = ge
Host = books.com
Path = /public/manuals/appliances
Port = 9000
Query = stove
Scheme = query
Scheme-specific part = //jeff@books.com:9000/public/manuals/appliances?stove
User Info = jeff
URI is absolute: true
URI is opaque: false

URI 類支持基本的操做，包括標準化（ normalize ）、分解（ resolution ）和相對化（ relativize ）。下例演示了 normalize() 方法。

程序 2: URIDemo2.java

import java.net.*;

class URIDemo2 {
public static void main (String [] args) throws Exception {
    if (args.length != 1) {
      System.err.println ("usage: java URIDemo2 uri");
      return;
    }
    URI uri = new URI (args [0]);
    System.out.println ("Normalized URI = " + uri.normalize());
}
}

在命令行輸入 java URIDemo2 x/y/../z/./q ，將看到下面的輸出：
Normalized URI = x/z/q

上面的輸出顯示 y 、 .. 和 . 消失了。

URI 經過提供 resolve(String uri) 、 resolve(URI uri) 和 relativize(URI uri) 方法支持反向解析和相對化操做。若是指定的 URI 違反了 RFC 2396 語法規則， resolve(String uri) 經過的內部的 create(String uri) 調用間接地產生一個 IllegalArgumentException 。下面的代碼演示了 resolve(String uri) 和 relativize(URI uri) 。

程序 3: URIDemo3.java

import java.net.*;

class URIDemo3 {
public static void main (String [] args) throws Exception {
    if (args.length != 2) {
      System.err.println ("usage: " + "java URIDemo3 uriBase uriRelative");
      return;
    }

    URI uriBase = new URI (args [0]);
    System.out.println ("Base URI = " +uriBase);

    URI uriRelative = new URI (args [1]);
    System.out.println ("Relative URI = " +uriRelative);

    URI uriResolved = uriBase.resolve (uriRelative);
    System.out.println ("Resolved URI = " +uriResolved);

    URI uriRelativized = uriBase.relativize (uriResolved);
    System.out.println ("Relativized URI = " +uriRelativized);
}
}

編譯 URIDemo3 後，在命令行輸入 java URIDemo3 http://www.somedomain.com/ x/../y ，輸出以下：

Base URI = http://www.somedomain.com/
Relative URI = x/../y
Resolved URI = http://www.somedomain.com/y
Relativized URI = y

使用 URL

Java 提供了 URL 類，每個 URL 對象都封裝了資源標識符和協議處理程序。得到 URL 對象的途徑之一是調用 URI 的 toURL() 方法，也能夠直接調用 URL 的構造函數來創建 URL 對象。

URL 類有多個構造函數。其中最簡單的是 URL(String url) ，它有一個 String 類型的參數。若是某個 URL 沒有包含協議處理程序或該 URL 的協議是未知的，其它的構造函數會產生一個 java.net.MalformedURLException 。
下面的代碼片段演示了使用 URL(String url) 創建一個 URL 對象，該對象封裝了一個簡單的 URL 組件和 http 協議處理程序。

URL url = new URL ("http://www.informit.com");

一旦擁有了 URL 對象，就可使用 getAuthority() 、 getDefaultPort() 、 getFile() 、 getHost() 、 getPath() 、 getPort() 、 getProtocol() 、 getQuery() 、 getRef() 、 getUserInfo() 、 getDefaultPort() 等方法提取各類組件。若是 URL 中沒有指定端口， getDefaultPort() 方法返回 URL 對象的協議默認端口。 getFile() 方法返回路徑和查詢組件的結合體。 getProtocol() 方法返回資源的鏈接類型（例如 http 、 mailto 、 ftp ）。 getRef() 方法返回 URL 的片段。最後， getUserInfo() 方法返回 Authority 的用戶信息部分。還能夠調用 openStream() 方法獲得 java.io.InputStream 引用。使用這種引用，能夠用面向字節的方式讀取資源。

下面是 URLDemo1 的代碼。該程序創建一個 URL 對象，調用 URL 的各類方法來檢索該 URL 的信息，調用 URL 的 openStream() 方法打開與資源的鏈接並讀取 / 打印這些字節。

程序 4: URLDemo1.java

import java.io.*;
import java.net.*;

class URLDemo1 {
public static void main (String [] args) throws IOException {
    if (args.length != 1) {
    System.err.println ("usage: java URLDemo1 url");
    return;
    }

    URL url = new URL (args [0]);

    System.out.println ("Authority = "+ url.getAuthority ());
    System.out.println ("Default port = " +url.getDefaultPort ());
    System.out.println ("File = " +url.getFile ());
    System.out.println ("Host = " +url.getHost ());
    System.out.println ("Path = " +url.getPath ());
    System.out.println ("Port = " +url.getPort ());
    System.out.println ("Protocol = " +url.getProtocol ());
    System.out.println ("Query = " +url.getQuery ());
    System.out.println ("Ref = " +url.getRef ());
    System.out.println ("User Info = " +url.getUserInfo ());

    System.out.print ('\n');

    InputStream is = url.openStream ();

    int ch;
    while ((ch = is.read ()) != -1) {
      System.out.print ((char) ch);
    }
    is.close ();
}
}

在命令行輸入 java URLDemo1 http://www.javajeff.com/articles/articles/html 後，上面的代碼的輸出以下：

Authority = http://www.javajeff.com
Default port = 80
File = /articles/articles.html
Host = http://www.javajeff.com
Path = /articles/articles.html
Port = -1
Protocol = http
Query = null
Ref = null
User Info = null

…

URL 的 openStream() 方法返回的 InputStream 類型，這意味着你必須按字節次序讀取資源數據，這種作法是恰當的，由於你不知道將要讀取的數據是什麼類型。若是你事先知道要讀取的數據是文本，而且每一行以換行符（ \n ）結束，你就能夠按行讀取而不是按字節讀取數據了。

下面的代碼片段演示了把一個 InputStream 對象包裝進 InputStreamReader 以從 8 位過渡到 16 位字符，進而把結果對象包裝進 BufferedReader 以調用其 readLine() 方法。

InputStream is = url.openStream ();
BufferedReader br = new BufferedReader (new InputStreamReader (is));
String line;
while ((line = br.readLine ()) != null) {

System.out.println (line);
}

is.close ();

有時候按字節的次序讀取數據並不方便。例如，若是資源是 JPEG 文件，那麼獲取一個圖像處理過程並向該過程註冊一個用戶使用數據的方法更好。若是出現這種狀況，你就有必要使用 getContent() 方法。

當調用 getContent() 方法時，它會返回某種對象的引用，而你能夠調用該對象的方法（在轉換成適當的類型後），採用更方便的方式取得數據。可是在調用該方法前，最好使用 instanceof 驗證對象的類型，防止類產生異常。

對於 JPEG 資源， getContent() 返回一個對象，該對象實現了 java.awt.Image.ImageProducer 接口。下面的代碼演示了使用如何 getContent() 。

URL url = new URL (args [0]);
Object o = url.getContent ();
if (o instanceof ImageProducer) {
ImageProducer ip = (ImageProducer) o;
// ...
}

查看一下 getContent() 方法的源代碼，你會找到 openConnection().getContent() 。 URL 的 openConnection() 方法返回一個 java.net.URLConnection 對象。 URLConnection 的方法反映了資源和鏈接的細節信息，使咱們能編寫代碼訪問資源。

下面的 URLDemo2 代碼演示了 openConnection() ，以及如何調用 URLConnection 的方法。

程序 5: URLDemo2.java

import java.io.*;
import java.net.*;
import java.util.*;

class URLDemo2 {
public static void main (String [] args) throws IOException {
    if (args.length != 1) {
      System.err.println ("usage: java URLDemo2 url");
      return;
    }

    URL url = new URL (args [0]);

    // 返回表明某個資源的鏈接的新的特定協議對象的引用
    URLConnection uc = url.openConnection ();

    // 進行鏈接
    uc.connect ();

    // 打印 header 的內容
    Map m = uc.getHeaderFields ();
    Iterator i = m.entrySet ().iterator ();
    while (i.hasNext ()) {
      System.out.println (i.next ());
    }
    // 檢查是否資源容許輸入和輸出操做
    System.out.println ("Input allowed = " +uc.getDoInput ());
    System.out.println ("Output allowed = " +uc.getDoOutput ());
}
}

URLConnection 的 getHeaderFields() 方法返回一個 java.util.Map 。該 map 包含 header 名稱和值的集合。 header 是基於文本的名稱 / 值對，它識別資源數據的類型、數據的長度等等。

編譯 URLDemo2 後，在命令行輸入 java URLDemo2 http://www.javajeff.com ，輸出以下：

Date=[Sun, 17 Feb 2002 17:49:32 GMT]
Connection=[Keep-Alive]
Content-Type=[text/html; charset=iso-8859-1]
Accept-Ranges=[bytes]
Content-Length=[7214]
null=[HTTP/1.1 200 OK]
ETag=["4470e-1c2e-3bf29d5a"]
Keep-Alive=[timeout=15, max=100]
Server=[Apache/1.3.19 (Unix) Debian/GNU]
Last-Modified=[Wed, 14 Nov 2001 16:35:38 GMT]
Input allowed = true
Output allowed = false

仔細看一下前面的輸出，會看到叫作 Content-Type 的東西。 Content-Type 識別了資源數據的類型是 text/html 。 text 部分叫作類型， html 部分叫作子類型。若是內容是普通的文本， Content-Type 的值多是 text/plain 。 text/html 代表內容是文本的可是 html 格式的。

Content-Type 是多用途 Internet 郵件擴展（ MIME ）的一部分。 MIME 是傳統的傳輸消息的 7 位 ASCII 標準的一種擴展。經過引入了多種 header ， MIME 使視頻、聲音、圖像、不一樣字符集的文本與 7 位 ASCII 結合起來。當使用 URLConnection 類的時候，你會遇到 getContentType() 和 getContentLength() 。這些方法返回的值是 Content-Type 和 Content