繼續上一章節的亂碼問題。上一篇文章僅僅說了設置Tomcat的URIEncoding能夠解決亂碼問題,這篇文章便會講述這一背後的內容。首先說明下,光看是沒用的,要多實驗實驗。
目前個人tomcat版本爲:7.0.55,spring全部文章的版本始終爲4.0.5
本文章會從tomcat的源碼角度來解析Tomcat的兩個參數設置URIEncoding和useBodyEncodingForURI。
對於一個請求,經常使用的有兩種編碼方式,以下: html
Java代碼 java
- <!DOCTYPE html>
- <html>
- <head>
- <meta charset="utf-8" />
- <title></title>
- </head>
- <body>
- <form action="http://127.0.0.1:8080/string?name=中國" method="post">
- <input type="text" name="user" value="張三"/>
- <input type="submit" value="提交"/>
- </form>
- </body>
- </html>
首先說說結論:
上述請求有兩處含有中文,一處是請求參數中,即?name='中國',另外一處是請求體中,即user='張三'。對於這兩處tomcat7是分兩種編碼方式的。URIEncoding就是針對請求參數的編碼設置的,而filter的request.setCharacterEncoding('UTF-8')或者請求header中的content-type中的編碼都是針對請求體的。不要把他們搞混了。
useBodyEncodingForURI=true是說,請求參數的編碼方式要採用請求體的編碼方式。當useBodyEncodingForURI=true時,若請求體採用utf-8解析,則請求參數也要採用utf-8來解析。這兩個屬性值的設置在tomcat的conf/server.xml文件中配置,以下: web
Java代碼 spring
- <Service name="Catalina">
-
- <!--The connectors can use a shared executor, you can define one or more named thread pools-->
- <!--
- <Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
- maxThreads="150" minSpareThreads="4"/>
- -->
-
-
- <!-- A "Connector" represents an endpoint by which requests are received
- and responses are returned. Documentation at :
- Java HTTP Connector: /docs/config/http.html (blocking & non-blocking)
- Java AJP Connector: /docs/config/ajp.html
- APR (HTTP/AJP) Connector: /docs/apr.html
- Define a non-SSL HTTP/1.1 Connector on port 8080
- -->
- <Connector port="8080" protocol="HTTP/1.1"
- connectionTimeout="20000"
- redirectPort="8443" useBodyEncodingForURI='true' URIEncoding='UTF-8' />
- <!-- A "Connector" using the shared thread pool-->
這樣寫只是說明這二者的配置位置,並非兩個屬性要同時配置,不要理解錯了。
繼續說說CharacterEncodingFilter的做用。
使用方式,將以下代碼加入web.xml文件中: apache
Java代碼 數組
- <filter>
- <filter-name>encoding</filter-name>
- <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
- <init-param>
- <param-name>encoding</param-name>
- <param-value>UTF-8</param-value>
- </init-param>
- <init-param>
- <param-name>forceEncoding</param-name>
- <param-value>true</param-value>
- </init-param>
- </filter>
-
- <filter-mapping>
- <filter-name>encoding</filter-name>
- <url-pattern>/*</url-pattern>
- </filter-mapping>
做用是,當forceEncoding爲false的前提下(默認爲false),當request沒有指定content-type或content-type不含編碼時,該filter將會爲這個request設定請求體的編碼爲filter的encoding值。
當forceEncoding爲true的前提下,就會爲request的請求體和response都設定爲這個filter的encoding值。
CharacterEncodingFilter源碼以下: tomcat
Java代碼 app
- public class CharacterEncodingFilter extends OncePerRequestFilter {
-
- private String encoding;
-
- private boolean forceEncoding = false;
-
-
- /**
- * Set the encoding to use for requests. This encoding will be passed into a
- * {@link javax.servlet.http.HttpServletRequest#setCharacterEncoding} call.
- * <p>Whether this encoding will override existing request encodings
- * (and whether it will be applied as default response encoding as well)
- * depends on the {@link #setForceEncoding "forceEncoding"} flag.
- */
- public void setEncoding(String encoding) {
- this.encoding = encoding;
- }
-
- /**
- * Set whether the configured {@link #setEncoding encoding} of this filter
- * is supposed to override existing request and response encodings.
- * <p>Default is "false", i.e. do not modify the encoding if
- * {@link javax.servlet.http.HttpServletRequest#getCharacterEncoding()}
- * returns a non-null value. Switch this to "true" to enforce the specified
- * encoding in any case, applying it as default response encoding as well.
- * <p>Note that the response encoding will only be set on Servlet 2.4+
- * containers, since Servlet 2.3 did not provide a facility for setting
- * a default response encoding.
- */
- public void setForceEncoding(boolean forceEncoding) {
- this.forceEncoding = forceEncoding;
- }
-
-
- @Override
- protected void doFilterInternal(
- HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
- throws ServletException, IOException {
-
- if (this.encoding != null && (this.forceEncoding || request.getCharacterEncoding() == null)) {
- request.setCharacterEncoding(this.encoding);
- if (this.forceEncoding) {
- response.setCharacterEncoding(this.encoding);
- }
- }
- filterChain.doFilter(request, response);
- }
-
- }
這個filter有兩個屬性,encoding和forceEncoding,咱們能夠在web.xml文件中來設定這兩個屬性值。
每次request請求到來執行方法doFilterInternal,首先調用request.getCharacterEncoding(),本質就是從請求header content-type中獲取編碼值,若是沒有,則調用request.setCharacterEncoding(this.encoding)將該filter的encoding值設置爲請求體的編碼方式,記住該編碼方式只對請求體,不針對請求參數。當forceEncoding=true時,無論請求header content-type有沒有編碼方式,始終將該filter的encoding值設置到request和response中,一樣只針對request的請求體。
以上的結論說完了,下面就要看看源代碼了。不想看的就算了不影響使用,想看看原理的請繼續:
首先是三個名詞:
org.apache.coyote.Request:這是一個最底層的request,包含各類參數信息。暫且稱爲coyoteRequest。
org.apache.catalina.connector.Request:實現了HttpServletRequest接口,稱它爲request,同時包含了一個coyoteRequest,一個connector,待會你就會發現這個connector的編碼傳遞做用。
org.apache.catalina.connector.RequestFacade:一樣實現了HttpServletRequest接口,它僅僅是一個裝飾類,稱它爲requestFacade,構造函數爲: ide
Java代碼 函數
- /**
- * Construct a wrapper for the specified request.
- *
- * @param request The request to be wrapped
- */
- public RequestFacade(Request request) {
-
- this.request = request;
-
- }
該構造函數將一個org.apache.catalina.connector.Request傳進來,requestFacade的工做全是靠它內部的org.apache.catalina.connector.Request來完成的,org.apache.catalina.connector.Request又是依據它所包含的org.apache.coyote.Request這個最底層的類來完成的。經過org.apache.catalina.connector.Request,咱們能夠設定org.apache.coyote.Request的一些工做方式,如經過什麼編碼來解析數據。
org.apache.coyote.Request含有的屬性:
String charEncoding:針對請求體的編碼(在第一次解析參數時會傳遞給Parameters的encoding)
Parameters :用於處理和存放請求參數和請求體參數的類
(1)含String encoding:針對請求體的編碼
(2)含String queryStringEncoding:針對請求參數的編碼
(3)含Map<String,ArrayList<String>> paramHashValues:存放解析後的參數
Parameters的兩個編碼是最最重要的編碼,直接參與解析數據的編碼,不像其餘對象的編碼大部分都是起傳遞做用,最終做用到Parameters的兩個編碼上
Java代碼
- public class MyCharacterEncodingFilter extends CharacterEncodingFilter{
-
- @Override
- protected void doFilterInternal(HttpServletRequest request,
- HttpServletResponse response, FilterChain filterChain)
- throws ServletException, IOException {
- request.setCharacterEncoding("UTF-8");
- String name=request.getParameter("user");
- System.out.println(name);
- request.setCharacterEncoding("UTF-8");
- String name1=request.getParameter("user");
- System.out.println(name1);
- super.doFilterInternal(request, response, filterChain);
- }
- }
傳給過濾器filter的HttpServletRequest request實際上是org.apache.catalina.connector.RequestFacade類型的,咱們看下獲取參數的具體過程:
requestFacade.getParameter("user")會傳遞到org.apache.catalina.connector.Request的相應方法,以下:
Java代碼
- public String getParameter(String name) {
-
- if (!parametersParsed) {
- parseParameters();
- }
-
- return coyoteRequest.getParameters().getParameter(name);
-
- }
parametersParsed是org.apache.catalina.connector.Request的屬性,用於標示是否已經解析過參數,若是解析過,便再也不解析,直接從coyoteRequest的Parameters參數中取出。因此當已經解析事後,你再去設置編碼,會無效的,由於它會直接返回第一次的解析結果。而且解析過程僅僅發生在第一次獲取參數的時候。
咱們來看下parseParameters()這個解析參數的過程:
Java代碼
- /**
- * Parse request parameters.
- */
- protected void parseParameters() {
-
- //解析發生後,便將是狀態置爲已解析
- parametersParsed = true;
-
- Parameters parameters = coyoteRequest.getParameters();
- boolean success = false;
- try {
- // Set this every time in case limit has been changed via JMX
- parameters.setLimit(getConnector().getMaxParameterCount());
-
- // getCharacterEncoding() may have been overridden to search for
- // hidden form field containing request encoding
- //重點1
- String enc = getCharacterEncoding();
- //重點2
- boolean useBodyEncodingForURI = connector.getUseBodyEncodingForURI();
- if (enc != null) {
- parameters.setEncoding(enc);
- if (useBodyEncodingForURI) {
- parameters.setQueryStringEncoding(enc);
- }
- } else {
- parameters.setEncoding
- (org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
- if (useBodyEncodingForURI) {
- parameters.setQueryStringEncoding
- (org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING);
- }
- }
- //重點3
- parameters.handleQueryParameters();
-
- if (usingInputStream || usingReader) {
- success = true;
- return;
- }
-
- if( !getConnector().isParseBodyMethod(getMethod()) ) {
- success = true;
- return;
- }
-
- String contentType = getContentType();
- if (contentType == null) {
- contentType = "";
- }
- int semicolon = contentType.indexOf(';');
- if (semicolon >= 0) {
- contentType = contentType.substring(0, semicolon).trim();
- } else {
- contentType = contentType.trim();
- }
-
- if ("multipart/form-data".equals(contentType)) {
- parseParts();
- success = true;
- return;
- }
-
- if (!("application/x-www-form-urlencoded".equals(contentType))) {
- success = true;
- return;
- }
-
- int len = getContentLength();
-
- if (len > 0) {
- int maxPostSize = connector.getMaxPostSize();
- if ((maxPostSize > 0) && (len > maxPostSize)) {
- if (context.getLogger().isDebugEnabled()) {
- context.getLogger().debug(
- sm.getString("coyoteRequest.postTooLarge"));
- }
- checkSwallowInput();
- return;
- }
- byte[] formData = null;
- if (len < CACHED_POST_LEN) {
- if (postData == null) {
- postData = new byte[CACHED_POST_LEN];
- }
- formData = postData;
- } else {
- formData = new byte[len];
- }
- try {
- if (readPostBody(formData, len) != len) {
- return;
- }
- } catch (IOException e) {
- // Client disconnect
- if (context.getLogger().isDebugEnabled()) {
- context.getLogger().debug(
- sm.getString("coyoteRequest.parseParameters"), e);
- }
- return;
- }
- //重點4
- parameters.processParameters(formData, 0, len);
- } else if ("chunked".equalsIgnoreCase(
- coyoteRequest.getHeader("transfer-encoding"))) {
- byte[] formData = null;
- try {
- formData = readChunkedPostBody();
- } catch (IOException e) {
- // Client disconnect or chunkedPostTooLarge error
- if (context.getLogger().isDebugEnabled()) {
- context.getLogger().debug(
- sm.getString("coyoteRequest.parseParameters"), e);
- }
- return;
- }
- if (formData != null) {
- parameters.processParameters(formData, 0, formData.length);
- }
- }
- success = true;
- } finally {
- if (!success) {
- parameters.setParseFailed(true);
- }
- }
-
- }
上面有四處咱們須要關注的重點。
重點1:getCharacterEncoding()實際上是經過底層的coyoteRequest來獲取header content-type中的編碼。
以下:
Java代碼
- /**
- * Return the character encoding for this Request.
- */
- @Override
- public String getCharacterEncoding() {
- return coyoteRequest.getCharacterEncoding();
- }
Java代碼
- public String getCharacterEncoding() {
-
- if (charEncoding != null)
- return charEncoding;
-
- charEncoding = ContentType.getCharsetFromContentType(getContentType());
- return charEncoding;
-
- }
若無,則返回空。
重點2:
boolean useBodyEncodingForURI = connector.getUseBodyEncodingForURI();這裏就是咱們在tomcat的server中配置的useBodyEncodingForURI屬性的值。
常量值org.apache.coyote.Constants.DEFAULT_CHARACTER_ENCODING="ISO-8859-1";
當重點1中的enc爲空時,則會設置底層coyoteRequest的parameters對象的encoding=s上述"ISO-8859-1",即請求體採用"ISO-8859-1"來解析。當useBodyEncodingForURI=true時,請求參數和請求體的編碼設置的都是同一個值,即"ISO-8859-1"。當useBodyEncodingForURI=false時,不改變queryStringEncoding即請求參數的編碼,queryStringEncoding默認是爲null的,當解析時遇見queryStringEncoding也會採用默認的編碼"ISO-8859-1",然而咱們能夠經過org.apache.catalina.connector.Request所包含的connector配置來給queryStringEncoding賦值。以下:
當你在tomcat的server.xml文件中加入URIEncoding="UTF-8"時,它將會爲queryStringEncoding賦值此值。
在tomcat的server.xml中配置此值
Java代碼
- <Service name="Catalina">
-
- <!--The connectors can use a shared executor, you can define one or more named thread pools-->
- <!--
- <Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
- maxThreads="150" minSpareThreads="4"/>
- -->
-
-
- <!-- A "Connector" represents an endpoint by which requests are received
- and responses are returned. Documentation at :
- Java HTTP Connector: /docs/config/http.html (blocking & non-blocking)
- Java AJP Connector: /docs/config/ajp.html
- APR (HTTP/AJP) Connector: /docs/apr.html
- Define a non-SSL HTTP/1.1 Connector on port 8080
- -->
- <Connector port="8080" protocol="HTTP/1.1"
- connectionTimeout="20000"
- redirectPort="8443" URIEncoding='UTF-8'/>
connector將這個值爲queryStringEncoding賦值的過程以下:
Java代碼
- public void log(org.apache.coyote.Request req,
- org.apache.coyote.Response res, long time) {
-
- Request request = (Request) req.getNote(ADAPTER_NOTES);
- Response response = (Response) res.getNote(ADAPTER_NOTES);
-
- if (request == null) {
- // Create objects
- request = connector.createRequest();
- request.setCoyoteRequest(req);
- response = connector.createResponse();
- response.setCoyoteResponse(res);
-
- // Link objects
- request.setResponse(response);
- response.setRequest(request);
-
- // Set as notes
- req.setNote(ADAPTER_NOTES, request);
- res.setNote(ADAPTER_NOTES, response);
-
- // Set query string encoding
- //重點重點重點重點重點重點重點重點重點重點重點重點重點重點重點
- req.getParameters().setQueryStringEncoding
- (connector.getURIEncoding());
- }
connector.getURIEncoding()即是咱們配置的URIEncoding值 req.getParameters().setQueryStringEncoding (connector.getURIEncoding()); 這句代碼即是將咱們在tomcat的server.xml文件中配置的URIEncoding值設置進最重要的Parameters的queryStringEncoding中。 當重點1中的enc不爲空時,爲Parameters請求體的的編碼encoding設置爲enc。 至此,Parameters的encoding和queryStringEncoding都有相應的值了,而後便按照對應的編碼來解析字節數組。 重點3和4:有個相應的編碼方式,分別執行請求參數的解析過程和請求體的解析過程。 總結下一些設置的做用: request.setCharacterEncoding(encoding) :暴漏給咱們的request爲requestFacade,最終調用request->調用coyoteRequest->設置到coyoteRequest的charEncoding,因此coyoteRequest的charEncoding有兩種來源,一種多是content-type中的編碼,另外一種就是調用request.setCharacterEncoding(encoding) 方法。此方法最好在第一次解析參數以前調用,否則就無效。 URIEncoding:直接設置Parameters的queryStringEncoding的值。即針對請求參數的編碼。 useBodyEncodingForURI:設置queryStringEncoding的值=encoding的值,即請求參數採用請求體的編碼方式。