StreamWriter結合UTF-8編碼使用不當,會形成BOM(Byte Order Mark )問題生成亂碼(轉載)

問:git


 

I was using HttpWebRequest to try a rest api in ASP.NET Core MVC.
Here is my HttpWebRequest client code:github

HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://localhost:55161/Home/Testing");

string data;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
using (StreamReader reader = new StreamReader(resp.GetResponseStream(), System.Text.Encoding.UTF8))
{
    data = reader.ReadToEnd();
}

If I used StreamWriter to write a message to Response.Body in an ASP.NET Core controller, everything is fine:api

using (var streamWriter = new StreamWriter(Response.Body, System.Text.Encoding.UTF8))
{
    streamWriter.Write("hello");
    streamWriter.Flush();
}

But if I used Response.Body.Write embedded in a StreamWriter block to write the same message, there will be a weird 65279 character in the end of the string "hello" when I got it from my client code.app

using (var streamWriter = new StreamWriter(Response.Body, System.Text.Encoding.UTF8))
{
    byte[] data = System.Text.Encoding.UTF8.GetBytes("hello");
    Response.Body.Write(data, 0, data.Length);
}

I want to know if this is a bug or any mechanism caused this problem?
I didn't use UseBrowserLink in startup and my ASP.NET Core version is 2.1less

 

 

答:ide


there will be a weird 65279 character in the end of the string "hello" when I got it from my client codethis

You mean something like this?編碼

hello

This is expected based on the code you provided. Why are you wrapping the stream in a StreamWriter, then writing to the stream directly?spa

The StreamWriter has a buffer that it will flush to the output when you close it. This will cause a lot of problems if you've been writing to the stream directly. Specifically, what's happening here is this:rest

  1. You wrap the Stream in the StreamWriter
  2. You write hello directly to the stream
  3. The StreamWriter is closed, so it flushes it's (empty) buffer.
  4. Since you are using the Encoding.UTF8 encoding, the StreamWriter writes a UTF-8 Byte Order Mark (the sequence 0xEF 0xBB 0xBF which appears as  unless it's at the very beginning of the stream) to the stream. Since you've already written hello, this appears after your hello, causing the rendering glitch above.

 

所以咱們能夠看到,在使用StreamWriter的時候,千萬不要又用代碼直接往StreamWriter底層的Stream對象(本例中是Response.Body)寫入數據,由於這頗有可能會致使StreamWriter錯誤地將UTF-8編碼的BOM(Byte Order Mark)加到了你寫入數據的後面,而UTF-8編碼的BOM(Byte Order Mark)只可以出如今一個Stream最開頭才能被正確地識別,不然會被識別爲亂碼,如同本例中的hello同樣。

 

 

原文連接

相關文章
相關標籤/搜索