加載超過100M的xml文件時(可能不是很常見),XmlDocument這種所有加載到內存裏的模式就有點不友好了,耗時長、內存高。node
這時用xmlreader就會有自行車換超跑的感受,但其間遇到幾個坑,記錄一下。web
先看源碼,包括dom和sax兩種模式的讀取和寫入dom
DOM模式:ui
1 /// <summary> 2 /// dom模式建立xml文件 3 /// </summary> 4 /// <param name="path"></param> 5 public void CreateXml_Dom(string path) 6 { 7 XmlDocument xmlDocw = new XmlDocument(); 8 //xml頭 9 var xmldecl = xmlDocw.CreateXmlDeclaration("1.0", "utf-8", null); 10 var root = xmlDocw.CreateElement("root"); 11 root.SetAttribute("Name", "李四"); 12 var test = xmlDocw.CreateElement("test"); 13 root.AppendChild(test); 14 15 xmlDocw.AppendChild(xmldecl); 16 xmlDocw.AppendChild(root); 17 xmlDocw.Save(path); 18 19 //能夠經過xmlreader讀數據後生成節點 20 //var node = xmlDocw.ReadNode(rdr); 21 //root.AppendChild(node); 22 //或者讀取outerxml後做爲innerxml寫入 23 //string str = rdr.ReadOuterXml(); 24 //root.InnerXml = str; 25 } 26 27 /// <summary> 28 /// dom模式讀取xml 29 /// </summary> 30 /// <param name="path"></param> 31 public void ReadXml_Dom(string path) 32 { 33 XmlDocument xmlDocr = new XmlDocument(); 34 xmlDocr.Load(path); 35 var root = xmlDocr.DocumentElement; 36 string str = root.GetAttribute("Name"); 37 Console.WriteLine(str); 38 }
SAX(simple API for XML)模式:幾種錯誤也都用註釋標註出來了編碼
1 /// <summary> 2 /// xmlwriter建立xml文件 3 /// </summary> 4 /// <param name="path"></param> 5 public void CreateXml_Sax(string path) 6 { 7 //filestream沒問題 8 //FileStream stream = new FileStream(path,FileMode.Create); 9 //會出現編碼一直是utf-16問題 10 //StringBuilder stream = new StringBuilder(); 11 MemoryStream stream = new MemoryStream(); 12 XmlWriterSettings settings = new XmlWriterSettings(); 13 //Encoding.UTF8這個會報錯,字節順序標記 14 settings.Encoding = new UTF8Encoding(false); 15 XmlWriter xw = XmlWriter.Create(stream, settings); 16 //XmlTextWriter xw = new XmlTextWriter(stream, new UTF8Encoding(false)); 17 18 //寫入聲明 19 xw.WriteStartDocument(); 20 21 xw.WriteStartElement("root"); 22 xw.WriteAttributeString("Name", "張三"); 23 //能夠經過xmlreader讀數據後直接寫入 24 //xw.WriteNode(rdr); 25 xw.WriteStartElement("test"); 26 xw.WriteEndElement(); 27 28 xw.WriteEndElement(); 29 30 xw.WriteEndDocument(); 31 xw.Close(); 32 33 string xmlstr = Encoding.UTF8.GetString(stream.ToArray()); 34 stream.Close(); 35 XmlDocument xmlDocw = new XmlDocument(); 36 xmlDocw.LoadXml(xmlstr); 37 xmlDocw.Save(path); 38 } 39 40 /// <summary> 41 /// xmlreader讀取xml 42 /// </summary> 43 /// <param name="path"></param> 44 public void ReadXml_Sax(string path) 45 { 46 XmlDocument xmlDocw = new XmlDocument(); 47 XmlReaderSettings rsettings = new XmlReaderSettings(); 48 rsettings.IgnoreComments = true; 49 rsettings.IgnoreWhitespace = false; 50 rsettings.CheckCharacters = false; 51 //默認的xmlreader不讀取內容中的回車換行\r\n 52 //(XmlReader rdr = XmlReader.Create(path,rsettings)) 53 using (XmlTextReader rdr = new XmlTextReader(path)) 54 { 55 rdr.WhitespaceHandling = WhitespaceHandling.Significant; 56 string eleName = ""; 57 while (rdr.Read()) 58 { 59 if (rdr.NodeType == XmlNodeType.Element) 60 { 61 //節點名稱 62 eleName = rdr.Name; 63 //節點深度 64 int dp = rdr.Depth; 65 //是否空節點,表示<elememt/> 不是<element></element> 66 bool needend = rdr.IsEmptyElement; 67 for (int i = 0; i < rdr.AttributeCount; i++) 68 { 69 rdr.MoveToAttribute(i); 70 Console.WriteLine(rdr.Name+":"+rdr.Value); 71 } 72 //能夠直接讀取節點全部的數據.能夠用readNode讀取 73 //rdr.EOF斷定,否則會跳過節點 74 //rdr.ReadOuterXml(); 75 } 76 else if (rdr.NodeType == XmlNodeType.EndElement) 77 { 78 eleName = rdr.Name; 79 } 80 } 81 } 82 }
xmlreader和xmldocument(xmlwriter)組合一塊兒用對大型xml進行拆分讀取,十分有效。 spa
下面是遇到的問題:code
1.xmlwriter後xml文件頭始終是utf-16orm
這是用StringBuilder纔會有的問題,改用FileStream、MemoryStream等就行了。xml
2.(UTF8)改用MemoryStream後,造成的xml字符串經過XMLDocument.LoadXml時報錯blog
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.UTF8;
最終發現默認的Encoding.UTF8是帶有字節順序標記的,要用new UTF8Encoding(false);
經過監視區代碼能夠看到,xmlstr[0]是65279,修改後就對了變成60'<'。
3.xmlreader默認不讀取內容中的回車換行,讀進來就是個空格。
第二個直接回車換行就是讀不進來,用xmldocument能夠讀到兩個,xmlreader就是讀取不到。
期間一直在找設置,好比IgnoreWhitespace等,發現都沒有用,仍是不讀。
XmlReaderSettings rsettings = new XmlReaderSettings();
rsettings.IgnoreWhitespace = false;
最後在stackoverflow上找到答案(注1),不能用XmlReader rdr = XmlReader.Create(path),用XmlTextReader就行了。
注1:不讀回車換行問題 https://stackoverflow.com/questions/1793908/xmlreader-newline-n-instead-of-r-n
This is because the XmlTextReader has a normalization setting defaulted to false unlike XmlReader.Create which always normalizes newlines no matter what.
本文爲原創,轉載請註明:https://www.cnblogs.com/zhanglb163/