使用任何語言作模擬登錄或者抓取訪問頁面,無外乎如下思路:web
第一 啓用一個web訪問會話方法或者實例化一個web訪問類,如.net中的HttpWebRequest;
第二 模擬POST或者GET方式提交的數據;
第三 模擬請求的頭;
第四 提交請求並得到響應,及對響應作咱們所須要的處理。
這裏咱們以人人網的登陸爲例,將涉及到POST以及GET兩種請求方式。
你們使用抓包工具(IE調試工具/httpwatch)都是能夠的,我這裏採用httpwatch,登錄人人網的時候(www.renren.com),一共作了一個POST請求以及兩個GET請求,以下圖:服務器
![](http://static.javashuo.com/static/loading.gif)
post了一個後,第一個返回狀態值是200的通常就是登陸後的首頁地址,有些網頁須要跳轉的比較多一些,可是方法都是同樣的,cookie
觀察這三個請求的詳細信息,不難看出這裏都是順序的,第一個GET請求的地址由POST的響應獲得,而第二個GET請求的地址又由第一個GET的響應獲得。app
每次請求與下一次請求之間的聯繫就是每次請求後返回的Cookies數據,前一次的返回Cookie數據須要同下一次請求一同發送到服務器,這也是C#模擬網站登錄的關鍵。dom
這裏須要注意幾點:工具
1、選擇須要post的地址,能夠經過工具查看得到,也能夠經過查看網頁源代碼得到。post
![](http://static.javashuo.com/static/loading.gif)
2、content能夠查看返回的內容,或者是包含下一跳的連接地址。到最後必定是首頁的網頁內容。網站
![](http://static.javashuo.com/static/loading.gif)
![](http://static.javashuo.com/static/loading.gif)
![](http://static.javashuo.com/static/loading.gif)
先來模擬第一個POST請求google
- HttpWebRequest request = null;
- HttpWebResponse response = null;
- string gethost = string.Empty;
- CookieContainer cc = new CookieContainer();
- string Cookiesstr = string.Empty;
- try
- {
-
- string postdata =「」email=adm13956587&password=786954887&icode=&origURL=http%3A%2F%2Fwww.renren.com%2Fhome&domain=renren.com&key_id=1&captcha_type=web_login"
- string LoginUrl="http://www.renren.com/PLogin.do";
- request = (HttpWebRequest)WebRequest.Create(LoginUrl);
- request.Method = "POST";
-
- request.ContentType = "application/x-www-form-urlencoded";
- byte[] postdatabytes = Encoding.UTF8.GetBytes(postdata);
- request.ContentLength = postdatabytes.Length;
- request.AllowAutoRedirect = false;
- request.CookieContainer = cc;
- request.KeepAlive = true;
-
- Stream stream;
- stream = request.GetRequestStream();
- stream.Write(postdatabytes, 0, postdatabytes.Length);
- stream.Close();
-
- response = (HttpWebResponse)request.GetResponse();
-
- response.Cookies = request.CookieContainer.GetCookies(request.RequestUri);
- CookieCollection cook = response.Cookies;
- string strcrook = request.CookieContainer.GetCookieHeader(request.RequestUri);
- Cookiesstr = strcrook;
- The URL has moved <a href="http://www.renren.com/home">here</a>
- StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
- string content = sr.ReadToEnd();
- response.Close();
- string[] substr = content.Split(new char[] { '"' });
- gethost = substr[1]; //http://www.renren.com/home
- }
- catch (Exception)
- {
-
- }
註釋寫的很詳細了,在這就再也不分析,也許有人對request = (HttpWebRequest)WebRequest.Create(LoginUrl)有疑問,能夠去google一下HttpWebRequest和WebRequest的區別,簡單來講WebRequest是一個抽象類,不能直接實例化,須要被繼承,而HttpWebRequest繼承自WebRequest。url
再模擬第一個和第二個GET請求
- try
- {
- request = (HttpWebRequest)WebRequest.Create(gethost);
- request.Method = "GET";
- request.KeepAlive = true;
- request.Headers.Add("Cookie:" + Cookiesstr);
- request.CookieContainer = cc;
- request.AllowAutoRedirect = false;
- response = (HttpWebResponse)request.GetResponse();
-
- Cookiesstr = request.CookieContainer.GetCookieHeader(request.RequestUri);
-
- StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
- string ss = sr.ReadToEnd();
- string[] substr = ss.Split(new char[] { '"' });
- gethost = substr[1]; //http://www.renren.com/1915651750
- request.Abort();
- sr.Close();
- response.Close();
- }
- catch (Exception)
- {
-
- }
- try
- {
-
- request = (HttpWebRequest)WebRequest.Create(gethost);
- request.Method = "GET";
- request.KeepAlive = true;
- request.Headers.Add("Cookie:" + Cookiesstr);
- request.CookieContainer = cc;
- request.AllowAutoRedirect = false;
- response = (HttpWebResponse)request.GetResponse();
-
- Cookiesstr = request.CookieContainer.GetCookieHeader(request.RequestUri);
-
StreamReader sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8);
-
string ss = sr.ReadToEnd();
- webBrowser1.Navigate("about:blank");
-
webBrowser1.Document.OpenNew(true);
-
webBrowser1.Document.Write(ss);
- request.Abort();
- response.Close();
- }
- catch (Exception)
- {
-
- }
GET與POST請求大同小異,這裏便再也不累述。三次請求結束,保存好你的cookie string,每次請求的時候都賦給請求的頭部,你就處於登陸狀態了。