java假設模擬請求從新啓動路由器(網絡爬蟲常用),還有java怎樣下載圖片

咱們假設在公司或家裏使用網絡爬蟲去抓取本身索要的一些數據的時候,常常對方的站點有defence機制,會給你的http請求返回500錯誤,僅僅要是一樣IP就請求不到數據,這時候咱們僅僅能去從新啓動路由器,這樣IP地址會改變,網絡爬蟲就能正常工做了html

如下是經過發送Socket請求來模擬路由器的從新啓動指令:java

protected void rebotadsl() {
			try {
				BufferedOutputStream sender = null;
				String url = baseURL;
				URL target = new URL(url);
				InetAddress address = InetAddress.getByName(target.getHost());
				Socket client = new Socket(address, 8080);
				sender = new BufferedOutputStream(client.getOutputStream());
				String str = "";
				String cmd = "GET "
						+ "/userRpm/StatusRpm.htm?Disconnect=%B6%CF%20%CF%DF&wan=1"
						+ " HTTP/1.0\r\n" + "User-Agent: myselfHttp/1.0\r\n"
						+ "Accept: www/source; text/html; image/gif; */*\r\n"
						+ "Authorization: Basic" + " " + luyou + "\r\n"
						+ "\r\n"; //luyou填寫路由器的password,如YWRtaW46d2FuZzIwMDU=
				sender.write(cmd.getBytes(), 0, cmd.length());
				sender.flush();
				System.out.println("由於重定向路由器斷線了");


			} catch (Exception ex) {
				ex.printStackTrace();


			}


		}

固然了,咱們得寫一個算法來使用這個函數,如兩次從新啓動路由器時間不能過短算法


java下載圖片:數據庫

/**
	 *發送圖片信息到server下載圖片,應用ISO8859-1 
	 */
	public void sendPic(String url,String story,String name){
		setURL(url);
		HttpClient http = new HttpClient();
		http.getHttpConnectionManager().getParams().setConnectionTimeout(100000);   
		GetMethod get=null;
	      try{
	       get = new GetMethod(url);
	      }catch(IllegalArgumentException ex){
	    	  Log.logException("url帶有不規則字符", ex);
	    	  setStatus(baseURL, ERROR);
	    	  _body.setLength(0);
	    	  return;
	      }
		get.getParams().setParameter(HttpMethodParams.SO_TIMEOUT,100000);  	
		get.setFollowRedirects(false);	
		 int er = 0;
		try{
			get.addRequestHeader("user-agent",useragent);
			 er = http.executeMethod(get);
		     System.out.println("server return code"+er);
		}catch(Exception ex){
			System.out.println("發送圖片url到server訪問失敗");
			 try{
	  			    Thread.sleep(120000);
	  			 }catch(InterruptedException e){
	  			   }
	  			 try {
					er = http.executeMethod(get);
				} catch (Exception e) {
					System.out.println("連不上server,系統將推出");
					System.exit(0);
				}
		}
		
		 if (er == 200) {
	    	InputStream is = null;
	          //讀取從server傳過來的頁面數據
	    	   try {
	    		   is = get.getResponseBodyAsStream();
	   		} catch (Exception e) {
	   		 System.out.println("讀取server內容響應時錯誤發生");
	   		}
	   		byte buffer[] = new byte[20480];
			byte tbuf[] = new byte[204800];
			StringBuffer bf = new StringBuffer();
			try {
				int tl=0;
				while (true) {
					int l = is.read(buffer);
					if (l < 0 || l+tl>204800)
						break;
					for(int jj=0;jj<l;jj++)
					    tbuf[tl+jj]=buffer[jj];
					tl+=l;
				}
				bf.append(new String(tbuf, 0, tl, "ISO8859-1"));
			   _body.setLength(0);
		       _body.append(bf.toString());
		      }catch(IOException ex){
		    	  System.out.println("將server的數據轉換成String時錯誤發生");
		      }
	    	  
		 }//end if(er == 200)
		 //下載圖片到硬盤上
		 
	
		 File outputfile = new File(story,name);
		 try{
			 FileOutputStream fos = new FileOutputStream(outputfile);
	         fos.write(_body.toString().getBytes("ISO8859-1"));
	         fos.close();
			}catch(IOException ex){
				System.out.println("IO存本地錯誤發生");
			}
	}
	

	// 返回true表示該url在數據庫中已存在
	public boolean URLisExist(String url) {
		ResultSet rs = null;
		boolean b = true;
		int count = 0;
		try {
			_prepGetCount.setString(1, url);
			rs = _prepGetCount.executeQuery();
			rs.next();
			count = rs.getInt("qty");
		} catch (Exception ex) {
			System.out.println("URLisExist錯誤發生");
			try {
				if (rs != null) {
					rs.close();
				}
			} catch (Exception e1) {
				System.out.println("rs關閉時錯誤發生");
			}
		}
		if (count < 1)
			b = false;
		return b;
	}
相關文章
相關標籤/搜索