暑假閒着沒事第一彈:基於Django的長江大學教務處成績查詢系統

 本篇文章涉及到的知識點有:Python爬蟲,MySQL數據庫,html/css/js基礎,selenium和phantomjs基礎,MVC設計模式,ORM(對象關係映射)框架,django框架(Python的web開發框架),apache服務器,linux(centos 7爲例)基本操做。所以適合有以上基礎的同窗學習。javascript

聲明:本博文只是爲了純粹的技術交流,敏感信息本文會有所過濾,你們見諒(因爲任何緣故致使長江大學教務處網站出現問題,都與本人無關)。php

實現思路:在沒有教務處數據接口的前提下(學生的信息安全),那也只有本身寫爬蟲去模擬登錄教務處,而後爬數據,爲了防止教務處網站崩潰,致使爬蟲失敗,能夠進行數據緩存,下次能夠直接從本身的數據庫中取數據,而咱們要作的就是定時更新數據與教務處實現同步。css

技術架構:centos 7 + apache2.4 + mariadb5.5 + Python2.7.5 + mod_wsgi 3.4 + django1.11html

------------------------------------------------------------------------前端

1、Python爬蟲:java

一、先看一下登陸入口 python

咱們這裏用FireFox進行抓包分析,咱們發現登陸是post上去的,而且帶有7個參數,發現有驗證碼,此時有兩種解決辦法,一種是運用如今很火的技術用DL作圖片識別,一種是down下來讓用戶本身輸。第一種成本比較高。。等不忙了能夠試一下,記得Python有個庫叫Pillow仍是PIL能夠作圖片識別,,暑假用TF試一下。第二種很low就不說了。mysql

二、 還有種高大上的方式,,,能夠不用管驗證碼,這裏就不細說了,咱們模擬登錄上去:linux

#coding:utf8
from bs4 import BeautifulSoup
import urllib
import urllib2
import requests
import sys

reload(sys)
sys.setdefaultencoding('gbk')

loginURL = "教務處登錄地址"
cjcxURL = "http://jwc2.yangtzeu.edu.cn:8080/cjcx.aspx"
html = urllib2.urlopen(loginURL)
soup = BeautifulSoup(html,"lxml")
__VIEWSTATE = soup.find(id="__VIEWSTATE")["value"]
__EVENTVALIDATION = soup.find(id="__EVENTVALIDATION")["value"]

data = {
        "__VIEWSTATE":__VIEWSTATE,
        "__EVENTVALIDATION":__EVENTVALIDATION,
        "txtUid":"帳號",
        "btLogin":"%B5%C7%C2%BC",
        "txtPwd":"密碼",
        "selKind":"1"
        }
header = {
#        "Host":"jwc2.yangtzeu.edu.cn:8080",
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0;… Gecko/20100101 Firefox/54.0",
        "Accept":"text/html,application/xhtml+x…lication/xml;q=0.9,*/*;q=0.8",
        "Accept-Language":"zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
        "Accept-Encoding":"gzip, deflate",
        "Content-Type":"application/x-www-form-urlencoded",
#        "Content-Length":"644",
        "Referer":"http://jwc2.yangtzeu.edu.cn:8080/login.aspx",
#        "Cookie":"ASP.NET_SessionId=3zjuqi0cnk5514l241csejgx",
#        "Connection":"keep-alive",
#        "Upgrade-Insecure-Requests":"1",
        }

UserSession = requests.session()
Request = UserSession.post(loginURL,data,header)
Response = UserSession.get(cjcxURL,cookies = Request.cookies,headers=header)
soup = BeautifulSoup(Response.content,"lxml")
print soup

接下來咱們能夠看到:nginx

再來post(此代碼接上面):

__VIEWSTATE2 = soup.find(id="__VIEWSTATE")["value"]
__EVENTVALIDATION2 = soup.find(id="__EVENTVALIDATION")["value"]

AllcjData = {
            "__EVENTTARGET":"btAllcj",
            "__EVENTARGUMENT":"",
            "__VIEWSTATE":__VIEWSTATE2,
            "__EVENTVALIDATION":__EVENTVALIDATION2,
            "selYear":"2017",
            "selTerm":"1",
#            "Button2":"%B1%D8%D0%DE%BF%CE%B3%C9%BC%A8"
        }
AllcjHeader = {
#       "Host":"jwc2.yangtzeu.edu.cn:8080",
        "User-Agent":"Mozilla/5.0 (Windows NT 10.0;… Gecko/20100101 Firefox/54.0",
        "Accept":"text/html,application/xhtml+x…lication/xml;q=0.9,*/*;q=0.8",
        "Accept-Language":"zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",
        "Accept-Encoding":"gzip, deflate",
        "Content-Type":"application/x-www-form-urlencoded",
#        "Content-Length":"644",
        "Referer":"http://jwc2.yangtzeu.edu.cn:8080/cjcx.aspx",
#        "Cookie":,
        "Connection":"keep-alive",
        "Upgrade-Insecure-Requests":"1",
        }
Request1 = UserSession.post(cjcxURL,AllcjData,AllcjHeader)
Response1 = UserSession.get(cjcxURL,cookies = Request.cookies,headers=AllcjHeader)
soup = BeautifulSoup(Response1.content,"lxml")
print soup

發現不行。。。此次get的頁面仍是原來的頁面。。。我以爲有兩種緣由致使此次post失敗:一是asp.net的__VIEWSTATE和__EVENTVALIDATION變量致使post失敗,二是一個form多個button用了js作判斷,致使爬蟲失敗,對於動態加載的頁面,普通爬蟲仍是不行。。。。

三、再來點高大上的用selenium(web自動化測試工具,能夠模擬鼠標點擊)+ phantomjs(沒有界面的瀏覽器,比chrome和Firefox都要快)

selenium安裝:pip install selenium

phantomjs安裝:

(1)地址:http://phantomjs.org/download.html(我下載的是Linux 64位的)

(2)解壓縮:tar -jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2 /usr/share/  

(3)安裝依賴:yum install fontconfig freetype libfreetype.so.6 libfontconfig.so.1

(4)配置環境變量:export PATH=$PATH:/usr/share/phantomjs-2.1.1-linux-x86_64/bin

(5)shell下輸入phantomjs,若是能進入命令行,安裝成功。

請忽略個人註釋:

#coding:utf8
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import urllib
import urllib2
import sys 


reload(sys)
sys.setdefaultencoding('utf8')

driver = webdriver.PhantomJS();
driver.get("教務處登陸地址")
driver.find_element_by_name('txtUid').send_keys('帳號')
driver.find_element_by_name('txtPwd').send_keys('密碼')
driver.find_element_by_id('btLogin').click()
cookie=driver.get_cookies()
driver.get("http://jwc2.yangtzeu.edu.cn:8080/cjcx.aspx")
#print driver.page_source
#driver.find_element_by_xpath("//input[@name='btAllcj'][@type='button']")
#js = "document.getElementById('btAllcj').onclick=function(){__doPostBack('btAllcj','')}"
#js = "var ob; ob=document.getElementById('btAllcj');ob.focus();ob.click();)"
#driver.execute_script("document.getElementById('btAllcj').click();")
#time.sleep(2)                            #讓操做稍微停一下
#driver.find_element_by_link_text("所有成績").click() #找到‘登陸’按鈕並點擊
#time.sleep(2)
#js1 = "document.Form1.__EVENTTARGET.value='btAllcj';"
#js2 = "document.Form1.__EVENTARGUMENT.value='';"
#driver.execute_script(js1)
#driver.execute_script(js2)
#driver.find_element_by_name('__EVENTTARGET').send_keys('btAllcj')
#driver.find_element_by_name('__EVENTARGUMENT').send_keys('')
#js = "var input = document.createElement('input');input.setAttribute('type', 'hidden');input.setAttribute('name', '__EVENTTARGET');input.setAttribute('value', '');document.getElementById('Form1').appendChild(input);var input = document.createElement('input');input.setAttribute('type', 'hidden');input.setAttribute('name', '__EVENTARGUMENT');input.setAttribute('value', '');document.getElementById('Form1').appendChild(input);var theForm = document.forms['Form1'];if (!theForm) {    theForm = document.Form1;}function __doPostBack(eventTarget, eventArgument) {    if (!theForm.onsubmit || (theForm.onsubmit() != false)) {        theForm.__EVENTTARGET.value = eventTarget;        theForm.__EVENTARGUMENT.value = eventArgument;        theForm.submit();    }   }__doPostBack('btAllcj', '')"
#js = "var script = document.createElement('script');script.type = 'text/javascript';script.text='if (!theForm) {    theForm = document.Form1;}function __doPostBack(eventTarget, eventArgument) {    if     (!theForm.onsubmit || (theForm.onsubmit() != false)) {        theForm.__EVENTTARGET.value = eventTarget;        theForm.__EVENTARGUMENT.value = eventArgument;        theForm.submit();  }}';document.body.appendChild(script);"
#driver.execute_script(js)
driver.find_element_by_name("Button2").click()
html=driver.page_source
soup = BeautifulSoup(html,"lxml")
print soup
tables = soup.findAll("table")
for tab in tables:
  for tr in tab.findAll("tr"):
    print "--------------------"
    for td in tr.findAll("td")[0:3]:
      print td.getText()

 

如今只能拿到必修課成績。。。。。由於所有成績是ASP生成的js觸發的。。。而不是直接submit。。。正在尋找解決的辦法。下面開始咱們數據庫的設計。。。

2、Mariadb學生數據庫設計,,,這裏引用了咱們SQL server數據庫原理上機的內容。。。

 

個人建庫語句:

create database jwc character set utf8;

use jwc;

create table Student(
    Sno char(9) primary key,
    Sname varchar(20) unique,
    Sdept char(20),
    Spwd char(20)
);
create table Course(
    Cno   char(2) primary key,
    Cname varchar(30) unique,
    Credit  numeric(2,1)
);
create table SC( 
    Sno char(9) not null,
    Cno char(2) not null,
    Grade int check(Grade>=0 and Grade<=100),
    primary key(Sno,Cno),
    foreign key(Sno) references Student(Sno),
    foreign key(Cno) references Course(Cno)
);

3、Python web環境的搭建(LAMP):

一、由於此次選的http服務器時apache,因此要安裝mod_wsgi(python通用網關接口)來實現apache和Python程序的交互。。。若是用nginx就要安裝配置uwsgi。。。相似java的servlet和PHP的php-fpm。

安裝:yum install mod_wsgi

配置:vim /etc/httpd/conf/httpd.conf

 這個配置花費了我很多心思和時間。。。網上的有不少錯誤。。。最標準的Python web django開發配置。。。拿走不謝。

#config python web
LoadModule wsgi_module modules/mod_wsgi.so  
<VirtualHost *:8080>
    ServerAdmin root@Vito-Yan
    ServerName www.yuol.onlne
    ServerAlias yuol.online

    Alias /media/ /var/www/html/jwc/media/
    Alias /static/ /var/www/html/jwc/static/
    <Directory /var/www/html/jwc/static/>    
        Require all granted
    </Directory>
    
    WSGIScriptAlias / /var/www/html/jwc/jwc/wsgi.py 
#    DocumentRoot "/var/www/html/jwc/jwc"
    ErrorLog "logs/www.yuol.online-error_log"
    CustomLog "logs/www.yuol.online -access_log" common
    
    <Directory "/var/www/html/jwc/jwc">
        <Files wsgi.py>
            AllowOverride All 
            Options Indexes FollowSymLinks Includes ExecCGI
            Require all granted
        </Files>    
    </Directory>
</VirtualHost>

 二、下面來安裝django。。。pip install django。。。。搞定。

查看django的版本:python -m django --version

官網地址:https://www.djangoproject.com

新建項目:django-admin.py startproject jwc(個人是在/var/www/html下建的,apache的網站根目錄)

三、apcehe的配置:就不貼了,把上面的jwc改爲jwc2,而後端口改爲9000,而後Listen 9000(爲何用9000呢,第一個項目jwc用的是8080,django自帶的服務器用python manage.py runserver能夠開啓,它的默認端口是8000,因此不用8000,以避免衝突,個人jsp項目的tomcat服務器用的是9090端口,以避免衝突,最好不用,常見的就9000端口了,其餘不敢亂用)。

四、 settings.py的配置:

DEBUG = True 調試開啓

ALLOWED_HOSTS = ['192.168.47.128'] 添加主機

五、wsgi.py配置,不要問我爲何。。。我也不知道。。用apache服務器啓動django項目這樣作就好了。。。若是用django自帶的server就不用改了。。。

"""
WSGI config for jwc2 project.

It exposes the WSGI callable as a module-level variable named ``application``.

For more information on this file, see
https://docs.djangoproject.com/en/1.11/howto/deployment/wsgi/
"""

#import os

#from django.core.wsgi import get_wsgi_application

#os.environ.setdefault("DJANGO_SETTINGS_MODULE", "jwc2.settings")

#application = get_wsgi_application()

import os    
from os.path import join,dirname,abspath    
PROJECT_DIR = dirname(dirname(abspath(__file__)))    

import sys    
sys.path.insert(0,PROJECT_DIR)
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "jwc2.settings")    

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

而後就大功告成。。。。Python web環境算是搭建完成。。。

4、開啓咱們的第一個django項目應用。。。

 一、新建成績查詢的應用 python manage.py startapp cjcx

二、在settings.py中添加應用

三、在views.py裏寫下寫下第一行代碼。。。。

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.http import HttpResponse
from django.shortcuts import render

# Create your views here.

def index(request):
    return HttpResponse("Hello,YUOL!")

四、在urls.py下添加url

from django.conf.urls import url 
from django.contrib import admin
import cjcx.views as cj

urlpatterns = [ 
    url(r'^admin/', admin.site.urls),
    url(r'^cjcx/',cj.index),
]

五、Hello,YUOL!

六、剛剛上面的4還能夠換種方法。。。。

在cjcx應用下面新建urls.py

from django.conf.urls import url 
from . import views

urlpatterns = [ 
    url(r'^$', views.index),
]

修改jwc2下面的urls.py(項目根路徑)

from django.conf.urls import url, include
from django.contrib import admin

urlpatterns = [
    url(r'^admin/', admin.site.urls),
    url(r'^cjcx/', include('cjcx.urls')),
]

七、寫前端頁面。。。。。

在cjcx應用下面新建templates文件夾放咱們的html文件(請暫時忽略動態加載的代碼,我懶得刪了)

<html>

<head>
    <title>YUOL成績查詢系統</title>
    <style type="text/css">
        #border {
            margin: 0 auto;
            width: 500px;
            min-height: 500px;
            background-color: #FFFFFF;
            border: 1px solid #000000;
        }   

        #button {}
    </style>
</head>

<body style="text-align:center">
    <div id="border">
        <h1>YUOL成績查詢系統</h1><br/>
        <form action="" method="post"> 帳號:
            <input type="text" id="xuehao" name="Sno" /><br/> 密碼:
            <input type="password" id="pwd" name="Spwd" /><br/><br/>
            <input type="submit" value="查詢" id="submit" /><br/>
            <div style="text-align:left;padding-left:50px;">
                -----------------------------------------------------------<br/> 
                姓名:{{ student.Sname }}<br/>
                 學號:{{ student.Sno}}<br/>
                  班級:{{ student.Sdept }}<br/>
            </div>
            -----------------------------------------------------------<br/>
            <div>
               &nbsp;&nbsp; &nbsp;&nbsp;
                <br>
                <div style="display:inline-block;width:150px;">
                     科目:<br>
                    {{ course.Cname }}
                </div>
                <div style="display:inline-block;width:150px;">
                    成績:<br>
                    {{ sc.Grade }}
                </div>
                <div style="display:inline-block;width:150px;">
                     學分:<br>
                    {{ course.Credit }}
                </div>

            </div>
        </form>
    </div>
</body>

</html>

修改views.py:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.http import HttpResponse
from django.shortcuts import render

# Create your views here.

def index(request):
    return render(request, 'jwcjcx.html')

而後就成這樣了。。。。。

八、根據jwc數據庫設計Models。。。。

django默認支持的是sqllite,,如今換成 mariadb,修改settings.py

DATABASES = { 
    'default': {
#        'ENGINE': 'django.db.backends.sqlite3',
#        'NAME': os.path.join(BASE_DIR, 'db.sqlite3'),
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'jwc2',
        'USER':'root',
        'PASSWORD':'你的密碼',
        'HOST':'localhost',
        'PORT':'3306',
     }   
}

九、去models.py下面建表吧。。。。

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.db import models

# Create your models here.

class Student(models.Model):
    Sno=models.CharField(max_length=9,primary_key=True)
    Sname=models.CharField(max_length=20,unique=True)
    Sdept=models.CharField(max_length=20)
    Spwd=models.CharField(max_length=20)

class Course(models.Model):
    Cno=models.CharField(max_length=2,primary_key=True)
    Cname=models.CharField(max_length=30,unique=True)
    Credit=models.DecimalField(max_digits=2, decimal_places=1)

class SC(models.Model):
    Sno=models.CharField(max_length=9)
    Cno=models.CharField(max_length=2)
    Grade=models.IntegerField()

    def __unicode__(self):
        return self.Sno

這種ORM免去了寫sql語句的麻煩,直接把表封裝成一個類繼承model.Model,查詢字段直接‘點’操做。。。很方便。

而後生成數據模型表:python manage.py makemigrations 

再將數據表遷移到mariadb數據庫:python manage.py migrate

生成cjcx_三個表,其餘是django默認的不用管,另外數據庫要本身先建(create database jwc2 charset=utf8;)

十、使用django admin作數據管理。。。。Admin真心好用這是django框架最顯著的一個優點。。。

建立用戶:python manage.py createsuperuser

而後在主機後面加/admin就能夠登陸。。。咱們發現它的css和img丟失了

解決辦法:

在jwc2下面建一個靜態文件夾:static

修改settings.py。。。在最後一行添加STATIC_ROOT = "/var/www/html/jwc2/static/",LANGUAGE_CODE = 'zh-Hans'(改爲中文的admin)

執行命令 :python manage.py collectstatic  

上面apache的靜態文件配置取消註釋。。。

這樣進去看不到數據表,須要修改admin.py引入models

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.contrib import admin
import models
# Register your models here.
admin.site.register(models.Student)
admin.site.register(models.Course)
admin.site.register(models.SC)

能夠直接操做數據庫了。。。django的強大之處。。

十一、下面開始咱們最重要的業務邏輯。。

數據入庫(MVC中的M,models):我這裏把Course表的Cno給刪了,把SC表的Cno換成Cname了。。。和上面有所不一樣,只須要把庫刪了從新生成數據表便可。。。

#encoding=utf-8
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import MySQLdb
import time
import urllib
import urllib2
import sys 


reload(sys)
sys.setdefaultencoding('utf8')

conn= MySQLdb.connect(
        host='localhost',
        port = 3306,
        user='root',
        passwd='密碼',
        db ='jwc2',
        charset='utf8'
        )   
cur = conn.cursor()

driver = webdriver.PhantomJS();
driver.get("教務處登陸入口")
driver.find_element_by_name('txtUid').send_keys('帳號')
driver.find_element_by_name('txtPwd').send_keys('密碼')
driver.find_element_by_id('btLogin').click()
cookie=driver.get_cookies()
driver.get("http://jwc2.yangtzeu.edu.cn:8080/cjcx.aspx")
driver.find_element_by_name("Button2").click()
html=driver.page_source
#html = open("btAllcj.html","r")
soup = BeautifulSoup(html,"lxml")
Sno = str(soup.find(id="lbXH").getText())
Sname = str(soup.find(id="lbXm").getText())
Sdept =  str(soup.find(id="lbBj").getText())
Student = (Sno,Sname,Sdept,'12345678')
sql = "insert into cjcx_student values(%s,%s,%s,%s)" 
cur.execute(sql,Student)
id = 0
tables = soup.findAll("table")
for tab in tables[1:2]:
    for tr in tab.findAll("tr")[1:]:
        count = 0 
        for td in tr.findAll("td"):
            count += 1
            if count==1:
                Cname = td.getText()
            if count==2:
                Grade = td.getText()
          id += 1 sql
= "insert into cjcx_sc values(%s,%s,%s,%s)" SC = (id,Sno,Cname,Grade) cur.execute(sql,SC) if count==3: Credit = td.getText() sql = "insert into cjcx_course values(%s,%s)" Course = (Cname,Credit) cur.execute(sql,Course) conn.commit() cur.close() conn.close()

業務邏輯views.py(MVC中的V,views)

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.http import HttpResponse
from django.shortcuts import render
from . import models

# Create your views here.

def index(request):
    return render(request, 'jwcjcx.html')

def search_action(request):
    Sno = request.POST['Sno']
    Spwd = request.POST['Spwd']
#這裏放爬蟲和數據入庫的代碼。。。。。
    student = models.Student.objects.get(Sno=Sno)
    pwd = student.Spwd
    if Spwd==pwd:
        sc = models.SC.objects.filter(Sno=Sno)
#       course = models.Course.objects.filter(Cname=sc.Cname)
        return render(request,'jwcjcx.html',{'student':student, 'sc':sc})

修改urls.py(MVC中的C,Controller)

jwc2項目urls:

from django.conf.urls import url,include
from django.contrib import admin

urlpatterns = [ 
    url(r'^admin/', admin.site.urls),
    url(r'^cjcx/',include('cjcx.urls', namespace='cjcx')),
]

cjcx應用urls:

from django.conf.urls import url 
from . import views

urlpatterns = [ 
    url(r'^$', views.index),
    url(r'^search/$',views.search_action,name='search_action'),
]

十二、前端數據渲染。。。。

<html>

<head>
    <title>YUOL成績查詢系統</title>
    <style type="text/css">
        #border {
            margin: 0 auto;
            width: 500px;
            min-height: 500px;
            background-color: #FFFFFF;
            border: 1px solid #000000;
        }   

        #button {}
    </style>
</head>

<body style="text-align:center">
    <div id="border">
        <h1>YUOL成績查詢系統</h1><br/>
        <form action="{% url 'cjcx:search_action' %}" method="post">{% csrf_token %} 帳號:
            <input type="text" id="xuehao" name="Sno" /><br/> 密碼:
            <input type="password" id="pwd" name="Spwd" /><br/><br/>
            <input type="submit" value="查詢" id="submit" /><br/>
            <div style="text-align:left;padding-left:50px;">
                -----------------------------------------------------------<br/> 
                姓名:{{ student.Sname }}<br/>
                 學號:{{ student.Sno}}<br/>
                  班級:{{ student.Sdept }}<br/>
            </div>
            -----------------------------------------------------------<br/>
            <div>
               &nbsp;&nbsp; &nbsp;&nbsp;
                <br/>
                <div style="display:inline-block;width:200px;">
                     科目:<br/>
                    {% for sc in sc %}
                    {{ sc.Cname }}<br/>
                    -------------------<br/>
                    {% endfor %}
                </div>
                <div style="display:inline-block;width:100px;">
                    成績:<br/>
                    {% for sc in sc %}
                    {{ sc.Grade }}<br/>
                    ------<br/>
                    {% endfor %}
                </div>
                <div style="display:inline-block;width:150px;">
                     學分:<br/>
                    {% for course in course %}
                    {{ course.Credit }}
                    {% endfor %}
                </div>

            </div>
        </form>
    </div>
</body>

</html>
                                                                

 收工。。。。。。

寫了兩天兩夜,實在卡不住了,後面學分就沒寫了。。。。。。。。。爬蟲還不穩定,邏輯判斷幾乎沒寫。。。只是簡單實現了功能。。。

最後附上一張照片:

相關文章
相關標籤/搜索