實習中遇到的問題

時間 2019-11-08

標籤實習遇到問題简体版

原文原文鏈接

●如何用http restful 發送請求request get 一個json數據?css

import requests

response = requests.get('https://www.sojson.com/open/api/weather/json.shtml?city=北京',
                         )
print(response
      )

data = response.json()
print(data)

View Code

h2o的使用:html

https://blog.csdn.net/gpwner/article/details/74058850java

●如何運行?node

這個代碼須要安裝jdk 8才行,安裝9就不行.python

# coding: utf-8

# In[1]:
#0411 84215599明天7點到10點空腹去體檢
import h2o
h2o.init(port='2341')

View Code

而後瀏覽器中輸入http://localhost:2341/就進入了mysql

●以後按照上面博客的方法,就能夠跑了.其中如何加入命令?點瀏覽器右邊的outline點assist再點屏幕左邊彈出來的importFiles就能夠加入命令.或者直接點快捷欄最後一個圓圈問號圖標也能夠.linux

●h2o基本都鼠標操做就好了ios

決策樹相關模型:git

https://blog.csdn.net/google19890102/article/details/51746402/angularjs

●伯努利分佈未必必定是 0-1 分佈，也多是 a-b 分佈，只需知足相互獨立、只取兩個值的隨機變量一般稱爲伯努利（Bernoulli）隨機變量。二項分佈就是n重伯努利分佈

●隨機森林效果很差,不如深度網絡,最後用的gbm效果最好到百分之12

學習pytorch代碼從github上.

clamp截斷:

>>> a = torch.randn(4) >>> a  1.3869  0.3912 -0.8634 -0.5468 [torch.FloatTensor of size 4] >>> torch.clamp(a, min=-0.5, max=0.5)  0.5000  0.3912 -0.5000 -0.5000 [torch.FloatTensor of size 4]

●聽同窗大師說要經過github學習別人的代碼.趕忙開始吧
●https://github.com/jcjohnson 很好的教程.
●項目1:作預測能夠輸出一個分佈而不是一個數
●pytorch 和tensorflow 區別:
One aspect where static and dynamic graphs differ is control flow. For some models we may wi
sh to perform different computation for each data point; for example a recurrent network might be unrolled for different numbers of tim
e steps for each data point; this unrolling can be implemented as a loop. With a static graph the loop construct needs to be a part of th
e graph; for this reason TensorFlow provides operators such as
tf.scan for embedding loops into the graph. With dynamic graphs the situation is simpler: since we build graphs on-the-fly for each exa
mple, we can use normal imperative flow control to perform computation that differs for each input.

●看看別人怎麼作的預測問題:

1.把時間分紅普通時間和特俗時間:好比雙11,618,節假日等

蒐集除已有銷量數據以外的額外信息（好比天氣、地點、節假日信息等），

https://blog.csdn.net/qq_19600291/article/details/74217896 頗有啓發

http://www.cnblogs.com/maybe2030/p/4585705.html

目前方法考慮:1.隨即森林2.lstm

●https://blog.csdn.net/aliceyangxi1987/article/details/73420583 這個牛逼.

●這個keras寫的真少,更好用.http://keras-cn.readthedocs.io/en/latest/

安裝keras cuda版本:

conda install pip

pip install tensorflow-gpu

繼續使用windows自帶的linux.

0.安裝好後,輸入帳號,密碼.這時設置的是普通用戶,好比zhangbo

1.第一次運行須要設置root帳號

sudo passwd

以後設置好後就su root就能夠切換root帳號了.用這個帳號就不用寫sudo 了很方便可是比較危險.就是展現和配置文件時候用,其餘時候不用.

2.普通使用的時候最好使用普通帳號: su zhangbo 便可.

3.cmd改字體,標題上右鍵屬性便可.

#cpu keras庫包的安裝:https://blog.csdn.net/yangqiang200608/article/details/78719568?locationNum=9&fps=1

anaconda

Anaconda建立環境：

//下面是建立python=3.6版本的環境，取名叫py36

conda create -n py36 python=3.6

刪除環境（不要亂刪啊啊啊）

conda remove -n py36 --all

激活環境

//下面這個py36是個環境名

source activate py36

退出環境

source deactivate

●2018-07-09 如何把代碼封裝成一個linux 的服務.使用的是windows的子系統ubuntu16版本.

1. 在/etc/init.d中vim myservice輸入:

# -*- coding: utf-8 -*-
"""
Created on Mon Jul  9 19:37:31 2018

@author: 張博
"""
#!/bin/bash
#dcription: a demo
#chkconfig:2345 88 77
lockfile=/var/lock/subsys/myservice
touch $lockfile
# start
start(){
        if [ -e $lockfile ] ;then
                sh  ~/tmp.sh 
                echo "Service is already running....."
                return 5
        else
                touch $lockfile
                echo "Service start ..."
                return 0
        fi
}
#stop
stop(){
        if [ -e $lockfile ] ; then
                rm -f $lockfile
                echo "Service is stoped "
                return 0
        else
                echo "Service is not run "
                return 5
        fi
 
}
#restart
restart(){
        stop
        start
}
usage(){
        echo "Usage:{start|stop|restart|status}"
}
status(){
        if [ -e $lockfile ];then
                echo "Service is running .."
                return 0
        else
                echo "Service is stop "
                return 0
        fi
}
case $1 in
start)
        start
        ;;
stop)
        stop
        ;;
restart)
        restart
        ;;
status)
        status
        ;;
*)
        usage
        exit 7
        ;;
esac

View Code

2.在vim ~/tmp.sh輸入:

echo 'dsfasdjlfs'

3. chmod+x /etc/init.d/myserver

4.service myservice start

5.屏幕輸出: 也就是在服務裏面運行了tmp.sh腳本,之因此把tmp.sh寫在了~下,由於我測試過若是寫在 /etc/init.d/中會提示沒法打開tmp

dsfff
Service is already running.....

以上實現了把代碼封裝到一個服務裏面,這樣只須要輸入第四步代碼就能運行程序tmp.sh

ps:對於windows的子系統.他們的文件系統共享的.c盤=/mnt/c 2個操做系統能夠互相訪問和修改文件

最後預測須要作的是

1.往服務器中裝上須要的庫包keras等

2..py封裝成服務

●記錄在linux系統中裝keras環境.

# 1. 更新系統包 sudo apt-get update sudo apt-get upgrade # 2. 安裝Pip sudo apt-get install python-pip # 3. 檢查 pip 是否安裝成功 pip -V

pip install tensorflow

pip install keras

把上面服務的程序改爲用python運行.

第一步改爲

#!/bin/bash
#dcription: a demo
#chkconfig:2345 88 77
lockfile=/var/myservice.back
touch $lockfile
# start
start(){
        if [ -e $lockfile ] ;then
                python3  ~/tmp.py
                echo "Service is already running....."
                return 5
        else
                touch $lockfile
                echo "Service start ..."
                return 0
        fi
}
#stop
stop(){
        if [ -e $lockfile ] ; then
                rm -f $lockfile
                echo "Service is stoped "
                return 0
        else
                echo "Service is not run "
                return 5
        fi
 
}
#restart
restart(){
        stop
        start
}
usage(){
        echo "Usage:{start|stop|restart|status}"
}
status(){
        if [ -e $lockfile ];then
                echo "Service is running .."
                return 0
        else
                echo "Service is stop "
                return 0
        fi
}
case $1 in
start)
        start
        ;;
stop)
        stop
        ;;
restart)
        restart
        ;;
status)
        status
        ;;
*)
        usage
        exit 7
        ;;
esac

View Code

第二步改爲:

在vim ~/tmp.py輸入:

print('3232423')

提示dpkg 被中斷,您必須手工運行 sudo dpkg --configure -a解決此問題:

https://blog.csdn.net/ycl295644/article/details/44536297

python 路徑問題:

windows ,linux 均可以用

dataframe = read_csv(r'c:/nonghang/output998.csv',usecols=['Sum']) 這種寫法來寫路徑,都不會報錯

時間序列相關;

時間序列按變量分:絕對值時間序列,相對值時間序列,平均值時間序列

技巧:用差分來替換數據來去除趨勢性,預測的干擾因素

# -*- coding: utf-8 -*-
"""
Created on Wed Jul 11 15:54:00 2018

@author: 張博
"""

#測試差分,確實能夠消除趨勢性,這樣時間序列分析就更準確了.
a=[4,5,6,8,9,10,15,20]
b=[a[i+1]-a[i]   for i in range(len(a)) if i!=len(a)-1]
print(b)
import matplotlib.pyplot as plt
x=range(len(a))
plt.plot(x,a,c='red')
print(b)
x=range(len(b))
plt.plot(x,b)
plt.plot.show()

View Code

單調隊列

http://www.cnblogs.com/ECJTUACM-873284962/p/7301757.html#_labelTop

0,1揹包變體:

#leecode461
class Solution:
    def canPartition(self, nums):
        """
        :type nums: List[int]
        :rtype: bool
        """
        if sum(nums)%2==1:
            return False
        tmp=sum(nums)//2
        memo={}
        def main(obj,list1):
              if (obj,tuple(list1)) in memo:
                    return memo[(obj,tuple(list1))]
              if obj!=0 and list1==[]:
                    return False
              if obj==0 :
                      return True
              if list1[0]>obj:
                return False
              for i in range(len(list1)):
                    if main(obj-list1[i],list1[:i]+list1[i+1:])==True:
                        memo[(obj,tuple(list1))]=True
                        return True
              memo[(obj,tuple(list1))]=False
              return False
        return main(tmp,nums)

View Code

很是好的linux課程:

https://www.shiyanlou.com/courses/944

TCP\IP協議

https://www.shiyanlou.com/courses/98/labs/448/document

ubuntu 設置代理的方法:

vi /etc/apt/apt.conf

輸入

Acquire::http::Proxy "http://帳號:密碼@代理地址:端口";

Acquire::http2::Proxy "http://帳號:密碼@代理地址:端口";

Acquire::ftp::Proxy "http://帳號:密碼@代理地址:端口";

以後就能夠用apt-get 來經過代理下載軟件了.ping 網站是ping不通的,由於這個方法只有apt有上網權限.

好在:裝keras linux 環境全用apt

這個方法在linux子系統裏面可使用.可能須要從新啓動這個子系統讓他生效.

sudo apt-get update 運行下這個必須

在虛擬機ubuntu14釐米也成功了.

改源

https://blog.csdn.net/zgljl2012/article/details/79065174/

安裝顯卡的tensorflow.

https://blog.csdn.net/weixin_39290638/article/details/80045236

我用的版本:cuda_9.0.176_win10 cudnn-9.1-windows10-x64-v7.1 (7.1版本的cudnn)

裝好後會出現,找不到xxx.dll

這時候用管理員權限打開cmd.激活上面的tensorflow-gpu 這個conda環境.再測試就好使了.

cmd裏面運行session程序就能看到是否是用gpu了.

同時不用設置keras,他自動會使用gpu版本的tensorflow

●如何用編輯器來使用這個環境.進入anaconda navigator裏面

從環境進入cmd.而後輸入spyder便可. ------------------------2018-07-13

python 追加寫入: 用a+便可

with open('C:/Users/張博/Desktop/all.txt','a+') as f:

　　f.write('77777777777777777777777777777777777777')
　　f.write('\n')

● https://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/

學習時間序列的處理方法:

1.要把數據stationary 化:用差分法

2.高維的時間序列分析!效果最好的模型!:https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

處理時間也是比1個變量慢很是多,可是效果很好,第一次跑農行數據沒調參就低於百分之9的偏差了.

●sublime多行編輯技巧:

1.ctrl+f 輸入要查詢的內容再點find

2.alt+F3 全選查詢內容

3.鼠標在文本框中點右鍵

4.鼠標在標題欄上點左鍵這時就會出現不少個鼠標光標.就能夠多光標輸入了

●複習機器學習的算法:https://machinelearningmastery.com/start-here/#process

試試雲Gpu主機:floydhub

發現上不去:網絡不行.而且他給的配置也很是水

解決運行cnn網絡出現:could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED 的bug解決:

後來是下載的:cudnn-9.0-windows10-x64-v7.1 就能夠了.把他解壓縮到cuda 9.0文件夾裏面覆蓋便可.

這個包前面寫9.0是cuda的標號,後面7.1是cudnn的標號.必定要對應好9.0的安裝才行.

總結:tensorflow-gpu的安裝. 用的是cuda 9.0 版本, 和上面的cudnn就能跑了.配環境真是麻煩.

解決Proxifier和spyder衝突的問題:

用上proxifier 後發現python全很差使了.spder也不能用了.因此卸載proxifier 一切都好了.神奇一個代理軟件竟然能把python弄壞.

如今已經知道經過設置proxifier的代理規則便可,把一些軟件設置爲不用代理,好比spyder.這樣就能夠同時使用了.

●寫bash腳本:

本身的ubuntu 16上面

編寫shell時，提示let：not found

解決方案: 用bash 1.sh 來運行腳本便可

把圖片識別繼續作下去:

1.多級文件夾的文件改名操做,爲了名字不重複

from os import *
import os
#a='C:/Users/張博/Desktop/文件/圖片項目'
#
#
#tmp=os.listdir(a)
#io=walk(tmp)
#print(io)
#dir_all=[]
#for i in tmp:
#    if path.isdir(a+'/'+i):
#        now_dir=a+'/'+i
#        dir_all.append(now_dir)
#print(dir_all) #獲得第一層全部目錄
#for i in range(1,11):
#  i=str(i)
#  b=a+'/after_process/'+i
#  if os.path.exists(b)==False:
#   os.makedirs(a+'/after_process/'+i)#遞歸的建立目錄
import shutil

#shutil.copyfile("1.txt","3.txt")
   




shutil.copyfile("1.txt","3.txt")

path1='E:/pic2'
tmp=os.listdir(path1)
print(tmp)
tmp=[path1+'/'+i for i in tmp ]
print(os.path.isdir('E:/pic2'))
tmp=[i for i in tmp if path.isdir(i)]
print(tmp)
print(len(tmp))
a8=1
for i in tmp:
    jj=i
    now=listdir(jj)
    now=[ii for ii in now if path.isdir(i+'/'+ii)]
    now=[i+'/'+ii for ii in now ]
    print(now)
    
#    os.rename()
    for iii in now:
        a=os.path.abspath(iii)
        print(a)
        print(listdir(a))
        for i in listdir(a):
            print(a)
            print(type(a))
            out=a
            out=out+'\\'+i  #注意要寫\\ 轉義
            os.rename(out,out[:-4]+str(a8)+'.png')
            a8+=1
            
#終於何斌了.曹餓了7千多個圖片
            
            
        





#for i in dir_all:
#    for j in os.listdir(i):
#        tmp=i+'/'+j
#        if path.isdir(tmp):
#            #i是P1,j是G1   i是包的標記,j是分類
#            k=os.listdir(tmp)
##           k= ['R10_l.png', 'R10_r.png', 'R11_l.png', 'R11_r.png', 'R12_l.png', 'R12_r.png', 'R13_l.png', 'R13_r.png', 'R14_l.png', 'R14_r.png', 'R15_l.png', 'R15_r.png', 'R16_l.png', 'R16_r.png', 'R17_l.png', 'R17_r.png', 'R18_l.png', 'R18_r.png', 'R19_l.png', 'R19_r.png', 'R1_l.png', 'R1_r.png', 'R20_l.png', 'R20_r.png', 'R2_l.png', 'R2_r.png', 'R3_l.png', 'R3_r.png', 'R4_l.png', 'R4_r.png', 'R5_l.png', 'R5_r.png', 'R6_l.png', 'R6_r.png', 'R7_l.png', 'R7_r.png', 'R8_l.png', 'R8_r.png', 'R9_l.png', 'R9_r.png']
#            print(k)
##            for kk in k:
##                
##                os.system("ren kk kk[:-4]+i[-2:]+'.png'")
##                print(kk)
##                fffffffffffffffffff
##            os.system()
#

View Code

2.拆分數據爲test和valid

# -*- coding: utf-8 -*-
"""
Created on Sun Jul 15 17:48:36 2018

@author: 張博
"""

'''
右鍵屬性,咱們發現每個圖報裏面有760張圖片.足夠了.若是不夠還能夠繼續用keras的
圖片生成器來隨機對圖片進行平移和旋轉放縮操做來對圖像進行提高.提高術語說的意思是
把一個圖片經過這3個變換來生成不少相相似的圖片,把這些圖片也做爲數據集,這樣訓練效果會跟好
更入幫性.
'''
from os import *
import shutil
a='C:/Users/張博/Desktop/圖片總結/all_pic'
aa=listdir(a)
print(a)
a=[a+'/'+i for i in aa]
print(a)
for i in a:
    #i 是當前文件夾
    print(i)
    tmp=listdir(i)
    num=(760*2//3)
    test=tmp[:num]
    valid=tmp[num:]
    mkdir(i+'/'+'test')
    mkdir(i+'/'+'valid')
    for ii in test:
     shutil.move(i+'/'+ii,i+'/'+'test')          #移動文件
    for ii in valid:
     shutil.move(i+'/'+ii,i+'/'+'valid')          #移動文件

View Code

3.跑到93正確率:固然還有正則化,bn層還沒加入.

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img








##圖像預處理:
#datagen = ImageDataGenerator(
#        rotation_range=10,
#        width_shift_range=1,
#        height_shift_range=1,
#        shear_range=0,
#        zoom_range=0.1,
#        horizontal_flip=True,
#        fill_mode='nearest')
# 
#img = load_img(r'C:/Users/張博/Desktop/cat.png')  # this is a PIL image
#x = img_to_array(img)  # this is a Numpy array with shape (3, 150, 150)
#
#print(x.shape)
##下面的方法把3維圖片變4維圖片
#x = x.reshape((1,) + x.shape)  # this is a Numpy array with shape (1, 3, 150, 150)#這個方法直接加一維
#
#print(x.shape)
## the .flow() command below generates batches of randomly transformed images
## and saves the results to the `preview/` directory
#i = 0
#for batch in datagen.flow(x, batch_size=1,
#                          save_to_dir='C:/Users/張博/Desktop/cat', save_prefix='cat'#默認的生成文件名前綴
#                          , save_format='png'):
#    i += 1
#    if i > 2:
#        break  # otherwise the generator would loop indefinitely
        
#img=load_img(r'C:/Users/張博/Desktop/cat/cat777777_0_1154.png')
#x = x.reshape((1,) + x.shape)
#print(x.shape)



from keras import backend as K
K.set_image_dim_ordering('th')
'''
if "image_dim_ordering": is "th" and "backend": "theano", your input_shape must be (channels, height, width)
if "image_dim_ordering": is "tf" and "backend": "tensorflow", your input_shape must be (height, width, channels)
所以上面咱們須要設置 把模式切換成爲'th'  .這點要注意.
'''











      
#搭建網絡:

from keras.models import Sequential
from keras.layers import Convolution2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
 
model = Sequential()
model.add(Convolution2D(32, 3, 3, input_shape=(3, 150, 150)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
 
model.add(Convolution2D(32, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
 
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))

model.add(MaxPooling2D(pool_size=(2, 2)))
 
# the model so far outputs 3D feature maps (height, width, features)





model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(10)) #由於咱們分10累因此最後這裏寫10

model.add(Activation("softmax")) #softmax能夠把10這個向量的分類凸顯出來
 



'''
我如今所知道的解決方法大體只有兩種，第一種就是添加dropout層，dropout的原理我就很少說了，
主要說一些它的用法，dropout能夠放在不少類層的後面，用來抑制過擬合現象，常見的能夠直接放在Dense層後面，
對於在Convolutional和Maxpooling層中dropout應該放置在Convolutional和Maxpooling之間，仍是Maxpooling
後面的說法，個人建議是試！這兩種放置方法我都見過，可是孰優孰劣我也很差說，可是大部分見到的都是放在
Convolutional和Maxpooling之間。關於Dropout參數的選擇，這也是隻能不斷去試，可是我發現一個問題，
在Dropout設置0.5以上時，會有驗證集精度廣泛高於訓練集精度的現象發生，可是對驗證集精度並無太大影響，
相反結果卻不錯，個人解釋是Dropout至關於Ensemble，dropout過大至關於多個模型的結合，一些差模型會拉低
訓練集的精度。固然，這也只是個人猜想，你們有好的解釋，不妨留言討論一下。 
固然還有第二種就是使用參數正則化，也就是在一些層的聲明中加入L1或L2正則化係數，




keras.layers.normalization.BatchNormalization(
epsilon=1e-06, mode=0, axis=-1, momentum=0.9, weights=None, beta_init='zero',
 gamma_init='one')

'''






# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)
 
# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)
 
# this is a generator that will read pictures found in
# subfolers of 'data/train', and indefinitely generate
# batches of augmented image data



train_generator = train_datagen.flow_from_directory(
        'C:/Users/張博/Desktop/圖片總結/all_pic/test',  # this is the target directory
        target_size=(150, 150),  # all images will be resized to 150x150
        batch_size=20,          #說通常不操過128,取16,32差很少
        class_mode='sparse')  # since we use binary_crossentropy loss, we need binary labels




# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
        'C:/Users/張博/Desktop/圖片總結/all_pic/valid',
        target_size=(150, 150),
        batch_size=20,
        class_mode='sparse')


#learning rate schedual 過重要了.常常發現到最後的學習效果提高很慢了.就是由於步子太大了扯蛋了.
def schedule(epoch):
    rate=0.7
    if epoch<3:
        return 0.002  #開始學的快一點
    if epoch<10:
        return 0.001
    if epoch<20:
        return 0.001*rate
    if epoch<30:
        return 0.001*rate**2
    if epoch<100:
       return 0.001*rate**3
    else:
        return 0.001*rate**4
    
learning_rate=keras.callbacks.LearningRateScheduler(schedule)
learning_rate2=keras.callbacks.ReduceLROnPlateau(factor=0.7)


adam=keras.optimizers.Adam( beta_1=0.9, beta_2=0.999, epsilon=1e-08,clipvalue=0.5)#lr=0.001
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=adam,
              metrics=['accuracy'])



#https://www.cnblogs.com/bamtercelboo/p/7469005.html       很是nice的調參總結


H=model.fit_generator(
        train_generator,
        steps_per_epoch=2000,     #  一個epoch訓練多少次生成器給的數據,無所謂隨便寫一個就行,
                                  #也就是說一個epoch要訓練多久,無所謂的一個數,可是若是看效果着急就設小一點.
        nb_epoch=500,         #總工迭代多少輪,越大越好
        validation_data=validation_generator,callbacks=[learning_rate,
         learning_rate2]
        )
 
#其實對於這個10分類問題就看acc指數就夠了

print(H.history["loss"])
print(H.history["val_loss"])
print(H.history["acc"])


'''
batch的理解:
    有一堆人各類需求,整體放進去算loss函數確定能讓全部人的滿意度提高
    可是人太多了,超大規模矩陣算梯度太慢了,因此選一小組人batch做爲訓練就能夠了,讓他們知足就好了.
    由於數據通過shuffle,因此每一次取得小組基本能表明全部人的需求.因此batch取大一點更能反映總體性質.
    可是單步計算可能就慢一些.因此通常16到100之見作選擇.
'''

View Code

等訓練好了,寫一個調參總結,把參數大概取什麼範圍定下來.其中動態學習率很是重要.

mysql傻瓜式安裝:

https://blog.csdn.net/NepalTrip/article/details/79492058

我選擇的版本是:

安裝步奏:狂next,運行:不過每次使用的時候，必須進入mysql的安裝bin目錄才能夠經過mysql -u root -p 進入.成功進入.

若是發現進不去就是由於mysql的服務沒有開啓,開啓便可.

很差使能夠試試:C:\Program Files\MySQL\MySQL Server 5.7\bin\mysql.exe 這個直接進

複習到邏輯迴歸的對數loss函數不是很理解.去找書看看怎麼推導的.

發現周志華的書上推導錯了,極大思然寫的不對,推導其實就是極大思然的公式帶進去就完了. 一個2項分佈的釋然函數而已

似然函數=p1^(y1)*(1-p1)^y0 就對了.

記錄solr的使用方法:

1.http://localhost:8080/solr/index.html#/ 進入solr界面

2.接下來就是建立solrCore

在solrHome,咱們建立一個空文件夾core1，

把solrHome裏面有別人的帳號,把他的conf文件夾貼到core1裏面.

在solr的管理控制檯界面 --core Admin ----Add Core

3.須要作的東西是把csv文件都放到solr服務器上.由於solr只能存.json 因此這裏用了pysolr這個庫包.

下面實現了這個上傳csv到solr的功能.ps:這個庫包還能夠維護solr

'''
文檔:https://pypi.org/project/pysolr/3.3.0/
localhost後面沒有.com
'''
from __future__ import print_function
import csv
import csv,json
b=csv.reader(open(r'C:\Users\張博\Desktop\solr_test.csv'))

b=list(b)
c=b[0]
keys=c

out=[]
for i in range(1,len(b)):
    tmp=b[i]
    out.append(dict(zip(keys,tmp)))

##a=json.dumps( out ,indent=4)
##print(a)
##
##










import pysolr
#這個url很重要，不能填錯了
##solr = pysolr.Solr('http://localhost:8080/solr/jcg/', timeout=10)
solr = pysolr.Solr('http://localhost:8080/solr/index.html#/zhangbo/documents', timeout=10)


'''
插入:
'''

#正確的數據格式，能夠少項
##solr.add([
##    {
##        "id": "doc_1",
##        "name": "A test document",
##        "cat": "book",
##        "price": "7.99",
##        "inStock": "T",
##        "author": "George R.R. Martin",
##        "series_t": "A Song of Ice and Fire",
##        "sequence_i": "1",
##        "genre_s": "fantasy",
##    }
##])




'''
取數據


'''


# Setup a basic Solr instance. The timeout is optional.
solr = pysolr.Solr('http://localhost:8080/solr/zhangbo', timeout=10)

solr.delete(q='*:*')
results = solr.search('*:*')
print(list(results.__dict__))
print('打印完了')
# How you would index data.
solr.add(out)
print('added')


#搜索jcg中的所有數據
results = solr.search('*:*')

###搜索id爲doc_1的數據
##doc1 = solr.search('id:doc_1')


print(results.__dict__)




'''
#刪除id爲doc_1的數據
solr.delete(id='doc_1') 

#刪除全部數據
solr.delete(q='*:*')
'''

View Code

下面實現,用pysolr對solr進行控制,包括高級搜索功能!

#從solr中讀取jason格式的數據



import pysolr
#這個url很重要，不能填錯了
##solr = pysolr.Solr('http://localhost:8080/solr/jcg/', timeout=10)
solr = pysolr.Solr('http://localhost:8080/solr/index.html#/zhangbo/documents', timeout=10)
print(423)

'''
插入:
'''

#正確的數據格式，能夠少項
##solr.add([
##    {
##        "id": "doc_1",
##        "name": "A test document",
##        "cat": "book",
##        "price": "7.99",
##        "inStock": "T",
##        "author": "George R.R. Martin",
##        "series_t": "A Song of Ice and Fire",
##        "sequence_i": "1",
##        "genre_s": "fantasy",
##    }
##])





'''
取數據


'''
solr = pysolr.Solr('http://localhost:8080/solr/zhangbo', timeout=10)

#必定要在這個地址的solr上操做.
solr.delete(q='*:*')
print(324234)
solr.add([
    {
        "idd": "doc_1",
        "title": "A",
        
    },
    {
        "idd": "doc_2",
        "title": "B",
        
    },
    {
        "idd": "doc_1",
        "title": "C",
        
    },
])


#打印所有
print(list(solr.search('*:*')))

'''

#測試了一天,可算試出來如何高級搜索了.遇到問題能夠看這個py庫的源碼.

#這個search函數的原理是**kwarg傳入一個字典.字典的key是查詢的命令,

#value是你要查詢命令的輸入框中應該寫入的內容.也是從csdn

https://blog.csdn.net/sinat_33455447/article/details/63341339

這個網頁受到的啓示.他寫的是start後面接頁數,因此對應改fq,後面就是接一個條件.



'''     

doc1 = solr.search("idd:doc_1" , **{"fq":'title:A'})



#讀取以後須要list一下,會自動去掉垃圾信息
doc1=list(doc1)

print(doc1)




# Setup a basic Solr instance. The timeout is optional.



print('打印完了')

View Code

4.批量傳輸最後用的這個腳本:

'''
文檔:https://pypi.org/project/pysolr/3.3.0/
localhost後面沒有.com
'''
from __future__ import print_function
import csv
import csv,json
b=csv.reader(open(r'C:\Users\張博\Desktop\201707\20170701.csv'))

b=list(b)
c=b[0]
keys=c
import pysolr
out=[]
print('kaishi' )
solr = pysolr.Solr('http://localhost:8080/solr/zhangbo', timeout=10)

#先都刪了
solr.delete(q='*:*')



j=1
while j <len(b) :
    print('當前放入第%s個到%s+1000)個'%(j,j))
    tmp2=b[j:j+1000]
    for i in tmp2:
        tmp=i
        out.append(dict(zip(keys,tmp)))
        
        
        # How you would index data.
    solr.add(out)    
        
        


        
    j+=1000
    out=[]

import pysolr
# Setup a basic Solr instance. The timeout is optional.



##a=json.dumps( out ,indent=4)
##print(a)
##
##










import pysolr
#這個url很重要，不能填錯了
##solr = pysolr.Solr('http://localhost:8080/solr/jcg/', timeout=10)


'''
插入:
'''

#正確的數據格式，能夠少項
##solr.add([
##    {
##        "id": "doc_1",
##        "name": "A test document",
##        "cat": "book",
##        "price": "7.99",
##        "inStock": "T",
##        "author": "George R.R. Martin",
##        "series_t": "A Song of Ice and Fire",
##        "sequence_i": "1",
##        "genre_s": "fantasy",
##    }
##])




'''
取數據


'''




###搜索jcg中的所有數據
##results = solr.search('*:*')
##
#####搜索id爲doc_1的數據
####doc1 = solr.search('id:doc_1')
##
##
##
##
##
##
##
##'''
###刪除id爲doc_1的數據
##solr.delete(id='doc_1') 
##
###刪除全部數據
##solr.delete(q='*:*')
##'''
##
##
##
##
##
##
##
##

View Code

2018-07-17,10點52 lstm繼續作下去

1.處理數據,加入特徵

# -*- coding: utf-8 -*-
"""
Created on Tue Jul 17 10:48:30 2018

@author: 張博
"""

# -*- coding: utf-8 -*-
"""
Created on Tue Jul 17 09:26:35 2018

@author: 張博
"""



import pandas as pd

#把全部數據拼起來從1到30,此次先不把星期6,日排除
print(3423)
list_all=[]
for i in range(1,31):
     if i<10:
         index_now=str(0)+str(i)
     else:
         index_now=str(i)
     path='C:/Users/張博/Desktop/201707/201707'+index_now+'.csv'
     #路徑帶中文會報錯,改用下面的方法
     f = open(path)
     tmp=pd.read_csv(f)
     f.close()
     list_all.append(tmp)
all_data=pd.concat(list_all)


print(34234)

#選取部分數據,注意'ret_code'裏面有空格

#把省份拆開,對不一樣省份作不一樣的預測.最後須要預測哪一個省份就把哪一個省份的數據扔給那個省份的訓練器
data=all_data #如今data是所有30天的數據了,通過測試仍是用兩個條件一塊兒對data作拆分更好.比只用Province好.
a=data['Province']==99 
b=data['ret_code']=='N         '

c=a&b
tmp=data[c]
#tmp 就是99,N這個條件下的數據彙總

#下面把小時合併起來
#首先獲得全部可能結果的分類而後query便可

print('over')

print(set(tmp['YY']))
list=[]
for yy in set(tmp['YY']):
    for mm in set(tmp['MM']):
        for dd in set(tmp['DD']):
            for hh in set(tmp['HH']):
                now=tmp.query('YY==@yy and MM==@mm and DD==@dd and HH==@hh')
                a=now['sum(BPC_BOEING_201707_MIN.cnt)'].sum()
                #建立一個表
                #下面就是建立一個表的寫法
                df = pd.DataFrame([{'YY':yy, 'MM':mm,'DD':dd,'HH':hh,'Province':99,'ret_code':'N', 'Sum':a}]
                                  ,columns =['YY','MM','DD','HH','Province','ret_code','Sum'])
                
                list.append(df)
a=pd.concat(list)
#a.to_csv(r'e:\output_nonghang\output998.csv')
#到這裏獲得了700多行數據,下面利用這700多行數據預測31號24小時的99_N的交易量


'''
奇怪存一下讀一下才好用.很是神祕!!!!
'''
print(type(a))                
a.to_csv(r'e:\output_nonghang\output998.csv')
print(type(a))
a=pd.read_csv(r'e:\output_nonghang\output998.csv')




import pandas as pd




a['holiday']=0
#a=a.loc(a['DD'].isin([1,2,8,9,15,16,22,23,29,30]))
#help(a)
#print(a)
#遍歷


for i in range(len(a)):
    if a.loc[i,'DD'] in [1,2,8,9,15,16,22,23,29,30]:
        a.loc[i,'holiday']=1

a.to_csv(r'e:\output_nonghang\out.csv')
a=a[['DD','HH','holiday','Sum']]
print(len(a))
print(len(a.loc[0]))
print(a)
a.to_csv(r'e:\output_nonghang\out.csv',index=None)

#下面利用這個數據,總共4個特徵來對最後一個特徵sum來作預測.

'''
跑完這個程序咱們獲得餓了out.csv
這個數據表示99,N這個類型的數據他的4個特徵組成的數據
'''

View Code

sublime編輯技巧:

按住shift,鼠標右鍵拉框.就能多行編輯,想選哪裏就選哪裏

2018-07-17,20點10 對lstm預測項目的總結:

經過學習積累了處理大數據基本方法

1.要把原始數據的圖像畫出來,作大致走勢和噪音點的分析.

從圖片中咱們看出其中23號的數據很是低,查看數據發現他凌晨3點數據是3千,比淺一天3點數據小了80被.而恰巧按照時間序列的原則去掉最前面的一部分數據(由於他沒有歷史值),用其餘數據的後三分之一作預測,正好包含了這個點的數據.因此致使最後預測的效果很差.

2.對於時間序列的預測能夠採用多元lstm.

參考:https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

# -*- coding: utf-8 -*-
"""
Created on Tue Jul 17 10:54:38 2018

@author: 張博
"""

# -*- coding: utf-8 -*-
"""
Created on Mon Jul 16 17:18:57 2018

@author: 張博
"""

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""











#老外的教程:很是詳細,最後的多變量,多step模型應該是最終實際應用最好的模型了.也就是這個.py文件寫的內容
#https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/

'''
SUPER_PARAMETER:通常代碼習慣把超參數寫最開始位置,方便修改和查找
'''
EPOCH=1
LOOK_BACK=24
n_features = 4         #這個問題裏面這個參數不用動,由於只有2個變量

n_hours = LOOK_BACK


import pandas as pd

from pandas import read_csv
from datetime import datetime
# load data
def parse(x):
    return datetime.strptime(x, '%Y %m %d %H')
data = read_csv(r'E:\output_nonghang\out.csv')


#切片和concat便可
tmp1=data.iloc[:,:3]
tmp2=data.iloc[:,3]

print(tmp1)
print(tmp2)
print(type(tmp1))
data=pd.concat([tmp2,tmp1],axis=1)
print(data)


#由於下面的模板是把預測值放在了第一列.因此對data先作一個變換.













#data.to_csv('pollution.csv')






from pandas import read_csv
from matplotlib import pyplot
# load dataset
dataset = data
values = dataset.values



# specify columns to plot
groups = [0, 1, 2, 3, 5, 6, 7]
i = 1


from pandas import read_csv
from matplotlib import pyplot
# load dataset
#dataset = read_csv('pollution.csv', header=0, index_col=0)
##print(dataset.head())
#values = dataset.values
# specify columns to plot
#groups = [0, 1, 2, 3, 5, 6, 7]
#i = 1
# plot each column
#pyplot.figure()
#圖中每一行是一個列數據的展示.因此一共有7個小圖,對應7個列指標的變化.
#for group in groups:
#    pyplot.subplot(len(groups), 1, i)
#    pyplot.plot(values[:, group])
#    pyplot.title(dataset.columns[group], y=0.5, loc='right')
#    i += 1
##pyplot.show()



from math import sqrt
from numpy import concatenate
from matplotlib import pyplot
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# load dataset
dataset = read_csv('pollution.csv', header=0, index_col=0)
values = dataset.values
# integer encode direction
#把標籤標準化而已.好比把1,23,5,7,7標準化以後就變成了0,1,2,3,3
#print('values')
#print(values[:5])
#encoder = LabelEncoder()
#values[:,4] = encoder.fit_transform(values[:,4])
## ensure all data is float
#values = values.astype('float32')
#print('values_after_endoding')
#numpy 轉pd
import pandas as pd
#pd.DataFrame(values).to_csv('values_after_endoding.csv')
#從結果能夠看出來encoder函數把這種catogorical的數據轉化成了數值類型,
#方便作迴歸.
#print(values[:5])
# normalize features,先正規化.
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values)
print('正規化以後的數據')

pd.DataFrame(scaled).to_csv('values_after_normalization.csv')

# frame as supervised learning




# convert series to supervised learning
#n_in:以前的時間點讀入多少,n_out:以後的時間點讀入多少.
#對於多變量,都是同時讀入多少.爲了方便,統一按嘴大的來.
#print('測試shift函數')
#
#df = DataFrame(scaled)
#print(df)      # 從測試看出來shift就是數據同時向下平移,或者向上平移.
#print(df.shift(2))
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
    n_vars = 1 if type(data) is list else data.shape[1]
    df = DataFrame(data)
    cols, names = [],[]
    # input sequence (t-n, ... t-1)
    for i in range(n_in, 0, -1):
        cols.append(df.shift(i))
        names += [('var%d(時間:t-%s)' % (j+1, i)) for j in range(n_vars)]
    # forecast sequence (t, t+1, ... t+n)
    for i in range(0, n_out):
        cols.append(df.shift(-i))
        if i == 0:
            names += [('var%d(時間:t)' % (j+1)) for j in range(n_vars)]
        else:
            names += [('var%d(時間:t+%d)' % (j+1, i)) for j in range(n_vars)]
    # put it all together
    agg = concat(cols, axis=1)
    agg.columns = names
    # drop rows with NaN values
    if dropnan:
        agg.dropna(inplace=True)
    return agg




















#series_to_supervised函數把多變量時間序列的列拍好.

reframed = series_to_supervised(scaled, LOOK_BACK, 1)
print(reframed.shape)
# drop columns we don't want to predict
#咱們只須要預測var1(t)因此把後面的拍都扔了.








# split into train and test sets
values = reframed.values
n_train_hours = int(len(scaled)*0.67)
train = values[:n_train_hours, :]
test = values[n_train_hours:, :]
# split into input and outputs
n_obs = n_hours * n_features
train_X, train_y = train[:, :n_obs], train[:, -n_features]
test_X, test_y = test[:, :n_obs], test[:, -n_features]
#print(train_X.shape, len(train_X), train_y.shape)
#print(test_X.shape, len(test_X), test_y.shape)
#print(train_X)
#print(9999999999999999)
#print(test_X)












#這裏依然是用timesteps=1
#從這個reshape能夠看出來,以前的單變量的feature長度=look_back
#                       如今的多變量feature長度=look_back*len(variables).就這一個區別.
# reshape input to be 3D [samples, timesteps, features]
train_X = train_X.reshape((train_X.shape[0], n_hours, n_features))
test_X = test_X.reshape((test_X.shape[0], n_hours, n_features))
#print(train_X.shape, train_y.shape, test_X.shape, test_y.shape)

'''
網絡結構比較小的時候，效率瓶頸在CPU與GPU數據傳輸，這個時候只用cpu會更快。
網絡結構比較龐大的時候，gpu的提速就比較明顯了。


:顯存和內存同樣,屬於隨機存儲,關機後自動清空。
'''



print('開始訓練')

# design network
model = Sequential()
import keras
from keras import regularizers

from keras import optimizers

import keras

model.add(keras.layers.recurrent.GRU(77, input_shape=(train_X.shape[1], train_X.shape[2]),activation='tanh', 
                                     recurrent_activation='hard_sigmoid',
                                     kernel_regularizer=regularizers.l2(0.01),
                                     recurrent_regularizer=regularizers.l2(0.01)
                                     , bias_regularizer=regularizers.l2(0.01), 
                                     dropout=0.2, recurrent_dropout=0.2))

#model.add(Dense(60, activation='tanh',kernel_regularizer=regularizers.l2(0.01),
#                bias_regularizer=regularizers.l2(0.01)))



def schedule(epoch):
    rate=0.3
    if epoch<3:
        return 0.002  #開始學的快一點
    if epoch<10:
        return 0.001
    if epoch<20:
        return 0.001*rate
    if epoch<30:
        return 0.001*rate**2
    if epoch<70:
       return 0.001*rate**3
    else:
        return 0.001*rate**4
import keras
learning_rate=keras.callbacks.LearningRateScheduler(schedule)
learning_rate2=keras.callbacks.ReduceLROnPlateau(factor=0.5)
#input_dim：輸入維度，當使用該層爲模型首層時，應指定該值（或等價的指定input_shape)

model.add(Dense(1, activation='tanh'))

#loss:mse,mae,mape,msle
adam = optimizers.Adam(lr=0.001, clipnorm=1.)
model.compile(loss='mape', optimizer=adam)
# fit network
#參數裏面寫validation_data就不用本身手動predict了,能夠直接畫histrory圖像了
history = model.fit(train_X, train_y, epochs=EPOCH, batch_size=1,
                    validation_data=(test_X, test_y),
                    verbose=2, shuffle=False,callbacks=[learning_rate,
         learning_rate2])
# plot history
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
pyplot.show()



#訓練好後直接作預測便可.
# make a prediction
yhat = model.predict(test_X)         #yhat 這個變量表示y上面加一票的數學符號
                             #在統計學裏面用來表示算法做用到test上獲得的預測值
test_X = test_X.reshape((test_X.shape[0], n_hours*n_features))
# invert scaling for forecast







#由於以前的scale是對初始數據作scale的,inverse回去還須要把矩陣的型拼回去.
inv_yhat = concatenate((yhat, test_X[:, -(n_features-1):]), axis=1)
inv_yhat = scaler.inverse_transform(inv_yhat)
inv_yhat = inv_yhat[:,0]#inverse完再把數據扣出來.多變量這個地方須要的操做要多點
# invert scaling for actual



print(test_y)
print(99999999999)
test_y = test_y.reshape((len(test_y), 1))
inv_y = concatenate((test_y, test_X[:, -(n_features-1):]), axis=1)
inv_y = scaler.inverse_transform(inv_y)
inv_y = inv_y[:,0]














# calculate RMSE
rmse = sqrt(mean_squared_error(inv_y, inv_yhat))

print('輸出abs差百分比指標:')
#這個污染指數還有0的.干擾很是大
#print(inv_y.shape)
#print(inv_yhat.shape)
wucha=abs(inv_y-inv_yhat)/(inv_y+0.1)
#print(wucha)


#with open(r'c:/234/wucha.txt','w') as f:
#      print(type(wucha))
#      wucha2=list(wucha)
#      wucha2=str(wucha2)
#      f.write(wucha2)




wucha=wucha.mean()
print(wucha)



inv_y=inv_y
inv_yhat=inv_yhat

#print('Test RMSE: %.3f' % rmse)
import numpy as np


pyplot.plot(inv_y,color='black')
pyplot.plot(inv_yhat     ,color='red')

pyplot.show()




'''
寫文件的模板
with open(r'c:/234/wucha.txt','w') as f:
      wucha=str(wucha)
      f.write(wucha)
'''



'''
下面對於lookback參數應該設置多少進行測試:
由於有些數據是0對abs百分比影響很大.因此仍是用rms來作指標:
lookback=1:    rmse:26.591
lookback=30      Test RMSE: 26.124
'''


'''
手動加斷電的方法:raise NameError #這種加斷點方法靠譜
'''

'''
畫圖模板:
from matplotlib import pyplot
data=[]
pyplot.plot(inv_y,color='black')


pyplot.show()

'''

View Code

3.最後偏差百分之8.4 mape

3.調參和模型:

數據初始要正則化,動態學習率,參數的初始化和正則化,經過history方法能夠觀察是否過擬合.提升gpu使用率:能夠用不一樣ide同時跑python程序.

其實都用cmd來跑也能夠.cmd能夠同時運行多個python互相不干擾.這樣gpu利用率高多了.能提升調參效率

https://machinelearningmastery.com/cnn-long-short-term-memory-networks/

這個應該試試.

windows 裏面安裝\卸載子系統ubuntu.出錯了就卸載重裝

https://blog.csdn.net/shixuehancheng/article/details/52173267

https://www.cnblogs.com/shellway/p/3699982.html java例子

記錄java:

Syntax error on token "Invalid Character", delete this token Syntax error on tokens, delete these tokens 這個問題是你的空格和tab用混了.

打算安裝hadoop:

http://archive.apache.org/dist/hadoop/core/

沒成功,各類坑

仍是實驗樓好,直接用.他的剪貼板功能頗有用.直接把windows裏面的代碼能貼到實驗樓的試驗環境中.在實驗樓裏面的代碼互相貼不會出現垃圾信息.

如何生成多個哈希函數

這裏咱們介紹一種快速生成多個哈希函數的方法。

假如你急須要1000個哈希函數，而且這1000個哈希函數都要求相互獨立，不能有相關性。這時，錯誤的方法是去在網上尋找1000個哈希函數。咱們能夠經過一個哈希函數來生成這樣的1000個獨立的哈希函數。

假如，你有一個哈希函數f，它的輸出域是2^64，也就是16字節的字符串，每一個位置上是16進制的數字0-9，a-f。

咱們將這16字節的輸出域分爲兩半，高八位，和低八位是相互獨立的（這16位都相互獨立）。這樣，咱們將高八位做爲新的哈希函數f1的輸出域，低八位做爲新的哈希函數f2的輸出域，獲得兩個新的哈希函數，它們之間相互獨立。

故此能夠經過如下算式獲得1000個哈希函數：

f1+2*f2=f3 f1+3*f2=f4 f1+3*f2=f5 ……

這裏能夠經過數學證實f3與f4及之後的哈希函數不相關，數學基礎較好的同窗能夠查詢相關資料，嘗試證實，這裏就不給出具體的證實了。

哈希表的經典結構

在數據結構中，哈希表最開始被描述成一個指針數組，數組中存入的每一個元素是指向一個鏈表頭部的指針。

咱們知道，哈希表中存入的數據是key,value類型的，哈希表可以put(key,value),一樣也能get(key,value)或者remove(key,value)。當咱們須要向哈希表中put（插入記錄）時，咱們將key拿出，經過哈希函數計算hashcode。假設咱們預先留下的空間大小爲16，咱們就須要將經過key計算出的hashcode模以16，獲得0-15之間的任意整數，而後咱們將記錄掛在相應位置的下面（包括key，value）。

注意：位於哪一個位置下只與key有關，與value無關

例如咱們要將下面這樣一條記錄插入哈希表中：

「shiyanlou」，666 #key是shiyanlou，value是666

首先咱們經過哈希函數，計算shiyanlou的hashcode，而後模以16。假如咱們獲得的值是6，哈希表會先去檢查6位置下是否存在數據。若是有，檢查該節點中的key是否等於shiyanlou，若是等於，則將該節點中的value替換爲666；若是不等於，則在鏈表的最後新添加一個節點，保存咱們的記錄。

因爲哈希函數的性質，獲得的hashcode會均勻分佈在輸出域上，因此模以16，獲得的0-15之間的數目也相近。這就意味着咱們哈希表每一個位置下面的鏈表長度相近。

對於常見的幾種數據結構來講，數組的特色是：容易尋址，可是插入和刪除困難。而鏈表的特色是：尋址困難，可是插入和刪除容易。而對於哈希表來講，它既容易尋址，一樣插入和刪除容易，這一點咱們從它的數據結構中是顯而易見的。

在實際哈希表應用中，它的查詢速度近乎O（1），這是由於經過key計算hashcode的時間是常數項時間，而數組尋址的時間也是常數時間。在實際應用中，每一個位置的鏈表長度不會太長，當到達必定長度後，哈希表會經歷一次擴容，這就意味着遍歷鏈表的時間也是常數時間。

因此，咱們增刪改查哈希表中的一條記錄的時間能夠默認爲O（1）。

ubuntu 下載地址:

http://ubuntu.cn99.com/ubuntu-releases/14.04/

開代理又無法用vmware了

使用的是:ubuntu-14.04.5-desktop-amd64.iso 這個版本

就能上網了

更新vmtools 就能在ubuntu裏面設置分辨率了

ubuntu 使用方法.按鍵盤win 建,輸入t ,就會出現terminal ,把他拖到左邊的快捷欄裏面,打開後,屏幕左上角有edit,perform設置字體.帶圖形界面的linux就是簡單多了.

繼續裝軟件:

java:https://www.linuxidc.com/Linux/2015-01/112030.htm

如何用root來登陸圖形界面.從而隨便複製粘貼.

在網上看了n種辦法，問題是個人ubuntu根本沒有system——>preferences——>……啥的

最後終於搞定。



安裝ubuntu時輸入的用戶名和密碼不是root，只是一個普通用戶，連個文件夾都新建不了。



首先啓動root

$ sudo passwd root

輸入你但願的root密碼

而後

$ su

# vim /etc/lightdm/lightdm.conf

# o //進入編輯模式

最後一行輸入 greeter-show-manual-login=true

按 esc按鈕

輸入 :wq //保存並退出


修改後爲：

[SeatDefaults]

greeter-session=unity-greeter

user-session=ubuntu

greeter-show-manual-login=true

重啓機器

登陸界面就能夠選擇用戶了，登錄root便可。




以後,shudown裏面選restart,開啓周出現一個login用戶,在裏面輸入root,
而後輸入密碼.進去就是root圖形帳號.!!!!!!

View Code

多用戶注意點:

裝環境必定要先切用戶,各個用戶之間的環境不共享!!!

spyder設置默認模板:

tools-preference-editor-advance

https://www.cnblogs.com/wubdut/ 斌哥的博客---吳斌

django

1.pip install django

2.開啓vs2017

3.新建項目-python-web項目-django

https://www.cnblogs.com/feixuelove1009/p/5823135.html (注意這裏面的區別,views 在vs2017中是在app文件夾裏面的)

4.寫好後,項目上啓動cmd 輸入 python manage.py runserver 127.0.0.1:8000

5.瀏覽器　　http://127.0.0.1:8000/index/

注:.用proxifier 中設置代理規則,裏面的輸入127.0.0.1:65535,規則direct便可.就不會被proxifier改代理上不去這個網了.

6.安裝django的須要版本：https://www.cnblogs.com/ld1226/p/6637998.html

教程：https://docs.djangoproject.com/en/2.0/intro/tutorial01/

參考:https://www.cnblogs.com/feixuelove1009/p/5823135.html

win10激活

http://www.xitongzhijia.net/soft/79575.html 關閉防火牆,而後激活就成功了

解決vs2017和proxifier的衝突.

把python anaconda裏面的程序python.exe. pythonw.exe都放到proxifier的列表裏面設置direct直連便可.

如何不用pycharm來創建python的一些項目:好比scrapy (pycharm要的配置過高)

我仍是用vs2017,比spyder強大多了.調試更詳細

如何用vs2017創建vs2017沒有模板的項目:

1.d盤創建一個目錄叫爬蟲練習

2.在目錄裏面開powershell或者cmd 輸入 scrapy startproject ArticleSpider

3.啓動vs2017 , 點文件-新建-項目-python- 從現有的python代碼-文件夾選擇上面的爬蟲練習文件夾-肯定-下一步-完成便可.自動把文件夾變成項目了.

網易雲課堂:

linux 高級系統管理: 技術LVM:邏輯卷

軟件測試:V模型,W模型,迴歸測試,冒煙測試,alpha,beta,pareto原則(8,2原則)

黑盒:等價類劃分法,邊界值法,因果圖,正交表,場景法

什麼事正交表:(1)每一列中，不一樣的數字出現的次數是相等的。(2)任意兩列中數字的排列方式齊全並且均衡 !也就是說只是考慮2個因素的全排列.而不考慮所有的n排列.

因此一個3因素*2特徵的實驗須要 4行便可.怎麼算的?

白盒:邏輯覆蓋,斷定覆蓋,語句覆蓋

圈的複雜度:邊-節點數+2=把平面分紅幾個區域

linux命令:

umask 設置建立的默認權限 (爲何叫這個,由於User's Mask 默認權限就是一個面具)

chattr lsattr 設置,查看權限　　set_uid:chmod u+s xxx # 設置setuid權限

ln -s /tmp/yum.log /root/111/yum.log 創立軟鏈接. (注:若是你用軟鏈接時候寫的不是絕對路徑,那麼移動文件以後,可能會bug找不到源文件)

注:ln -s 命令先寫主文件目錄,後寫小的軟文件目錄

硬鏈接:不能對目錄作,他是用anode來存儲的.刪除一個,另外一個無所謂.硬鏈接也不能跨分區

ls -i 就能看文件的anode了.他就是文件的本質存儲的位置

軟鏈接很是實用的一個技術:磁盤文件的一種移動擴容方法.!!!!!!!

好比/boot/aming.log 這個目錄/boot已經滿了,/root 裏面還很大, 可是我仍是要往這個/boot地址寫入,那麼用軟鏈接來實現這個功能

cp /boot/aming.log /root/aming.log

rm /boot/aming.log

ln -s /root/aming.log /boot/aming.log

只須要這3行命令就夠了.很是牛逼,固然也能夠用上面的LVM技術對邏輯分區擴容

https://ke.qq.com/webcourse/index.html#cid=200566&term_id=100237673&taid=1277907389583222&vid=c1412o4kuqj

講的很是好的操做系統課程.

跟io有關的就必定跟阻塞太有關,不然無關.

變量自加,自減須要利用寄存器.因此翻譯成機器語言是都3句話.

記錄型信號量的使用:

這個記錄型信號量很是有用:例子:

一個打印機,兩個工做a,b

信號量是打印機這個資源的一個描述.

記做typer,他的屬性一個value=1(2個打印機就初始化爲2),一個L=[],方法一個wait,一個signal ,

a先進來,一個進程進來,打印就就運行wait函數.這樣信號量value變成0了.由於0不小於0因此a直接運行不用等.

b這時候才進來,仍是運行wait函數,這時候value就變成-1了,因此觸發block函數.b進入L中

這時候a打印完了,他出來,觸發signal函數.value變成0,觸發wakeup函數.因此b從隊列中彈出.

這時候b能夠進入了,而且不用wait函數.b直接運行.

這時候a又要打印,因此typer又進入wait函數.value變成-1,a進入隊列等待.整個過程完美的實現了臨街資源同一時間只能被一個進程訪問.

(總結:進程從外界進入打印機就調用wait函數進程是被wakeup喚醒的就直接運行!!!!!!! 進程運行完畢調用signal函數)

對上面的信號量做用到多個共享資源時候發生的死鎖現象,繼續作優化.就是下面的and型信號量

1.互斥

2.前驅:

3.合做,同時

雲計算課程:

https://ke.qq.com/webcourse/index.html#course_id=269104&term_id=100317753&taid=1945925128035120&vid=y1423459vg1

馬哥linux 講bash腳本

http://study.163.com/course/courseLearn.htm?courseId=712012#/learn/video?lessonId=876097&courseId=712012

講的很細

#!/bin/bash
groupadd -g 8008 newgroup
useradd -u 3006 -G 8008 mage   #-g ,-u 表示的是id號
#能夠傳入位置參數了
lines=$(wc -l $1|cut -d ' ': -f1)
echo $1 has $lines lines.
#useradd $1
echo $1|passwd --stdin$1 &>/dev/null #把echo出來的東西用--stdin 給passwd &>把錯誤信息給刪除了.

for userno in `seq 301 310`
do
　　useradd user${userno}
done
   
dest=/tmp/dir-$(date+%Y%m%d-%H%M%S)
mkdir $dest
for i in {1..10}
do 
   touch $dest/file$i
done

#變量的運算用let
num1=9
num2=9
let num1+=9  #這就方便多了
#算數運算不賦值
echo $[$num1+$num2]    #這裏面的第一個$表示算數運算符,第二第三個表示取變量值
#expr
echi "the sum is `expr $num1 + $num2`"    #第一要用`號,第二要注意+號左右帶空格

#獲取全部的user_id:
for  i in `cut -d: -f3 /etc/passwd`
do 
   echo $i
done

#傳入一堆地址,返回地址內文件個數 ,$*能夠把一堆變量當一個list傳入
for file in $*
do 
   echo file
done


#添加用戶:
if [ $# -lt 1 ]; then 
   exit 2
fi


if  ! id $username &>/dev/null ; then
    useradd $username
fi
#返回任意多個參數的嘴大值
max=0
for i in $*
do 
   if [ $max -lt $i ]; then 
          max=$i
   fi
done
#看一個文件是否存在空白行:
if grep "^[[ :space ]]*" $1; then
   echo '$1 has $(grep "^[[ :space ]]*" $1 |wc -l ) blank lines.' 
fi
#改主機名:
if [ -z '$localhost' -o '$localhost'=='localhost' ]; then
    localhost magedu.cn
fi

#建立1個文件
for i in `seq 1 10`; do    echo   $i>>1.txt; done

AWK高級應用:

awk 'NR==1' 1.txt 返回第一行

awk 'END{print $0}' 1.txt 打印最後一行

python 鏈接mysql: pymysql 基本命令: 這個mysql瀏覽器輸入127.0.0.1:3306 是沒用的.只能用下面的py腳本鏈接

pymysql基本方法:

View Code

ccna網絡精品課程:

http://study.163.com/course/courseLearn.htm?courseId=1003605098#/learn/video?lessonId=1004115689&courseId=1003605098

局域網用交換機連

廣域網用路由器連

上來ping 127.0.0.1 來看網卡是否是好的. 因此127就給操做系統了

ip 子網掩碼網絡位主機位

有效的子網掩碼的範圍到/30

vrsm :利用子網掩碼再劃分來吧網絡繼續分割

huffman編碼:

給一個字符串aabbccde對他進行huffman編碼

a頻率2

b頻率2

c頻率2

d頻率1

e頻率1

那麼須要5個節點.結論:每個編碼均可以表示成每一個非葉子節點剛好有2個子節點的2茶樹,樹的邊上左寫1右寫0.

那麼葉子節點表示成根到葉子的編碼就是要的碼

1.畫樹 2.給編碼 3.按照頻率大的給短的編碼便可.可是樹的構造須要技巧才能讓最後的碼最短

使用哈夫曼編碼來編碼字符串"aaaabbcd"時，獲得的編碼長度爲多少?

若是寫成平衡樹就須要16個,非平衡樹就14個.這個也是分問題的

好比對abcd編碼,用平衡樹就更短.這個須要試.

計算機題目:

遞歸函數最終會結束，那麼這個函數必定？

有一個分支不調用自身
採用遞歸方式對順序表進行快速排序，下列關於遞歸次數的敘述中，正確的是（）

遞歸次數與每次劃分後獲得的分區處理順序無關
對遞歸程序的優化的通常的手段爲（）   牛逼的例子.這樣棧裏面的元素永遠大o1

以斐波那契數列爲例子
普通的遞歸版本
int fab(int n){
    if(n<3)
        return 1;
    else
        return fab(n-1)+fab(n-2);  
}

具備"線性迭代過程"特性的遞歸---尾遞歸過程
int fab(int n,int b1=1,int b2=1,int c=3){
    if(n<3)
        return 1;
    else {
        if(n==c)
             return b1+b2;
        else
             return fab1(n,b2,b1+b2,c+1);
    }
}
以fab(4)爲例子
普通遞歸fab(4)=fab(3)+fab(2)=fab(2)+fab(1)+fab(2)=3  6次調用
尾遞歸fab(4,1,1,3)=fab(4,1,2,4)=1+2=3                         2次調用

View Code

下列方法中，____不能夠用來程序調優？

使用多線程的方式提升 I/O 密集型操做的效率
IO密集型表示大部分狀況下IO處於繁忙狀態。多線程適合於CPU等待長時間IO操做的狀況，好比網絡鏈接數據流的讀寫
。在IO密集型狀況下IO操做都比較慢，所以須要專門開線程等待IO響應，而不影響非IO任務的執行。
遞歸函數中的形參是（）

自動變量

在間址週期中，______。

對於存儲器間接尋址或寄存器間接尋址的指令，它們的操做是不一樣的

下列哪個是析構函數的特徵（）

一個類中只能定義一個析構函數
標準ASCII編碼是（）位編碼。

7

位操做運算符:參與運算的量，按二進制位進行運算。包括位與(&)、位或(|)、位非(~)、位異或(^)、左移(<<)、右移(>>)六種。
浮點數能夠作邏輯運算,可是不能作位運算.

&&：邏輯與，先後條件同時知足表達式爲真
||：邏輯或，先後條件只要有一個知足表達式爲真
&：按位與
|：按位或
&&和||是邏輯運算，&與|是位運算

如下關於過擬合和欠擬合說法正確的是

過擬合能夠經過減小變量來緩解

兩臺主機A和B已創建了TCP鏈接，A始終以MSS=1KB大小的段發送數據，並一直有數據發送；B每收到一個數據段都會發出一個接收窗口爲9KB的確認段。

若A在T時刻發生超時時擁塞窗口爲8KB，則從T時刻起，再也不發生超時的狀況下，通過10個RTT後，A的發送窗口是（）

9KB
在Linux系統中，由於某些緣由形成了一些進程變成孤兒進程，那麼這些孤兒進程會被如下哪個系統進程接管？

init
在軟件開發中，經典的模型就是瀑布模型，下列關於瀑布模型的說法正確的是()

瀑布模型採用結構化的分析與設計方法，將邏輯實現與物理實現分開

深度學習:
維數災難是什麼:當特徵特別多的時候,維數變高,樣本數量相對會不夠用.分類後的空間佔整個空間變小.繁華能力變差.
https://ke.qq.com/webcourse/index.html#course_id=240557&term_id=100283770&taid=1552484648856493&vid=i1421vqqhew
推薦系統:
視頻學的很少,仍是要從書上基礎來補.統計機率,機器學習,深度學習.這些.
https://ke.qq.com/webcourse/index.html#course_id=277276&term_id=100328034&taid=1988093116955420&vid=v1424iibikr
深度學習優化:
1.損失函數,替代損失函數.好比交叉熵來替換準確率來分類.

複習機率,用於給數據一頓分析.
●貝葉斯

理解就是左邊是一個分類問題的機率.咱們已經知道了A這個事件發生了,也就是A這個物體的符合這個特徵已經知道了.那麼他屬於Bi這個類的機率是多少?

理解右邊公式:分子就是P(A交B)而已.分母就是全機率公式唄表示P(A) 兩個一除,顯然表示當A已經發生了的時候再發生Bi的機率,也就是條件機率.證畢.

　　應用:樸素貝葉斯也就是上面說的A的特徵已經知道了.求A屬於Bi類的機率.樸素貝葉斯說的是各個條件之見的影響是沒有的.也就是機率上獨立.

A的特徵是a1,...an 則P(A|Bi) =P(a1|Bi)*...*P(an|Bi) 也看作極大似然.道理都同樣.感受統計學思想本質就是一個極大似然.說一堆其實化簡化簡都一個.

　　而後咱們的P(aj|Bi)這個機率是經過經驗或者train集來得到的.得到方法count便可.(看Bi類裏面屬性爲aj的有多少,而後除一下Bi裏面元素個數)

●分佈: 0-1分佈, 伯努利試驗(二項分佈,也就是多重0-1分佈),

連續分佈: 均勻分佈,指數分佈(無記憶性),正態分佈(3sigma: 1:68 2:95.4 3: 99.7) 標準正態分佈:φ(-x)=1-φ(x)

2維正太分佈的邊緣分佈與參數rho 無關,因此能夠證實:單由一個分佈的邊緣分佈是不能決定這個分佈的聯合分佈的.

分佈機率獨立=獨立=邊緣機率獨立 (證實顯然)

核心定理:怎麼證實?

往證: F(h,g)=F(h,.)*F(.,g)

左邊=sigma全部x的取值,y的取值s.t.h=h1且g=g1的取值

右邊=(sigma 全部x的取值取s.t.h=h1) (sigma y的取值s.t.g=g1) 利用假設證畢.

●指望：設離散型隨機變量的分佈是P{X=xk}=pk ，那麼若是級數∑xkpk絕對收斂，那麼他就定義爲隨機變量X的數學指望。

爲何這麼要求絕對收斂而不是收斂。

1/2 - 1/3 + 1/4 - 1/5 + 。。。就是條件收斂。算他的和是幾：
設X=1/2 - 1/3 + 1/4 - 1/5 + 。。。   他是ln（1+x）的泰勒展開式因此他等於1-ln（2）=0.306，咱們說他是指望的話，給X一個機率分佈，設成正太吧
，把正太的區間機率值都給這個區間內的1/n ，便可。即假設級數∑xkpk，的每一項xkpk=1/n乘以正負號。可是這顯然不能說明n趨緊無窮的時候X這個取值趨近於0.306，由於顯而後面無窮
多項都趨近於0的。這個跟常識矛盾，因此要加入絕對收斂這個條件。

●重要題型：

●樣本的4個分位數, 箱線圖, 修正箱線圖, 咱們採用中位數來描述數據的中心趨勢,由於他比平均數更不受異常值干擾.

●經常使用計算公式:

標準正態分佈密度函數:

D(X)=E(X^2)-E(X)^2 (方差=先平方再指望-先指望再平方)

E(X^4)=3

證實:

http://blog.sina.com.cn/s/blog_4cb6ee6c0102xh17.html

因此卡方分佈自由度爲n:指望是n,方差是2n

●點估計:

樣本均值=(x1+...+xn)/n 樣本方差:sum(xi-x平均)/(n-1)

●極大似然估計:

設x1,...,xn 是樣本值, 機率密度函數是f(x,θ).

那麼答案是θ 取值使得Πf(x,θ) 最大便可. 事實上更經常使用的是求左面式子取ln後的極值. (由於伺機爆炸問題)

●評價估計量:

無偏性,有效性,相合性.

●正太分佈的抽樣的均值和方差的分佈:

因此有這個很是重要的公式: 最重要的公式!

上面是最核心的公式:解釋:Xi是你觀察一個獨立重複時間的發生頻率.那麼用上式能夠刻畫μ和σ.

中心極限定理:一個徹底相同的實驗重複無窮次.那麼觀測值的平均值是一個正態分佈.

統計學2個方法;1.估計 2.假設檢驗

假設檢驗例子:

解答:首先套用最重要的公式.知道(xhat-miu0)/(σ/sqrt(n)) 是一個正態分佈.假設檢驗成立,那麼左邊這個東西的取值不可能高於1.96,由於若是他高於1.96說明

miu爲0.5這個條件觸發了小几率事件.因此不對.(小几率時間認爲不發生).拒絕域是均值過大,或者太小.因此是雙邊的拒絕域.即用0.025.

題目都很相似:繼續套用一個題目:

第一步:重要定理又來了: (xhat-μ0)/(sigma/sqrt(n)) =標準正態分佈

假設沒有參水,那麼上面正態分佈就必定小於z(0.05).由於拒絕域是參水了,只可能往裏面放水,只能讓冰點提升,不能下降.因此預測

出來的只能是是否超過上0.05分點.因此比較上面的數根z(0.05)便可.超過就說明冰點變高了.說明參水了.

實際中:用的是t檢驗

解釋:就是用S來替代sigma就獲得了t檢驗.拒絕域是零件壽命太小.因此是單邊拒絕域.因此直接帶入t(0.05)(n-1)便可.

●驗證方差:用卡房檢驗.

●分佈擬合檢驗:當分佈不知道的時候

隨機過程:加入時間變量的分佈就是隨機過程了.

刻畫:均值函數,相關函數,協方差函數.

馬爾科夫過程定義:

獨立增量過程定義: 就是時間不重疊的部分,差分,獨立

應用時間序列分析:王燕:

1.描述性分析:經過畫圖,看出規律 (是對時間序列分析必須的一步)

2.統計學方法:1.譜分析:就是用sin,cos級數組合來逼近任意一個函數.

　　　　　　2.時域分析:也就是一個時間是他以前的一段時間的取值的函數.

3.作假設檢驗來吧序列歸類.

　　1.平穩時間序列:若是序列有趨勢性或者週期性,他就不是平穩序列.自相關係數降低速度很快就是,而且沒有周期性.

　　2.純隨機性檢驗:Q統計量.是純隨機性的必定是平穩的時間序列.因此通常也不用檢驗了.看一眼上面第一條就好了.

4.方法性工具:

　　1.p階差分和k步差分.(我很好奇,爲啥不用比分.就是t時刻數據/t-1時刻的數據)

5.平穩時間序列分析:ARMA

　　說白了就是仿射函數.

　　1.斷定模型的平穩性:看圖像法,特徵根法,平穩域(也就是用特徵根來算的)

　　2.平穩性和可逆性條件.

　　3.建模調參便可.

6.非平穩:

這不就是lagrange定理麼,多項式逼近任何函數,而後加一個偏差函數e.核心就是算這個多項式.

數據預處理的方法:

1.趨勢分析:線性擬合,曲線擬合.(都是out的方法,怎麼可能這麼簡單的函數就擬合了!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!沒卵用)

2.數據平滑法:移動平滑法, n期移動平均,指數平滑法,holt平滑法(都沒亂用!!!!!!!!!!!!)

3.季節分析

4.X-11方法

隨機分析方法:

1.線性趨勢:用1階差分

2.曲線趨勢:用2階

3.固定週期的序列:進行週期步長的週期差分

去除pdf的安全和加密:

http://www.liangchan.net/soft/softdown.asp?softid=8065

下載後跑一下便可.

深度網絡好文章:

https://blog.csdn.net/u014696921/article/details/52768311　

sqlite3的使用:
●python自帶,直接cmd裏面輸入sqlite3就進入了



發現拔罐子祛痘祛溼效果很好,可是不能把時間長,容易起水泡

2018-07-31,15點08 學習xgboost
●複習決策樹:
劃分的依據:信息熵增益比(本質就是各個分類的純度愈來愈高)
●迴歸樹:就是算方差,越小越好.最後生成的是分段的常函數.
 舉個例子:擬合(1,2)  (3,4) (5,10) 這3個點.那麼就有2個分點,第一個分店是(1+3)/2  第二個是(3+5)/2 
第一個分店算完方差是0+(4-7)^2/2+(10-7)^2/2=9  第二個分店:是1因此選第二個分店也就是4.小於4用3來畫,大於4用10來畫.
圖案就是

逼近效果還能夠.能比線性迴歸好一點.

●xgboost就是改loss function 改爲 mse+葉子結點個數+葉子節點的數值就是這3個部分了.(mse也就是上面迴歸樹說的方差:由於預測值取的就是平均數)

實際使用xgboost:

https://blog.csdn.net/flydreamforever/article/details/70767818 按照步驟安裝成功.

實例:

分類

from sklearn.datasets import load_iris
import xgboost as xgb
from xgboost import plot_importance
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split

# read in the iris data
iris = load_iris()
print(iris)

X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

# 訓練模型
'''
學習xgboost的使用:
在這裏能夠查詢函數的含義:
http://xgboost.apachecn.org/cn/latest/python/python_api.html?highlight=n_estimators
下面是分類器的參數說明:
max_depth:每一個樹的高度
objective:是一個重要的函數,能夠本身定義.這裏面是多分類softmax
'''
model = xgb.XGBClassifier(max_depth=50, learning_rate=0.01, n_estimators=16000, 
                          silent=True, objective='multi:softmax')
model.fit(X_train, y_train)

# 對測試集進行預測
ans = model.predict(X_test)

# 計算準確率
cnt1 = 0
cnt2 = 0
for i in range(len(y_test)):
    if ans[i] == y_test[i]:
        cnt1 += 1
    else:
        cnt2 += 1

print("Accuracy: %.2f %% " % (100 * cnt1 / (cnt1 + cnt2)))

# 顯示重要特徵
plot_importance(model)
plt.show()

View Code

迴歸:

# -*- coding: utf-8 -*-
"""
Created on Fri Jul 20 10:58:02 2018


@author: 張博
"""

#讀取csv最穩的方法:
#f = open(r'C:\Users\張博\Desktop\展現\old.csv')
#data = read_csv(f,header=None)





'''
畫圖模板:
from matplotlib import pyplot
data=[]
pyplot.plot(data,color='black')
pyplot.show()

'''



'''
獲取當前時間:
import datetime
nowTime=datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')#如今
nowTime=((nowTime)[:-3])
print(nowTime)
'''


'''
寫文件的模板
with open(r'c:/234/wucha.txt','w') as f:
      wucha=str(wucha)
      f.write(wucha)
'''



'''
手動加斷電的方法:raise 
'''








# -*- coding: utf-8 -*-
"""
Created on Fri Jul 20 10:58:02 2018


@author: 張博
"""

#讀取csv最穩的方法:
#f = open(r'C:\Users\張博\Desktop\展現\old.csv')
#data = read_csv(f,header=None)





'''
畫圖模板:
from matplotlib import pyplot
data=[]
pyplot.plot(data,color='black')
pyplot.show()

'''



'''
獲取當前時間:
import datetime
nowTime=datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')#如今
nowTime=((nowTime)[:-3])
print(nowTime)
'''


'''
寫文件的模板
with open(r'c:/234/wucha.txt','w') as f:
      wucha=str(wucha)
      f.write(wucha)
'''



'''
手動加斷電的方法:raise 
'''


# -*- coding: utf-8 -*-
"""
Created on Fri Jul 20 10:58:02 2018


@author: 張博
"""









# -*- coding: utf-8 -*-
"""
Created on Tue Jul 17 10:54:38 2018

@author: 張博
"""

# -*- coding: utf-8 -*-
"""
Created on Mon Jul 16 17:18:57 2018

@author: 張博
"""

# -*- coding: utf-8 -*-
"""
Spyder Editor

This is a temporary script file.
"""

#2018-07-23,22點54對學習率參數進行for循環來學習哪一個最好RATE
for i in range((1)):
    
    import os
    os.environ['CUDA_VISIBLE_DEVICES'] = '0' #使用 GPU 0
    
    import tensorflow as tf
    from keras.backend.tensorflow_backend import set_session
    
    config = tf.ConfigProto()
    config.gpu_options.allocator_type = 'BFC' #A "Best-fit with coalescing" algorithm, simplified from a version of dlmalloc.
    config.gpu_options.per_process_gpu_memory_fraction = 1.
    config.gpu_options.allow_growth = True
    set_session(tf.Session(config=config))
    
    
    
    
    
    
    
    
    #老外的教程:很是詳細,最後的多變量,多step模型應該是最終實際應用最好的模型了.也就是這個.py文件寫的內容
    #https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/
    
    '''
    SUPER_PARAMETER:通常代碼習慣把超參數寫最開始位置,方便修改和查找
    '''
    EPOCH=100
    LOOK_BACK=1
    n_features = 3         #這個問題裏面這個參數不用動,由於只有2個變量
    RATE=0.55
    shenjing=62
    n_hours = LOOK_BACK
    
    
    
    
    
    import pandas as pd
    
    from pandas import read_csv
    from datetime import datetime
    # load data
    def parse(x):
        return datetime.strptime(x, '%Y %m %d %H')
    data = read_csv(r'E:\output_nonghang\out2new.csv')
    
    #應該把DD給刪了,天數沒用
    #切片和concat便可
    
    
    tmp1=data.iloc[:,2:3]
    tmp2=data.iloc[:,3]
    tmp3=data.iloc[:,1]

    data.to_csv('c:/234/out00000.csv')
    
#    for i in range(len(tmp3)):
#        if tmp3[i] in range(12,13):
#            tmp3[i]=1
#        if tmp3[i] in range(13,14):
#            tmp3[i]=2
#        else:
#            tmp3[i]=0


    #加一個預處理判斷.判斷數據奇異的點.
    #方法是:遍歷一遍整個數據,若是這個點的數據比同時工做日或者週末的狀況的mean的0.2還低
    #就說明這個點錯了.用上面同比狀況mean來替代.
    #2018-07-25,21點52跑出來百分之5.8錯誤率,說明這個修正的初始化過程很是重要!!否則就在
    #8左右徘徊.
    '''
    應該是更好的一種修改壞點的方法:
    好比7月23日3點的數據是錯的.那麼咱們就用7月1日到7月23日2點的數據作訓練,而後來預測7越23日3點的數據
    把這個7月23日3點預測到的數據當成真是數據來給7月23日3點.後面的壞點都一樣處理.
    
    好比若是7月23日3點和4點數據都壞了.(也就是顯然跟真實數據差不少,個人判斷是比同期的數據0.4唄還低)
    那麼我先預測3點的數據,而後把這個預測到的數據當真實值,4點的數據用上前面預測到的3點的值繼續跑.來
    預測4點的值.這樣就把3,4點的值都修正過來了.固然時間上會很慢,比下面使用的平均數替代法要多跑2次深度學習.
    '''

    for i in range(len(data)):
        hour=data.iloc[i]['HH']
        week=data.iloc[i]['week']
        tmp56=data.query('HH == '+str(hour) +' and '+ 'week=='+str(week)+' and '+'index!='+str(i))
        tmp_sum=tmp56['Sum'].mean()
        
        if data.iloc[i]['Sum']< tmp_sum *0.4:
            data.iloc[i]['Sum']=tmp_sum 
            print('修改了以下行,由於他是異常點')
            print(i)
            
            
    
    #修改完畢


    tmp1=data.iloc[:,2:3]
    tmp2=data.iloc[:,3]
    tmp3=data.iloc[:,1]















    









    
    
    data=pd.concat([tmp2,tmp3,tmp1],axis=1)
#    print(data)
    
    
    
    
    
    data.to_csv('c:/234/out00000.csv')
    
    
    #由於下面的模板是把預測值放在了第一列.因此對data先作一個變換.
    
    
    
    
    
    
    
    
    
    
    
    
    
    #data.to_csv('pollution.csv')
    
    
    
    
    
    
    from pandas import read_csv
    from matplotlib import pyplot
    # load dataset
    dataset = data
    values = dataset.values
    
    
    
    ## specify columns to plot
    #groups = [0, 1, 2, 3, 5, 6, 7]
    #i = 1
    
    
    from pandas import read_csv
    from matplotlib import pyplot
    # load dataset
    #dataset = read_csv('pollution.csv', header=0, index_col=0)
    ##print(dataset.head())
    #values = dataset.values
    # specify columns to plot
    #groups = [0, 1, 2, 3, 5, 6, 7]
    #i = 1
    # plot each column
    #pyplot.figure()
    #圖中每一行是一個列數據的展示.因此一共有7個小圖,對應7個列指標的變化.
    #for group in groups:
    #    pyplot.subplot(len(groups), 1, i)
    #    pyplot.plot(values[:, group])
    #    pyplot.title(dataset.columns[group], y=0.5, loc='right')
    #    i += 1
    ##pyplot.show()
    
    
    
    from math import sqrt
    from numpy import concatenate
    from matplotlib import pyplot
    from pandas import read_csv
    from pandas import DataFrame
    from pandas import concat
    from sklearn.preprocessing import MinMaxScaler
    from sklearn.preprocessing import LabelEncoder
    from sklearn.metrics import mean_squared_error
    from keras.models import Sequential
    from keras.layers import Dense
    from keras.layers import LSTM
    # load dataset
    
    
    # integer encode direction
    #把標籤標準化而已.好比把1,23,5,7,7標準化以後就變成了0,1,2,3,3
    #print('values')
    #print(values[:5])
    #encoder = LabelEncoder()
    #values[:,4] = encoder.fit_transform(values[:,4])
    ## ensure all data is float
    #values = values.astype('float32')
    #print('values_after_endoding')
    #numpy 轉pd
    import pandas as pd
    #pd.DataFrame(values).to_csv('values_after_endoding.csv')
    #從結果能夠看出來encoder函數把這種catogorical的數據轉化成了數值類型,
    #方便作迴歸.
    #print(values[:5])
    # normalize features,先正規化.
    
    
    
    
    #這裏面係數多嘗試(0,1) (-1,1) 或者用其餘正則化方法.
    scaler = MinMaxScaler(feature_range=(-1, 1))
    scaled = scaler.fit_transform(values)
    print('正規化以後的數據')
    
    pd.DataFrame(scaled).to_csv('values_after_normalization.csv')
    
    # frame as supervised learning
    
    
    
    
    # convert series to supervised learning
    #n_in:以前的時間點讀入多少,n_out:以後的時間點讀入多少.
    #對於多變量,都是同時讀入多少.爲了方便,統一按嘴大的來.
    #print('測試shift函數')
    #
    #df = DataFrame(scaled)
    #print(df)      # 從測試看出來shift就是數據同時向下平移,或者向上平移.
    #print(df.shift(2))
    def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
        n_vars = 1 if type(data) is list else data.shape[1]
        df = DataFrame(data)
        cols, names = [],[]
        # input sequence (t-n, ... t-1)
        for i in range(n_in, 0, -1):
            cols.append(df.shift(i))
            names += [('var%d(時間:t-%s)' % (j+1, i)) for j in range(n_vars)]
        # forecast sequence (t, t+1, ... t+n)
        for i in range(0, n_out):
            cols.append(df.shift(-i))
            if i == 0:
                names += [('var%d(時間:t)' % (j+1)) for j in range(n_vars)]
            else:
                names += [('var%d(時間:t+%d)' % (j+1, i)) for j in range(n_vars)]
        # put it all together
        agg = concat(cols, axis=1)
        agg.columns = names
        # drop rows with NaN values
        if dropnan:
            agg.dropna(inplace=True)
        return agg
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    #series_to_supervised函數把多變量時間序列的列拍好.
    
    reframed = series_to_supervised(scaled, LOOK_BACK, 1)
    
    # drop columns we don't want to predict
    #咱們只須要預測var1(t)因此把後面的拍都扔了.
    
    help111=series_to_supervised(values, LOOK_BACK, 1)
    
    
    
    print('處理的數據集')
    print(help111)
    
    # split into train and test sets
    values = reframed.values
    n_train_hours = int(len(scaled)*0.75)
    train = values[:n_train_hours, :]
    test = values[n_train_hours:, :]
    # split into input and outputs
    n_obs = n_hours * n_features
    train_X, train_y = train[:, :n_obs], train[:, -n_features]
    test_X, test_y = test[:, :n_obs], test[:, -n_features]
    #print(train_X.shape, len(train_X), train_y.shape)
    #print(test_X.shape, len(test_X), test_y.shape)
    #print(train_X)
    #print(9999999999999999)
    #print(test_X)
    
    
    
    
    
    '''
    因此最後咱們獲得4個數據
    train_X
    train_Y
    test_X
    test_Y
    '''
    
    
    
    
    
    #下面我開始改爲xgboost來跑
#    print(train_X.shape)
#    print(train_y.shape)
#    print(test_X.shape)
#    print(test_y.shape)
    
    
    
    '''
    Learning Task Parameters

Specify the learning task and the corresponding learning objective. The objective options are below:

objective [default=reg:linear]
reg:linear: linear regression
reg:logistic: logistic regression
binary:logistic: logistic regression for binary classification, output probability
binary:logitraw: logistic regression for binary classification, output score before logistic transformation
gpu:reg:linear, gpu:reg:logistic, gpu:binary:logistic, gpu:binary:logitraw: versions of the corresponding objective functions evaluated on the GPU; note that like the GPU histogram algorithm, they can only be used when the entire training session uses the same dataset
count:poisson –poisson regression for count data, output mean of poisson distribution
max_delta_step is set to 0.7 by default in poisson regression (used to safeguard optimization)
survival:cox: Cox regression for right censored survival time data (negative values are considered right censored). Note that predictions are returned on the hazard ratio scale (i.e., as HR = exp(marginal_prediction) in the proportional hazard function h(t) = h0(t) * HR).
multi:softmax: set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)
multi:softprob: same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata * nclass matrix. The result contains predicted probability of each data point belonging to each class.
rank:pairwise: set XGBoost to do ranking task by minimizing the pairwise loss
reg:gamma: gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be gamma-distributed.
reg:tweedie: Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be Tweedie-distributed.
base_score [default=0.5]
The initial prediction score of all instances, global bias
For sufficient number of iterations, changing this value will not have too much effect.
eval_metric [default according to objective]
Evaluation metrics for validation data, a default metric will be assigned according to objective (rmse for regression, and error for classification, mean average precision for ranking)
User can add multiple evaluation metrics. Python users: remember to pass the metrics in as list of parameters pairs instead of map, so that latter eval_metric won’t override previous one
The choices are listed below:
rmse: root mean square error
mae: mean absolute error
logloss: negative log-likelihood
error: Binary classification error rate. It is calculated as #(wrong cases)/#(all cases). For the predictions, the evaluation will regard the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances.
error@t: a different than 0.5 binary classification threshold value could be specified by providing a numerical value through ‘t’.
merror: Multiclass classification error rate. It is calculated as #(wrong cases)/#(all cases).
mlogloss: Multiclass logloss.
auc: Area under the curve
ndcg: Normalized Discounted Cumulative Gain
map: Mean average precision
ndcg@n, map@n: ‘n’ can be assigned as an integer to cut off the top positions in the lists for evaluation.
ndcg-, map-, ndcg@n-, map@n-: In XGBoost, NDCG and MAP will evaluate the score of a list without any positive samples as 1. By adding 「-」 in the evaluation metric XGBoost will evaluate these score as 0 to be consistent under some conditions.
poisson-nloglik: negative log-likelihood for Poisson regression
gamma-nloglik: negative log-likelihood for gamma regression
cox-nloglik: negative partial log-likelihood for Cox proportional hazards regression
gamma-deviance: residual deviance for gamma regression
tweedie-nloglik: negative log-likelihood for Tweedie regression (at a specified value of the tweedie_variance_power parameter)
seed [default=0]
Random number seed.
    '''
    
    #迴歸

        
    model = xgb.XGBRegressor(max_depth=10, learning_rate=0.1,
                             n_estimators=1600, silent=True, objective='reg:linear')
    model.fit(train_X, train_y)
    
    # 對測試集進行預測
    ans = model.predict(test_X)
    yhat=ans
    # 顯示重要特徵
    import xgboost

    
    axx=plt.rcParams['figure.figsize'] = (20, 3)
     
    xgboost.plot_importance(model)
    
    plt.show() 
#    import graphviz
#    xgboost.plot_tree(model)
#    
#    plt.show()   
    
    
    
    
    
    
    
    
    
    
    test_X = test_X.reshape((test_X.shape[0], n_hours*n_features))
    # invert scaling for forecast
    
    
    
    import numpy as np
    yhat=yhat.reshape(len(yhat),1)
    print(yhat.shape)
    print(test_X[:, -(n_features-1):].shape)
    
    #由於以前的scale是對初始數據作scale的,inverse回去還須要把矩陣的型拼回去.
    inv_yhat = np.concatenate((yhat, test_X[:, -(n_features-1):]), axis=1)
    inv_yhat = scaler.inverse_transform(inv_yhat)
    inv_yhat = inv_yhat[:,0]#inverse完再把數據扣出來.多變量這個地方須要的操做要多點
    # invert scaling for actual
    
    
    
    test_y = test_y.reshape((len(test_y), 1))
    inv_y = concatenate((test_y, test_X[:, -(n_features-1):]), axis=1)
    inv_y = scaler.inverse_transform(inv_y)
    inv_y = inv_y[:,0]
    
    
    
    
    
    
    
    
    
    with open(r'c:/234/inv_y.txt','w') as f:
          inv_y1=str(inv_y)
          f.write(inv_y1)
    with open(r'c:/234/inv_yhat.txt','w') as f:
          inv_yhat1=str(inv_yhat)
          f.write(inv_yhat1)
    
    
    
    # calculate RMSE
    rmse = sqrt(mean_squared_error(inv_y, inv_yhat))
#    print('RATE:')
#    print(RATE)
    print('輸出abs差百分比指標:')
    #這個污染指數還有0的.干擾很是大
    #print(inv_y.shape)
    #print(inv_yhat.shape)
    wucha=abs(inv_y-inv_yhat)/(inv_y)
    #print(wucha)
    '''
    下面把獲得的abs百分比偏差寫到 文件裏面
    '''

    #with open(r'c:/234/wucha.txt','w') as f:
    #      print(type(wucha))
    #      wucha2=list(wucha)
    #      wucha2=str(wucha2)
    #      f.write(wucha2)
    
    with open(r'c:/234/sumary.txt','a') as f:
          rate=str(RATE)
          f.write(rate+'，')
          shenjing=str(shenjing)
          f.write(shenjing)
          f.write(',')
          wucha2=wucha.mean()
          wucha2=str(wucha2)
          f.write(wucha2)
          f.write('.')
          f.write('\n')
    
    
    wucha=wucha.mean()
    print(wucha)
    
    
    
    inv_y=inv_y
    inv_yhat=inv_yhat
    
    #print('Test RMSE: %.3f' % rmse)
    import numpy as np
    
    from matplotlib import pyplot
    pyplot.rcParams['figure.figsize'] = (20, 3) # 設置figure_size尺寸
    
    pyplot.rcParams['image.cmap'] = 'gray' # 
    pyplot.plot(inv_y,color='black',linewidth = 0.7)
    pyplot.plot(inv_yhat     ,color='red',linewidth = 0.7)
    
    pyplot.show()
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    '''
    獲取當前時間:
    import datetime
    nowTime=datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')#如今
    nowTime=((nowTime)[:-3])
    print(nowTime)
    '''
    
    
    '''
    寫文件的模板
    with open(r'c:/234/wucha.txt','w') as f:
          wucha=str(wucha)
          f.write(wucha)
    '''
    
    
    
    
    
    
    '''
    手動加斷電的方法:raise NameError #這種加斷點方法靠譜
    '''
    
    '''
    畫圖模板:
    import numpy as np
    
    from matplotlib import pyplot
    pyplot.rcParams['figure.figsize'] = (20, 3) # 設置figure_size尺寸
    
    pyplot.rcParams['image.cmap'] = 'gray' # 
    pyplot.plot(inv_y,color='black',linewidth = 0.7)
    
    
    pyplot.show()
    
    
    '''
    
    #讀取csv最穩的方法:
    #f = open(r'C:\Users\張博\Desktop\展現\old.csv')
    #data = read_csv(f,header=None)

View Code


●模型的評估
公式:a1,a2同分布,獨立那麼D(a1-a2)=2σ方  推導:D(a1-a2)=E(a1^2)-2E(a1a2)+E(a2^2)=利用獨立性=2(E(a1^2)-E(a1)^2) 證畢
　　　　　　　　　　顯然不能把a2換成a1,由於雖然同分布可是換成a1表示他們是嚴格相關的,而事實上,他們是獨立的

    a1,a2獨立那麼E(a1a2)=E(a1)E(a2)  這個怎麼理解?兩個變量無關,那麼a1a2
性能指標:
    2分問題:
　　  查準率precision 查全率 recall　　PR曲線
　　交叉驗證的處理:


流程:假設錯誤率小於ε0,則計算後若是<t(k-1)0.05就能夠說假設成立.

重學linux:
http://study.163.com/course/courseLearn.htm?courseId=956005&from=study#/learn/video?lessonId=1004214244&courseId=956005
ls -alh  方便查看文件大小
ls -d   看目錄
--help  能夠查詢命令的使用方法.
複製粘貼: 鼠標中鍵
ll 命令 =ls -l
date -s 修改時間
date 查看時間
去掉鎖屏:  https://blog.csdn.net/super828/article/details/79342862
定時開機的原理:用bios來啓動
查看硬件信息:cat /proc/meminfo
動態查看日誌:
tail -f /var/log/messages  這樣只要文件俺動了,就能當即看到.
cp  /root/*.txt  /opt 把全部*.txt都複製過去.
!$ 表示上一個命令的參數
vim:/正想查找   數字+gg  調到那一行
帳號信息:/etc/passwd   密碼在 /etc/shadow

組帳號信息:/etc/group  密碼在  /etc/gshadow
直接改/etc/passwd裏面 uid改爲0 就變root了!
userdel -r [用戶]  完全刪除,不然只刪除passwd裏面的對應行.
給用戶密碼: echo 123456|passwd -- stdin [用戶]
取哈希: echo 123456|sha1sum
可是密碼同樣,在/etc/shadow 裏面仍是不同.

chmod 修改權限  chmod u-w a.txt   chmod g+x a.txt   chmod o-r a.txt   chmod a=r a.txt (全部人)
ll -d test 看目錄權限

chown san:bin  a.txt   修改所屬主和所屬組

擁有者沒有w權限仍是能夠寫:加!便可.
mount /dev/sr0 /mnt   掛載光盤
umount /mnt    取消掛載
rpm -ivh [軟件包]
以來關係複雜就用 yum install   [軟件包]
壓縮:
對文本壓縮比高,對圖片和視頻越壓越大.由於他們已經壓過了,再壓反倒變大
file 文件類型的擦看
du [文件夾] 看文件夾的大小.
ps -aux
cat /proc 看參數
kill -9 [進程PID號]
killall
pstree
top -p [進程id號]   (-p 表示選id號)
看id號:   ps -aux |grep [進程名]
nice -n 5 vi a.txt   (設置進程的有限度)
renice -n 5 vi a.txt
grep -v 4 a.txt   -v表示取反
過濾空行 grep ^$ a.txt
find /etc/ name '*.txt'  查找名字是*.txt的文件

find /etc/ -perm 755  查找權限

find /etc/ -user 755

find /etc/ -group 755

C語言:http://study.163.com/course/courseLearn.htm?courseId=1004489035#/learn/video?lessonId=1048926708&courseId=1004489035

ide:Dev-C++ 用這個軟件來作. 首先調試的設置方法:http://tieba.baidu.com/p/3976904106
太難用了,仍是用vc6.0吧
跳動的小球:

//http://study.163.com/course/courseLearn.htm?courseId=1004489035&from=study#/learn/video?lessonId=1049009037&courseId=1004489035
//跳動的小球 
#include<math.h>
#include<stdlib.h>
#include<windows.h>
#include<stdio.h>
int main(void)
{
   int i ,j ;
   int x=10;
   int y=20;
   int velocity=1;
   int velocity2=1;
   while (1){ 
   if (x>10 ||x<0)
     velocity*=-1; //到邊界的時候變向 
   if (y>20 ||y<0)
     velocity2*=-1; //到邊界的時候變向 
   x=x+velocity;
   y=y+velocity2;
   for (i=0;i<x;i++)
      printf("\n");
   for (j=0;j<y;j++)
      printf(" ");
   printf("o\n");
   Sleep(50);
   
   system("cls");//這個函數能讓屏幕清空,光標變回0,0位置 

   
   }
 } 
 
 
 
 
 
 
 
 
 
 /*
按F11開始編譯和運行
 
默認須要輸入的模板 
#include<math.h>
#include<stdio.h>
int main(void)
{

     printf("%d",x);
 } 

強轉換:
x=(float)x3; 正確
x=float(x3); 正確
x5=(float)x5; 錯誤 由於轉化類型後不能賦值給本身. 
讀取字符:
scanf("%c %d",&b,&a); //鍵盤輸入時候必須m 10213 
scanf("%c,%d",&b,&a); //鍵盤輸入時候必須m,321 
scanf("%c%d",&b,&a); //鍵盤輸入時候必須m 1023或者m 回車123 


for語句:
for (i=1;i<=n;i++)
  {s=s+i; 
  }

while語句:
while(i<=100)
{sum=sum+i;
} 

if語句: 

if (a<b) y=-1;
else if (x==0) y=0;
else y=1;

switch語句: 
switch (month)
{
case1: printf("January");break ;
case2: printf("Fanuary");break ;
default:printf("Fanuary");break ;
}










*/

View Code

飛機遊戲:

//http://study.163.com/course/courseLearn.htm?courseId=1004489035&from=study#/learn/video?lessonId=1049009037&courseId=1004489035
//飛機遊戲 
#include<math.h>
#include<stdlib.h>
#include<windows.h>
#include<conio.h>   //能夠用getch() ;
 
#include<stdio.h>
int main(void)
{
   int i ,j ;
   int x,y=10;
   char input;
   char fired=0;
   int ny=15;//靶子的位置 
   int isKilled=0 ;
   while (1){ 
   system("cls");//這個函數能讓屏幕清空,光標變回0,0位置 
   
   
   
   //靶子
   if (isKilled==0) {
   for (i=0;i<ny;i++)
      printf(" ");
   printf("+\n");}
   
   
   if (fired==0){
   for (i=0;i<x;i++)
      printf("\n");
   for (j=0;j<y;j++)
      printf(" ");
    printf("\n");}
   
   else
       {
           
   
   
   for (i=0;i<x+1;i++) //開槍時候多甩一行 
      {
          for (j=0;j<y;j++)
          {printf(" ");}
      
        printf("|\n");
      }
      
           if(y==ny){isKilled=1;} 
 
       fired=0;}
       
       
       
       
   //下面幾行畫飛機 
   for (j=0;j<y;j++)
          printf(" ");
   printf("o\n");//\n用於把光標甩到最左邊 
   for (j=0;j<y-2;j++)
       printf(" ");
   printf("*****\n");
   for (j=0;j<y-2;j++)
       printf(" ");
   printf(" * * \n");
   //畫完了 
   
   
   input=getch();//相似scanf會自動停住,好處是不用輸入回車就能賦值了. 
   if (input=='s')
      x++;
   if (input=='w')
      x--;
   if (input=='a')
      y--;
   if (input=='d')
      y++;
   if (input==' ')
      fired=1;
   }
 } 
 
 
 
 
 
 
 
 
 
 /*
按F11開始編譯和運行
 
默認須要輸入的模板 
#include<math.h>
#include<stdio.h>
int main(void)
{

     printf("%d",x);
 } 

強轉換:
x=(float)x3; 正確
x=float(x3); 正確
x5=(float)x5; 錯誤 由於轉化類型後不能賦值給本身. 
讀取字符:
scanf("%c %d",&b,&a); //鍵盤輸入時候必須m 10213 
scanf("%c,%d",&b,&a); //鍵盤輸入時候必須m,321 
scanf("%c%d",&b,&a); //鍵盤輸入時候必須m 1023或者m 回車123 


for語句:
for (i=1;i<=n;i++)
  {s=s+i; 
  }

while語句:
while(i<=100)
{sum=sum+i;
} 

if語句: 

if (a<b) y=-1;
else if (x==0) y=0;
else y=1;

switch語句: 
switch (month)
{
case1: printf("January");break ;
case2: printf("Fanuary");break ;
default:printf("Fanuary");break ;
}










*/

View Code

c語言的基礎知識:

int a; 表示的是auto int a;他是一個自動類型變量,表示的是你不給他賦值他就是一個隨機的(也叫狗屎值),他的生命週期是他所在的括號的範圍內.

static int a;表示的是靜態變量.不給他賦值他就自動錶示的是0.

頭文件:創建一個文件叫t2.h 而後cpp文件裏面輸入include "t2.h" 便可引入這個文件.

宏定義的使用:#define Pi 3.14156  
函數傳遞的是形參,不會對原來的進行修改,可是參數是數組的狀況下就不是形參而是實參,由於他傳遞的是地址!
指針初始化必定要賦值 int *p=NULL 不然他會隨機指向一個地方,不安全.
指針能夠相減表示之間差幾個元素.可是指針不能相加,可是指針能夠加一個int.
數組名是地址常量不能用++ 可是能夠用+1    好比 int a[10]; a是地址常量是不能放在等號左邊的.因此a++  本質是a+=1因此不對.而a+1沒有賦值因此對
指針p=a  那麼p是一個變量,能夠放在等號的左邊.因此這個時候p++,p+1都是對的.
malloc 動態申請內存,返回首地址
●數組中每一個元素都是指針,叫指針數組.
●指針的指針 int **p;  p指向一個指針,被指向的指針指向一個int.使用就是*p或者**p便可.
總之就是符號各類混亂,須要熟練掌握理解才能看的透徹:
定義地址用*  int a=3;int*p=&a;  這個*表示的是定義p是一個地址.因此含義是把a的地址賦值給p
int *p;  p=&a ;   這裏不是定義,因此不用寫*p,直接賦值給地址.

●動態分配2維數組

#include<stdio.h>
#include<stdlib.h>
int main()
{
    int high,width,i,j;
    scanf("%d%d",&high,&width);  // 用戶自定義輸入長寬

    // 分配動態二維數組的內存空間
    int **canvas=(int**)malloc(high*sizeof(int*)); 
    for(i=0;i<high;i++)
        canvas[i]=(int*)malloc(width*sizeof(int));
    
    // canvas能夠當成通常二維數組來使用了
    for (i=0;i<high;i++)
        for (j=0;j<width;j++)
            canvas[i][j] = i+j;
    for (i=0;i<high;i++)
    {
        for (j=0;j<width;j++)
            printf("%d ",canvas[i][j]);
        printf("\n");
    }
        
    // 使用完後清除動態數組的內存空間
    for(i=0; i<high; i++)
        free(canvas[i]);
    free(canvas);
    
    return 0;
}

View Code

字符串的賦值:

下面的不對:由於str是常量不能放等號左邊.

char str[20];
str="i love china";

下面的正確:由於str是變量能夠放在等號左邊.

char *p;
p="i love china";

char *p;

scanf("%s",p); 錯誤由於p初始化時候沒有開闢空間.

數字轉字符:

3+'0' 獲得的就是數字3的字符(本質就是ascii碼加3)

結構體:把不一樣類型的數據組合在一塊兒.

結構體直接賦值便可,結構體數組也同樣:如:

struct s1{
char name[20];
char addr[40];
int id;
};
int main(void)
{

s1 first={"zhangsan","changzhou",3};

s1 student[30];
for (int i=1;i<=29;i++) {
student[i]=first;}
printf("%s",student[10].name);

}

鏈表:

#include<math.h>
#include<stdlib.h>
#include<windows.h>
#include<conio.h> //能夠用getch() ;

#include<stdio.h>

struct node {
int val;
node * next;

};
int main(void)
{

node *p1, *p2;
p1 = (node *)malloc(sizeof(node));
p2 = (node *)malloc(sizeof(node));
(*p1).val = 1;
(*p2).val = 2;
(*p1).next = p2;
p2->next = NULL;

printf("%s", (p1->next->next)); //取內容的運算級別最低
free(p1);
free(p2);
return 0;
}

http://study.163.com/course/courseLearn.htm?courseId=1005353018#/learn/video?lessonId=1052520444&courseId=1005353018

windows技巧:everything軟件查詢文件速度很快

忘記密碼:

用大白菜u盤啓動盤

angularjs 教程:

http://study.163.com/course/courseLearn.htm?courseId=1003290024#/learn/video?lessonId=1003746479&courseId=1003290024

<div> 表示區域塊,對於區域塊的東西能夠同時設置屬性. 英文就是division切塊.

ng-app="" 表示這個塊歸我angularjs管.

ng-model="str" 表示數據

ng-bind="str" 表示顯示綁定的位置比這個更高級的叫模板 :{{ }}

2句最重要的話:

1.angular和js不互通

2.開發只須要頂住數據便可.

ng-init="a=0;b=0"

ng-repeat的模板

<!DOCTYPE html>
<html ng-app="">
<head>
<meta charset=utf-8″>
<title>大發生的</title>
<script src="https://cdn.bootcss.com/angular.js/1.4.6/angular.min.js"></script> 
</head>



<body>

<ul ng-init="users=[{name:'blue',age:18},{name:'張三',age:24}]">
    <li ng-repeat="user in users"> 姓名: {{user.name}} 年齡: {{user.age}}</li>
</ul>

</body>




</html>

http://study.163.com/course/courseLearn.htm?courseId=1003590022#/learn/video?lessonId=1004094558&courseId=1003590022

邊學邊玩-網絡協議

arp投毒試驗:

1. cmd 裏面arp -a能夠看本地記錄的ip-mac地址對應關係2

2.用arproof投毒,改兩邊mac地址,開啓路由中間轉發功能.

3.中間人開啓抓包:帳號密碼就都抓到了.

4.單向綁定,雙向綁定.來解決投毒攻擊

DNS學習:

把鬱悶轉化到ip地址,用dns服務器.是典型的CS結構

流程:先給dns服務器裏面A表示用的是ipv4地址.而後找根域,而後根域會給你分配給com服務區,以後又分配給baidu.com服務器.

ipconfig/flushdns 清楚windows的dns緩存.

防護dns攻擊的方法:

1.免費wifi:都直接指定dns服務器,他可能會修改dns.不安全.或者他能夠起跟免費wifi同樣的名,這樣手機也直接自動連這個wifi.(設置密碼也同樣)

因此上支付寶,微信都不要用免費WiFi.必定要用本身的4G流量.

2.手動配置dns服務器地址便可!避免別人篡改dns服務.

弱口令:

1.鍵盤組合,本身姓名生日,身份證號,學號.

2,能夠用隨機生成密碼本便可.

破解:

1.看端口

2.口令從高到低機率排序

3.暴力

軟件:hydra 只能在linux上跑

源碼安裝:wget 命令.這個命令在哪一個文件夾裏面運行就,下載到哪一個文件夾裏面

果斷centos

http://mirrors.aliyun.com/centos/7/isos/x86_64/

下載

CentOS-7-x86_64-DVD-1804.iso  
安裝時候選gnome,而後把右邊軟件都選上.免得本身裝麻煩.

查看無限網卡:

linux : ifconfig 或者iwconfig (後者更詳細)

http://study.163.com/course/courseLearn.htm?courseId=1004492024#/learn/video?lessonId=1048929449&courseId=1004492024

cmd命令:

tab補全多按tab能循環

-? 能夠幫助

net user 看用戶

net user dwl 2321 /add 添加用戶dwl 密碼是2321

cmd裏面命令: %systemdriver% 系統盤

嵌入式linux開發

http://study.163.com/course/courseLearn.htm?courseId=1002965014#/learn/video?lessonId=1003417109&courseId=1002965014

arm芯片

配置環境變量:修改.bash文件. /etc/bashrc /etc/profile

配置好後,命令在哪一個目錄輸入都有效果了.

du -h /etc 看目錄文件的大小

cat -n 2015.log 顯示行號.

一塊兒顯示 cat -n 2015.log 2016.log

一塊兒輸出cat -n 2015.log 2016.log>log 獲得一個帶行號的合併文件.

ps -ef |grep sshd 查詢進程

查看路由:

route -n

添加

route del default gw IP地址

route add default gw IP地址

route add -net 192.168.0.0 netmask 255.255.255.0 gw 192.168.0.1 dev eth0

route del -net 192.168.0.0 netmask 255.255.255.0 gw 192.168.0.1 dev eth0

嵌入式的服務:

nfs: 1.dpkg -l|grep -i nfs 2.apt-get install nfs-kernel-server 3.啓動: service nfs-kernel-server restart

看服務ps -ef

2018-08-04,10點03作智能運維.
https://github.com/linjinjin123/awesome-AIOps#white-paper
什麼都沒有,只能本身找網上題目作

不要憎恨你的敵人,那會影響你的判斷力 Never hate your enemy, it affects your judgment.教父3

彙編:

http://study.163.com/course/courseLearn.htm?courseId=1640004#/learn/video?lessonId=1962114&courseId=1640004

win10無法用debug功能:這麼解決.

https://blog.csdn.net/lcr_happy/article/details/52491107

指令以16進制存在內存中,本質是2進制.咱們看起來是16的.好比FFH 表示255 最後H表示結尾.

數據也同樣,都放內存中.

內存中最小單元叫字節bytes=2個16進制數字.也就是8位

cpu的地址線能決定cpu能找到多少個地址. 找到2的N次冪個地址

一個cpu的尋之能力8kb,那麼他地址線多寬:2^n=8*1024=2^13因此n=13

一kb存儲1024個Byte 一個Byte存儲8個bit 就看有沒有寫e.寫e的大,不寫e的小.

1KB的存儲器有1024個存儲單元.

ROM:只容許讀取電沒了還有

RAM:能夠寫入電沒了就沒了

寄存器裏面數字的表示:

AX=AH+AL

BX=BH+BL

CX=CH+CL

DX=DH+DL

16位=8位+8位.因此既能夠16位直接mov 也能夠移動上8位或者下8位.

實例:先-r 而後-a 回車 mov ax,4E20回車再回車再-t 再-r就發現ax數值變了 (這裏面的回車很煩).

挺亂套的,輸入-a以後的命令,他會本身記錄下來,無論輸入多少,而後每一次-t就按順序執行一個.

最後發現 mov ah,al 這種也同樣能跑.隨便位置都能隨便mov

例子:

上面5條按順序跑完就是下面對應5條.H表示的是16進制的數,閒麻煩就直接計算器.選16進制加法

把2個16進制的數7820相加便可.

mov ax,8226H

mov bx,ax

add bx,ax

那麼結果是044C 也就是最高位超過了表示範圍自動扔掉. (真正敲的時候必須不輸入H)

物理地址=段地址*16(也就是乘以10H)+偏移地址

例如:用1230H 和c8表示餓了12300+c8=123c8H這個5個數位的16進制,一個16進制是4byte,因此一共能夠表示20byte.

8080cpu就是20byte的地址.用的這種表示方法.

cup是怎麼區分指令和數據的?

cpu把cs:ip 這連個寄存器組成的地址裏面的內容當成指令的

運行過程:很是重呀!!!!!!!!!!!!!!!!!

4個8進制是一個字節,2個16進制也是1個字節

C premier Plus 書:

1.字是設計計算機時給定的天然的存儲單位. 8位機表示一個字是8位,目前64位機表示一個字是64位.而字節是全部計算機都8位.

32位機就是說用32位的數來表示一個整數.也就是正負2^31次冪.

0前綴表示8進制的數,好比020

0x表示16進制的數.

int cost=12.99 結果是cost=12

float pi=3.1415926 結果是pi是float,只有前6位有精度.

因此定義類型的時候若是不符合會自動強轉換成定義的類型,可是精度和數據會變化.

通常而言，根據%s轉換說明， scanf()只會讀取字符串中的一個單詞，而不是一整句。

字符串常量"x"和字符常量'x'不一樣。區別之一在於'x'是基本類型
（char），而"x"是派生類型（char數組）；區別之二是"x"實際上由兩個字符
組成： 'x'和空字符\0（見圖4.3）。

win10 安裝redis:

https://blog.csdn.net/thousa_ho/article/details/71279852

按照這個能夠進入.

輸入redis-cli.exe

設置鍵值對 set myKey abc

取出鍵值對 get myKey

mset a 30 b 20 c 10

mget a b c

rpush mylist A

rpush mylist B

lpush mylist first

lrange mylist 0 -1 讀取整個列表

rpush mylist 1 2 3 4 5 "foo bar"

lrange mylist 0 -1

rpush mylist 1 2 3 4 5 "foo bar"

rpop mylist

hmset user:1000 username antirez birthyear 1977 verified 1 創建一個user:1000哈希表

hget user:1000 username 讀取哈希表

hincrby user:1000 birthyear 10

sadd myset 1 2 3

smembers myset

autohotkey 鼠標連點的代碼:2018-08-11,13點16

$LButton::
Loop
{
  GetKeyState,State,LButton,P
  If (State="U")
  {
    Break
  }
  Else
  {
   
    Send {LButton}
    Sleep 50
  }
}
Return

歌曲it is my life

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。