第四章習題,部分題目未給出答案php
1.ide
這個題比較簡單,有高中生推導水平的應該不難。函數
2~3證實題,略測試
4.spa
(a)code
這個問題問我略困惑,答案怎麼直接寫出來了,難道不是10%麼blog
(b)get
這個答案是(0.1*0.1)/(1*1),因此答案是1%it
(c)io
其實就是個空間所佔比例,因此這題是(0.1**100)*100 = 0.1**98%
(d)
這題答案顯而易見啊,並且是指數級別降低
(e)
答案是0.1**(1)、0.1**(1/2)、0.1**(1/3)...0.1**(1/100)
5.
這題在中文版的104頁的誤差-方差權衡說的聽清楚。
(a)
當貝葉斯決策邊界是線性的時候,訓練集上固然是QDA效果好,由於擬合的更好。而測試集上是LDA更好,由於更接近實際。
(b)
當貝葉斯決策邊界是非線性的時候,QDA在訓練集和測試集都比LDA好
(c)
相比於LDA,QDA的預測率變得更好。由於當樣本量n提高時,一個自由度更高的模型會產生更好的效果,由於方差會被大的樣本抵消一點
(d)
不對。由於當樣本不多時,QDA會過擬合。
6.
(a)
(b)
仍是帶入上述公式,反求X1爲50hours
7.
其實就是貝葉斯公式+中文版書97頁公式4-12。。。有點繁瑣,最後答案是75.2%
8.
文字題。。當你用K=1的KNN時,在訓練集上的錯誤率是0%,因此測試集上錯誤率實際是36%。咱們固然選邏輯迴歸啦
9.
參見92頁公式4-3。。。就是帶入公式而已,第一題是27%,第二題是0.19
10.
(a)
感受題目裏面讓咱們進行數值和圖像描述統計時,大概就三條命令:summary()、pairs()、cor()。不過pairs()在特徵不少的時候,跑的真心慢,cor()在使用前也要把定性的變量去掉。
library(ISLR) summary(Weekly) pairs(Weekly) cor(Weekly[, -9])
(b)
attach(Weekly) glm.fit = glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, data = Weekly, family = binomial) summary(glm.fit)
(c)
glm.probs = predict(glm.fit, type = "response") glm.pred = rep("Down", length(glm.probs)) glm.pred[glm.probs > 0.5] = "Up" table(glm.pred, Direction)
(d)
train = (Year < 2009) Weekly.0910 = Weekly[!train, ] glm.fit = glm(Direction ~ Lag2, data = Weekly, family = binomial, subset = train) glm.probs = predict(glm.fit, Weekly.0910, type = "response") glm.pred = rep("Down", length(glm.probs)) glm.pred[glm.probs > 0.5] = "Up" Direction.0910 = Direction[!train] table(glm.pred, Direction.0910)
mean(glm.pred == Direction.0910)
(e)
library(MASS) lda.fit = lda(Direction ~ Lag2, data = Weekly, subset = train) lda.pred = predict(lda.fit, Weekly.0910) table(lda.pred$class, Direction.0910) mean(lda.pred$class == Direction.0910)
(f)
qda.fit = qda(Direction ~ Lag2, data = Weekly, subset = train) qda.class = predict(qda.fit, Weekly.0910)$class table(qda.class, Direction.0910) mean(qda.class == Direction.0910)
(g)
library(class) train.X = as.matrix(Lag2[train]) test.X = as.matrix(Lag2[!train]) train.Direction = Direction[train] set.seed(1) knn.pred = knn(train.X, test.X, train.Direction, k = 1) table(knn.pred, Direction.0910) mean(knn.pred == Direction.0910)
(h)
兩種方法的準確率同樣。。。
(i)
# Logistic regression with Lag2:Lag1 glm.fit = glm(Direction ~ Lag2:Lag1, data = Weekly, family = binomial, subset = train) glm.probs = predict(glm.fit, Weekly.0910, type = "response") glm.pred = rep("Down", length(glm.probs)) glm.pred[glm.probs > 0.5] = "Up" Direction.0910 = Direction[!train] table(glm.pred, Direction.0910) mean(glm.pred == Direction.0910) ## [1] 0.5865 # LDA with Lag2 interaction with Lag1 lda.fit = lda(Direction ~ Lag2:Lag1, data = Weekly, subset = train) lda.pred = predict(lda.fit, Weekly.0910) mean(lda.pred$class == Direction.0910) ## [1] 0.5769 # QDA with sqrt(abs(Lag2)) qda.fit = qda(Direction ~ Lag2 + sqrt(abs(Lag2)), data = Weekly, subset = train) qda.class = predict(qda.fit, Weekly.0910)$class table(qda.class, Direction.0910) mean(qda.class == Direction.0910) ## [1] 0.5769 # KNN k =10 knn.pred = knn(train.X, test.X, train.Direction, k = 10) table(knn.pred, Direction.0910) mean(knn.pred == Direction.0910) ## [1] 0.5769 # KNN k = 100 knn.pred = knn(train.X, test.X, train.Direction, k = 100) table(knn.pred, Direction.0910) mean(knn.pred == Direction.0910) ## [1] 0.5577
結果在代碼註釋中,邏輯迴歸效果最好
11.
(a)
library(ISLR) summary(Auto) attach(Auto) mpg01 = rep(0, length(mpg)) mpg01[mpg > median(mpg)] = 1 Auto = data.frame(Auto, mpg01)
(b)
cor(Auto[, -9]) pairs(Auto)
(c)
train = (year%%2 == 0) # if the year is even test = !train Auto.train = Auto[train, ] Auto.test = Auto[test, ] mpg01.test = mpg01[test]
(d)
library(MASS) lda.fit = lda(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, subset = train) lda.pred = predict(lda.fit, Auto.test) mean(lda.pred$class != mpg01.test)
(e)
qda.fit = qda(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, subset = train) qda.pred = predict(qda.fit, Auto.test) mean(qda.pred$class != mpg01.test)
(f)
glm.fit = glm(mpg01 ~ cylinders + weight + displacement + horsepower, data = Auto, family = binomial, subset = train) glm.probs = predict(glm.fit, Auto.test, type = "response") glm.pred = rep(0, length(glm.probs)) glm.pred[glm.probs > 0.5] = 1 mean(glm.pred != mpg01.test)
(g)
library(class) train.X = cbind(cylinders, weight, displacement, horsepower)[train, ] test.X = cbind(cylinders, weight, displacement, horsepower)[test, ] train.mpg01 = mpg01[train] set.seed(1) # KNN(k=1) knn.pred = knn(train.X, test.X, train.mpg01, k = 1) mean(knn.pred != mpg01.test) # KNN(k=10) knn.pred = knn(train.X, test.X, train.mpg01, k = 10) mean(knn.pred != mpg01.test) # KNN(k=100) knn.pred = knn(train.X, test.X, train.mpg01, k = 100) mean(knn.pred != mpg01.test)
13題和11題相似,就是用這幾個函數。因此13題略。
12.
(a)~(b)
Power = function() { 2^3 } print(Power()) Power2 = function(x, a) { x^a } Power2(3, 8)
(c)
Power2(10, 3) Power2(8, 17) Power2(131, 3)
(d)~(f)
Power3 = function(x, a) { result = x^a return(result) } x = 1:10 plot(x, Power3(x, 2), log = "xy", ylab = "Log of y = x^2", xlab = "Log of x", main = "Log of x^2 versus Log of x") PlotPower = function(x, a) { plot(x, Power3(x, a)) } PlotPower(1:10, 3)