1. 第 1 個問題

You are building a 3-class object classification and localization algorithm. The classes are: pedestrian (c=1), car (c=2), motorcycle (c=3). What would be the label for the following image? Recall $[p_c, b_x, b_y, b_h, b_w, c_1, c_2, c_3]y=[pc,bx,by,bh,bw,c1,c2,c3]$ html

react

算法

windows

網絡

app

第 2 個問題

1
point

2. 第 2 個問題

Continuing from the previous problem, what should y be for the image below? Remember that 「?」 means 「don’t care」, which means that the neural network loss function won’t care what the neural network gives for that component of the output. As before, $[p_c, b_x, b_y, b_h, b_w, c_1, c_2, c_3]y=[pc,bx,by,bh,bw,c1,c2,c3].$ ide

函數

學習

ui

第 3 個問題

1
point

3. 第 3 個問題

You are working on a factory automation task. Your system will see a can of soft-drink coming down a conveyor belt, and you want it to take a picture and decide whether (i) there is a soft-drink can in the image, and if so (ii) its bounding box. Since the soft-drink can is round, the bounding box is always square, and the soft drink can always appears as the same size in the image. There is at most one soft drink can in each image. Here’re some typical images in your training set:

What is the most appropriate set of output units for your neural network?

Logistic unit (for classifying if there is a soft-drink can in the image)

Logistic unit, $b_x and b_y √$

Logistic unit, $b_x, b_y, b_h (since b_w= b_h)$

Logistic unit, $b_x, b_y, b_h, b_w$

第 4 個問題

1
point

4. 第 4 個問題

If you build a neural network that inputs a picture of a person’s face and outputs N landmarks on the face (assume the input image always contains exactly one face), how many output units will the network have?

N

2N √

3N

$N^2N2$

第 5 個問題

1
point

5. 第 5 個問題

When training one of the object detection systems described in lecture, you need a training set that contains many pictures of the object(s) you wish to detect. However, bounding boxes do not need to be provided in the training set, since the algorithm can learn to detect the objects by itself.

True

False √

第 6 個問題

1
point

6. 第 6 個問題

Suppose you are applying a sliding windows classifier (non-convolutional implementation). Increasing the stride would tend to increase accuracy, but decrease computational cost.

True

False √

第 7 個問題

1
point

7. 第 7 個問題

In the YOLO algorithm, at training time, only one cell ---the one containing the center/midpoint of an object--- is responsible for detecting this object.

True √

False

第 8 個問題

1
point

8. 第 8 個問題

What is the IoU between these two boxes? The upper-left box is 2x2, and the lower-right box is 2x3. The overlapping region is 1x1.

1/6

1/9 √

1/10

None of the above

第 9 個問題

1
point

9. 第 9 個問題

Suppose you run non-max suppression on the predicted boxes above. The parameters you use for non-max suppression are that boxes with probability $\leq≤ 0.4 are discarded, and the IoU threshold for deciding if two boxes overlap is 0.5. How many boxes will remain after non-max suppression?$

3

4

5 √

6

7

第 10 個問題

1
point

10. 第 10 個問題

Suppose you are using YOLO on a 19x19 grid, on a detection problem with 20 classes, and with 5 anchor boxes. During training, for each image you will need to construct an output volume

19x19x(5x25)

19x19x(25x20)

19x19x(5x20) √

19x19x(20x25)

-----------------------------------------------------------中文版-------------------------------------------------------------------------

中文版摘自:https://blog.csdn.net/u013733326/article/details/80306093

檢測算法

如今你要構建一個可以識別三個對象並定位位置的算法，這些對象分別是：行人（c=1），汽車（c=2），摩托車（c=3）。下圖中的標籤哪一個是正確的？注： $y = [p_{c}, b_{x}, b_{y}, b_{h}, b_{w}, c_{1}, c_{2}, c_{3}]$
- 【★】 y=[1, 0.3, 0.7, 0.3, 0.3, 0, 1, 0]
- 【】 y=[1, 0.7, 0.5, 0.3, 0.3, 0, 1, 0]
- 【】 y=[1, 0.3, 0.7, 0.5, 0.5, 0, 1, 0]
- 【】 y=[1, 0.3, 0.7, 0.5, 0.5, 1, 0, 0]
- 【】 y=[0, 0.2, 0.4, 0.5, 0.5, 0, 1, 0]
繼續上一個問題，下圖中y的值是多少？注：「？」是指「不關心這個值」，這意味着神經網絡的損失函數不會關心神經網絡對輸出的結果，和上面同樣， $y = [p_{c}, b_{x}, b_{y}, b_{h}, b_{w}, c_{1}, c_{2}, c_{3}]$
- 【】 y=[1, ?, ?, ?, ?, 0, 0, 0]
- 【★】y=[0, ?, ?, ?, ?, ?, ?, ?]
- 【】 y=[?, ?, ?, ?, ?, ?, ?, ?]
- 【】 y=[0, ?, ?, ?, ?, 0, 0, 0]
- 【】 y=[1, ?, ?, ?, ?, ?, ?, ?]
你如今任職於自動化工廠中，你的系統會看到一罐飲料從傳送帶上下來，你想要對其進行拍照，而後肯定照片中是否有飲料罐，若是有的話就對其進行包裝。飲料罐頭是圓的，而包裝盒是方的，每一罐飲料的大小是同樣的，每一個圖像中最多隻有一罐飲料，如今你有下面的方案可供選擇，這裏有一些訓練集圖像：
- 【】 Logistic unit (用於分類圖像中是否有罐頭)
- 【★】Logistic unit, $b_{x}$
- 【】 Logistic unit, $b_{x}$
- 【】 Logistic unit, $b_{x}$
博主注：由於每一個罐頭大小是必定的，因此咱們只須要知道它的中心位置就行了。
若是你想要構建一個可以輸入人臉圖片輸出爲N個標記的神經網絡（假設圖像只包含一張臉），那麼你的神經網絡有多少個輸出節點？
- 【】 N
- 【★】2N
- 【】 3N
- 【】 $N^{2}$
博主注：圖像是二維的，指定一個位置應該是(x,y)，那麼，一個標記就須要兩個節點。
當你訓練一個視頻中描述的對象檢測系統時，裏須要一個包含了檢測對象的許多圖片的訓練集，然而邊界框不須要在訓練集中提供，由於算法能夠本身學習檢測對象，這個說法對嗎？
- 【】正確
- 【★】錯誤
假如你正在應用一個滑動窗口分類器（非卷積實現），增長步伐不只會提升準確性，也會下降成本。
- 【】正確
- 【★】錯誤
在YOLO算法訓練時候，只有一個包含對象的中心/中點的一個單元負責檢測這個對象。
- 【★】正確
- 【】錯誤
這兩個框中IoU大小是多少？左上角的框是2x2大小，右下角的框是2x3大小，重疊部分是1x1。
- 【】 1/6
- 【★】1/9
- 【】 1/10
- 【】以上都不是
博主注： $\frac{1 \times 1}{2 \times 2 + 2 \times 3 - 1 \times 1} = \frac{1}{9}$
假如你在下圖中的預測框中使用非最大值抑制，其參數是放棄機率≤ 0.4的框，並決定兩個框IoU的閾值爲0.5，使用非最大值抑制後會保留多少個預測框？
- 【】 3
- 【】 4
- 【★】5
- 【】 6
- 【】 7
假如你使用YOLO算法，使用19x19格子來檢測20個分類，使用5個錨框（anchor box）。在訓練的過程當中，對於每一個圖像你須要輸出卷積後的結果 $y$
- 【】 19x19x(25x20)
- 【】 19x19x(20x25)
- 【★】19x19x(5x25)
- 【】 19x19x(5x20)

吳恩達深度學習筆記 course4 week3 測驗

1. 第 1 個問題

2. 第 2 個問題

3. 第 3 個問題

4. 第 4 個問題

5. 第 5 個問題

6. 第 6 個問題

7. 第 7 個問題

8. 第 8 個問題

9. 第 9 個問題

10. 第 10 個問題

檢測算法