[Optimization] Greedy method

 「貪心算法」 算是 "動態規劃" 的前置課程。html

在數據結構graph中的優化問題也大量涉及到了」Greedy Method"。算法

也有五大經常使用算法之說:算法設計之五大經常使用算法設計方法總結數據結構

1、【分治法】app

2、【動態規劃法】ide

3、【貪心算法】wordpress

4、【回溯法】post

5、【分支限界法】優化

 

 

貪心地關注邊界

Given two sequences of letters A and B, find if B is a subsequence of A in the
sense that one can delete some letters from A and obtain the sequence B.ui

Greedy領先的思想。(always stay ahead)this

Ref: https://www.geeksforgeeks.org/given-two-strings-find-first-string-subsequence-second/

A上的指針找B的頭char,只要同樣,就開始「齊頭並進」對比。

若是出現不同,就只移動A上的指針;畢竟,只要對比過的,對以後的也是有意義的。

 

 

覆蓋的「範圍」

避開最大"無效區"

There is a line of 111 stalls, some of which need to be covered with boards.
You can use up to 11 boards, each of which may cover any number of
consecutive stalls.

Cover all the necessary stalls, while covering as few total stalls as possible

一塊大板,不斷去掉大空隙。直到大板被分爲要求的11個。

本質:排序「間隙」,先 eliminate 最大的間隙(貪心的體現)

Ref: https://projectalgorithm.wordpress.com/2011/04/25/greedier-than-you/

  

區間「徹底覆蓋」問題

給定一個長度爲m的區間,再給出n條線段的起點和終點(注意這裏是閉區間),

最少使用多少條線段能夠將整個區間徹底覆蓋。

區間長度8,可選的覆蓋線段[2,6],[1,4],[3,6],[3,7],[6,8],[2,4],[3,5]

區間覆蓋問題  《區間徹底覆蓋》

先按照「起始點」排序,結果以下:

Prove:

須要最少的線段進行覆蓋,那麼選取的線段必然要儘可能長,而已經覆蓋到的區域以前的地方已經無所謂了。

貪心策略就是:head是上一個tail以前的的狀況下,看誰的len更長。

 

最大「不相交覆蓋」

給定一個長度爲m的區間,再給出n條線段的起點和終點(開區間和閉區間處理的方法是不一樣,這裏以開區間爲例),

問題是從中選取儘可能多的線段,使得每一個線段都是獨立的,就是不和其它有任何線段有相交的地方。

例如:

區間長度8,可選的覆蓋線段[2,6],[1,4],[3,6],[3,7],[6,8],[2,4],[3,5],選的不能相交哦!

Ref: https://blog.csdn.net/chenguolinblog/article/details/7882316 

對線段的端點進行升序排序,每加入一個線段,而後選擇後面若干個(也有多是一個)右端點相同的線段,選擇左端點最大的那一條,若是加入之後不會跟以前的線段產生公共部分,那麼就加入,不然就繼續判斷後面的線段

  • 排序:

將每個區間按右端點進行遞增順序排列,拍完序後爲[1,4],[2,4],[2,6],[3,5],[3,6],[3,7],[6,8]

  • 操做:

第一步選取[2,4],發現後面只能加入[6,8],因此區間的個數爲2

  • 貪心證實:

由於須要儘可能多的獨立的線段,因此每一個線段都儘量的小

對於同一右端點,左端點越大,線段長度越小。

 

區間選點問題 

用最少的釘子穿插全部的木條。

Greedy的對象的選擇問題:

    • 穿插最多的優先 (greedy 失敗)
    • 最先結束的優先 (greedy 成功) ----> 最先結束的「穿插的機率低」;晚結束的,能夠放心一些。這體現了「貪心」

 

 

覆蓋的「個數」,而非「範圍」

—— 給後者更多活路

 Find a maximum size subset of compatible activities.

求「一段時間內」能容納的「最多活動數」。《最大不相交覆蓋》

最先結束時間的活動優先,保最多的空餘時間,纔可能會有更多的「活動數」,體現「貪心」

Transforming any optimal solution to the greedy solution with equal number of
activities

find that proving the greedy solution is also optimal.

證實

greedy exchange, 即證實greedy所得結論不會worse.

Extended

「最多活動數」 ==> "總的最長活動時間「,且」每一個活動時間不等「,則,greedy失效,需dynamic programming.

 

 

There are N robbers who have stolen N items. You would like to distribute the items
amongst the robbers (one item per robber). You know the precise value of each item.
Each robber has a particular range of values they want their item to be worth

(too cheap and they will not have made money, too expensive and they will draw a lot of attention).

Devise an algorithm that can distribute the items so each robber is happy or determines that there is no such distribution.

From: 從零開始學貪心算法

「最大活動數」的變形題。

若是咱們每次都選擇開始時間最先的活動,不能獲得最優解:

 

若是咱們每次都選擇持續時間最短的活動,不能獲得最優解:

 

(貪心體現)

能夠用數學概括法證實,咱們的貪心策略應該是每次選取結束時間最先的活動

直觀上也很好理解,按這種方法選擇相容活動爲未安排活動留下儘量多的時間。這也是把各項活動按照結束時間單調遞增排序的緣由。

Sol:

對物品價值v升序排序。遍歷每個物品價值 v:

最小bound在範圍內,即當前v的左邊,最大bound也在(這是默認確定的),這些 j 構成一個集合。

分配集合中「最大bound」最小的那個 to robber。  

例如分配v3時的 j3, j4:j4的tail大,因此把機會留給j3。

隱含的道理是:

    • 只關心tail,不關心head。由於head在此無論有多早彷佛都對之後的分配都沒有意義。
    • 思考:j1與j2能夠互換麼?

Prove

「Cut-and-paste" arguments.

改變一個逆序,不會變得更糟。由於,減小一個逆序,例如 j2從新在j1以前,那麼j2的deadline更久,就更能成立!

 

 

Schedule all the jobs so that the lateness of the job with the largest lateness is minimised.

最小化任務延遲

只關心deadlines,體現了貪心

證實:(交換論證)

關鍵步驟的證實是"減少一個逆序de調度致使的最大延遲不會更糟". 

 

 

Along the long, straight road from Loololong to Goolagong houses are
scattered quite sparsely, sometimes with long gaps between two
consecutive houses. Telstra must provide mobile phone service to people
who live alongside the road, and the range of Telstras cell base station is
5km.

Design an algorithm for placing the minimal number of base stations alongside the road, that is sufficient to cover all houses.

一個思考:從左到右,從右到左,既然都是greedy,minimum是同樣的,但stations的位置卻不一樣。

有點相似以前board覆蓋stall的習題,也相似「插棍子」。從左往右時,關注覆蓋左邊的邊緣,體現「貪心」。

 

 

 

「先處理」的隱藏累贅代價

Assume you are given n sorted arrays of different sizes. You are allowed
to merge any two arrays into a single new sorted array and proceed in
this manner until only one array is left.

Design an algorithm that achieves this task and uses minimal total number of moves of elements of
the arrays. Give an informal justification why your algorithm is optimal.

相似Merge sort過程的,huffman code原理的東東。

較小的塊優先合併。(合併時有排序過程)

合併後的較大的塊,若是還有後續的操做,那麼前面合併得越大,將會成爲後續移動操做中的累贅

 

 

貪心「重要性密度」

A list of n files fi of lengths li which have to be stored on a tape.
Each file is equally likely to be needed. To retrieve a file, one must start from
the beginning of the tape and scan it until the tape is found and read.

Order the files on the tape so that the average (expected) retrieval time
is minimised.

若是p再也不均勻,則比較P/L,證實以下。

把長度最小的放前面,平均讀取時間最小;這體現了貪心。

若是非均勻,則「假設swap其中的兩個」,再比較E。(具體見上圖公式)

 

 

只有一臺機器,單線程執行某我的交代的的任務,任務的重要性不一樣。

最小化 總的「重要性*截止時間」

貪心比較"重要性密度":Schedule jobs in decreasing order of the ratio ri = wi /ti  (重要性/任務時長)

prove:

假設有更好的方案。然後減小逆序看變化。

相似 the tape storage problem。

 

 

最小生成樹 a minimum spanning tree 

You are given a connected graph with weighted edges. Find a spanning tree

such that the largest weight of all of its edges is as small as possible.

求增強連通圖的最小生成樹

爲什麼最優?

    • Kruskal Algo: sort e by cost 以edge爲主角,則適合"稀疏圖"
    • Prim Algo: start from any vertex, add lightest edge one by one. 以vertex爲主角,則適合"稠密圖"

Ref: 最小生成樹-Prim算法和Kruskal算法

Ref: http://www.javashuo.com/article/p-pdhqlavg-mt.html

Goto: [Algorithm] Graph

 

 

Design an algorithm which produces a minimum spanning tree T 0 for the

new graph containing the additional vertex vn+1 and which runs in time O(n log n).

New vertex與其餘n個vertex的edge 作排序,選最小的,體現了「貪心」

 

 

There are n radio towers for broadcasting tsunami warnings. You are given the coordinates of each tower and its radius of range.

When a tower is activated, all towers within the radius of range of the tower will also activate, and those can cause other towers to activate and so on.

You need to equip some of these towers with seismic sensors so that when these sensors activate the towers where these sensors are located all

towers will eventually get activated and send a tsunami warning.

The goal is to design an algorithm which finds the fewest number of towers you must equip with seismic sensors.

總有一個塔,在連鎖反應中能激活最多的其餘塔。給它安裝報警器便可。

 

 

Partition the vertices of G into k disjoint subsets so that the minimal distance between two points belonging to different sets of the partition is as large as possible.

Thus, we want a partition into k disjoint sets which are as far apart as possible.

Sort the edges in increasing order and start performing the usual
Kruskal’s algorithm for building a minimal spanning tree,

but stop when you obtain k trees, rather than a single spanning tree.

最小生成樹的生成過程當中,紅線確定大於任何一條藍線,體現了「貪心」

時間複雜度:

N = n^2條邊,O(N * N log N).

採用「並查集」數據結構後,

we make at most 2n2 calls of the Find operation and at most n calls of the Union operation.

 

 

Assume that a weighted (undirected) graph G = (V, E) has all weights of edges
distinct and that its set of vertices V has been partitioned into two disjoint subsets,
X and V \X and assume that an edge e = (u, v) is the smallest weight edge whose
one end belongs to X and the other end to V\X (Y). Prove that every spanning tree
must contain edge e.

證實過程:(假設法)

通過e,那麼e把圖分爲了兩份

你丫說不通過e,那麼就是通過其餘邊兒咯?加上你說的這條邊,但暫時不刪除e,是否是就構成了一個circle?

circle裏,誰最小?!固然e最小!

有最小的e,爲什麼還要你說的那條邊?!故,你的邏輯有矛盾,得證。

 

Extended:

Let G = (V, E) be a weighted (undirected) graph has all weights
of edges distinct and let e be the highest weight edge in C.

Prove that e cannot belong to the minimum spanning tree.

證實 circle 裏的最大邊,不可能屬於MST(最小生成樹)

反證法:

若是是的話,這條邊把圖一分爲二,再加個其餘邊,假設爲其餘方案中選中了這條,而不是這個最大邊e,

那麼,又出現了一個circle。

可見,還不如不要最大邊e爲好。

 

Extended:

Assume that you are given a weighted (undirected) graph G = (V, E) with all weights of edges distinct and its minimum spanning tree T.

Assume now that you add a new edge e to G. Design a linear time algorithm which produces the minimum spanning tree for the new graph with the additional edge.

在已有的MST中加了新邊e,如何更新MST(最小生成樹) 

Sol:

(1) 加了新邊後會出現一個新circle

(2) 而後刪掉環上的最大邊。

 

 

擬陣理論 - matroid

Scheduling unit jobs with penalties and deadlines.

The problem of scheduling unit-time tasks with deadlines and penalties for a single processor has the following inputs:

 a set S = {1, 2, . . . , n} of n unit-time tasks;

 a set of n integer deadlines d1d2, . . . , dn, such that each di satisfies 1  di  n and task i is supposed to finish by time di; and

 a set of n nonnegative weights or penalties w1,w2, . . . , wn, such that a penalty wi is incurred if task i is not finished by time di and no penalty is incurred if a task finishes by its deadline.

We are asked to find a schedule for S that minimizes the total penalty incurred for missed deadlines.

Ref: 一個任務調度問題-----算法導論

Theory:

實現任務的最優調度主要就是利用貪心算法中擬陣的思想

若是S是一個帶期限的單位時間任務的集合,且I是全部獨立的任務集構成的集合,則對應的系統 M =(S,I是一個擬陣。知足以下條件:

    1. S是一個非空有窮集合; 
    2. l2^Sϕl (I爲S的非空子集族)
    3. l知足交換性質 (Augmentation):若Al,Bl|A|<|B|,則xBA,使得A{x}l (這條性質給了咱們已知集合B,構造集合A的方法)
    4. l知足遺傳性質 (Downward closure):若Bl,AB,則Al. Or, BS的獨立子集,這樣B的任意子集也都是S的獨立子集。(暗示了咱們已知集合B,找出其子集的性質的辦法)

 

利用擬陣解決任務調度問題的算法原理主要就是:

最小化遲任務的懲罰之和問題 ----> 轉化爲 ----> 最大化早任務的懲罰之和的問題,

也就是說在任務調度的時候 優先選擇當前任務序列中懲罰最大的任務(體現了"貪心")

這裏,假設集合A存聽任務的一個調度。若是存在關於A中任務的一個調度,使得沒有一個任務是遲的,稱任務集合A是獨立的。

Prove:

(1) 先證實其是擬陣

(2) 可採用最大化早任務的懲罰和的"貪心"算法。

Extended:

O(n)次獨立性檢查的每一次都用O(n)時間。如何優化?

並查集。

 

實驗操做

n取值爲7,每一個任務的期限爲4, 2, 4, 3, 1, 4, 6,對應的懲罰爲70, 60, 50, 40, 30, 20, 10。

放棄了a5, a6。

 

 

"貪心法」的適用特色

Assume you have $2, $1, 50c, 20c, 10c and 5c coins to pay for your lunch.

Design an algorithm that, given the amount that is a multiple of 5c, pays it with a minimal number of coins.

明顯是從大面值開始。相似於人民幣問題,找零時符合貪心算法。

Prove:

知足「最優子結構性質」,「貪心選擇性質」。link

最優子結構性質

  一個問題最優解包含其子問題最優解。(子結構也是子問題的最優解)

  • 設立一個事實:

例如95c = 50c + 20c + 20c + 5c 這個貪心算法的結果是最優解,是知足optimal substrcuture的。

  • 作一個推斷:

少一個20c,本應該推斷是75c的最優解;

  • 韭菜忽然假設:

75c可鞥有更好的解?「兩張紙幣就能搞定「。

  • 與」原事實「的衝突:

那麼在你這個假設的基礎上能夠認爲 加一個20c成爲了「僅需3張紙幣就能達到95c的最優解「,這與原事實衝突!

故,貪心算法所得結果是知足「最優子結構性質」

 

貪心選擇性質

總體最優解,能夠經過一系列局部最優的選擇來達到。(每張紙幣的量已經是最優,不可能更大)

  • 設立一個事實:

貪心算法的結果應保證了「每一個面額不可能更多,即已經是最大」。由於貪心嘛。

  • 韭菜忽然假設:

由於,總額不變,紙幣量減小的話:減小任意的相對小的紙幣(這致使須要更大的面額的紙幣來填充差額)

  • 與「原事實」的衝突:

貪心解已知足每張紙幣面額達到了最大 (貪心的本質),故產生矛盾。

故,原問題知足「貪心選擇性質「。

 

貪心失效

貪心算法(按單位重量價值排序)(含爲何不能夠解決) goto: 0-1揹包問題、貪心算法、動態規劃

    • 找換硬幣問題 與 0-1揹包問題區別【貪心適用、動態規劃適用的分析】
    • 對於有些狀況下,貪心算法失效!好比硬幣面值1,10,25。找30分錢,最優解是3*10,而貪心的狀況下產生的解是1*5+25。
    • 指數增加的面額狀況時,可換的硬幣的單位(或者稱 面值)是 c 的冪,也就是 c0,c1,... ,ck,其中整數 c>1,k>=1。

Prove:

常識一:

對於最優解而言,若是使用了面值爲 ci 的硬幣去找零,那麼 ci 最多隻能使用 c-1 個。若使用c個的話,c*ci 意味着可使用一張更大面額紙幣來替換衆多小紙幣。

根據常識一,

若是非貪心是最優的,非貪心使用的全部面值爲c^i的硬幣個數應該小於c。不然,不是最優!

 

 

具備"動態限制"的貪心 

Suppose you have n video streams that need to be sent, one after another, over a communication link.

Stream i consists of a total of bi bits that need to be sent, at a constant rate, over a period of ti seconds.

You cannot send two streams at the same time, so you need to determine a schedule for the streams:

an order in which to send them. Whichever order you choose,

there cannot be any delays between the end of one stream and the start of the next.

Suppose your schedule starts at time 0. We assume that all the
values bi and ti are positive integers. Now, because you're just one user, the link does
not want you taking up too much bandwidth, so it imposes the following constraint,
using a fixed parameter r:

For each natural number t > 0, the total number of bits you send over the time interval from 0 to t cannot exceed rt.

Note that this constraint is only imposed for time intervals that start at 0, not for
time intervals that start at any other value. We say that a schedule is valid if it
satisfies the constraint.

Example.

Suppose we have n = 3 streams, with (b1, t1) = (2000, 1), (b2, t2) = (6000, 2), (b3, t3) = (2000, 1), and suppose the link’s parameter is r = 5000.

Then the schedule that runs the streams in the order 1, 2, 3, is valid, since the constraint (*) is satisfied:

  • t = 1: the whole first stream has been sent, and 2000 < 5000 · 1
  • t = 2 : half the second stream has also been sent, and 2000 + 3000 < 5000 · 2.

Similar calculations hold for t = 3 and t = 4.

link

不一樣的stream,單位時間能發送的bit不一樣。(理解爲壓縮的效率不一樣便可)

右側的 rt 表明了一種動態的limitation,理解爲:帶寬限制。

這裏只是判斷下:是否存在 知足如此條件的schedule。

貪心天然是:

能儘量地 reach/follow this limitation as far as possible。

最可能的方式,便是保持limitation的距離越遠越好

 

Design:

(a) In O(nlogn), order the streams in increasing order of bi/ti (壓縮率) , and check if this schedule has the desired property.

(b) To get an ordering in O(n) time define si = r*ti − bi and schedule streams so that you
start with all streams for which si is non-negative, in any order, followed by those for which si is negative, also in any order.

(由於只要把小於r*ti的排在前面,大於的排在後面便可,至於排序操做,確實是多餘)

Prove:

交換論證(exchange argument)

 

End.

相關文章
相關標籤/搜索