原文:Solving the Traveling Salesman Problem with Postgres Recursive CTEshtml
Many SQL implementations don't have loops, making some kinds of analysis very difficult. Postgres, SQL Server, and several others have the next best thing — recursive CTEs!git
許多 SQL 實現沒有將循環包括在內,使進行一些分析工做變得很困難。Postgres、SQL Server 以及一些其它的 SQL 實現則提供了下面將要提到的優良特性:遞歸公共表表達式(recursive CTE)。算法
We'll use them to solve the Traveling Salesman Problem and find the shortest round-trip route through several US cities.sql
咱們將使用它們來解決旅行銷售員問題(Traveling Salesman Problem )而且找出通過若干美國城市的最短巡迴路徑。數據庫
Normal CTEs are great at helping to organize large queries. They are a simple way to make temporary tables you can access later in your query.編程
通常的公共表表達式(CTE)主要用於幫助組織大型查詢語句。你能夠簡便地建立臨時表,並在稍後的查詢語句中訪問它們。編程語言
Recursive CTEs are more powerful - they reference themselves and allow you to explore hierarchical data. While that may sound complicated, the underlying concept is very similar to a for
loop in other programming languages.函數
遞歸公共表表達式(Recursive CTEs)的能力更強——它們經過自引用,使你可以探索層次化的數據。雖然這聽起來很複雜,但其背後的概念和其它編程語言中的 for
循環十分類似。 oop
These CTEs have two parts — an anchor
member and a recursive
member. The anchor
member selects the starting rows for the recursive steps.post
該公共表表達式(CTE)包含兩個部分——一個 anchor 部分和一個 recursive 部分。 anchor 部分提供了遞歸步驟的起始數據。
The recursive
member generates more rows for the CTE by first joining against the anchor
rows, and then joining against rows created in previous recursions. The recursive
member comes after a union all
in the CTE definition.
recursive 部分爲此公共表表達式(CTE)生成更多的數據,方法是先鏈接(join) anchor 數據,而後鏈接(join)上次遞歸產生的數據。在公共表表達式(CTE)的定義語句中,recursive 部分接在 union all
關鍵字後面。
Here's a simple recursive CTE that generates the numbers 1 to 10. The anchor
member selects the value 1, and the recursive
member adds to it up to the number 10:
這是一個能產生數字1到10的簡單的遞歸公共表表達式(CTE)。其中 anchor 部分提供了數據值1,而後 recursive 部分將其逐步累加至10。
with recursive incrementer(prev_val) as ( select 1 -- anchor member union all select -- recursive member incrementer.prev_val + 1 from incrementer where prev_val < 10 -- termination condition ) select * from incrementer
The first time the recursive CTE runs it generates a single row 1
using the anchor
member. In the second execution, the recursive
member joins against the 1
and outputs a second row, 2
. In the third execution the recursive
step joins against both rows 1
and 2
and adds the rows 2
(a duplicate) and 3
.
該遞歸公共表表達式(recursive CTE)在第一次執行的時候根據 anchor 部分生成了一行數據「1」。在第二次執行中, recursive 部分鏈接到數據「1」並輸出了第二行「2」。在第三次執行中,recursive 部分同時鏈接到了數據「1」和「2」而且添加了新數據「2」(重複)和「3」。
Recursive CTEs also only return distinct rows. Even though our CTE above creates many rows with the same value, only a distinct set of rows will be returned.
同時遞歸公共表表達式(recursive CTE)只返回互不相同的數據。雖然咱們的公共表表達式(CTE)建立了許多相同的數據行,但返回的數據集只包含互不相同的數據。
Notice how the CTE specifies its output as the named value prev_val
. This lets us refer to the output of the previous recursive step.
注意該公共表表達式(CTE)是如何將其輸出命名爲 prev_val
的。這使得咱們能夠引用上一次遞歸的輸出結果。
And at the very end there is a termination condition to halt the recursion once the sum gets to 10. Without this condition, the CTE would enter an infinite loop!
而且在最後的最後有一個終止條件:一旦 sum 到達10就中止遞歸。若是沒有這個條件,該公共表表達式(CTE)將會進入一個無限循環!
Under the hood, the database is building up a table named after this recursive CTE using unions:
這樣,數據庫以該遞歸公共表表達式(recursive CTE)爲名字,基於並集創建了一個數據表:
Recursive CTEs can also have many parameters. Here's one that takes the sum, double, and square of starting values of 1, 2 and 3:
遞歸公共表表達式(recursive CTE)還能夠包含多個參數。下面的例子以一、2和3爲初始值,分別計算了依次加1的和、倍增值和依次平方的值。
with recursive cruncher(inc, double, square) as ( select 1, 2.0, 3.0 -- anchor member union all select -- recursive member cruncher.inc + 1, cruncher.double * 2, cruncher.square ^ 2 from cruncher where inc < 10 ) select * from cruncher
With recursive CTEs we can solve the Traveling Salesman Problem.
經過使用遞歸公共表表達式(recursive CTE),咱們可以解決旅行銷售員問題。
There are many algorithms for finding the shortest round-trip path through several cities. We'll use the simplest: brute force. Our recursive CTE will enumerate all possible routes and their total distances. We'll then sort to find the shortest.
有許多算法能夠用於找出通過若干城市的最短巡迴路徑。咱們將使用最簡單的一種:暴力搜索。咱們的遞歸公共表表達式(recursive CTE)將枚舉全部可能的路徑和它們的總距離,而後排序以找出最短的一條。
First, a list of cities with Periscope customers, along with their latitudes and longitudes:
首先是一個顧客所在城市的列表,包含它們的緯度和經度。
create table places as ( select 'Seattle' as name, 47.6097 as lat, 122.3331 as lon union all select 'San Francisco', 37.7833, 122.4167 union all select 'Austin', 30.2500, 97.7500 union all select 'New York', 40.7127, 74.0059 union all select 'Boston', 42.3601, 71.0589 union all select 'Chicago', 41.8369, 87.6847 union all select 'Los Angeles', 34.0500, 118.2500 union all select 'Denver', 39.7392, 104.9903 )
And we'll need a distance function to compute how far two lat/lons are from each other (thanks to strkol on stackoverflow.com):
而後咱們須要一個距離函數來計算兩個經緯度之間的距離(感謝 strkol 在 stackoverflow.com 上的回答):
create or replace function lat_lon_distance( lat1 float, lon1 float, lat2 float, lon2 float ) returns float as $$ declare x float = 69.1 * (lat2 - lat1); y float = 69.1 * (lon2 - lon1) * cos(lat1 / 57.3); begin return sqrt(x * x + y * y); end $$ language plpgsql
Our CTE will use San Francisco as its anchor city, and then recurse from there to every other city:
咱們的公共表表達式(CTE)將使用 San Francisco 做爲出發城市,而後從那開始遞歸抵達其它城市。
with recursive travel(places_chain, last_lat, last_lon, total_distance, num_places) as ( select -- anchor member name, lat, lon, 0::float, 1 from places where name = 'San Francisco' union all select -- recursive member -- add to the current places_chain travel.places_chain || ' -> ' || places.name, places.lat, places.lon, -- add to the current total_distance travel.total_distance + lat_lon_distance(last_lat, last_lon, places.lat, places.lon), travel.num_places + 1 from places, travel where position(places.name in travel.places_chain) = 0 )
The parameters in the CTE are:
places_chain
: The list of places visited so far, which will be different for each instance of the recursion
last_lat and last_lon
: The latitude and longitude of the last place in the places_chain
total_distance
: The distance traveled going from one place to the next in the places_chain
num_places
: The number of places in places_chain
— we'll use this to tell which routes are complete because they visited all cities
該公共表表達式(CTE)中的參數有:
places_chain
:截至目前訪問過的位置的列表,在每條遞歸路徑中都是不一樣的
last_lat and last_lon
:places_chain
中最後一個位置的緯度和經度。
total_distance
:places_chain
中相鄰位置的距離的總和
num_places
:places_chain
中位置的數目——咱們使用該參數來分辨哪條路徑已經完成,由其訪問過了全部城市
In the recursive
member, the where
clause ensures that we never repeat a place. If we've already visited Denver, position(...)
will return a number greater than 0, invalidating this instance of the recursion.
在 recursive
部分中,where
子句確保了咱們不會重複訪問一個地方。(好比說)若是咱們已經訪問過 Denver,position(...)
將返回一個大於0的數字,使得該遞歸路徑無效化。
We can see all possible routes by selecting all 8-city chains:
經過列出全部包含了8個城市的城市鏈,咱們能夠看到全部可能的路徑:
select * from travel where num_places = 8
We need to add in the distance from the last city back to San Francisco to complete the round-trip. We could hard code San Francisco's lat/lon, but a join is more elegant. Once that's done we sort by distance and show the smallest:
咱們須要加上從最後一個城市回到 San Francisco 的距離以完成迴路。咱們能夠在代碼中顯式寫入 San Francisco 的經緯度,但使用鏈接操做看起來更加優雅。一完成這個咱們就能夠根據距離進行排序並輸出最短路徑:
select travel.places_chain || ' -> ' || places.name, total_distance + lat_lon_distance( travel.last_lat, travel.last_lon, places.lat, places.lon) as final_dist from travel, places where travel.num_places = 8 and places.name = 'San Francisco' order by 2 -- ascending! limit 1
Even though this query is significantly more complicated than the incrementer
query earlier, the database is doing the same things behind the scenes. The top branch is the creating the CTE's rows, the bottom branch is the final join and sort:
雖然該查詢語句明顯複雜於以前的 incrementer
查詢,數據庫在幕後作的事情依然同樣。最頂上的分支是建立該公共表表達式(CTE)的數據,最底部的分支是最終的鏈接和排序。
Run this query and you'll see the shortest route takes 6671 miles and visits the cities in this order:
執行該查詢語句,你會看到最短路徑須要 6671 英里而且按順序通過了下列城市:
San Francisco -> Seattle -> Denver ->
Chicago -> Boston -> New York -> Austin ->
Los Angeles -> San Francisco
Thanks to recursive CTEs, we can solve the Traveling Salesman Problem in SQL!
得益於遞歸公共表表達式(recursive CTE),咱們成功地用 SQL 解決了旅行銷售員問題!