阿里天池大賽:最後一千米急速配送

前言

最近公司組織了一場大咖秀,有位講師建議咱們沒事多參加阿里的天池大賽,說是對提升本身頗有幫助。因而想起本身幾天前看到的FinanceR專欄的天池最後一千米,便緊隨偶像步伐,註冊並下載了一份數據,湊個熱鬧。詳情請點擊賽題介紹html

此文爲截圖版,須要js交互版的原文請點擊這裏node

簡單分析

數據有三種類型的節點。第一類是Site,電商訂單發貨節點。第二類是Shop,O2O訂單發貨節點。第三類是Spot,消費者收穫節點。電商訂單的要求比較鬆,只需在當天晚上8點前配送完畢便可。O2O訂單比較着急,必須在指定的時刻前去Shop取貨,並在指定的時刻去Spot送貨。git

首先,咱們將電商訂單的狀況打到地圖上看一下。github

library(readr)
library(plyr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(plotly)
library(lubridate)
library(leaflet)
library(sp)
library(RColorBrewer)
library(jsonlite)
library(splitstackshape)
library(stringr)
library(rlist)

# 輔助函數
points2spline <- function(df, x_field, y_field, id_field){
  data <- as.matrix(df[,c(x_field, y_field)])
  id = df[1, id_field]
  Lines(list(Line(data)), ID=id)
}  

# 探索Site與Spot的空間關係
df.site <- read_csv("1.csv")
df.spot <-  read_csv("2.csv")
df.e.order <- read_csv("4.csv")

df.site.spot <- df.e.order %>% 
  inner_join(df.site, by=c("Site_id")) %>% 
  inner_join(df.spot, by=c("Spot_id")) %>%
  unite(Point_x, ends_with("x")) %>%
  unite(Point_y, ends_with("y")) %>%
  gather(point, location, Point_x, Point_y) %>%
  separate(location, c("Lng", "Lat"), sep="_", convert=TRUE) %>%
  unite(Line_id, Site_id, Spot_id, remove=FALSE)

df.site <- df.site %>%
  inner_join(df.site.spot %>% 
               group_by(Site_id) %>% 
               dplyr::summarise(order_cnt=sum(Num)),
             by=c("Site_id"))
  
ls.site.spot <- split(df.site.spot, df.site.spot[, c("Line_id")])
names(ls.site.spot) <- NULL
sl.site.spot <- SpatialLines(llply(ls.site.spot, points2spline, "Lng", "Lat", "Line_id"))
m <- leaflet() %>%
  addTiles(
    'http://webrd02.is.autonavi.com/appmaptile?lang=zh_cn&size=1&scale=1&style=8&x={x}&y={y}&z={z}',
    tileOptions(tileSize=256, minZoom=9, maxZoom=17)
  ) %>%
  addPolylines(data=sl.site.spot, weight=2, color="#377EB8") %>%
  addCircleMarkers(lng=~Lng, lat=~Lat, radius=~order_cnt/1000, data=df.site, stroke=FALSE, fill=TRUE, fillColor="#E41A1C", fillOpacity=0.5, popup=~paste0("Order Num: ", order_cnt)) %>%
  fitBounds(sl.site.spot@bbox["x", "min"],sl.site.spot@bbox["y", "min"], sl.site.spot@bbox["x", "max"], sl.site.spot@bbox["y", "max"])
m

clipboard.png

圖中紅色的圓圈就是每一個Site,半徑越長代表出貨量越大。藍色的線表示這個Site與它負責的電商訂單的Spot的連線。能夠看出Site和Spot之間是一一對應的關係,不存在交叉,因此若是隻考慮電商訂單,這就是一個比較簡單的VRP問題,能夠分而治之,每一個Site單獨規劃。web

可是呢,咱們還有一堆O2O訂單要一塊兒配送,這就讓問題的複雜度驟然提高了難度。咱們先來看一下O2O訂單的空間分佈。算法

df.shop <- read_csv("3.csv")
df.o2o.order <- read_csv("5.csv")

df.shop.spot <- df.o2o.order %>% 
  inner_join(df.shop, by=c("Shop_id")) %>% 
  inner_join(df.spot, by=c("Spot_id")) %>%
  unite(Point_x, ends_with("x")) %>%
  unite(Point_y, ends_with("y")) %>%
  gather(point, location, Point_x, Point_y) %>%
  separate(location, c("Lng", "Lat"), sep="_", convert=TRUE) %>%
  unite(Line_id, Shop_id, Spot_id, remove=FALSE)

ls.shop.spot <- split(df.shop.spot, df.shop.spot[, c("Line_id")])
names(ls.shop.spot) <- NULL
sl.shop.spot <- SpatialLines(llply(ls.shop.spot, points2spline, "Lng", "Lat", "Line_id"))

df.shop <- df.shop %>%
  inner_join(df.shop.spot %>% 
               group_by(Shop_id) %>% 
               dplyr::summarise(order_cnt=sum(Num)),
             by=c("Shop_id"))

m <- leaflet() %>%
  addTiles(
    'http://webrd02.is.autonavi.com/appmaptile?lang=zh_cn&size=1&scale=1&style=8&x={x}&y={y}&z={z}',
    tileOptions(tileSize=256, minZoom=9, maxZoom=17)
  ) %>%
  addPolylines(data=sl.shop.spot, weight=2, color="#4DAF4A") %>%
  addCircleMarkers(lng=~Lng, lat=~Lat, radius=~5, data=df.shop, stroke=FALSE, fill=TRUE, fillColor="#984EA3", fillOpacity=0.5, popup=~paste0("Order Num: ", order_cnt)) %>%
  fitBounds(sl.shop.spot@bbox["x", "min"],sl.shop.spot@bbox["y", "min"], sl.shop.spot@bbox["x", "max"], sl.shop.spot@bbox["y", "max"])
m

clipboard.png

途中紫色的點爲每一個shop,綠線爲O2O訂單的spot與shop的連線。從空間上看,O2O的訂單分佈就比較散亂了。咱們將O2O訂單和電商訂單的分佈疊到一塊兒看一下效果:json

clipboard.png

疊在一塊兒後,咱們可以很明顯地看到O2O訂單範圍比電商範圍小,集中在市區。segmentfault

下面,咱們模仿下FinanceR的思路,看一下O2O提單與配送時間的分佈。app

fake_dt <- "20160806"
o2o.hour <- df.o2o.order %>% 
  mutate(pickup_hour=round_date(ymd_hm(paste0(fake_dt, Pickup_time)), "hour"), 
         delivery_hour=round_date(ymd_hm(paste0(fake_dt, Delivery_time)), "hour")) %>%
  gather(time_type, tm, pickup_hour, delivery_hour) %>%
  group_by(time_type, tm) %>%
  summarise(order_cnt=n())
ggplot(o2o.hour) + geom_point(aes(x=tm, y=order_cnt, colour=time_type)) + geom_line(aes(x=tm, y=order_cnt, colour=time_type)) + theme_bw(base_size=16)

clipboard.png

O2O訂單有一個特色,就是比較碎。從數據中咱們能夠看到存在同一個spot在不一樣時刻向同一個shop下的訂單。若是把這些碎訂單拼湊在一塊兒統一配送的話,可以節約很大成本。那麼這種拼單思路的操做空間有多大呢?less

df.o2o.order.batch <- df.o2o.order %>%
  mutate(batch_pickup_time = round_minute(ymd_hm(paste0(fake_dt, Pickup_time)), 30),
         batch_delivery_time = round_minute(ymd_hm(paste0(fake_dt, Delivery_time)), 30)) %>%
  group_by(Spot_id, Shop_id, batch_pickup_time, batch_delivery_time) %>%
  summarise(total_order_size=sum(Num), total_order_num=n())

table(df.o2o.order.batch$total_order_num)

在拼單的時候,考慮到用戶體驗,將pickup和delivery時刻取整爲最近的30分鐘時刻,再分組統計並單數量。結論是2975單沒法拼單,143個訂單能2單合併,4個訂單能3單合併。因此,彷彿拼單預處理的優化空間不是很大,關鍵仍是在這個配送問題自己。

VRP入門

最後一千米急速配送這個比賽,是一道十分困難的VRP問題。學術上,這種問題稱爲VRPPDPTW,添加的後綴PDP表示Pickup Delivery Problem,即容許沿途取貨送貨;TW表示Time Window,即配送存在時間窗口約束。這麼複雜的問題,我這種菜鳥確定是搞不定的。

因此本文暫且把O2O訂單拋開,來解一解單純的電商訂單問題。前文已經提到,電商的最後一千米配送網規劃得比較整齊,一個site對一個spot,order之間沒有交集,所以能夠逐個site求解。本文采用的方法是Saving Method。

# 輔助函數
## 賽題規定的兩點距離計算公式
p2pdist <- function(lng1, lat1, lng2, lat2){
  diff_lat = (lat1 - lat2)/2
  diff_lng = (lng1 - lng2)/2
  coors_sum = (sin((pi * diff_lat )/180))^2 + cos((pi * lat1)/180) * cos((pi * lat2)/180)*(sin((pi * diff_lng )/180))^2
  result = 2 * 6378137 * asin (sqrt(coors_sum))
  return(result)
}

## 賽題規定的配送員默認速度是250m/min
distance_time_cost <- function(distance, speed=250) {
  return(round(distance/speed))
}

## 賽題規定的訂單處理時間
processing_time_cost <- function(package_num) {
  return(round(3*sqrt(package_num) + 5))
}

# Saving Method
get_sub_data <- function(target.site, df.site, df.spot, df.e.order) {
  target.site.geo <- df.site %>%
    inner_join(df.e.order %>% 
                 group_by(Site_id) %>% 
                 dplyr::summarise(Num=sum(Num)),
               by=c("Site_id")) %>%
    filter(Site_id==target.site)
  target.spot.geo <- df.e.order %>% 
    filter(Site_id==target.site) %>%
    inner_join(df.spot, by=c("Spot_id")) %>%
    arrange(Spot_id)
  
  target.site.geo$Index <- 0
  target.spot.geo$Index <- 1:nrow(target.spot.geo)
  points <- rbind(target.site.geo %>% 
                    select(Index, Site_id, Lng, Lat, Num) %>% 
                    dplyr::rename(ID=Site_id), 
                  target.spot.geo %>%
                    select(Index, Spot_id, Lng, Lat, Num) %>%
                    dplyr::rename(ID=Spot_id)
  )
  return(points)
}

get_cost_matrix <- function(points.matrix) {
  cost.matrix <- matrix(0, nrow(points), nrow(points))
  for(i in 1:nrow(cost.matrix)) {
    for(j in i:nrow(cost.matrix)) {
      cost.matrix[i, j] = p2pdist(points.matrix[i, 1], 
                                        points.matrix[i, 2],
                                        points.matrix[j, 1],
                                        points.matrix[j, 2])
    }
  }
  return(cost.matrix)
}

get_saving_matrix <- function(cost.matrix) {
  saving.matrix <- matrix(0, nrow(points)-1, nrow(points)-1)
  for(i in 2:(nrow(cost.matrix)-1)) {
    for(j in (i+1):nrow(cost.matrix)) {
      saving.matrix[i-1, j-1] = cost.matrix[1, i] + cost.matrix[1, j] - cost.matrix[i, j]
    }
  }
  return(saving.matrix)
}

get_vrp_init_solution <- function(demand.vec, cost.matrix) {
  route <- list()
  for(i in 1:nrow(demand.vec)) {
    load = as.integer(demand.vec[i, "Num"])
    processing_time = processing_time_cost(load)
    distance = 2*cost.matrix[1, i+1]
    distance_time = distance_time_cost(distance)
    route <- list.append(route, 
        list(route_node=c(i), 
             load=load,
             processing_time=processing_time,
             distance=distance,
             ddistance_time=distance_time
             )
        )
  }
  return(route)
}

get_vrp_saving_solution <- function(route, saving.matrix, capacity=140) {
  saving.idx <- order(saving.matrix, decreasing = T)
  for(k in 1:length(saving.idx)) {
    cur_saving <- saving.matrix[saving.idx[k]]
    if(cur_saving <= 0) {
      break
    }
    i <- saving.idx[k] %% nrow(saving.matrix)
    j <- saving.idx[k] %/% nrow(saving.matrix) + 1
    p1.idx <- list.which(route, i %in% route_node)[1]
    p2.idx <- list.which(route, j %in% route_node)[1]
    p1 <- route[[p1.idx]]
    p2 <- route[[p2.idx]]
    # Condition 1: i and j not in the same route
    if(p1.idx != p2.idx) {
      total_load <- p1$load + p2$load
      total_distance <- p1$distance + p2$distance - cur_saving
      total_processing_time <- p1$processing_time + p2$processing_time
      
      # Condition 2: combine load still less than capacity
      if(total_load < capacity) {
        idx1 <- which(p1$route_node == i)
        idx2 <- which(p2$route_node == j)
        # Condition 3: both i or j at the head or end of the route
        if(idx1 == 1 & idx2 == 1) {
          new_route_node <- c(rev(p1$route_node), p2$route_node)
        } 
        else if(idx1 == length(p1$route_node) & idx2 == 1) {
          new_route_node <- c(p1$route_node, p2$route_node)
        }
        else if(idx1 == 1 & idx2 == length(p2$route_node)) {
          new_route_node <- c(p2$route_node, p1$route_node)
        }
        else if(idx1 == length(p1$route_node) & idx2 == length(p2$route_node)) {
          new_route_node <- c(p2$route_node, rev(p1$route_node))
        }
        else {
          next
        }
        route <- route %>%
          list.remove(c(p1.idx, p2.idx)) %>%
          list.append(list(route_node=new_route_node, load=total_load, processing_time=total_processing_time,
                           distance=total_distance, distance_time=distance_time_cost(total_distance)))
      }
    }
  }
  return(route)
}


## 繪製結果
plot_vrp_route <- function(route, points) {
  df.route <- route %>%
    list.map(list(spot_idx=str_c(route_node, collapse=","), load=load)) %>%
    list.stack() %>%
    mutate(id=1:length(route), Index=paste0("0,", spot_idx, ",0")) %>%
    cSplit(c("Index"), direction="long") %>%
    inner_join(points,
               by="Index") 
  target.site.geo <- points %>% filter(Index == 0)
  ls.route <- split(df.route, df.route$id)
  names(ls.route) <- NULL
  sl.route <- SpatialLines(llply(ls.route, points2spline, "Lng", "Lat", "id"))
  ids <- data.frame(names(sl.route))
  colnames(ids) <- "id"
  sldf.route <- SpatialLinesDataFrame(sl.route, ids, match.ID = "id")
  
  factpal <- colorFactor(brewer.pal(8, "Dark2"), domain=df.route$id)
  
  m <- leaflet() %>%
    addTiles(
      'http://webrd02.is.autonavi.com/appmaptile?lang=zh_cn&size=1&scale=1&style=8&x={x}&y={y}&z={z}',
      tileOptions(tileSize=256, minZoom=9, maxZoom=17),
      group="高德地圖"
    ) %>%
    addCircleMarkers(data=target.site.geo, radius=15, stroke=FALSE, 
                     fill=TRUE, fillColor="#E41A1C", fillOpacity=0.8,
                     group="Site") %>%
    addCircleMarkers(lng=~Lng, lat=~Lat, radius=3, data=df.route, color=~factpal(id), group="Spot") %>%
    addPolylines(data=sldf.route, weight=3, color=~factpal(id), group="配送路線") %>%
    fitBounds(sldf.route@bbox["x", "min"], sldf.route@bbox["y", "min"], sldf.route@bbox["x", "max"], sldf.route@bbox["y", "max"]) %>%
    addLayersControl(
      baseGroups = c("配送路線"),
      overlayGroups = c("Site", "Spot", "高德地圖"),
      options = layersControlOptions(collapsed = FALSE)
    )
  
  return(m)
}

## 普通VRP主函數
vrp_saving_method <- function(points) {
  points.matrix <- as.matrix(points %>% select(Lng, Lat))
  cost.matrix <- get_cost_matrix(points.matrix)
  demand.vec <- points %>% 
    filter(Index != 0) %>%
    select(ID, Num)
  init_route <- get_vrp_init_solution(demand.vec, cost.matrix)
  saving.matrix <- get_saving_matrix(cost.matrix)
  
  saving_route <- get_vrp_saving_solution(init_route, saving.matrix)
  return(saving_route)
}


## 取出一個例子
target.site <- "A083"
points <- get_sub_data(target.site, df.site, df.spot, df.e.order)
saving_route <- vrp_saving_method(points)
plot_vrp_route(saving_route, points)

clipboard.png

Saving Method規劃的結果整體來看質量仍是很不錯的。簡單說一下實現Saving Method的思路:構造cost矩陣和demand向量;構造初始解,即從site派專車送貨到spot而後返回site;算saving matrix;將saving從大到小排序,逐個取出,判斷這個saving對應的兩個路線合併後車輛Capacity或時間窗等約束是否被知足,若知足則合併路徑。其餘常見的構造性的啓發式算法還有Insert和Sweep;而後有一些聽過大名的模擬退火、禁忌搜索等算法,不甚瞭解。因爲學藝不精,時間有限,此題只能作到這裏打住了。

相關文章
相關標籤/搜索