特殊的數據結構：主席樹

時間 2020-12-08

標籤 node c++ git github 算法數組數據結構 ide 學習欄目 C&C++ 简体版

原文原文鏈接

最近學習了下主席樹，發現比想象中簡單，又發現網上的講解比較複雜，因而本身寫一篇簡易的指南，較難的問題慢慢補吧。node

什麼是主席樹呢

讓咱們來看一個經典的問題吧：c++

給定一個[1,n]的區間，m次操做，操做種類以下：git

1 L R：查詢[L,R]的區間和github

2 L R X：將[L,R]的值加上X算法

這種經典問題，想必你們學過線段樹後均可以輕鬆解決。然而若是再增長一種操做：數組

3 K：回退到第K次修改操做的結果數據結構

可見，若是題目要求回溯到歷史版本，那麼普通的線段樹就不能解決了，由於在每次更新操做後，線段樹存儲的內容就發生了改變，若是不進行特殊記錄，那麼這種改變將是永久的。所以，對於這種類型的題目，咱們能夠用到今天要討論的數據結構——主席樹來進行解決。ide

主席樹的實質，就是以最初的線段樹做爲模板，經過"結點複用「的方式，實現存儲多個線段樹。說的準確一點，他是 \(n\) 棵完整的權值線段樹，可是這 \(n\) 棵樹之間共用一些節點，使得內存開銷僅爲 \((nlogn)\)，因爲權值線段樹之間能夠加減，因此咱們能夠獲得序列任意區間的一棵權值線段樹。學習

主席樹是怎麼實現的

首先因爲主席樹是一顆可持久化線段樹，因此他本質上是一棵線段樹（這不是廢話嗎），因此咱們先畫一棵可愛的線段樹。ui

而後咱們對其中一個無辜的葉子節點進行修改，正常來講若是咱們要存儲以前沒修改的歷史版本，咱們就能夠再對修改過點的新線段樹再新建一個新線段樹，這樣咱們就一共有兩棵完整的線段樹了，就像下面同樣。

但經過肉眼的觀察咱們發現，只有紅色部分，也就是從被修改的葉子節點到根節點的一條鏈被修改了。因此咱們就想能不能兩棵線段樹共用沒有修改的節點呢？

固然能夠啊，這樣咱們就只要在新的線段樹上新建一條鏈，而後沒有修改的子節點就直接指向以前的那棵線段樹，這樣就能夠在只增長一條鏈的空間複雜度和時間複雜度的代價下存儲下了新的那棵歷史版本的線段樹。

一下講那麼多能夠難以理解，如今再搭配一道經典題來理解吧：【POJ 2104 K-th Number】（靜態區間求第K大）。

【Description】

You are working for Macrohard company in data structures department. After failing your previous task about key insertion you were asked to write a new data structure that would be able to return quickly k-th order statistics in the array segment.
That is, given an array a[1...n] of different integer numbers, your program must answer a series of questions Q(i, j, k) in the form: "What would be the k-th number in a[i...j] segment, if this segment was sorted?"
For example, consider the array a = (1, 5, 2, 6, 3, 7, 4). Let the question be Q(2, 5, 3). The segment a[2...5] is (5, 2, 6, 3). If we sort this segment, we get (2, 3, 5, 6), the third number is 5, and therefore the answer to the question is 5.

【Input】

The first line of the input file contains n --- the size of the array, and m --- the number of questions to answer (1 <= n <= 100 000, 1 <= m <= 5 000).
The second line contains n different integer numbers not exceeding 109 by their absolute values --- the array for which the answers should be given.
The following m lines contain question descriptions, each description consists of three numbers: i, j, and k (1 <= i <= j <= n, 1 <= k <= j - i + 1) and represents the question Q(i, j, k).

【Output】

For each question output the answer to it --- the k-th number in sorted a[i...j] segment.

題目大意：就是很簡單的給出一個長爲n的序列a，而後給出m個詢問，每次給出三個數x,y,k，而後須要咱們求出在序列a的區間【x，y】中，第k大的數是哪一個。

一看到第k大數，咱們蒟蒻的第一反應就是權值線段樹（有更加高級算法的大佬輕噴）。若是題目求的是整個區間的第k大數，確實能夠直接用裸的權值線段樹【若排序後第一個數到第x個數一共有k個數，那麼這個x就是第k大數】。

可是這道題須要咱們求解的是區間【x，y】中的第k大數，這時咱們就想，能不能對於任意【1，i】都開一棵權值線段樹呢？沒錯，這就是正解——主席樹。咱們只要對於每一個點i都開一棵只有一條鏈的不完整線段樹，剩下的未修改的節點，就直接指向第i-1個節點那些子節點就能夠了。

思路已經很清晰了，下面咱們就一步一步來完成這個算法。

首先咱們發現序列a中的數很大，直接開權值線段樹確定爆炸，因此咱們須要離散化，下面給出vector離散化的模板：

//離散化代碼
int getid(int x) {
    return (lower_bound(v.begin(), v.end(), x) - v.begin() + 1);
}  //求出原來的數字在離散化之後的數字
for (int i = 1; i <= n; i++)
    scanf("%d", &a[i]), v.push_back(a[i]);  //讀取序列a【i】
sort(v.begin(), v.end()), v.erase(unique(v.begin(), v.end()),
                                  v.end());  //對序列進行離散

接下來咱們就只要對每一個點建一棵線段樹，而後咱們知道權值線段樹是具備可加可減性的，因此在查詢【x，y】區間的時候，只要將第y棵權值線段樹（【1，y】）減去第x-1棵權值線段樹（【1，x-1】），獲得的就是【x，y】的權值線段樹。

因此基本的核心代碼就呼之欲出了:

for (int i = 1; i <= n; i++) update(1, n, root[i], root[i - 1], getid(a[i]));
for (int i = 1; i <= m; i++) {
    scanf("%d%d%d", &x, &y, &k);
    printf("%d\n", v[query(1, n, root[x - 1], root[y], k) - 1]);
}
	return 0;

剩下的問題就是，怎麼樣完成構建的update操做和查詢的query操做了;

首先是update操做：

void update(int l, int r, int &x, int y, int pos) {
    T[++cnt] = T[y], T[cnt].sum++, x = cnt;
    int mid = (l + r) / 2;
    if (l == r) return;
    if (pos <= mid)
        update(l, mid, T[x].l, T[y].l, pos);
    else
        update(mid + 1, r, T[x].r, T[y].r, pos);
}

l，r是區間的範圍，x，y是線段樹節點在T數組裏的位置，pos爲要加入的權值。接下來就很顯然了，咱們先新建一個空間，剛開始時左右兒子都和前一棵線段樹同樣，而後咱們就判斷要增長的權值在左半部分仍是右半部分，而後就逐層修改所需增長的權值就能夠了。

而後是query操做：

int query(int l, int r, int x, int y, int k) {
    if (l == r) return (l);
    int sum = T[T[y].l].sum - T[T[x].l].sum;  //求出【l，r】的權值線段樹。
    int mid = (l + r) / 2;
    if (k <= sum) return (query(l, mid, T[x].l, T[y].l, k));
    else return (query(mid + 1, r, T[x].r, T[y].r, k - sum));
}

類似的，咱們只須要像修改步驟同樣，逐層找到權值線段樹中的第k個節點是誰，就能夠求出第k大數了。

因此總的程序就很短小：

#include <bits/stdc++.h>
using namespace std;
const int Maxx = 1e5 + 6;
int n, m, cnt, a[Maxx], root[Maxx], x, y, k;
struct node {
    int l, r, sum;
} T[Maxx * 40];
vector<int> v;
int getid(int x) {
    return (lower_bound(v.begin(), v.end(), x) - v.begin() + 1);
}
void update(int l, int r, int &x, int y, int pos) {
    T[++cnt] = T[y], T[cnt].sum++, x = cnt;
    int mid = (l + r) / 2;
    if (l == r) return;
    if (pos <= mid)
        update(l, mid, T[x].l, T[y].l, pos);
    else
        update(mid + 1, r, T[x].r, T[y].r, pos);
}
int query(int l, int r, int x, int y, int k) {
    if (l == r) return (l);
    int sum = T[T[y].l].sum - T[T[x].l].sum;
    int mid = (l + r) / 2;
    if (k <= sum)
        return (query(l, mid, T[x].l, T[y].l, k));
    else
        return (query(mid + 1, r, T[x].r, T[y].r, k - sum));
}
int main() {
    scanf("%d%d", &n, &m);
    for (int i = 1; i <= n; i++) scanf("%d", &a[i]), v.push_back(a[i]);
    sort(v.begin(), v.end()), v.erase(unique(v.begin(), v.end()), v.end());
    for (int i = 1; i <= n; i++)
        update(1, n, root[i], root[i - 1], getid(a[i]));
    for (int i = 1; i <= m; i++) {
        scanf("%d%d%d", &x, &y, &k);
        printf("%d\n", v[query(1, n, root[x - 1], root[y], k) - 1]);
    }
    return 0;
}