JavaScript數組高性能去重解決方案

時間 2019-11-16

標籤 javascript 數組高性能解決方案欄目 JavaScript 简体版

原文原文鏈接

在大多數的人眼裏，數組去重是一個很簡單的課題，不少人甚至熟練掌握了多種數組去重的方法，然而大多時候，咱們卻忽略了數組去重所消耗的時間資源。譬如咱們在作前端性能優化的時候，又有多少人會考慮JavaScript的運行性能。今天，我將經過一組測試數據來給你們展現高性能數組去重的必要性。固然以上僅針對像我這樣的強迫症患者，😄。前端

先展現下結論，有些不喜歡看過程的同窗能夠直接拿去用，固然你也可使用本人的高性能js工具集：npm i efficient-jsgit

// 最高性能數組去重方法 10萬數量級：3毫秒，100萬數量級：6毫秒，1000萬數量級36毫秒
Array.prototype.distinct = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}

1、收集數組去重的方法github

一、遍歷數組法：實現思路：新建一個數組，遍歷去要重的數組，當值不在新數組的時候（indexOf爲-1）就加入該新數組中；npm

二、數組下標判斷法：實現思路：若是當前數組的第 i 項在當前數組中第一次出現的位置不是 i，那麼表示第 i 項是重複的，忽略掉。不然存入結果數組。數組

三、排序後相鄰去除法：實現思路：給傳入的數組排序，排序後相同的值會相鄰，而後遍歷排序後數組時，新數組只加入不與前一值重複的值。性能優化

四、優化遍歷數組法（推薦）：實現思路：雙層循環，外循環表示從0到arr.length，內循環表示從i+1到arr.length，將沒重複的右邊值放入新數組。（檢測到有重複值時終止當前循環同時進入外層循環的下一輪判斷）數據結構

五、ES6實現：dom

a、實現思路：ES6提供了新的數據結構Set。它相似於數組，可是成員的值都是惟一的，沒有重複的值。Set函數能夠接受一個數組（或相似數組的對象）做爲參數，用來初始化。前端性能

b、實現思路：Array.filter() + indexOf函數

六、雙重 for 循環（最容易理解）：實現思路：外層循環遍歷元素，內層循環檢查是否重複，當有重複值的時候，可使用 push()，也可使用 splice()

七、for...of + includes() （雙重for循環的升級版）：實現思路：外層用 for...of 語句替換 for 循環，把內層循環改成 includes()，先建立一個空數組，當 includes() 返回 false 的時候，就將該元素 push 到空數組中，相似的，還能夠用 indexOf() 來替代 includes()

八、Array.sort()：實現思路：首先使用 sort() 將數組進行排序，而後比較相鄰元素是否相等，從而排除重複項

九、for...of + Object： 實現思路：利用Object惟一key的特性來實現去重

以上的方法去重都沒問題，均可以實現數組去重的目的，可是性能差距很大，可能處理10000條數據之內的表現不回太明顯，可是不管哪一個程序都是由不少不少的指令組成的，假如你不去關注每一條指令的優化，而想固然的想直接優化程序，那麼你註定會失敗。「毋以善小而不爲」，雖然用在這裏有些欠妥，但大致就是這個意思，一切都要從細節入手。咱們先看看上面的去重方法，方法不少，咱們需理一下，雖然上面的方法看起來思路不同，可是能夠分爲兩類：遍歷數組和直接使用內置方法

然而遍歷數組的方式有不少種，甚至不止上面的這些方法，咱們首先對比遍歷數組的方法的效率，而後在對比其餘的

2、創建多維度的測試模板並驗證

如下是測試結果的環境

首先咱們列出全部的遍歷數組的方法

function getRandomIntInclusive(min, max) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return Math.floor(Math.random() * (max - min + 1)) + min; //The maximum is inclusive and the minimum is inclusive 
}
var orgArray = Array.from(new Array(100000), ()=>{
    return getRandomIntInclusive(1, 1000);
})

// 普通for循環
Array.prototype.distinct1 = function () {
  var hash=[];
  for (i = 0; i < this.length; i++) {
     if(hash.indexOf(this[i])==-1){
      hash.push(this[i]);
     }
  }
  return hash;
}
// 優化版for循環
Array.prototype.distinct2 = function () {
  var hash=[];
  for (i = 0, len = this.length; i < len; i++) {
     if(hash.indexOf(this[i])==-1){
      hash.push(this[i]);
     }
  }
  return hash;
}
// 弱化版for循環
Array.prototype.distinct3 = function () {
  var hash=[];
  for (i = 0; this[i] != null; i++) {
     if(hash.indexOf(this[i])==-1){
      hash.push(this[i]);
     }
  }
  return hash;
}
// foreach
Array.prototype.distinct4 = function () {
  var hash=[];
  this.forEach(item => {
    if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  })
  return hash;
}
// foreach變種
Array.prototype.distinct5 = function () {
  var hash=[];
  Array.prototype.forEach.call(this, item => {
    if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  })
  return hash;
}
// forin
Array.prototype.distinct6 = function () {
  var hash=[];
  for (key in this) {
　　　var item = this[key]
     if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  }
  return hash;
}
// map
Array.prototype.distinct7 = function () {
  var hash=[];
  this.map(item => {
     if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  });
  return hash;
}
// forof
Array.prototype.distinct8 = function () {
  var hash=[];
  for (let item of this) {
     if(hash.indexOf(item)==-1){
      hash.push(item);
     }
  }
  return hash;
}
var startTime,endTime, rtn;
function test(types) {
  types.forEach(type => {
    startTime = new Date();
    rtn = orgArray[type]();
    endTime = new Date();
    console.log(`數量級[${orgArray.length/10000}萬]去重後數組長度爲${rtn.length},使用${type}消耗時長${endTime - startTime}毫秒`)
    console.log('----------------------------------------------------------------------');
  })
}
var testArray = [];
for (i = 1; i <= 8;i++) {
  testArray.push('distinct' + i);
}
test(testArray)

輸出結果：

　　把數據量級提升到100萬、1000萬，測試結果以下：

基於以上測試結果咱們能夠排除forin，可是其餘的遍歷數組的方法相差不到，咱們取目前表現最好的弱化版for循環（其實針對咱們的測試環境是強化版，哈哈）

(distinct6的去重後結果竟然是1008，這個其實個for in的遍歷機制有關，for in會遍歷其原型鏈，因此for in不適合遍歷數組，具體參考forin和forof的區別)

　　上代碼

function getRandomIntInclusive(min, max) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return Math.floor(Math.random() * (max - min + 1)) + min; //The maximum is inclusive and the minimum is inclusive 
}
var orgArray = Array.from(new Array(100000), ()=>{
    return getRandomIntInclusive(1, 1000);
})


// indexOf
Array.prototype.distinct1 = function () {
  var hash=[];
  for (i = 0; this[i] != null; i++) {
     if(hash.indexOf(this[i])==-1){
      hash.push(this[i]);
     }
  }
  return hash;
}

// 數組下標判斷法
Array.prototype.distinct2 = function () {
  var hash=[];
  for (i = 0; this[i] != null; i++) {
     if(this.indexOf(this[i])==i){
      hash.push(this[i]);
     }
  }
  return hash;
}

// includes
Array.prototype.distinct3 = function () {
  var hash=[];
  for (i = 0; this[i] != null; i++) {
     if(!hash.includes(this[i])){
      hash.push(this[i]);
     }
  }
  return hash;
}

// Object
Array.prototype.distinct4 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
     if(!obj[i]){
      hash.push(this[i]);
      obj[i] = this[i];
     }
  }
  return hash;
}

var startTime,endTime, rtn;
function test(types) {
  types.forEach(type => {
    startTime = new Date();
    rtn = orgArray[type]();
    endTime = new Date();
    console.log(`數量級[${orgArray.length/10000}萬]去重後數組長度爲${rtn.length},使用${type}消耗時長${endTime - startTime}毫秒`)
    console.log('----------------------------------------------------------------------');
  })
}
var testArray = [];
for (i = 1; i <= 4;i++) {
  testArray.push('distinct' + i);
}
test(testArray)

　　測試結果以下

　　這個結果就拉開差距了，在看下100萬和1000萬的結果

　　結論很明確，數據量級呈線性增加，正常的遍歷數組的方式中中用objec的方法效率大大領先其餘方法，咱們再次來回顧下object方法的實現思路，

　　利用Object惟一key的特性來實現去重　　

　　爲了保證不漏掉，咱們object的去重取執行全部的遍歷方法

function getRandomIntInclusive(min, max) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return Math.floor(Math.random() * (max - min + 1)) + min; //The maximum is inclusive and the minimum is inclusive 
}
var orgArray = Array.from(new Array(100000), ()=>{
    return getRandomIntInclusive(1, 1000);
})

// 普通for循環
Array.prototype.distinct1 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; i < this.length; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// 優化版for循環
Array.prototype.distinct2 = function () {
  var hash=[];
  var obj = {};
  for (i = 0, len = this.length; i < len; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// 弱化版for循環
Array.prototype.distinct3 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// foreach
Array.prototype.distinct4 = function () {
  var hash=[];
  var obj = {};
  this.forEach(item => {
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  })
  return hash;
}
// foreach變種
Array.prototype.distinct5 = function () {
  var hash=[];
  var obj = {};
  Array.prototype.forEach.call(this, item => {
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  })
  return hash;
}
// forin
Array.prototype.distinct6 = function () {
  var hash=[];
  var obj = {};
  for (key in this) {
　　var item = this[key];
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  }
  return hash;
}
// map
Array.prototype.distinct7 = function () {
  var hash=[];
  var obj = {};
  this.map(item => {
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  });
  return hash;
}
// forof
Array.prototype.distinct8 = function () {
  var hash=[];
  var obj = {};
  for (let item of this) {
    if(!obj[item]){
      hash.push(item);
      obj[item] = item;
    }
  }
  return hash;
}
var startTime,endTime, rtn;
function test(types) {
  types.forEach(type => {
    startTime = new Date();
    rtn = orgArray[type]();
    endTime = new Date();
    console.log(`數量級[${orgArray.length/10000}萬]去重後數組長度爲${rtn.length},使用${type}消耗時長${endTime - startTime}毫秒`)
    console.log('----------------------------------------------------------------------');
  })
}
var testArray = [];
for (i = 1; i <= 8;i++) {
  testArray.push('distinct' + i);
}
test(testArray)

　　結論以下

　　依然再次測試100萬和1000萬的，結果以下

　　能夠得出結論，普通的for循環（優化版、弱化版）加上Object的組合在遍歷數組的方法大幅領先其餘方法，咱們如今可使用object與其餘方法進行對比來

　　(distinct6的去重後結果竟然是1008，這個其實個for in的遍歷機制有關，for in會遍歷其原型鏈，因此for in不適合遍歷數組，具體參考forin和forof的區別)

function getRandomIntInclusive(min, max) {
  min = Math.ceil(min);
  max = Math.floor(max);
  return Math.floor(Math.random() * (max - min + 1)) + min; //The maximum is inclusive and the minimum is inclusive 
}
var orgArray = Array.from(new Array(100000), ()=>{
    return getRandomIntInclusive(1, 1000);
})

// 普通for循環
Array.prototype.distinct1 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; i < this.length; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// 優化版for循環
Array.prototype.distinct2 = function () {
  var hash=[];
  var obj = {};
  for (i = 0, len = this.length; i < len; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// 弱化版for循環
Array.prototype.distinct3 = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}
// new Set() + [...]
Array.prototype.distinct4 = function () {
  return [...new Set(this)];
}
// new Set() + Array.from
Array.prototype.distinct5 = function () {
  return Array.from(new Set(this));
}
// Array.filter() + Object
Array.prototype.distinct6 = function () {
  var hash=[];
  var obj = {};
  return this.filter(item => {
    if (!obj[item]) {
      obj[item] = item;
      return true;
    }
  })
}
// 雙重for循環
Array.prototype.distinct7 = function () {
  let arr = [...this];
  for (let i=0, len=arr.length; i<len; i++) {
    for (let j=i+1; j<len; j++) {
      if (arr[i] == arr[j]) {
        arr.splice(j, 1);
        // splice 會改變數組長度，因此要將數組長度 len 和下標 j 減一
        len--;
        j--;
      }
    }
  }
  return arr
}
// Array.sort()
Array.prototype.distinct8 = function () {
  var arr = this.sort()
  let result = [arr[0]]

  for (let i=1, len=arr.length; i<len; i++) {
    arr[i] !== arr[i-1] && result.push(arr[i])
  }
  return result
}
var startTime,endTime, rtn;
function test(types) {
  types.forEach(type => {
    startTime = new Date();
    rtn = orgArray[type]();
    endTime = new Date();
    console.log(`數量級[${orgArray.length/10000}萬]去重後數組長度爲${rtn.length},使用${type}消耗時長${endTime - startTime}毫秒`)
    console.log('----------------------------------------------------------------------');
  })
}
var testArray = [];
for (i = 1; i <= 8;i++) {
  testArray.push('distinct' + i);
}
test(testArray)

　　結果以下

　　基本上能夠去除雙重for循環和Array.sort()，咱們再次測試100萬數量級別的

　　雙重for循環執行掛掉了。。。看下30萬的

　　咱們去除雙重for循環執行100萬和1000萬

　　很明顯普通for循環（包含優化版和弱化版）+ Object遙遙領先。

3、給出測試結果報告

　　　經過屢次執行普通for循環、優化版、弱化版，最終得出結論，效率：弱化版>普通版>優化版

　　弱化版能夠改名特殊版，哈哈

// 最高性能數組去重方法 10萬數量級：3毫秒，100萬數量級：6毫秒，1000萬數量級36毫秒
Array.prototype.distinct = function () {
  var hash=[];
  var obj = {};
  for (i = 0; this[i] != null; i++) {
    if(!obj[this[i]]){
      hash.push(this[i]);
      obj[this[i]] = this[i];
    }
  }
  return hash;
}