開源網站流量統計系統Piwik源碼分析——參數統計(一)

  Piwik現已更名爲Matomo,這是一套國外著名的開源網站統計系統,相似於百度統計、Google Analytics等系統。最大的區別就是能夠看到其中的源碼,這正合我意。由於我一直對統計的系統很好奇,很想知道里面的運行原理是怎麼樣的,碰巧了解到有這麼一個系統,所以立刻嘗試了一下。國內關於該系統的相關資料比較匱乏,大可能是分享怎麼安裝的,並無找到有關源碼分析的文章。下面先對其作個初步的分析,後面會愈來愈詳細,本人目前的職位是前端,所以會先分析腳本代碼,然後再分析後臺代碼。javascript

1、總體概況

  Piwik的官網是matomo.org,使用PHP編寫的,而我之前就是PHP工程師,所以看代碼不會有障礙。目前最新版本是3.6,Github地址是matomo-org/matomo,打開地址將會看到下圖中的內容(只截取了關鍵部分)。php

  打開js文件夾,裏面的piwik.js就是本次要分析的腳本代碼(以下圖紅色框出部分),內容比較多,有7838行代碼。html

  先把系統的代碼都下載下來,而後在本地配置虛擬目錄,再開始安裝。在安裝的時候能夠選擇語言,該系統支持簡體中文(注意下圖中紅色框出的部分)。系統會執行一些操做(注意看下圖左邊部分),包括檢查當前環境可否安裝、創建數據庫等,按照提示一步一步來就行,比較簡單,沒啥難度。前端

 

  安裝完後就會自動跳轉到後臺界面(以下圖所示),有圖表,有分析,和經常使用的統計系統差很少。功能還沒細看,只作了初步的瞭解,界面的友好度仍是蠻不錯的。java

  嵌到頁面中的JavaScript代碼與其它統計系統也相似,以下所示,也是用異步加載的方式,只是發送的請求地址沒有假裝成圖像地址(注意看標紅的那句代碼)。git

<script type="text/javascript">
      var _paq = _paq || [];
      /* tracker methods like "setCustomDimension" should be called before "trackPageView" */
      _paq.push(['trackPageView']);
      _paq.push(['enableLinkTracking']);
      (function() {
        var u="//loc.piwik.cn/";    //自定義
        _paq.push(['setTrackerUrl', u+'piwik.php']);
        _paq.push(['setSiteId', '1']);
        var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
        g.type='text/javascript'; g.async=true; g.defer=true; g.src='piwik.js'; s.parentNode.insertBefore(g,s);
      })();
</script>

  在頁面中嵌入這段腳本後,頁面在刷新的時候,會有下圖中的請求。在請求中帶了一大堆的參數,在後面的內容中會對每一個參數作釋義。github

2、腳本拆分

  7000多行的腳本,固然不能一行一行的讀,須要先拆分,拆成一個一個的模塊,而後再逐個分析。腳本之因此這麼大,是由於裏面編寫了大量代碼來兼容各個版本的瀏覽器,這其中甚至包括IE四、Firefox1.0、Netscape等骨灰級的瀏覽器。接下來我把源碼拆分紅6個部分,分別是json、private、query、content-overlay、tracker和piwik,以下圖紅線框出的所示,piwik-all中包含了所有代碼,便於對比。代碼已上傳到Githubweb

  json.js是一個開源插件JSON3,爲了兼容不支持JSON對象的瀏覽器而設計的,這裏面的代碼能夠單獨研究。private.js包含了一些用於全局的私有變量和私有函數,例如定義系統對象的別名、判斷類型等。query.js中包含了不少操做HTML元素的方法,例如設置元素屬性、查詢某個CSS類的元素等,它相似於一個微型的jQuery庫,不過有許多獨特的功能。content-overlay.js有兩部分組成,一部分包含內容追蹤以及URL拼接等功能,另外一部分是用來處理嵌套的頁面,這裏面具體沒有細看。tracker.js中只有一個Tracker()函數,不過內容最多,有4700多行,主要的統計邏輯都在這裏了。piwik.js中內容很少,包含一些初始化和插件的鉤子等功能,鉤子具體怎麼運做的還沒細看。數據庫

  雖然分紅了6部分,可是各部分的內容仍是蠻多的,而且內容之間是有聯繫的,所以短期的話,很難搞清楚其中全部的門道。我就挑了一點我我的感受最重要的先作分析。json

1)3種傳送數據的方式

  我原先只知道兩種傳送數據的方式,一種是經過Ajax的方式,另外一種是建立一個Image對象,而後爲其定義src屬性,數據做爲URL的參數傳遞給後臺,這種方式很通用,而且還能完美解決跨域問題。我之前編寫的一個性能參數蒐集的插件primus.js,也是這麼傳送數據的。在閱讀源碼的時候,發現了第三種傳送數據的方式,使用Navigator對象的sendBeacon()

  MDN上說:「此方法可用於經過HTTP將少許數據異步傳輸到Web服務器」。雖然這個方法有兼容問題,但我仍是被震撼到了。它很適合統計的場景,MDN上又講到:「統計代碼會在頁面關閉(window.onunload)以前向web服務器發送數據,但過早的發送數據可能錯過收集數據的機會。然而, 要保證在頁面關閉期間發送數據一直比較困難,由於瀏覽器一般會忽略在卸載事件中產生的異步請求 。在使用sendBeacon()方法後,能使瀏覽器在有機會時異步地向服務器發送數據,同時不會延遲頁面的卸載或影響下一頁的載入。這就解決了提交分析數據時的全部的問題:使它可靠,異步而且不會影響下一頁面的加載,而且代碼更簡單」。下面是代碼片斷(注意看標紅的那句代碼),存在於tracker.js中。

function sendPostRequestViaSendBeacon(request) {
  var supportsSendBeacon =
    "object" === typeof navigatorAlias &&
    "function" === typeof navigatorAlias.sendBeacon &&
    "function" === typeof Blob;
  if (!supportsSendBeacon) {
    return false;
  }
  var headers = {
    type: "application/x-www-form-urlencoded; charset=UTF-8"
  };
  var success = false;
  try {
    var blob = new Blob([request], headers);
    success = navigatorAlias.sendBeacon(configTrackerUrl, blob);
    // returns true if the user agent is able to successfully queue the data for transfer,
    // Otherwise it returns false and we need to try the regular way
  } catch (e) {
    return false;
  }
  return success;
}

2)參數釋義

  下面的方法(存在於tracker.js中)專門用於蒐集頁面中的統計數據,將它們拼接成指定連接的參數,而這條連接中的參數最終將會發送給服務器。

/**
 * Returns the URL to call piwik.php,
 * with the standard parameters (plugins, resolution, url, referrer, etc.).
 * Sends the pageview and browser settings with every request in case of race conditions.
 */
function getRequest(request, customData, pluginMethod, currentEcommerceOrderTs) {
  var i,
    now = new Date(),
    nowTs = Math.round(now.getTime() / 1000),
    referralTs,
    referralUrl,
    referralUrlMaxLength = 1024,
    currentReferrerHostName,
    originalReferrerHostName,
    customVariablesCopy = customVariables,
    cookieSessionName = getCookieName("ses"),
    cookieReferrerName = getCookieName("ref"),
    cookieCustomVariablesName = getCookieName("cvar"),
    cookieSessionValue = getCookie(cookieSessionName),
    attributionCookie = loadReferrerAttributionCookie(),
    currentUrl = configCustomUrl || locationHrefAlias,
    campaignNameDetected,
    campaignKeywordDetected;

  if (configCookiesDisabled) {
    deleteCookies();
  }

  if (configDoNotTrack) {
    return "";
  }

  var cookieVisitorIdValues = getValuesFromVisitorIdCookie();
  if (!isDefined(currentEcommerceOrderTs)) {
    currentEcommerceOrderTs = "";
  }

  // send charset if document charset is not utf-8. sometimes encoding
  // of urls will be the same as this and not utf-8, which will cause problems
  // do not send charset if it is utf8 since it's assumed by default in Piwik
  var charSet = documentAlias.characterSet || documentAlias.charset;

  if (!charSet || charSet.toLowerCase() === "utf-8") {
    charSet = null;
  }

  campaignNameDetected = attributionCookie[0];
  campaignKeywordDetected = attributionCookie[1];
  referralTs = attributionCookie[2];
  referralUrl = attributionCookie[3];

  if (!cookieSessionValue) {
    // cookie 'ses' was not found: we consider this the start of a 'session'

    // here we make sure that if 'ses' cookie is deleted few times within the visit
    // and so this code path is triggered many times for one visit,
    // we only increase visitCount once per Visit window (default 30min)
    var visitDuration = configSessionCookieTimeout / 1000;
    if (
      !cookieVisitorIdValues.lastVisitTs ||
      nowTs - cookieVisitorIdValues.lastVisitTs > visitDuration
    ) {
      cookieVisitorIdValues.visitCount++;
      cookieVisitorIdValues.lastVisitTs = cookieVisitorIdValues.currentVisitTs;
    }

    // Detect the campaign information from the current URL
    // Only if campaign wasn't previously set
    // Or if it was set but we must attribute to the most recent one
    // Note: we are working on the currentUrl before purify() since we can parse the campaign parameters in the hash tag
    if (
      !configConversionAttributionFirstReferrer ||
      !campaignNameDetected.length
    ) {
      for (i in configCampaignNameParameters) {
        if (
          Object.prototype.hasOwnProperty.call(configCampaignNameParameters, i)
        ) {
          campaignNameDetected = getUrlParameter(
            currentUrl,
            configCampaignNameParameters[i]
          );

          if (campaignNameDetected.length) {
            break;
          }
        }
      }

      for (i in configCampaignKeywordParameters) {
        if (
          Object.prototype.hasOwnProperty.call(
            configCampaignKeywordParameters,
            i
          )
        ) {
          campaignKeywordDetected = getUrlParameter(
            currentUrl,
            configCampaignKeywordParameters[i]
          );

          if (campaignKeywordDetected.length) {
            break;
          }
        }
      }
    }

    // Store the referrer URL and time in the cookie;
    // referral URL depends on the first or last referrer attribution
    currentReferrerHostName = getHostName(configReferrerUrl);
    originalReferrerHostName = referralUrl.length
      ? getHostName(referralUrl)
      : "";

    if (
      currentReferrerHostName.length && // there is a referrer
      !isSiteHostName(currentReferrerHostName) && // domain is not the current domain
      (!configConversionAttributionFirstReferrer || // attribute to last known referrer
      !originalReferrerHostName.length || // previously empty
        isSiteHostName(originalReferrerHostName))
    ) {
      // previously set but in current domain
      referralUrl = configReferrerUrl;
    }

    // Set the referral cookie if we have either a Referrer URL, or detected a Campaign (or both)
    if (referralUrl.length || campaignNameDetected.length) {
      referralTs = nowTs;
      attributionCookie = [
        campaignNameDetected,
        campaignKeywordDetected,
        referralTs,
        purify(referralUrl.slice(0, referralUrlMaxLength))
      ];

      setCookie(
        cookieReferrerName,
        JSON_PIWIK.stringify(attributionCookie),
        configReferralCookieTimeout,
        configCookiePath,
        configCookieDomain
      );
    }
  }

  // build out the rest of the request
  request +=
    "&idsite=" +
    configTrackerSiteId +
    "&rec=1" +
    "&r=" +
    String(Math.random()).slice(2, 8) + // keep the string to a minimum
    "&h=" +
    now.getHours() +
    "&m=" +
    now.getMinutes() +
    "&s=" +
    now.getSeconds() +
    "&url=" +
    encodeWrapper(purify(currentUrl)) +
    (configReferrerUrl.length
      ? "&urlref=" + encodeWrapper(purify(configReferrerUrl))
      : "") +
    (configUserId && configUserId.length
      ? "&uid=" + encodeWrapper(configUserId)
      : "") +
    "&_id=" +
    cookieVisitorIdValues.uuid +
    "&_idts=" +
    cookieVisitorIdValues.createTs +
    "&_idvc=" +
    cookieVisitorIdValues.visitCount +
    "&_idn=" +
    cookieVisitorIdValues.newVisitor + // currently unused
    (campaignNameDetected.length
      ? "&_rcn=" + encodeWrapper(campaignNameDetected)
      : "") +
    (campaignKeywordDetected.length
      ? "&_rck=" + encodeWrapper(campaignKeywordDetected)
      : "") +
    "&_refts=" +
    referralTs +
    "&_viewts=" +
    cookieVisitorIdValues.lastVisitTs +
    (String(cookieVisitorIdValues.lastEcommerceOrderTs).length
      ? "&_ects=" + cookieVisitorIdValues.lastEcommerceOrderTs
      : "") +
    (String(referralUrl).length
      ? "&_ref=" +
        encodeWrapper(purify(referralUrl.slice(0, referralUrlMaxLength)))
      : "") +
    (charSet ? "&cs=" + encodeWrapper(charSet) : "") +
    "&send_image=0";

  // browser features
  for (i in browserFeatures) {
    if (Object.prototype.hasOwnProperty.call(browserFeatures, i)) {
      request += "&" + i + "=" + browserFeatures[i];
    }
  }

  var customDimensionIdsAlreadyHandled = [];
  if (customData) {
    for (i in customData) {
      if (
        Object.prototype.hasOwnProperty.call(customData, i) &&
        /^dimension\d+$/.test(i)
      ) {
        var index = i.replace("dimension", "");
        customDimensionIdsAlreadyHandled.push(parseInt(index, 10));
        customDimensionIdsAlreadyHandled.push(String(index));
        request += "&" + i + "=" + customData[i];
        delete customData[i];
      }
    }
  }

  if (customData && isObjectEmpty(customData)) {
    customData = null;
    // we deleted all keys from custom data
  }

  // custom dimensions
  for (i in customDimensions) {
    if (Object.prototype.hasOwnProperty.call(customDimensions, i)) {
      var isNotSetYet =
        -1 === indexOfArray(customDimensionIdsAlreadyHandled, i);
      if (isNotSetYet) {
        request += "&dimension" + i + "=" + customDimensions[i];
      }
    }
  }

  // custom data
  if (customData) {
    request += "&data=" + encodeWrapper(JSON_PIWIK.stringify(customData));
  } else if (configCustomData) {
    request += "&data=" + encodeWrapper(JSON_PIWIK.stringify(configCustomData));
  }

  // Custom Variables, scope "page"
  function appendCustomVariablesToRequest(customVariables, parameterName) {
    var customVariablesStringified = JSON_PIWIK.stringify(customVariables);
    if (customVariablesStringified.length > 2) {
      return (
        "&" + parameterName + "=" + encodeWrapper(customVariablesStringified)
      );
    }
    return "";
  }

  var sortedCustomVarPage = sortObjectByKeys(customVariablesPage);
  var sortedCustomVarEvent = sortObjectByKeys(customVariablesEvent);

  request += appendCustomVariablesToRequest(sortedCustomVarPage, "cvar");
  request += appendCustomVariablesToRequest(sortedCustomVarEvent, "e_cvar");

  // Custom Variables, scope "visit"
  if (customVariables) {
    request += appendCustomVariablesToRequest(customVariables, "_cvar");

    // Don't save deleted custom variables in the cookie
    for (i in customVariablesCopy) {
      if (Object.prototype.hasOwnProperty.call(customVariablesCopy, i)) {
        if (customVariables[i][0] === "" || customVariables[i][1] === "") {
          delete customVariables[i];
        }
      }
    }

    if (configStoreCustomVariablesInCookie) {
      setCookie(
        cookieCustomVariablesName,
        JSON_PIWIK.stringify(customVariables),
        configSessionCookieTimeout,
        configCookiePath,
        configCookieDomain
      );
    }
  }

  // performance tracking
  if (configPerformanceTrackingEnabled) {
    if (configPerformanceGenerationTime) {
      request += "&gt_ms=" + configPerformanceGenerationTime;
    } else if (
      performanceAlias &&
      performanceAlias.timing &&
      performanceAlias.timing.requestStart &&
      performanceAlias.timing.responseEnd
    ) {
      request +=
        "&gt_ms=" +
        (performanceAlias.timing.responseEnd -
          performanceAlias.timing.requestStart);
    }
  }

  if (configIdPageView) {
    request += "&pv_id=" + configIdPageView;
  }

  // update cookies
  cookieVisitorIdValues.lastEcommerceOrderTs =
    isDefined(currentEcommerceOrderTs) && String(currentEcommerceOrderTs).length
      ? currentEcommerceOrderTs
      : cookieVisitorIdValues.lastEcommerceOrderTs;
  setVisitorIdCookie(cookieVisitorIdValues);
  setSessionCookie();

  // tracker plugin hook
  request += executePluginMethod(pluginMethod, {
    tracker: trackerInstance,
    request: request
  });

  if (configAppendToTrackingUrl.length) {
    request += "&" + configAppendToTrackingUrl;
  }

  if (isFunction(configCustomRequestContentProcessing)) {
    request = configCustomRequestContentProcessing(request);
  }

  return request;
}

  統計代碼每次都會傳送數據,而每次請求都會帶上一大串的參數,這些參數都是簡寫,下面作個簡單說明(若有不正確的地方,歡迎指正),部分參數還沒做出合適的解釋,例如UUID的生成規則等。首先將這些參數分爲兩部分,第一部分以下所列:

一、idsite:網站ID

二、rec:1(寫死)

三、r:隨機碼

四、h:當前小時

五、m:當前分鐘

六、s:當前秒數

七、url:當前純淨地址,只留域名和協議

八、_id:UUID

九、_idts:訪問的時間戳

十、_idvc:訪問數

十一、_idn:新訪客(目前還沒有使用)

十二、_refts:訪問來源的時間戳

1三、_viewts:上一次訪問的時間戳

1四、cs:當前頁面的字符編碼

1五、send_image:是否用圖像請求方式傳輸數據

1六、gt_ms:內容加載消耗的時間(響應結束時間減去請求開始時間)

1七、pv_id:惟一性標識

  再列出第二部分,用於統計瀏覽器的功能,經過Navigator對象的屬性(mimeTypes、javaEnabled等)和Screen對象的屬性(width與height)得到。

一、pdf:是否支持pdf文件類型

二、qt:是否支持QuickTime Player播放器

三、realp:是否支持RealPlayer播放器

四、wma:是否支持MPlayer播放器

五、dir:是否支持Macromedia Director

六、fla:是否支持Adobe FlashPlayer

七、java:是否激活了Java

八、gears:是否安裝了Google Gears

九、ag:是否安裝了Microsoft Silverlight

十、cookie:是否啓用了Cookie

十一、res:屏幕的寬和高(未正確計算高清顯示器)

  上面這11個參數的獲取代碼,能夠參考下面這個方法(一樣存在於tracker.js中),注意看代碼中的pluginMap變量(已標紅),它保存了多個MIME類型,用來檢測是否安裝或啓用了指定的插件或功能。

/*
* Browser features (plugins, resolution, cookies)
*/
function detectBrowserFeatures() {
  var i,
    mimeType,
    pluginMap = {
      // document types
      pdf: "application/pdf",

      // media players
      qt: "video/quicktime",
      realp: "audio/x-pn-realaudio-plugin",
      wma: "application/x-mplayer2",

      // interactive multimedia
      dir: "application/x-director",
      fla: "application/x-shockwave-flash",

      // RIA
      java: "application/x-java-vm",
      gears: "application/x-googlegears",
      ag: "application/x-silverlight"
    };

  // detect browser features except IE < 11 (IE 11 user agent is no longer MSIE)
  if (!new RegExp("MSIE").test(navigatorAlias.userAgent)) {
    // general plugin detection
    if (navigatorAlias.mimeTypes && navigatorAlias.mimeTypes.length) {
      for (i in pluginMap) {
        if (Object.prototype.hasOwnProperty.call(pluginMap, i)) {
          mimeType = navigatorAlias.mimeTypes[pluginMap[i]];
          browserFeatures[i] = mimeType && mimeType.enabledPlugin ? "1" : "0";
        }
      }
    }

    // Safari and Opera
    // IE6/IE7 navigator.javaEnabled can't be aliased, so test directly
    // on Edge navigator.javaEnabled() always returns `true`, so ignore it
    if (
      !new RegExp("Edge[ /](\\d+[\\.\\d]+)").test(navigatorAlias.userAgent) &&
      typeof navigator.javaEnabled !== "unknown" &&
      isDefined(navigatorAlias.javaEnabled) &&
      navigatorAlias.javaEnabled()
    ) {
      browserFeatures.java = "1";
    }

    // Firefox
    if (isFunction(windowAlias.GearsFactory)) {
      browserFeatures.gears = "1";
    }

    // other browser features
    browserFeatures.cookie = hasCookies();
  }

  var width = parseInt(screenAlias.width, 10);
  var height = parseInt(screenAlias.height, 10);
  browserFeatures.res = parseInt(width, 10) + "x" + parseInt(height, 10);
}

除了上述20多個參數以外,在系統官網上可點擊「Tracking HTTP API」查看到全部的參數,只不過都是英文的。 

 

上面用到的代碼已上傳至https://github.com/pwstrick/mypiwik,若有須要,可自行下載。

相關文章
相關標籤/搜索