使用機器學習檢測TLS 惡意加密流——業界調研有開源的數據集，包括惡意證書的，以及惡意tls pcap報文

時間 2019-11-15

標籤使用機器學習檢測 tls 惡意加密業界調研開源數據包括證書以及 pcap 報文欄目程序員峯會简体版

原文原文鏈接

2018 年的文章， Using deep neural networks to hunt malicious TLS certificates from：https://techxplore.com/news/2018-10-deep-neural-networks-malicious-tls.html 使用LSTM對惡意證書進行分類，準確率94% 下面是介紹。html

Moreover, encryption can give online users a false sense of security, as many web browsers display a green lock symbol when the connection to a website is encrypted, even when these websites are actually executing phishing attacks. To address these challenges, researchers are exploring new ways of detecting and responding to malicious online traffic.react

"We are seeing an increase in the sophistication of phishing attacks over the last 12 months," Alejandro Correa Bahnsen, one of the researchers who carried out the study, told TechXplore. "In particular, attackers started using web certificates to make end users believe that they are entering a secure website."ios

As there is currently no way to detect TLS certificates in the wild, the researchers developed a new method to identify the malicious use of web certificates, using deep neural networks. Essentially, their system uses the content of TLS certificates to successfully identify legitimate certificates and malicious ones.git

Using deep neural networks to hunt malicious TLS certificates

Neural network architecture to classify malicious certificates. Credit: Torroledo, Camacho & Bahnsen

"The use of web certificates by attackers is increasing the efficiency of their attacks, but at the same time, it leaves more traces of their actions," Bahnsen said. "With these additional data points, we created a deep neural network to find hidden malicious patterns in web certificates and use them to predict the legitimacy of a web site."web

Bahnsen and his colleagues evaluated their new method and compared it to an existing model, namely Splunk's support vector machines (SVM) algorithm. Their deep neural network used the text information contained in the certificate more effectively than SVM, identifying malware certificates with an accuracy of 94.87 percent (7 percent more than SVM) and phishing certificates with an accuracy of 88.64 percent (5 percent more than SVM).api

paper地址：http://delivery.acm.org/10.1145/3280000/3270105/p64-torroledo.pdf?ip=103.218.216.118&id=3270105&acc=ACTIVE%20SERVICE&key=5A3314F2D74B117C%2E5A3314F2D74B117C%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1545643147_bc7498b8c52551013f7810ed29a5d731瀏覽器

這樣說來，splunk也使用了SVM來檢測惡意加密流量？？？安全

該文章提到的相關工做研究：服務器

Traditional malware detection has been done either by manual methods or by analyzing the traffic payload using expert rules [30]. Unfortunately, those traditional methods cannot work with encrypted content. Recent work has focused on detecting malicious encrypted traffic by analyzing network connections in real time. This is done by investigating the encrypted malware communication with C2 servers, identifying the destination of such communication and then a DNS sinkhole is created to redirect the malware communication away from the C2 servers [27]. This represents a reactive approach because it must allow the malware to infect, propagate and execute its harmful action before it can be stopped. Furthermore, this approach needs to decrypt the communication in order to perform analysis of the malware’s content [34]. Another approach is based on certificate and IP address pivoting to keep track of threat actor infrastructure. Classification strategy for this approach is done by the use of internet scanning and blacklisting of IP addresses and certificates, so when a new connection is coming from any blacklisted IP or uses a known malicious certificate, the connection is classified as malicious [3, 29]. As machine learning starts to become a more popular technique for encrypted traffic analysis, other work has shifted focus to connection metadata analysis. These approaches can predict when a connection is potentially harmful and keep track of threat actor infrastructure [3, 4]. Most recent work avoids the pivoting and starts with a focus only on certificates by looking at digital certificates data. For example, researchers from the security company Splunk were able to achieve a 91% accuracy by classifying certificates used in malware activities by using a support vector machines (SVM) algorithm [32].網絡

[3] Blake Anderson and David McGrew. 2016. Identifying Encrypted Malware Traffic with Contextual Flow Data. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security (AISec ’16). ACM, New York, NY, USA, 35–46. https://doi.org/10.1145/2996758.2996768

[4] Blake Anderson and David McGrew. 2017. Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-Stationarity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’17). ACM, New York, NY, USA, 1723–1732. https://doi.org/10.1145/3097983.3098163

[32] Dave Herrald Ryan Kovar. 2018. The 「Hidden Empires」 of Malware. Retrieved June 18, 2018 from https://www.sans.org/summit-archives/file/summit-archive-1517253771.pdf

文章中還提到了釣魚網站的檢測，若是後續作的話，能夠看下。

此外，文中說到攻擊者一般也使用自簽名證書做爲免費生成的證書，由於它們快速而便宜生成。可是，經過使用此類證書，攻擊者能夠公開他們的意圖，讓他們容易被發現，追蹤並列入黑名單。

文中指出惡意軟件的簽名特徵：請記住，信息較少的證書更多可疑。攻擊者不會花時間或金錢購買並驗證證書，由於它可能會減小他們的收入和暴露他們的意圖。咱們注意到惡意軟件和網絡釣魚證書幾乎老是缺乏幾個信息域。當證書被自簽名並免費得到時，檢查有效期。咱們也注意到惡意軟件中重複了一些信息字段。看文章一個表格就知道：common name（CN）裏其實包含了大量的有趣信息：

Table 4: Most common CN found in certificates.

Malware Phishing Domain Name % Domain Name %

No CN 30.8% incapsula.com 1.8% example.com 8.5% localhost 1.4% localhost 6.0%

No CN 1.1% domain.com 4.6% Parallels Panel 0.7% www.example.com 1.1% localhost.localdomain 0.5%

證書驗證狀況：

Validation certificates by certificate category. DV OV EV No Validation

Legitimate 32.6% 4.0% 7.7% 55.7%

Phishing 9.0% 0.6% 0.01% 90.0%

Malware 9.7% 0.0% 0.0% 91.0%

補充：

目前主流的SSL證書主要分爲DV SSL 、 OV SSL 、EV SSL。

DV SSL
DV SSL證書是隻驗證網站域名全部權的簡易型（Class 1級）SSL證書，可10分鐘快速頒發，能起到加密傳輸的做用，但沒法向用戶證實網站的真實身份。

目前市面上的免費證書都是這個類型的，只是提供了對數據的加密，可是對提供證書的我的和機構的身份不作驗證。

OV SSL
OV SSL,提供加密功能,對申請者作嚴格的身份審覈驗證,提供可信身份證實。

和DV SSL的區別在於，OV SSL 提供了對我的或者機構的審覈，能確認對方的身份，安全性更高。

因此這部分的證書申請是收費的~

EV SSL
超安=EV=最安全、最嚴格超安EV SSL證書遵循全球統一的嚴格身份驗證標準，是目前業界安全級別最高的頂級 (Class 4級)SSL證書。

金融證券、銀行、第三方支付、網上商城等，重點強調網站安全、企業可信形象的網站，涉及交易支付、客戶隱私信息和帳號密碼的傳輸。

這部分的驗證要求最高，申請費用也是最貴的。

常見的頒發證書機構
賽門鐵克(Symantec)是 SSL/TLS 證書的領先提供商
中國金融認證中心(CFCA)全球信任SSL證書
GeoTrust是全球第二大數字證書頒發機構

文章模型提取的特徵：

Feature Name Description Category

SubjectCommonNameIp Indicates if CN is an IP address instead of domain Boolean

Is_extended_validated Indicates if certificate is extended validated Boolean

Is_organization_validated Indicates if certificate is organization validated Boolean Is_domian_validated Indicates certificate is domain validated Boolean SubjectHasOrganization Indicates if subject principal has O field Boolean IssuerHasOrganization Indicates if issuer principal has O field Boolean SubjectHasCompany Indicates if subject principal has CO field Boolean IssuerHasCompany Indicates if issuer principal has CO field Boolean SubjectHasState Indicates if subject principal has ST field Boolean IssuerHasState Indicates if issuer principal has ST field Boolean SubjectHasLocation Indicates if subject principal has L field Boolean IssuerHasLocation Indicates if issuer principal has L field Boolean Subject_onlyCN Indicates if subject principal has only CN field Boolean Subject_is_com Indicates if subject CN is a 」.com」 domain Boolean Issuer_is_com Indicates if issuer CN is a 」.com」 domain Boolean HasSubjectCommonName Indicates if CN is present in subject principal Boolean HasIssuerCommonName Indicates if CN is present in issuer principal Boolean Subject_eq_Issuer Boolean indicating if Subject Principal = Issuer Principal Boolean SubjectElements Number of details present in subject principal Splunk IssuerElements Number of details present in issuer principal Splunk SubjectLength Number of characters of whole subject principal string Splunk IssuerLength Number of characters of whole issuer principal string Splunk ExtensionNumber Number of extensions contained in the certificate Splunk Selfsigned Indicates if certificate is self signed SOC Is_free Indicates if the certificate is free generated SOC DaysValidity Calculated days between not before and not after days SOC Ranking_C Calculated ranking of domain based on domain ranking SOC SubjectCommonName Calculated character entropy in the subject CN text Euclidian_Subject_Subjects Calculated euclidean distance of subject among all subjects Text Euclidian_Subject_English Calculated euclidean distance of subject characters among English characters Text Euclidian_Issuer_Issuers Calculated euclidean distance of issuer among all issuers Text Euclidian_Issuer_English Calculated euclidean distance of issuer characters among English characters Text Ks_stats_Subject_Subjects Kolmogorov-Smirnov statistics for subject in subjects Text Ks_stats_Subject_English Kolmogorov-Smirnov statistic for subject in English characters Text Ks_stats_Issuer_Issuers Kolmogorov-Smirnov statistics for issuers in issuers Text Ks_stats_Issuer_English Kolmogorov-Smirnov statistic for issuer in English characters Text Kl_dist_Subject_Subjects Kullback-Leiber Divergence for subject in subjects Text Kl_dist_Subject_English Kullback-Leiber Divergence for subject in English characters Text Kl_dist_Issuer_Issuers Kullback-Leiber Divergence for issuer in Issuers Text Kl_dist_Issuer_English Kullback-Leiber Divergence for issuer in English characters Text

有點多。

樣本和實驗數據：

To train our classification models, a dataset of legitimate, phishing and malware certificates is created. The phishing certificates come from Vaderetro an internal feed that gave us confirmed phishing certificates. We also extracted malware certificates from abuse.ch project and censys.io , they gave us blacklisted certificates and pem files. Finally, legitimate certificates came from Alexa top one million5 rank who provided us with those website certificates. Our dataset has a total of 5,000 phishing certificates, 3,000 malware certificates and 1,000,000 legitimate certificates. 比後面splunk的svm感受仍是要完善些。樣本更多了。

https://www.sans.org/cyber-security-summit/archives/file/summit-archive-1517253771.pdf 使用Slpunk作SSL惡意檢測的：本質上仍是在使用SSL證書進行檢測，你看他的特徵就知道了。

能夠查看ssl證書安全性的網站：

https://censys.io/certificates?q=ee5efc7223434aee0547df8914873463038cb93d

SSL數據集：https://opendata.rapid7.com/sonar.ssl/ 全網的SSL數據

October 30, 2013 – Present ▶ Raw size • Entire data set: 315 GB compressed (as of 02JAN2017) • Weekly: ~1.5 - 2.0 GB compressed ▶ Entire data set indexed in Splunk: ~1.2TB ▶ Scan the entire Internet (TCP/443 only) ▶ Comprised of: • Observed certificates * • Observed IP address / certificate * • Names (FQDNS) • Endpoints

https://sslbl.abuse.ch/blacklist/sslblacklist.csv 這個是目前探測到的惡意SSL sha1 哈哈，這下就知道如何作分類了吧！！！我看splunk是提取以下特徵：

Features

Number of certificate extensions

Number of Issuer elements

Number of Subject elements

Length of Extensions

Length of Issuer

Length of Subject Shannon

Entropy of Subject Common Name

使用splunk 語句：

index=*blcertdetails | spath | eval sha1=coalesce(sha1, hash) | lookup sslblacklist.csv sha1 | eval blacklist=case(isnull(reason), "False", true(), "True") | spath input=_raw output=extlist path="extensions" | eval extlist=replace(extlist,"[\{\}]", "") | eval extlen=len(extlist) | makemv delim="\", \"" extlist | eval extcount=mvcount(extlist) | spath input=_raw output=isslist path="issuer" | eval isslist=replace(isslist,"[\{\}]", "") | eval isslen=len(isslist) | makemv delim="\", \"" isslist | eval isscount=mvcount(isslist) | spath input=_raw output=sublist path="subject" | eval sublist=replace(sublist,"[\{\}]", "") | eval sublen=len(sublist) | makemv delim="\", \"" sublist | eval subcount=mvcount(sublist) | `ut_shannon(subject.CN)` | fillnull value=0 ut_shannon | eval subcnshannon=ut_shannon | table sha1 blacklist reason extcount extlen isscount isslen subcount sublen subcnshannona

模型：

Categorical Prediction Algorithm Accuracy FP Rate

Logistic Regression 0.75 24.90%

Support Vector Machine (SVM) 0.91 4.90%

Random Forest Classifier 0.91 8.10%

Gaussian Naive Bayes (GaussianNB) 0.71 18.40%

Decision Tree Classifier 0.91 9.80%

看來仍是SVM要好。

來看看思科的文章：

Machine Learning for Encrypted Malware Traic Classification: Accounting for Noisy Labels and Non-Stationarity 連接：http://delivery.acm.org/10.1145/3100000/3098163/p1723-anderson.pdf?ip=103.218.216.118&id=3098163&acc=ACTIVE%20SERVICE&key=5A3314F2D74B117C%2E5A3314F2D74B117C%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1545706495_c9e1206db619ebde472f6a6238fd7c22

machine learning on the encrypted network session’s metadata is a natural solution. While not applied directly to detecting threats in encrypted trac, this basic formula of machine learning and network metadata has been well-researched [6, 24, 31]. Unfortunately, these solutions have been slow to materialize as viable methods for real-world threat detection, and some critics have rightfully called into question the applicability of machine learning for this problem domain [25, 37]. 說使用AI進行加密流量檢測難以在工業應用，緣由是

Suitable false positive rates, while still maintaining high true positive rates on novel threats, has been dicult to achieve. In this paper, we highlight two primary reasons why this is the case: inaccurate ground truth and non-stationarity in network data. The most straightforward method to acquire labeled data for training is to use a sandbox environment to run malware and collect the sample’s associated packet capture les for positively-labeled, malicious data, and to monitor a network and collect all connections for negatively-labeled, benign data. For the benign case, even after ltering the dataset using an IP blacklist [13], there will typically be a non-negligible percentage of network trac that would be considered suspicious. For the malicious case, malware samples often perform connectivity checks, or other inherently benign activities. It is nearly impossible to identify all of these cases, and this must be taken into account when using supervised learning。緣由就是攻擊不對等，你不知道全部的攻擊狀況。就算是你在沙箱裏跑了黑樣本，比拿到白樣本，而後訓練模型，但仍是會有大量的正常流量爲識別爲惡意的。

Malware typically performs connectivity checks by
visiting a standard website, e.g., https://www.google.com. 提到惡意軟件可能會訪問知名網站。

特徵： But, if additional features about the connection are included, such as the TLS handshake metadata, it becomes possible to distinguish these

two cases because the TLS features provide information about the originating client. TLS的握手信息比較有用。

數據提取：一年採集樣本，使用沙箱（不一樣地理位置）來獲取惡意tls流，Our analysis is based on millions of TLS encrypted sessions collected over 12 months from a commercial malware sandbox and two geographically distinct, large enterprise networks.

Detecting malware even when it is encrypted 下載地址：https://2018.bsidesbud.com/wp-content/uploads/2018/03/seba_garcia_frantisek_strasak.pdf 他們使用開源的數據集來進行惡意ssl的識別，模型使用xgboost和RF，svm，MLP

惡意軟件的pcap包數據集：

Dataset ● CTU-13 dataset - public ○ Malware and Normal captures ○ 13 Scenarios. 600GB pcap ○ https://www.stratosphereips.org/datasets-ctu 13/ ● MCFP dataset - public ○ Malware Capture Facility Project. (Maria Jose Erquiaga) ○ 340 malware pcap captures ○ https://stratosphereips.org/category/dataset. html ● Own normal dataset - public ○ 3 days of accessing to secure sites (Alexa 1000) ○ Google, Facebook, Twitter accounts ○ https://stratosphereips.org/category/data set.html ● Normal CTU dataset - almost public ○ Normal captures ○ 22 known and trusted people from department of FEE CTU

https://www.stratosphereips.org/datasets-malware/ 我擦，發現還真能夠下載！！！

特徵：

Top 7 most discriminant features 1. Certificate length of validity 2. Inbound and outbound packets 3. Validity of certificate during the capture 4. Duration 5. Number of domains in certificate (SAN DNS) 6. SSL/TLS version 7. Periodicity

效果：

XGBoost ○ Cross validation accuracy: 92.45% ○ Testing accuracy: 94.33% ○ False Positive Rate: 5.54% ○ False negative rate: 10.11% ○ Sensitivity: 89.89% ○ F1 Score: 46.96 %

● Random Forest ○ Cross validation accuracy: 91.21% ○ Testing accuracy: 95.65% ○ False Positive Rate: 4.05% ○ False negative rate: 14.82% ○ Sensitivity: 85.18% ○ F1 Score: 52.24%

重要發現：

Malware and Certificates ● Certificates used by Malware in Alexa 1000 ~ 50% ● Certificates used by Normal in Alexa 1000 ~ 30%

The certificates used by Malware are mostly from normal sites! 惡意軟件使用的證書居然在alexa中使用！！！

Detecting Malignant TLS Servers Using Machine Learning Techniques https://arxiv.org/ftp/arxiv/papers/1705/1705.09044.pdf

摘要：

TLS使用X.509證書進行服務器身份驗證。 X.509證書是一個複雜的文檔，在建立/使用它時可能會出現各類無辜的錯誤。此外，許多證書屬於惡意網站，應該被客戶拒絕，不該訪問這些Web服務器。一般，當客戶端使用傳統測試發現可疑的證書時，它會要求人爲干預。可是，查看證書，大多數人沒法區分惡意網站和非惡意網站。所以，一旦傳統的證書驗證失敗，咱們使用機器學習技術來使網絡瀏覽器決定證書所屬的服務器是不是惡性的，便是否應訪問網站或不。一旦證書在上述階段被接受，咱們發現該網站可能仍然是惡意的。所以，在第二階段，咱們在沙箱中下載部分網站而不對其進行解密，並觀察TLS加密流量（在沙箱中捕獲的加密惡意數據不會損害系統）。因爲握手完成後流量被加密，所以不能採用傳統的模式匹配技術。所以，咱們使用流量的流量特徵以及上述第一階段中使用的特徵。咱們將這些功能與在TLS握手期間得到的未加密的TLS頭信息結合起來，並在機器學習分類器中使用這些信息來識別流量是不是惡意的。——先使用決策樹來斷定證書是否可疑，而後再使用貝葉斯網絡看tls流量是否惡意！！！

在這個文檔裏，提到：

According to [9], a subordinate CA of ANSSI issued an intermediate certificate that they installed on a network monitoring device, which enabled the device to act as a MITM of domains or websites that the certificate holder did not own or control. In early 2011, a hacker hacked the DigiNotar CA and issued certificates for *.google.com, *.skype.com and *.*.com, as well as few intermediate CA certificates carrying the names of well-known roots. The *.google.com certificate was used to launch a MITM attack against Gmail users in Iran. That is, the attackers were able to create both CA and leaf certificates through an existing CA. [16] describes this attack.

2011年初，一名黑客攻擊了DigiNotar CA，併爲* .google.com，* .skype.com和*。*。com頒發了證書，以及一些帶有衆所周知根源的中間CA證書。 * .google.com證書用於針對伊朗的Gmail用戶發起MITM攻擊。也就是說，攻擊者可以經過現有CA建立CA和葉證書。 [16]描述了這種攻擊。我擦，黑客攻擊CA服務器竊取證書用於攻擊。。。是否是和上面的說的一回事？？？

文章惡意流量識別提取的特徵（使用貝葉斯網絡）：

1. Features of Classifier of Phase 1: The above features used in Phase 1 are also used in Phase 2. They are the reasons for the certificate failing the traditional certificate validation and whether the server certificate is self-signed.

2. Flow Metadata: Traditional flow data are the first set of additional features for the classifier. They are the number of inbound bytes, outbound bytes, inbound packets, outbound packets; the source and destination ports; and the total duration of the flow in seconds.

3. Packet Lengths and Packet Inter – Arrival Times: Minimum, Maximum, Mean and Standard Deviation of Packet Lengths and Minimum, Maximum, Mean and Standard Deviation of Packet Inter – Arrival Times during the duration of flow are taken as the second set of additional features for the classifier.

4. Unencrypted TLS Header Information exchanged during TLS Handshake: 4a. Critical extensions: Malicious servers rarely select TLS extensions. Legitimate servers select different TLS extensions. 0Xff01 (renegotiation_info) and 0x000b (ec_point_formats) are most common. Usually, 21 unique extensions are observed, most of them in legitimate traffic. A binary vector of length 21 was created with a true (1) if extension is present and a false (0) if it is absent. 4b. Weak ciphersuite: Approx 90% of the malicious servers use one of the following ciphersuite: 0x000a (TLS_RSA_WITH_3DES_EDE_CBC_SHA), 0x0004 (TLS_RSA_WITH_RC4_128_MD5), 0x006b (TLS_DHE_RSA_WITH_AES_256_CBC_SHA256) and 0x0005 (TLS_RSA_WITH_RC4_128_SHA). TLS_RSA_WITH_RC4_MD5 and TLS_RSA_WITH_RC4_128_SHA are considered weak. A numeric value is assigned to ciphersuite to identify which ciphersuite server will be used. It helps identify malicious traffic.

可能有用的參考文獻：

[3] Sheffer, Y., Holz, R., Saint-Andre, P.: Summarizing Known Attacks on Transport Layer Security (TLS) and Datagram TLS (DTLS) (2015), RFC 7457

[10] Anderson, B., Paul, S., McGrew, D.: Deciphering Malware's use of TLS (without Decryption). In: arXiv:1607.01639v1 (2016)

Deciphering Malware's use of TLS (without Decryption) 這個文章也是思科寫的，文中說惡意軟件一般使用較舊並且是較弱的加密算法：

There is an FAQ section in the opensourced Zeus/Zbot malware [3] where the following question and answer occur (content left as is):

Question: Why traffic is encrypted with symmetric encryption method (RC4), but not asymmetric (RSA)?

Answer: Because, in the use of sophisticated algorithms it makes no sense, encryption only needs to hide traffic.

In the current privacy climate, this attitude most certainly does not hold for enterprise network traffic [4], [26]從TLS客戶端和TLS服務器的角度概述了惡意軟件對TLS的使用與企業網絡的不一樣之處。

當限制爲單個加密流時，咱們可以實現家庭歸屬問題的準確率爲90.3％，當咱們在5分鐘窗口內使用全部加密流時，準確率爲93.2％。咱們使用商業沙箱環境來收集惡意軟件樣本網絡活動的前五分鐘。咱們從這些樣本中收集了數以萬計的獨特惡意軟件樣本和數十萬個惡意加密流。表I列出了2015年8月至2016年5月期間收集的惡意軟件樣本的5個最經常使用的TLS端口。爲了肯定流是否爲TLS，咱們使用深度數據包檢查和基於TLS版本的自定義簽名和clientHello和serverHello消息的消息類型。總的來講，咱們在203個惟一端口中發現了229,364個TLS流，而端口443是迄今爲止最多見的惡意TLS端口。雖然惡意軟件中端口使用的多樣性很大，但這些不一樣的端口相對不常見。鑑於咱們的非惡意軟件數據是在企業網絡上收集的，所以本文中提供的分類和分類結果最適用於企業環境。咱們並未聲稱這些結果適用於通常類別的網絡，例如服務提供商數據。本文中使用的企業網絡數據最初使用衆所周知的IP黑名單進行過濾[10]。這刪除了~0.05％的初始流量。

能夠看到，真正收集惡意流量成功的就只要2萬多條。

Summary of the malicious families used in our analysis. We collected 18 malicious families, 5,623 malicious samples, and 25,793 encrypted flows that successfully negotiated the TLS handshake and sent application data.

We also analyzed information from the servers’ certificates. As anticipated, we found that enterprise endpoints most frequently connected to servers with the following certificate subjects: • *.google.com • api.twitter.com • *.icloud.com • *.g.doubleclick.net • *.facebook.com This distribution of certificate subjects was very long tailed. The certificate subjects of servers that the malware samples communicated with also had a long tail. These certificates were mostly composed of subjects that had characteristics of a domain generation algorithm (DGA) [6], e.g., www.33mhwt2j.net. Although malware mostly communicated with servers that had suspicious certificate subjects, it is also clear that malware communicates with many inherently benign servers, e.g., google.com for connectivity checks or twitter.com for command and control. The following certificate subjects were the most frequent for TLS flows initiated by malware: • block.io • *.wpengine.com • *.criteo.com

惡意SSL證書會僞造subject name啊：

Malware Number Unique Number of Selected Certificate Family of Flows Server IPs SS Certs Ciphersuite Subject

Bergat 332 12 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA www.dropbox.com

Deshacop 129 38 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.onion.to

Dridex 103 10 89 TLS_RSA_WITH_AES_128_CBC_SHA amthonoup.cy

Dynamer 372 155 3 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 www.dropbox.com

Kazy 1152 225 52 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.onestore.ms

Parite 275 128 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.google.com 假裝成google的

Razy 564 118 16 TLS_RSA_WITH_RC4_128_SHA baidu.com

Sality 1,200 323 4 TLS_RSA_WITH_3DES_EDE_CBC_SHA vastusdomains.com

Skeeyah 218 90 0 TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 www.dropbox.com

Symmi 2,618 700 22 TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA *.criteo.com

Tescrypt 205 26 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.onion.to

Toga 404 138 8 TLS_RSA_WITH_3DES_EDE_CBC_SHA www.dropbox.com

Upatre 891 37 155 TLS_RSA_WITH_RC4_128_MD5 *.b7websites.net

Virlock 12,847 1 0 TLS_DHE_RSA_WITH_AES_256_CBC_SHA256 block.io

Virtob 511 120 0 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.g.doubleclick.net

Yakes 337 51 0 TLS_RSA_WITH_RC4_128_SHA baidu.com

Zbot 2,902 269 507 TLS_RSA_WITH_RC4_128_MD5 tridayacipta.com

Zusy 733 145 14 TLS_RSA_WITH_3DES_EDE_CBC_SHA *.criteo.com

TABLE IV: TLS server configurations for the servers most visited by the 18 malicious families. The certificate subject typically has a long tail, but only the most frequent is reported. The reported number of self-signed certificates is not necessarily related to the most popular certificate subject.

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。

使用機器學習檢測TLS 惡意加密流——業界調研***有開源的數據集，包括惡意證書的，以及惡意tls pcap報文***

使用機器學習檢測TLS 惡意加密流——業界調研有開源的數據集，包括惡意證書的，以及惡意tls pcap報文