2019年最新的機器學習項目

時間 2019-11-07

標籤 2019年最新機器學習項目简体版

原文原文鏈接

翻譯：瘋狂的技術宅
原文： https://www.edureka.co/blog/m...

更多文章請關注微信公衆號：硬核智能node

機器學習顯然是一個在近幾年裏瘋狂進步的領域。這一趨勢和進步爲該行業創造了許多就業機會。對機器學習工程師的需求很高，這種激增是因爲技術的發展和巨大的產生數據量大數據。在本文中，我將按如下順序討論你絕對應該知道和使用的機器學習項目：python

什麼是機器學習？
機器學習的步驟
機器學習的類型
行業用例
2019年開源機器學習項目

什麼是機器學習？

機器學習是一個概念，它容許機器從示例和經驗中進行學習，並且不用去明確的進行編程。所以你不是去寫代碼，而是須要將數據提供給通用算法，算法或機器會根據給出的數據構建邏輯。git

機器學習的步驟

任何機器學習算法都遵循一個共同的模式或步驟:github

收集數據：此階段涉及從各類來源收集全部相關數據算法

數據處理：對「原始數據」進行清洗並轉換爲方便處理的格式的過程spring

分析數據：經過分析對數據進行選擇和過濾，以準備模型所需的數據編程

訓練算法：算法在訓練數據集上進行訓練，經過該算法理解數據的模式和規則瀏覽器

測試模型：經過測試數據集來檢測所生成模型的準確性。微信

部署：若是模型的速度和準確性是可接受的，那麼該模型應該被部署在真實系統中。在根據其性能部署模型以後，若是性能降低，模型將會被從新訓練，而後更新和改進模型。網絡

機器學習的類型

機器學習分爲三類：

監督學習：使用算法來學習從輸入變量（x）到輸出變量（Y）的映射函數。

無監督學習：有時候給出的數據是非結構化和未標記的。因此很難把這些數據分到不一樣的類別中。無監督學習有助於解決這個問題，它利用基於統計的特性將輸入的數據進行聚類。

強化學習：爲了在特定狀況下獲得最大化獎勵而採起適當的行爲。

在強化學習方面，並無預期的產出。在執行給定任務時由加強代理決定要採起的操做。在沒有訓練數據集的狀況下，從其經驗中學習。

接下來讓咱們看一些可以幫助公司創造利潤的真實機器學習項目。

行業用例

1. MOTION STUDIO

領域：媒體

焦點：優化選擇過程

業務挑戰： Motion Studio 是歐洲最大的無線廣播節目製做公司。該公司的年收入超過十億美圓，他們決定推出一個新的真人秀節目：RJ Star。觀衆們對節目的反響是前所未有的，公司收到了大量的語音片斷。做爲 ML 專家，你必須將聲音分類爲男性或女性，以便可以更快的進行初選流程。

關鍵問題：語音樣本的音調。

商業利益：因爲 RJ Star 是一個真人秀，選擇候選人的時間很是短。整個節目的成功和利潤取決因而否可以快速和順利的執行。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

df = pd.read_csv('voice-classification.csv')
df.head()

# Check the no. of records
df.info()
 
df.describe()
 
df.isnull().sum()

print ("Shape of Data:" , df.shape)
print("Total number of labels: {}".format(df.shape[0]))
print("Number of male: {}".format(df[df.label == 'male'].shape[0]))
print("Number of female: {}".format(df[df.label == 'female'].shape[0]))

X=df.iloc[:, :-1]
print (df.shape)
print (X.shape)

from sklearn.preprocessing import LabelEncoder
y=df.iloc[:,-1]
 
gender_encoder = LabelEncoder()
y = gender_encoder.fit_transform(y)
 
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X)
X = scaler.transform(X)
 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=100)
 
from sklearn.svm import SVC
from sklearn import metrics
from sklearn.metrics import classification_report,confusion_matrix
 
svc_model=SVC()
svc_model.fit(X_train,y_train)
y_pred=svc_model.predict(X_test)
 
print('Accuracy Score:')
print(metrics.accuracy_score(y_test,y_pred))

print(confusion_matrix(y_test,y_pred))

2. LITHIONPOWER

領域：汽車

焦點：激勵駕駛員

業務挑戰： Lithionpower 是最大的電動汽車（e-vehicle）電池供應商。司機一般會爲一天的出行去租用電池，用公司充滿電的電池換下舊電池。 Lithionpower 根據司機的駕駛歷史記錄提供可變訂價模型。因爲電池壽命取決因而否超速和天天行駛的距離等因素，你做爲 ML 專家必須建立一個聚類模型，根據駕駛數據將駕駛員進行分組。

關鍵問題：將根據聚類狀況對司機進行激勵，所以分組必須準確。

商業利益：利潤增長高達15-20％，由於歷史記錄較差的司機將被收取更多的費用。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set() # for plot styling
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (12, 6)
 
df=pd.read_csv('driver-data.csv')
df.head()

df.info()
 
df.describe()

from sklearn.cluster import KMeans
 
#Taking 2 clusters
kmeans = KMeans(n_clusters=2)
df_analyze = df.drop('id',axis=1)
 
kmeans.fit(df_analyze)

kmeans.cluster_centers_

print (kmeans.labels_)
print (len(kmeans.labels_))

print (type(kmeans.labels_))
unique, counts = np.unique(kmeans.labels_, return_counts=True)
print(dict(zip(unique, counts)))

df_analyze['cluster'] = kmeans.labels_
sns.set_style('whitegrid')
sns.lmplot('mean_dist_day','mean_over_speed_perc',data=df_analyze, hue='cluster',
palette='coolwarm',size=6,aspect=1,fit_reg=False)

#Now, Let's check the clusters, when n=4
kmeans_4 = KMeans(n_clusters=4)
kmeans_4.fit(df.drop('id',axis=1))
kmeans_4.fit(df.drop('id',axis=1))
print(kmeans_4.cluster_centers_)
unique, counts = np.unique(kmeans_4.labels_, return_counts=True)
 
kmeans_4.cluster_centers_
print(dict(zip(unique, counts)))

df_analyze['cluster'] = kmeans_4.labels_
sns.set_style('whitegrid')
sns.lmplot('mean_dist_day','mean_over_speed_perc',data=df_analyze, hue='cluster',
palette='coolwarm',size=6,aspect=1,fit_reg=False)

3. BluEx

領域：物流

焦點：最佳路徑

業務挑戰： BluEx 是印度領先的物流公司。然而他們面臨的挑戰是其麪包車司機的投遞路線並不是最優。這致使投遞延遲和更高的燃料成本。做爲 ML 專家，你必須使用強化學習建立 ML 模型，以便經過該程序找到最佳路徑。

關鍵問題：數據有不少屬性，分類可能會很棘手。

商業利益：經過採用最佳路徑，節省的燃料成本可高達15％。

import numpy as np
import pylab as plt
import networkx as nx
 
#Initializing points
points_list = [(0,1), (1,5), (5,6), (5,4), (1,2), (2,3), (2,7)]
 
goal = 7
mapping={0:'Start', 1:'1', 2:'2', 3:'3', 4:'4', 5:'5', 6:'6', 7:'7-Destination'}
G=nx.Graph()
G.add_edges_from(points_list)
pos = nx.spring_layout(G,k=.5,center=points_list[2])
nx.draw_networkx_nodes(G,pos,node_color='g')
nx.draw_networkx_edges(G,pos,edge_color='b')
nx.draw_networkx_labels(G,pos)
plt.show()

NO_OF_POINTS = 8
 
#Inititlaizing R Matrix
R = np.matrix(np.ones(shape=(NO_OF_POINTS, NO_OF_POINTS)))
R *= -1

for point in points_list:
  print(point)
  if point[1] == goal:
    R[point] = 150
  else:
    R[point] = 0

  if point[0] == goal:
    R[point[::-1]] = 150
  else:
    # reverse of point
    R[point[::-1]]= 0

R[goal,goal]= 150
R

Q = np.matrix(np.zeros([NO_OF_POINTS,NO_OF_POINTS]))

# The learning parameter
gamma = 0.8
 
initial_state = 1
 
def available_actions(state):
  current_state_row = R[state,]
  av_act = np.where(current_state_row &amp;amp;amp;gt;= 0)[1]
  return av_act

available_act = available_actions(initial_state)

def sample_next_action(available_actions_range):
  next_action = int(np.random.choice(available_act,1))
  return next_action

action = sample_next_action(available_act)

def update(current_state, action, gamma):
  max_index = np.where(Q[action,] == np.max(Q[action,]))[1]

  if max_index.shape[0] &amp;amp;amp;gt; 1:
      max_index = int(np.random.choice(max_index, size = 1))
  else:
    max_index = int(max_index)
    max_value = Q[action, max_index]

  Q[current_state, action] = R[current_state, action] + gamma * max_value
  print('max_value', R[current_state, action] + gamma * max_value)

  if (np.max(Q) &amp;amp;amp;gt; 0):
      return(np.sum(Q/np.max(Q)*100))
  else:
      return (0)

update(initial_state, action, gamma)

scores = []
for i in range(700):
  current_state = np.random.randint(0, int(Q.shape[0]))
  available_act = available_actions(current_state)
  action = sample_next_action(available_act)
  score = update(current_state,action,gamma)
  scores.append(score)
  print ('Score:', str(score))
 
print("Trained Q matrix:")
print(Q/np.max(Q)*100)
 
# Testing
current_state = 0
steps = [current_state]
 
while current_state != 7:
    next_step_index = np.where(Q[current_state,] == np.max(Q[current_state,]))[1]
  if next_step_index.shape[0] &amp;amp;amp;gt; 1:
    next_step_index = int(np.random.choice(next_step_index, size = 1))
  else:
    next_step_index = int(next_step_index)
  steps.append(next_step_index)
  current_state = next_step_index

print("Most efficient path:")
print(steps)
 
plt.plot(scores)
plt.show()

2019 年開源機器學習項目

Detectron： Detectron 是 Facebook AI Research 的軟件系統，它實現了最早進的物體檢測算法。它是用 Python 編寫的，由 Caffe2 深度學習框架提供支持。

Detectron 的目標是爲物體檢測研究提供高質量、高性能的代碼庫。它的宗旨在於靈活，以此支持新穎研究的快速實施和評估。它包含50多個預訓練模型。

項目連接：https://github.com/facebookre...

Denspose：它的功能是在將 RGB 圖片中的全部人物映射到 3D 人體模型的表面。 DensePose-RCNN 基於Detectron 框架中實現。

項目連接：https://github.com/facebookre...

TensorFlow.js： 它是一個用於開發和訓練 ML 模型並在瀏覽器中部署的庫。自時候發佈以來，它已成爲一個很是受歡迎的版本。有了它你就能夠：

在瀏覽器中進行機器學習：經過靈活直觀的API，可使用低級 JavaScript 線性代數庫或高級 API 從頭開始構建模型。
運行現有模型：使用 TensorFlow.js 的模型轉換器直接在瀏覽器中運行已有的 TensorFlow 模型。
從新訓練現有模型：使用鏈接到瀏覽器的傳感器數據或其餘客戶端數據從新訓練已有的 ML 模型。

項目連接：https://github.com/tensorflow...

Waveglow:
機器學習也在音頻處理方面取得了重大進步，它不只僅是產生音樂或進行分類。 WaveGlow 是 NVIDIA 的基於流的語音合成生成網絡。若是你想從頭開始訓練本身的模型的話，研究人員還列出了你能夠遵循的步驟。

項目連接：https://github.com/NVIDIA/wav...

Image Outpainting: 若是你只有有一個場景的半張圖片，可是想要完整的風景，Image Outpainting 能夠幫你作到。該項目是 Keras 基於斯坦福大學的圖像修復論文的實現。

這是一個全部機器學習愛好者必需要去嘗試的例子，逐步進行了詳細的解釋。就我的而言，這是我最喜歡的機器學習項目。

項目連接：https://github.com/bendangnuk...

Deep Painterly Harmonization：關於圖像方面，這個是傑做。這個算法的做用是：將圖像做爲輸入，若是向圖像添加外部元素，它會把該元素混合到原圖的環境中，就好像是它的一部分。

你能分辨出來嗎？很難吧？這向咱們展現了機器學習方面取得的最新進展。

項目連接：https://github.com/luanfujun/...

DeepMimic：仔細看看這裏的圖像，你會看到一個火柴人的形象在作迴旋踢踢、後空翻和側手翻。這是個人朋友正在增強學習。 DeepMimic 是一個基於物理學的角色技能引導方面的深度強化學習示例。

項目連接：https://github.com/xbpeng/Dee...

Magenta： Magenta 是一個研究機器學習在創造藝術和音樂過程當中的做用的研究項目。這主要涉及開發新的深度學習和強化學習算法，用來生成歌曲、圖像、繪圖等。

它也是構建智能工具和界面的探索，容許藝術家和音樂家用這些模型擴展（不是替換！）他們的創做過程。展開你的翅膀，爲 Instagram 或 Soundcloud 創造你獨特的內容，成爲一個有影響力的人。

項目連接：https://github.com/tensorflow...

這篇關於機器學習項目的文章到此就結束。試着運行這些例子，並在下面的評論部分告訴咱們。但願你能瞭解機器學習在不一樣行業中的實際應用。

更多文章請關注微信公衆號：硬核智能

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。