關於ML.NET v0.5的發佈說明

在這個0.5版本中，咱們將TensorFlow模型評分做爲ML.NET的轉換類添加。這樣能夠在ML.NET實驗中使用現有的TensorFlow模型。社區提出的各類問題和反饋能夠在這裏找到。ios

做爲即將到來的ML.NET之路的一部分，咱們正在開發一種新的ML.NET API，它能夠提升靈活性和易用性。當新API準備得足夠好時，咱們計劃棄用當前的LearningPipelineAPI。由於這將是一個重大變化，本文末尾分享咱們對多個API選項和比較的建議。
git

此博客文章提供了有關ML.NET中如下主題的詳細信息：github

在ML.NET v0.5中添加了TensorFlow模型評分轉換（TensorFlowTransform）算法
新的ML.NET API建議api

TensorFlow模型評分轉換（TensorFlowTransform）

TensorFlow是一種流行的深度學習和機器學習工具包，能夠訓練深度神經網絡（和通用數值計算）。網絡

深度學習是人工智能和機器學習的一個子集，它教授程序來作人類天然而然的事情：經過實例學習。
與傳統機器學習相比，它的主要區別在於深度學習模型能夠學習直接從圖像，聲音或文本中執行對象檢測和分類任務，甚至能夠提供語音識別和語言翻譯等任務，而傳統的ML方法則嚴重依賴於特徵工程和數據處理。
深度學習模型須要經過使用包含多個層的大量標記數據和神經網絡進行訓練。它目前的流行是由幾個緣由引發的。首先，它在計算機視覺等一些任務上表現更好第二，由於它能夠利用如今變得可用的大量數據（而且須要該量以便表現良好）。架構

使用ML.NET 0.5，咱們開始在ML.NET中添加對深度學習的支持。今天，咱們經過新引進與TensorFlow在ML.NET整合的第一級TensorFlowTransform這使得可以以現有的TensorFlow模型，不管是你訓練或從別的地方下載的，並獲得來自ML.NET的TensorFlow模型的分數。併發

這種新的TensorFlow評分功能不須要您具有TensorFlow內部細節的工做知識。從長遠來看，咱們將致力於使用ML.NET進行深度學習的體驗變得更加容易。框架

此轉換的實現基於TensorFlowSharp的代碼。機器學習

以下圖所示，您只需在.NET Core或.NET Framework應用程序中添加對ML.NET NuGet包的引用。在封面下，ML.NET包含並引用了本機TensorFlow庫，它容許您編寫加載現有訓練的TensorFlow模型文件以進行評分的代碼。

如下代碼段顯示瞭如何在ML.NET管道中使用TensorFlow轉換：

// ... Additional transformations in the pipeline code

pipeline.Add(new TensorFlowScorer()
{
    ModelFile = "model/tensorflow_inception_graph.pb",   // Example using the Inception v3 TensorFlow model
    InputColumns = new[] { "input" },                    // Name of input in the TensorFlow model
    OutputColumn = "softmax2_pre_activation"             // Name of output in the TensorFlow model
});

// ... Additional code specifying a learner and training process for the ML.NET model

您能夠在此處找到與上述代碼片斷相關的完整代碼示例TensorFlowTransform，使用TensorFlow Inception v3模型和現有LearningPipelineAPI。

上面的代碼示例使用名爲Inception v3的預先訓練的TensorFlow模型，您能夠從此處下載。在成立之初V3是受過訓練的很是流行的圖像識別模型ImageNet數據集，其中TensorFlow模型試圖整個圖像分紅千類，如「傘」，「澤西」和「廚房」。

該盜夢空間V3模型能夠被歸類爲深卷積神經網絡，能夠實現對硬盤的視覺識別任務，匹配或超過在某些領域人類行爲的合理性能。該模型/算法由多位研究人員根據原始論文開發：「從新思考計算機視覺的初始架構」，Szegedy等。人。

在下一個ML.NET版本中，咱們將添加功能，以便識別TensorFlow模型的預期輸入和輸出。目前，使用TensorFlow API或Netron等工具來探索TensorFlow模型。

若是您tensorflow_inception_graph.pb使用Netron打開上一個示例TensorFlow模型文件（）並瀏覽模型的圖形，您能夠看到它如何InputColumn與input圖形開頭的節點相關聯：

以及如何OutputColumn與softmax2_pre_activation節點的輸出相關聯幾乎在圖的末尾。

限制：咱們目前正在更新ML.NET API以提升靈活性，由於在今天的ML.NET中使用TensorFlow有一些限制。就目前而言（當使用LearningPipelineAPI時），這些分數只能LearningPipeline做爲輸入（數字向量）用於像分類器學習者這樣的學習者。可是，隨着即將推出的新ML.NET API，TensorFlow模型得分將能夠直接訪問，所以您可使用TensorFlow模型進行評分，而無需在此示例中實現添加額外的學習者及其相關的訓練過程。它使用與數字向量要素相關的標籤（對象名稱），基於StochasticDualCoordinateAscentClassifier建立多類分類ML.NET模型 TensorFlow模型爲每一個圖像文件生成/評分。

考慮到使用ML.NET提到的TensorFlow代碼示例正在使用v0.5中LearningPipeline提供的當前API。接下來，支持使用TensorFlow的ML.NET API將略有不一樣，而不是基於「pipeline」。這與此博客文章的下一部分有關，該部分重點介紹即將推出的ML.NET新API。

最後，咱們還要強調ML.NET框架目前正在出現TensorFlow，但將來咱們可能會考慮其餘深度學習庫集成，例如Torch和CNTK。

您能夠在此處使用TensorFlowTransform現有LearningPipelineAPI 查找其餘代碼示例/測試。

探索即將推出的新ML.NET API（0.5以後）並提供反饋

正如本文開頭所提到的，咱們很是期待在製做ML.NET時建立新的ML.NET API時獲得您的反饋。ML.NET的這種發展提供了比當前LearningPipelineAPI提供的更靈活的功能。在LearningPipeline當這個新的API準備和足夠好的API將被棄用。

如下連接到咱們以GitHub形式得到的一些示例反饋，這些反饋是關於使用LearningPipelineAPI 時的限制：

所以，基於LearningPipelineAPI的反饋，幾周前咱們決定切換到新的ML.NET API，以解決LearningPipelineAPI目前的大部分限制。

這個新ML.NET API的設計原則

咱們正在根據如下原則設計此新API：

使用與Scikit-Learn，TensorFlow和Spark等其餘知名框架並行的術語，咱們將嘗試在命名和概念方面保持一致，使開發人員更容易理解和學習ML.NET Core。
保持簡單和簡潔的ML場景，如簡單的訓練和預測。
容許高級ML場景（使用當前LearningPipelineAPI 沒法實現，以下一節所述）。

咱們還探索了諸如Fluent API，聲明性和命令式等API方法。
有關原則和所需方案的更深刻討論，請在GitHub中查看此問題。

爲何ML.NET正在從`LearningPipeline`API 切換到新的API？

做爲預覽版製做過程的一部分（請記住ML.NET仍處於早期預覽中），咱們一直在得到LearningPipelineAPI反饋，並發現了一些咱們須要經過建立更靈活的API來解決的限制。

具體來講，新的ML.NET API提供了當前LearningPipelineAPI 沒法實現的有吸引力的功能：

強類型API：這種新的強類型API利用了C＃功能，所以能夠在編譯時發現錯誤，同時改進編輯器中的Intellisense。
更好的靈活性：此API提供可分解的訓練和預測過程，消除了剛性和線性管道執行。使用新API，執行某個代碼路徑，而後分叉執行，以便多個路徑能夠重用初始公共執行。例如，與多個學習者和培訓師共享給定變換的執行和轉換數據，或分解管道並添加多個學習者。

這個新的API是基於概念，如Estimators，Transforms和DataView，在這篇博客文章下面的代碼所示。

改進的可用性：從代碼直接調用API，再也不須要腳手架或日照層，在用戶/開發人員編寫的內容和內部API之間建立模糊的分隔。入口點再也不是強制性的。
可以使用TensorFlow模型進行簡單評分。因爲API中提到的靈活性，您還能夠簡單地加載TensorFlow模型並使用它進行評分，而無需添加任何其餘學習者和培訓過程，如TensorFlow部分以前的「限制」主題中所述。
更好地查看轉換後的數據：在應用變換器時，您能夠更好地查看數據。

強類型API與`LearningPipeline`API的比較

另外一個重要的比較與新API中的強類型API功能有關。
做爲您沒有強類型API時能夠得到的問題的示例，LearningPipelineAPI（以下面的代碼所示）經過將列的名稱指定爲字符串來提供對數據列的訪問，所以若是您輸入錯字（即，寫了「Descrption」沒有'i'而不是「Description」，做爲示例代碼中的拼寫錯誤，你會獲得一個運行時異常：

pipeline.Add(new TextFeaturizer("Description", "Descrption"));

可是，當使用新的ML.NET API時，它是強類型的，所以若是你輸入錯誤，它將在編譯時捕獲，你也能夠在編輯器中使用Intellisense。

var estimator = reader.MakeEstimator()
                .Append(row => (                    
                    description: row.description.FeaturizeText()))

有關可分解列車和預測API的詳細信息

如下代碼片斷顯示瞭如何使用ML.NET中的新API實現「GitHub issue labeler」示例應用程序的轉換和培訓過程。

這是咱們當前的提案，根據您的反饋，此API可能會相應地發展。

新的ML.NET API代碼示例：

public static async Task BuildAndTrainModelToClassifyGithubIssues()
{
    var env = new MLEnvironment();

    string trainDataPath = @"Data\issues_train.tsv";

    // Create reader
    var reader = TextLoader.CreateReader(env, ctx =>
                                    (area: ctx.LoadText(1),
                                    title: ctx.LoadText(2),
                                    description: ctx.LoadText(3)),
                                    new MultiFileSource(trainDataPath), 
                                    hasHeader : true);

    var loss = new HingeLoss(new HingeLoss.Arguments() { Margin = 1 });

    var estimator = reader.MakeNewEstimator
        .Append(row => (
            // Convert string label to key. 
            label: row.area.ToKey(),
            // Featurize 'description'
            description: row.description.FeaturizeText(),
            // Featurize 'title'
            title: row.title.FeaturizeText()))
        .Append(row => (
            // Concatenate the two features into a vector and normalize.
            features: row.description.ConcatWith(row.title).Normalize(),
            // Preserve the label - otherwise it will be dropped
            label: row.label))
        .Append(row => (
            // Preserve the label (for evaluation)
            row.label,
            // Train the linear predictor (SDCA)
            score: row.label.PredictSdcaClassification(row.features, loss: loss)))
        .Append(row => (
            // Want the prediction, as well as label and score which are needed for evaluation
            predictedLabel: row.score.predictedLabel.ToValue(),
            row.label,
            row.score));

    // Read the data
    var data = reader.Read(new MultiFileSource(trainDataPath));

    // Fit the data to get a model
    var model = estimator.Fit(data);

    // Use the model to get predictions on the test dataset and evaluate the accuracy of the model
    var scores = model.Transform(reader.Read(new MultiFileSource(@"Data\issues_test.tsv")));
    var metrics = MultiClassClassifierEvaluator.Evaluate(scores, r => r.label, r => r.score);

    Console.WriteLine("Micro-accuracy is: " + metrics.AccuracyMicro);

    // Save the ML.NET model into a .ZIP file
    await model.WriteAsync("github-Model.zip");
}

public static async Task PredictLableForGithubIssueAsync()
{
    // Read model from an ML.NET .ZIP model file
    var model = await PredictionModel.ReadAsync("github-Model.zip");

    // Create a prediction function that can be used to score incoming issues
    var predictor = model.AsDynamic.MakePredictionFunction<GitHubIssue, IssuePrediction>(env);

    // This prediction will classify this particular issue in a type such as "EF and Database access"
    var prediction = predictor.Predict(new GitHubIssue
    {
        title = "Sample issue related to Entity Framework",
        description = @"When using Entity Framework Core I'm experiencing database connection failures when running queries or transactions. Looks like it could be related to transient faults in network communication agains the Azure SQL Database."
    });

    Console.WriteLine("Predicted label is: " + prediction.predictedLabel);
}

與如下LearningPipeline缺少靈活性的舊API代碼段相比較，由於管道執行不可分解可是線性：

舊的LearningPipelineAPI代碼示例：

public static async Task BuildAndTrainModelToClassifyGithubIssuesAsync()
{
        // Create the pipeline
    var pipeline = new LearningPipeline();

    // Read the data
    pipeline.Add(new TextLoader(DataPath).CreateFrom<GitHubIssue>(useHeader: true));

    // Dictionarize the "Area" column
    pipeline.Add(new Dictionarizer(("Area", "Label")));

    // Featurize the "Title" column
    pipeline.Add(new TextFeaturizer("Title", "Title"));

    // Featurize the "Description" column
    pipeline.Add(new TextFeaturizer("Description", "Description"));
    
    // Concatenate the provided columns
    pipeline.Add(new ColumnConcatenator("Features", "Title", "Description"));

    // Set the algorithm/learner to use when training
    pipeline.Add(new StochasticDualCoordinateAscentClassifier());

    // Specify the column to predict when scoring
    pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });

    Console.WriteLine("=============== Training model ===============");

    // Train the model
    var model = pipeline.Train<GitHubIssue, GitHubIssuePrediction>();

    // Save the model to a .zip file
    await model.WriteAsync(ModelPath);

    Console.WriteLine("=============== End training ===============");
    Console.WriteLine("The model is saved to {0}", ModelPath);
}

public static async Task<string> PredictLabelForGitHubIssueAsync()
{
    // Read model from an ML.NET .ZIP model file
    _model = await PredictionModel.ReadAsync<GitHubIssue, GitHubIssuePrediction>(ModelPath);
    
    // This prediction will classify this particular issue in a type such as "EF and Database access"
    var prediction = _model.Predict(new GitHubIssue
        {
            Title = "Sample issue related to Entity Framework", 
            Description = "When using Entity Framework Core I'm experiencing database connection failures when running queries or transactions. Looks like it could be related to transient faults in network communication agains the Azure SQL Database..."
        });

    return prediction.Area;
}

舊LearningPipelineAPI是徹底線性的代碼路徑，所以您沒法將其分解爲多個部分。
例如，BikeSharing ML.NET示例（在機器學習樣本GitHub repo中可用）正在使用當前的LearningPipelineAPI。

此示例使用評估程序API經過如下方式比較迴歸學習者的準確性：

執行多個數據轉換爲原始數據集
基於七種不一樣的迴歸訓練器/算法（如FastTreeRegressor，FastTreeTweedieRegressor，StochasticDualCoordinateAscentRegressor等）訓練和建立七種不一樣的ML.NET模型

目的是幫助您比較給定問題的迴歸學習者。

因爲這些模型的數據轉換是相同的，所以您可能但願重用與轉換相關的代碼執行。可是，因爲LearningPipelineAPI僅提供單個線性執行，所以您須要爲您建立/訓練的每一個模型運行相同的數據轉換步驟，如如下代碼摘錄自BikeSharing ML.NET示例所示。

var fastTreeModel = new ModelBuilder(trainingDataLocation, new FastTreeRegressor()).BuildAndTrain();
var fastTreeMetrics = modelEvaluator.Evaluate(fastTreeModel, testDataLocation);
PrintMetrics("Fast Tree", fastTreeMetrics);

var fastForestModel = new ModelBuilder(trainingDataLocation, new FastForestRegressor()).BuildAndTrain();
var fastForestMetrics = modelEvaluator.Evaluate(fastForestModel, testDataLocation);
PrintMetrics("Fast Forest", fastForestMetrics);

var poissonModel = new ModelBuilder(trainingDataLocation, new PoissonRegressor()).BuildAndTrain();
var poissonMetrics = modelEvaluator.Evaluate(poissonModel, testDataLocation);
PrintMetrics("Poisson", poissonMetrics);

//Other learners/algorithms
//...

BuildAndTrain（）方法須要同時具備數據轉換和每種狀況下的不一樣算法，如如下代碼所示：

public PredictionModel<BikeSharingDemandSample, BikeSharingDemandPrediction> BuildAndTrain()
{
    var pipeline = new LearningPipeline();
    pipeline.Add(new TextLoader(_trainingDataLocation).CreateFrom<BikeSharingDemandSample>(useHeader: true, separator: ','));
    pipeline.Add(new ColumnCopier(("Count", "Label")));
    pipeline.Add(new ColumnConcatenator("Features", 
                                        "Season", 
                                        "Year", 
                                        "Month", 
                                        "Hour", 
                                        "Weekday", 
                                        "Weather", 
                                        "Temperature", 
                                        "NormalizedTemperature",
                                        "Humidity",
                                        "Windspeed"));
    pipeline.Add(_algorythm);

    return pipeline.Train<BikeSharingDemandSample, BikeSharingDemandPrediction>();
}

使用舊LearningPipelineAPI，對於使用不一樣算法的每次培訓，您須要再次運行相同的過程，一次又一次地執行如下步驟：

從文件加載數據集
進行列轉換（連續，複製或其餘特徵或字典，若是須要）

可是，基於新的ML.NET API Estimators，DataView您將可以重用部分執行，就像在這種狀況下同樣，從新使用數據轉換執行做爲使用不一樣算法的多個模型的基礎。

您還能夠在此處使用新API探索其餘代碼示例。