HDInsight － 1，簡介

時間 2019-11-06

標籤 hdinsight 簡介简体版

原文原文鏈接

最近工做須要，要看HDInsight部分，這裏要作筆記。天然是官網資料最權威，因此內容都從這裏搬過來：https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-introduction/ web

Hadoop on HDInsight

搞大數據，都知道Hadoop，那麼HDInsight和Hadoop啥關係呢？HDInsight是M$基於Azure的一個軟件架構，主要作大數據分析、管理用的，它使用了HDP（Hortonworks Data Platform）的Hadoop發行版。而後有點要注意，咱們講的Hadoop 通常指的是Hadoop的生態系統，包括Storm／Hbase等，而不僅僅是那個小大象。shell

HDInsight能夠理解爲是Apache Hadoop在微軟Azure上的一個實現，裏面包含了對應的Storm, HBase, Pig, Hive, Sqoop, Oozie, Ambari等等，固然，也捆綁了自家的Excel，SSAS，SSRS。apache

HDInsight支持兩種類型操做系統，Linux和M$本身的Windows，區別主要在這裏：架構

CATEGORY	HADOOP ON LINUX	HADOOP ON WINDOWS
Cluster OS	Ubuntu 12.04 Long Term Support (LTS)	Windows Server 2012 R2
Cluster Type	Hadoop	Hadoop, HBase, Storm
Deployment	Azure Management Portal, Azure CLI, Azure PowerShell	Azure Management Portal, Azure CLI, Azure PowerShell, HDInsight .NET SDK
Cluster UI	Ambari	Cluster Dashboard
Remote Access	Secure Shell (SSH)	Remote Desktop Protocol (RDP)

一些基本概念及定義框架

Hadoop (the "Query" workload): Provides reliable data storage with HDFS, and a simple MapReduce programming model to process and analyze data in parallel.dom
HBase (the "NoSQL" workload): A NoSQL database built on Hadoop that provides random access and strong consistency for large amounts of unstructured and semi-structured data - potentially billions of rows times millions of columns. See Overview of HBase on HDInsight.機器學習
Apache Storm (the "Stream" workload): A distributed, real-time computation system for processing large streams of data fast. Storm is offered as a managed cluster in HDInsight. See Analyze real-time sensor data using Storm and Hadoop.分佈式

Ambari: Cluster provisioning, management, and monitoring.ide
Avro (Microsoft .NET Library for Avro): Data serialization for the Microsoft .NET environment.oop
Hive & HCatalog: Structured Query Language (SQL)-like querying, and a table and storage management layer.
Mahout: Machine learning.
MapReduce and YARN: Distributed processing and resource management.
Oozie: Workflow management.
Phoenix: Relational database layer over HBase.
Pig: Simpler scripting for MapReduce transformations.
Sqoop: Data import and export.
Tez: Allows data-intensive processes to run efficiently at scale.
ZooKeeper: Coordination of processes in distributed systems.