Nifi表達式和自定義處理器(基於HDP)
Short Description:
Getting started with Nifi expression language and custom Nifi processors on HDP sandboxhtml
Article
Getting started with Nifi expression language and custom Nifi processors on HDP sandboxjava
This tutorial is part of a webinar for partners on Hortonworks DataFlow. The recording will be made available atgit
Background
- For a primer on HDF, you can refer to the materials here to get a basic background
- A basic tutorial on using Nifi on HDP sandbox is also available here
Goals
- Build Nifi flow to analyze Nifi's network traffic using tcpdump. Use Expression Language to extract out source/target IPs/ports
- Build and use custom tcpdump processor to filter Nifi's source/target IPs/ports on HDP sandbox
- Note that:
- Nifi can be installed independent of HDP
- The custom processor also can be built on any machine where Java and eclipse are installed
- Sandbox is being used for demo purposes, to have everything in one place
Pre-Requisites: Install Nifi on sandbox
- The lab is designed for the HDP Sandbox. Download the HDP Sandbox here, import into VMWare Fusion and start the VM
- After it boots up, find the IP address of the VM and add an entry into your machines hosts file e.g.
- 192.168.191.241 sandbox.hortonworks.com sandbox
- Connect to the VM via SSH (root/hadoop), correct the /etc/hosts entry
- ssh root@sandbox.hortonworks.com
- Deploy Nifi Ambari service on sandbox by running below
- VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`
- sudo git clone https://github.com/abajwa-hw/ambari-nifi-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/NIFI
- #sandbox
- service ambari restart
- #non sandbox
- service ambari-server restart
- To install Nifi, start the 'Install Wizard': Open Ambari (http://sandbox.hortonworks.com:8080) then:
- On bottom left -> Actions -> Add service -> check NiFi server -> Next -> Next -> Change any config you like (e.g. install dir, port, setup_prebuilt or values in nifi.properties) -> Next -> Deploy. This will kick off the install which will run for 5-10min.
- Once installed, launch Nifi by opening http://sandbox.hortonworks.com:9090/nifi
Steps
Explore tcpdumpgithub
- Tcpdump is a common packet analyzer that runs under the command line. It allows the user to display TCP/IP and other packets being transmitted or received over a network to which the computer is attached. Full details can be found here
- To install tcdump on sandbox:
- yum install -y tcpdump
- tcpdump -n -nn
- On sandbox, this will output something like below for each network connection being made, showing:
- which socket (i.e. IP/port) was the source (to the left of >) and
- which was the target (to the right of >)
- 08:16:15.878652 IP 192.168.191.1.49270 > 192.168.191.144.9090: Flags [.], ack 2255, win 8174, options [nop,nop,TS val 1176961367 ecr 32747195], length 0
- In the example above:
- the source machine was 192.168.191.1 (port 49270) and
- the target machine was 192.168.191.144 (port 9090)
- Note that since Nifi is running on port 9090, by monitoring traffic to port 9090, we will be able to capture connections made by Nifi
Build tcpdump flow using ExecuteProcess and ELweb
Build custom processor for tcpdump express
- Setup your sandbox for development by using VNC Ambari service to install VNC/eclipse/maven
- Download Ambari service for VNC (details below)
- VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - \([0-9]\.[0-9]\).*/\1/'`
- sudo git clone https://github.com/hortonworks-gallery/ambari-vnc-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/VNCSERVER
- service ambari restart
- Once the status of HDFS/YARN has changed from a yellow question mark to a green check mark...
- Setup Eclipse on the sandbox VM and remote desktop into it using an Ambari service for VNC
- In Ambari open, Admin > Stacks and Services tab. You can access this viahttp://sandbox.hortonworks.com:8080/#/main/admin/stack/services
- Deploy the service by selecting:
- VNC Server -> Add service -> Next -> Next -> Enter password (e.g. hadoop) -> Next -> Proceed Anyway -> Deploy
- Make sure the password is at least 6 characters or install will fail
- Connect to VNC from local laptop using a VNC viewer software (e.g. Tight VNC viewer or Chicken of the VNC or just your browser). Detailed steps here
- (Optional): To install maven manually instead:
- curl -o /etc/yum.repos.d/epel-apache-maven.repo https://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo
- yum -y install apache-maven-3.2*
- In general, when starting a new project you would use the mvn archetype to create a custom processor. Details here:https://cwiki.apache.org/confluence/display/NIFI/Maven+Projects+for+Extensions
- Command to run the wizard:
- cd /tmp
- mvn archetype:generate -DarchetypeGroupId=org.apache.nifi -DarchetypeArtifactId=nifi-processor-bundle-archetype -DarchetypeVersion=0.2.1 -DnifiVersion=0.2.1
- Sample inputs to generate a maven project archetype skeleton.
- Define value for property 'groupId': : com.hortonworks
- Define value for property 'artifactId': : nifi-network-processors
- Define value for property 'version': 1.0-SNAPSHOT: :
- Define value for property 'artifactBaseName': : network
- Define value for property 'package': com.hortonworks.processors.network: :
- This will create an archetype maven project for a custom processor with the package name, artifactId, etc specified above.
- In this case we will download a previously built sample and walk through what changes you would need to make to the archetype to create a basic custom processor
- cd
- sudo git clone https://github.com/abajwa-hw/nifi-network-processor.git
- ls -la ~/nifi-network-processor/nifi-network-nar/target/nifi-network-nar-1.0-SNAPSHOT.nar
- Deploy the nar into Nifi: copy the compiled nar file into Nifi lib dir and correct permissions
- cp ~/nifi-network-processor/nifi-network-nar/target/nifi-network-nar-1.0-SNAPSHOT.nar /opt/nifi-1.0.0.0-7/lib/
- chown nifi:hadoop /opt/nifi-1.0.0.0-7/lib/nifi-network-nar-1.0-SNAPSHOT.nar
Further readingapp