Over 10 years we help companies reach their financial and branding goals. Engitech is a values-driven technology agency dedicated.

Gallery

Contacts

411 University St, Seattle, USA

engitech@oceanthemes.net

+1 -800-456-478-23

Development

Working with Apache NIFI

What is Nifi?

Apache Nifi is a software used to automate the flow of data between systems. It is a Java program that runs within a Java virtual machine running on a server. When we talk about between the system, then what does it actually mean?

The concept is very simple. Suppose there is an application which produces some data, take an example of customer orders. Now, that application is just developed to issue an order. It will pass the data to some queue (like kafka) and will show confirmation message to the customer at that very moment.

Assume that order is in form of json format holding some basic info like product info, customer info and delivery options.

At this point, we can use the Nifi process to create an automate flow which can consume the data from that queue and send one copy of json data to any external API (may be inventory system), and can send same copy of json to store it in database (for order management system) and even can pass the same copy of any other nifi flow which is responsible for courier service which handles the transit process.

So NIFI is always listening and keeping an eye on the queue. The moment the data arrives, the whole process starts and process the data based on the flow designed by you.

Features of Nifi

Above is just an simple use case showing the capability of Apache Nifi, lets have a look at the other features:

  1. Highly Configurable
    Apache NiFi is highly configurable. This helps users achieve guaranteed delivery, high throughput, low latency, dynamic prioritization, back pressure and allows for modifying flows at runtime.
  2. Web-Based User Interface
    Apache NiFi provides an easy-to-use web-based user interface. Design, control, and feedback monitoring can all happen within the web UI with no need for other resources. This offers users a simple web-based interface, and seamless experience between design, control, feedback, and monitoring.
  3. Built-in Monitoring
    Apache NiFi provides a data provenance module to track and monitor data from beginning to the end of the flow. Developers can create their own custom processors and reporting tasks according to their needs.
  4. Support for Secure Protocols
    Apache NiFi also provides support for secure protocols such as SSL, HTTPS, SSH and a variety of other encryptions. This translates to a highly secure framework within a variety of complex enterprise environments.
  5. Good User & Role Management
    Apache NiFi supports user role management and can also be configured with LDAP for authorization. Administrators can set thresholds for various users to allow for viewing and modifying policies, access the controller, retrieve site-to-site details, or restrict users from accessing any and all functions.

How to install Nifi?

Before you install the Nifi, make sure you have installed java https://www.java.com/download/ie_manual.jsp  on your system and set the Java_Home variable in system variable.

So now coming to install NIFI, I have Windows OS, so you can download the nifi from this url: https://nifi.apache.org/download/

Once it’s downloaded, you will get a zip file, un-compress it and place it in some directory. I have kept it at c:\Nifi

Go inside the bin folder and run run-nifi.bat file.

Keep the batch file running and wait for few seconds or a minute or so and try accessing the url http://localhost:8080/nifi/ in browser.

Note: In case if you face any issues while installing and running the nifi, please try finding solution over web as the scope of this blog is restricted over creating the basic NIFI flow only.

Basic Terminologies

  • FlowFile: A FlowFile is the unit of ‘data’ in NiFi. It is the actual data which flows through the flow and on which action is performed. Example any json/xml data.
  • FlowFile Processor: Processors actually perform the work. It consumes the flowfile and performs operation on it. In Nifi there are many processors which performs specific tasks. Example to Fetch the data from a Kafka Queue we have ConsumeKafka processor, to insert any sql insert statement we have ExecuteSql processor. So, Firstly you need to understand your requirement then need to choose the processors accordingly. You can go through all the processor in the NIFI documentation at https://nifi.apache.org/documentation/v2/
  • Connection : Connections provide the actual linkage between processors. These act as queues and allow various processes to interact at differing rates. These queues can be prioritized dynamically and can have upper bounds on load, which enable back pressure.
  • Process Group: A Process Group is a specific set of processes and their connections, which can receive data via input ports and send data out via output ports. In this manner, process groups allow creation of entirely new components simply by composition of other components.
  • Flow : A Flow is the combination of Processors, Controller Services, Funnels, Ports, etc. that are connected through their relationships to move and/or process some data. A Flow is built and viewed through the NiFi Web UI on the grid.
  • Controller Service: A Controller Service is a encapsulation of functionality that is consumed or used by a Flow or Processor and does not operate independently. The same Controller Service can be shared by multiple Processors/Flows. An example is the DBCPConnectionPool, which encapsulates the functionality while connecting to Sql Database, but does not do anything on it’s own – it must be consumed by a supporting Record Processor.

Creating Simple Flow

So lets switch to creating a simple flow. In this example, I don’t have any external system which is pushing any data to NIfi Queue, so we will generate our own sample data in json format and will create a flow around it.

In actual scenario, there could be some application which throws the data to some queue like Kafka. As in real example, we have a Sitecore Application which throws the Contact and Interaction data (Json Data) into Kafka topic and from NIFI side, we are consuming the data from same kafka topic and saving into Sql Servers to generate reports.

Here in this example, lets have a sample json data like below:

{
"firstName":"Arjun",
"lastName": "Arora",
"shoppingList": [
                      {
                         "itemName": "Keyboard"
                      },
                      {
                         "itemName": "Mouse"
                      }
                ]
}

Above json data will act as a single flowfile which shows that a user Arjun has bought some items like keyboard and mouse. If you have to do in C# then you can de-serialize the json data into a data model and can use it to save into database. But here we have a NIFI flow which will solve our purpose.

So lets start with creating a process group first

Step 1: Create Process group

Named it as workspace and then again create a new Process group inside Workspace and named it as Public Demos. Inside this process group, I have created another with name Nifi-Demo-1.0. So the final namespace is like below:

You can choose whatever you wants to, but concept is it helps to manage the flows under a namespace. 

Step 2 : Create a Json flow file using GenerateFlowFile processor

Drag a Processor Icon onto grid, like below:

Select GenerateFlowFile from the list, just a side note, this is the window which contains all the processors in NIFI. Every processor has its own functionality and is used for specific purpose.

Our purpose to choose GenerateFlowFile is to create a sample flow file on which we will create our further flow. Double click the processor OR right click and select configure to go to Properties of the processor and use below settings:

Change value of Run Schedule from 0 to 10 sec, that means this processor will produce flowfile containing our custom json after every 10 sec. This allows me time to start and stop the processor so that a single flowfile can be generated for testing.

Put the json in value against Custom Text property.

Step 3: Now drag a new processor EvaluateJsonPath:

Before going to next step, connect the GenerateFlowFilepProcessor to EvaluateJsonPath processor:

This is the relationship set between these 2 processors, It means, on successfully generating of flowfile it will be passed to next processor.

Step 4: Set Properties in EvaluateJsonPath Processor and connect with previous processor with success relation

Double click on the processor and set the properties like below:

To add the property or I would say variables click on the + icon and add the property name and then add the json path for the key in the json which we have provided in generate flow file:

$.firstName => “firstName”

By doing this, firstName variable will contain the value which is in the firstName key I.e Arjun.

We are also auto terminating the other relations which we are not using, else processor will show error.

Step 5: Add New processor UpdateAttribute and connect with previous processor with matched relation.

This processor is used to update values for attribute we have created in EvaluateJsonPath

Set the configuration like below:

This actually means:

firstName = firstName + “ ” + lastName

After this step, based on the json we have provided it will contain

firstName =  Arjun Arora

Step 6 : Add new Processor SplitJson and connect with previous processor with success relation

Add new processor and configure it like below:

At this point we are going inside the loop, keep a note that shoppingList is an array and now we need to traverse into it to get the itemName.

Step 7: Add new Processor EvaluateJsonPath and connect with previous processor with split relation

Configure this processor like below:

We are retrieving the itemName from the json key and storing into new variable. Keep a note that we are inside the loop and json has the itemName key.

Step 8: Add last processor ReplaceText and connect with previous processor with matched relation

 Configure the processor like below:

Step 9: Running Final Flow, one by one or all.

Run the flow, You can run the processor one by one as well to know the progress. To view the flowfile, you can do it at every point in the relation box. You can right click on the relation and click on List Queue:

Click on View button

Conclusion:

NIFI could be the best solution which can run parallelly providing you alternative lightweight structure solving some complex problem in very easy and configurable way.

References:

https://nifi.apache.org/docs/nifi-docs/html/user-guide.html

 

×