Some First Steps for a New NiFi Cluster

After installing Apache NiFi there are a few steps you might want to take before making your cluster available for prime time. None of these steps are required so make sure they are appropriate for your use-case before implementing them.

Lowering NiFi’s Log File Retention Properties

By default, Apache NiFi’s nifi-app.log files are capped at 100 MB per log file and NiFi retains 30 log files. If the maximum is reached that comes out to 3 GB of disk space from nifi-app.log files. That’s not a whole lot but in some cases you may need the extra disk space. Or, if an external service is already managing your NiFi log files you don’t need them hanging around any longer than necessary. To lower the thresholds open NiFi’s conf/logback.xml file. Under the appender configuration for nifi-app.log you will see a maxFileSize and maxHistory values. Lower these values, save the file, and restart NiFi to save disk space on log files. Conversely, if you want to keep more log files just increase those limits.

You can also make changes to the handling of the nifi-user.log and nifi-bootstrap.log files here, too. But those files typically don’t grow as fast as the nifi-app.log so they can often be left as-is. Note that in a cluster you will need to make these changes on each node.

Install NiFi as a Service

Having NiFi run as a service allows it to automatically start when the system starts and provides easier access for starting and stopping NiFi. Note that in a cluster you will need to make these changes on each node. To install NiFi as a system service, go to NiFi’s bin/ directory and run the following commands (on Ubuntu):

sudo ./nifi.sh install
sudo update-rc.d nifi defaults

You can now control the NiFi service with the commands:

sudo systemctl status nifi
sudo systemctl start nifi
sudo systemctl stop nifi
sudo systemctl restart nifi

If running NiFi in a container add the install commands to your Dockerfile.

Install (and use!) the NiFi Registry

The NiFi Registry provides the ability to put your flows under source control. It has quickly become an invaluable tool for NiFi. The NiFi Registry should be installed outside of your cluster but accessible to your cluster. The NiFi Registry Documentation contains instructions on how to install it, create buckets, and connect it to your NiFi cluster.

By default the NiFi Registry listens on port 18080 so be sure your firewall rules allow for the communication. Remember, you only need a single installation of the NiFi Registry per NiFi cluster. If you are using infrastructure-as-code to deploy your NiFi cluster make sure the scripts to deploy the NiFi Registry are outside the cluster scripts. You don’t want the NiFi Registry’s lifecycle to be tied to the NiFI cluster’s lifecycle. This allows you to create and teardown NiFi clusters without affecting your NiFi Registry. It also allows you to share your NiFi registry between multiple clusters if you need to.

Although using the NiFi Registry is not required to make a data flow in NiFI your life will be much, much easier if you do use the NiFi Registry, especially in an environment where multiple users will be manipulating the data flow.

Need more help?

We provide consulting services around AWS and big-data tools like Apache NiFi. Get in touch by sending us a message. We look forward to hearing from you!

Monitoring Apache NiFi with Datadog

One of the most common requirements when using Apache NiFi is a means to adequately monitor the NiFi cluster. Insights into a NiFi cluster’s use of memory, disk space, CPU, and NiFi-level metrics are crucial to operating and optimizing data flows. NiFi’s Reporting Tasks provide the capability to publish metrics to external services.

Datadog is a hosted service for collecting, visualizing, and alerting on metrics. With Apache NiFi’s built-in DataDogReportingTask, we can leverage Datadog to monitor our NiFi instances. In this blog post we are running a 3 node NiFi cluster in Amazon EC2. Each node is on its own EC2 instance.

Note that the intention of this blog post is not to promote Datadog but instead to demonstrate one potential platform for monitoring Apache NiFi.

If you don’t already have a Datadog account the first thing to do is to create one. Once done, the first thing you can do is install the Datadog Agent on your NiFi hosts. The command will look similar to the following except it will have your API key. In the command below we are installing the Datadog agent on an Ubuntu instance. If you just want to monitor NiFi-level metrics you can skip this step, however, we find the host-level metrics to be valuable as well.

DD_API_KEY=xxxxxxxxxxxxxxx bash -c "$(curl -L https://raw.githubusercontent.com/DataDog/datadog-agent/master/cmd/agent/install_script.sh)"

This command will download and install the Datadog agent on the system. The Datadog service will be automatically started and run automatically upon system start.

Creating a Reporting Task

The next step is to create a DataDogReportingTask in NiFi. In NiFi’s Controller Settings under Reporting Tasks, click to add a new Reporting Task and select Datadog. In the Reporting Tasks’ settings, enter your Datadog API key and change the other values as desired.

By default, NiFi reporting Tasks run every 5 minutes by default. You can change this period under the Settings tab under the “Run Schedule” if needed. Click Apply to save the reporting task.

The reporting task will now be listed. Click the Play icon to start the reporting task. Apache NiFi will now send metrics to Datadog every 5 minutes (unless you changed the Run Schedule value to a different interval).

Exploring NiFi Metrics in Datadog

We can now go to Datadog and explore the metrics from NiFi. Open the Metrics Explorer and enter “nifi” (without the quotes) and the available NiFi metrics will be displayed. These metrics can be included in graphs and other visuals in Datadog dashboards. (If you’re interested, the names of the metrics originate in MetricNames.java.)

Creating a Datadog Dashboard for Apache NiFi

These metrics can be added to Datadog dashboards. By creating a new dashboard, we can add NiFi metrics to it. For example, in the dashboard shown below we added line graphs to show the CPU usage and JVM heap usage for each of the NiFi nodes.

The DataDogReportingTask provides a convenient but powerful method of publishing Apache NiFi metrics. The Datadog dashboards can be configured to provide a comprehensive look into the performance of your Apache NiFi cluster.

What we have shown here is really the tip of the iceberg for making a comprehensive monitoring dashboard. With NiFi’s metrics and Datadog’s flexibility, how the dashboard is created is completely up to you and your needs.

Need more help?

We provide consulting services around AWS and big-data tools like Apache NiFi. Get in touch by sending us a message. We look forward to hearing from you!

Dataworks Summit 2019

Recently (in May 2019) I had the honor of attending and speaking at the Dataworks Summit in Washington D.C. The conference had many interesting topics and keynote speakers focused on big-data technologies and business applications. I also always enjoy exploring downtown Washington DC.  Whether it is doing the “hike” across the National Mall taking in the sights or visiting all of the nearby shopping, there’s always something new to see.

One thing that caught my attention early on was the number of talks that either focused largely on Apache NiFi or at least mentioned Apache NiFi as a component of a larger data ingest platform. Apache NiFi has definitely cemented itself squarely as a core piece of data flow orchestration.

My talk was one of those. In my talk that described a process to ingest natural language text, process it, and persist extracted entities to a database, Apache NiFi was the workhorse that drove the process. Without NiFi, I would have had to write a lot more code and probably ended up with a much less elegant and performant solution.

In conclusion, if you have not yet looked at Apache NiFi for your data ingest and transformation (think ETL) pipeline needs, do yourself a favor and spend a few minutes with NiFi. I think you will find what you like. And if you need help along the way drop me a note. Just say, “Hey Jeff, I need a pipeline to do X. Show me how it can be done with NiFi.” I’ll be glad to help.

Need more help?

We provide consulting services around AWS and big-data tools like Apache NiFi. Get in touch by sending us a message. We look forward to hearing from you!