On Might 3, 2023, Cloudera kicked off a contest referred to as “Finest in Move” for NiFi builders to compete to construct one of the best information pipelines. This weblog is to congratulate our winner and assessment the highest submissions.
On the verge of the discharge of NiFi 2.0, Cloudera VP of Engineering and NiFi founder Joe Witt, joined by principal committers Mark Payne and Matt Gillman, addressed the worldwide group by way of a digital occasion dubbed “Meet the Committers.” The crew mentioned NiFi’s origins and the journey to NiFi 2.0 in addition to important options within the upcoming launch, and surveyed the group in regards to the dev/ops challenges of managing their very own nodes. As a part of the occasion, Cloudera kicked off the “Finest in Move” contest. The competition challenged builders to construct information pipelines that symbolize their enterprise use circumstances utilizing Cloudera DataFlow. DataFlow is a cloud-native information service powered by Apache NiFi with a streamlined consumer expertise for improvement and deployment enabling true common information distribution. For the competition, Cloudera made a sandbox atmosphere accessible for builders to make use of DataFlow Public Cloud. We had greater than 40 builders lively within the atmosphere and plenty of high-quality contest submissions. However in the long run there may solely be one winner.
Finest in Move champion
So with none additional ado, our winner and the brand new Finest in Move Champion is:
Vince Lombardo! Vince is a Senior Infrastructure Engineer at Wells Fargo, and he developed a cybersecurity pipeline to effectively acquire, course of, and make information from an asset polling software accessible for database ingestion. Cybersecurity is a typical area for DataFlow deployments as a result of want for well timed entry to information throughout programs, instruments, and protocols. What’s fascinating about Vince’s software is that it cleverly makes use of “pagination” performance to constantly distribute up-to-the minute outcomes from a software that doesn’t all the time return a full set of outcomes immediately. For extra element on the successful stream, take a look at Vince’s github web page right here.
Vince’s successful stream
Vince started by funneling information from six API endpoints from an asset polling software containing cybersecurity and tech ops information into two discrete information matters. The stream he constructed differentiates between check or true API name earlier than initiating a safe log in. The good half comes subsequent. As a result of the polling software can take time to return queries, Vince added a processor to loop till the question completes, returning question standing till the question is full. Completeness is estimated by evaluating a check outcome with “estimated whole.” When a close to match is detected, the info pull is triggered after which checked once more for completeness earlier than being remodeled into rows and columns and merged right into a batch for database ingestion.

Determine 1: The a part of the stream that loops till the Tanium question has accomplished
Vince’s stream met all of our standards and was the clear contest winner. This stream is full and adheres to NiFi finest practices being each environment friendly and extremely safe. By using pagination, this dataflow ensures a whole outcome set is available from a knowledge supply with extremely variable question execution instances. It’s deployable, has clear enterprise worth, and serves as a fantastic instance of common information distribution in motion. Congratulations Vince!
Runner up
Ramakrishna Sanikommu was our runner up. His submission publish will be discovered right here. RK constructed some easy flows to drag streaming information into Google Cloud Storage and Snowflake. Many builders use DataFlow to filter/enrich streams and ingest into cloud information lakes and warehouses the place the power to course of and route wherever makes DataFlow very efficient. RK constructed a number of flows rapidly, first pulling a number of information sources from a Google Pub/Sub subject and merging them right into a file for ingestion into GCS. He then constructed a second stream to execute a Python script and cargo the info into Snowflake. His flows adhered to finest practices and demonstrated some mild transformations. RK correctly used the DataViewer as effectively to view contents of a queue.

Determine 2: Ramakrishna’s first stream consuming information from Google PubSub and ingesting it into Google Cloud Storage

Determine 3: Ramakrishna’s second stream studying information from Google Cloud Storage and ingesting it into Snowflake
Abstract and looking out forward
In lower than 10 years since its inception, NiFi has achieved completely large scale each by way of recognition and the dimension of deployments. NiFi’s origins, nonetheless, have been fairly easy—for any two programs to work collectively, there are fairly just a few issues that need to agree. They have to not solely converse some frequent information language however account for myriad issues like relevance, safety, precedence, authorization, and so forth. NiFi was constructed as a type of Swiss Military Knife to rapidly join completely different programs and coordinate dataflows from one to a different utilizing an intuitive no-code improvement canvas.
Since buying the corporate primarily chargeable for sustaining the NiFi code base in 2015, Cloudera has continued to pour assets into the Open Supply challenge, which now boasts greater than 500 contributors throughout the globe and hundreds of lively group members in Slack. NiFi has advanced significantly, staying forward of safety vulnerabilities and including connectors with releases each quarter. The “Finest in Move” contest was quite a lot of enjoyable, and demonstrated the urge for food for group round Apache NiFi. Right here at Cloudera we’re excited to host future occasions for NiFi builders, so keep tuned to seek out out what’s subsequent. To check drive Cloudera DataFlow your self, click on right here to request a trial of Cloudera Knowledge Platform within the Public Cloud. https://www.cloudera.com/marketing campaign/try-cdp-public-cloud.html