Utilizing Apache Solr REST API in CDP Public Cloud


Summary

The Apache Solr cluster is on the market in CDP Public Cloud, utilizing the “Information exploration and analytics” knowledge hub template. On this article we are going to examine how to connect with the Solr REST API working within the Public Cloud, and spotlight the efficiency impression of session cookie configurations when Apache Knox Gateway is used to proxy the site visitors to Solr servers. Info on this weblog publish will be helpful for engineers creating Apache Solr consumer purposes.

The Apache Solr servers within the Cloudera Information Platform (CDP) expose a REST API, protected by Kerberos authentication. Normally, all of the Solr server situations can deal with site visitors when the Solr cluster is working in a distributed mode. The given Solr server that’s receiving the request from the consumer will ahead the question to all of the servers dealing with shards for the gathering and mix the outcomes earlier than sending again the response to the consumer. For scalability, it’s best to distribute the queries among the many Solr servers in a round-robin vogue.

When Solr is deployed within the public cloud utilizing the “knowledge exploration and analytics” knowledge hub template, there are two methods to succeed in the Solr cluster from a separate consumer host. The primary, simpler method is to succeed in Solr utilizing Knox Gateway as a proxy. The Apache Knox Gateway is a system that gives a single level of authentication and entry for Apache Hadoop companies in a cluster. Within the CDP Information Hub cluster Knox accepts HTTP fundamental authentication, so CDP customers can use their workload or machine person credentials for authentication. Primarily based on these credentials Knox will ahead the requests to Solr servers in round-robin, utilizing Kerberos and Easy and Protected GSSAPI Negotiation Mechanism (SPNEGO) on behalf of the authenticated finish person. (See Determine 1)

Determine 1. Sending Solr queries to the Solr cluster via Knox Gateway

After we connect with Solr via Knox, the Knox Gateway units the KNOXSESSIONID cookie within the HTTPS response. This cookie will be reused and set in every subsequent request, which can drastically enhance the efficiency of dealing with Solr requests.

One other method is to connect with any Solr server occasion immediately, utilizing HTTPS with SPNEGO authentication. On this case the Knox Gateway will not be used. Organising this connection will be more difficult, as no fundamental authentication is feasible however Kerberos credentials are required. Additionally, if the Solr consumer host is exterior of the CDP setting, then all Solr server ports on the employee hosts have to be uncovered. (See Determine 2) 

Determine 2. Sending Solr queries on to a Solr Server occasion

Benchmarking

To measure the efficiency of the Solr API, we developed a small efficiency benchmark script and executed it from a gateway node of the information hub cluster. The benchmark script is on the market underneath Apache 2.0 license in this repository. 

The next desk and graph current our benchmark outcomes. We executed quick Solr queries on a really small Solr assortment. We diverse the variety of parallel threads (1..10) and on every thread we executed 100 Solr REST calls utilizing the “curl” command. We examined the Solr API each immediately (connecting to a single given Solr server with out load balancing) and utilizing Knox (connecting to Solr via a Knox Gateway occasion). We repeated the exams each with and with out reusing the cookies despatched again within the HTTPS responses. In all circumstances, the benchmark script was working on the gateway host of the Solr knowledge hub cluster. 

Desk 1: Efficiency benchmark outcomes (common response time and throughput) exhibiting the impact of cookie reuse between subsequent Solr API calls. Colours of the cells correspond to strains visualized in Determine 3.

 

Determine 3: Efficiency benchmark outcomes (common response) exhibiting the impact of cookie reuse between subsequent Solr API calls. Colours of the strains correspond to colours utilized in Desk 1.

Our outcomes clearly present how vital it’s to concentrate to make use of the KNOXSESSIONID cookie when connecting to Solr utilizing the Knox Gateway. When the cookie is ready, the efficiency is mainly the identical, suggesting that the Knox Gateway will not be the bottleneck for this specific benchmark. Nonetheless, with out setting KNOXSESSIONID we get a really important efficiency degradation, which is brought on by the truth that the Knox Gateway must authenticate every HTTPS request one after the other, but when this cookie is ready Knox can depend on earlier authentication.

Conclusion

We described two methods to connect with Solr REST API within the CDP Public Cloud; hopefully the data on this weblog publish will show you how to to decide on the perfect one in your undertaking. Connecting via Knox is preferable because the Knox Gateway gives load balancing and in addition eases the authentication by eliminating the necessity for consumer aspect Kerberos configuration. Direct connection to the Solr server situations can be doable and is perhaps a great method if Knox gateway turns into a bottleneck or if the additional routing step made by Knox proves so as to add an excessive amount of additional latency to the site visitors. Nonetheless, for a lot of the circumstances we propose beginning the undertaking through the use of Knox Gateway to succeed in Solr, primarily as a result of establishing safe connection and cargo balancing for a direct Solr entry will be more difficult. Utilizing the KNOXSESSIONID cookie may help to succeed in efficiency just like the direct setup.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles