How to access Cloudera Director To Spin Up a Cloudera Cluster in AWS, Azure or Google Cloud

I have installed Cloudera Director in an ec2 instance in AWS so I can use it to spin up a cluster in AWS, Azure or Google Cloud. You need to follow the below steps to access the Cloudera Director:

  1. Set up a SOCKS proxy server with SSH
  2. Configure your browser to use SOCKS proxy

Set up a SOCKS proxy server with SSH

ssh -i "pem_key" -CND 8157 ec2-user@ip-address-of-ecc2-instance

Configure your browser to use SOCKS proxy
For Chrome browser:
1. Download "proxy helper" extension
2. Change the proxy helper configuration as below:
3. Start the browser proxy

3. In the browser, type private-ip-of-ec2:7189
It should direct to the login page of the Cloudera Director. If you are logging in for the first time, the username and password is admin and admin respectively
4. Follow the wizard to spin up cluster as provided in this manual - https://www.cloudera.com/documentation/director/latest/PDF/cloudera-director.pdf

HDFS with Python

I was researching today on the available APIs for using HDFS with Python. Two APIs seem to be popular - hdfs and snakebite. I pip installed hdfs but later found out that it doesn't support High Availability (HA). Then, I start reading the documentation for snakebite and found that it supports both High Availability and Kerberos. I pip installed snakebite[kerberos] as this is the version of snakebite that supports Kerberos. Note that you also need to pip install python-krbV before installing snakebite for Kerberos. Here is the step by step process:

<pre>

<code>

pip install python-krbV

pip install "snakebite[kerberos]"

</code>

</pre>

I used the the APIs AutoConfig class to test the API and here is the code:

<pre>

<code>

from snakebite.client import AutoConfigClient
client = AutoConfigClient(use_sasl=True)
print type(client)
print dir(client)
print client.df()

 

</code>

</pre>

AutoConfigClient looks for conf folder in HADOOP_HOME environment variable. I set the HADOOP_HOME environment variable to "/etc/hadoop" in my Cloudera cluster and it worked with no issue.

snakebite also comes with a command line interface (CLI) . Just give the command "snakebite" and hit enter and it will tell you all the commands that are available with the cli.