Anaconda Virtual Environment in Python

You can create a Python virtual environment for your Anaconda Distribution of Python using "conda" that comes with the distribution. You can get a list of commonly used conda commands by "conda -h" in the the command line."conda -V" gives you the current version of conda and can be updated with "conda update conda".

To create an anaconda virtual environment:

conda create -n yourenvname python=x.x anaconda

To activate the virtual environment:

source activate yourenvname

The above command prepend your python virtual enviornment to your path. To verify, you can say "which python" and you'll see that its pointing to the python in your virtual environment

To de-activate the virtual environment:

source deactivate

Get a list of all anaconda virtual environments:

conda env list

Delete a virtual environment

conda remove -n yourenvname -all 

Conda's official documentation:
https://conda.io/docs/index.html

Working With Kerberos Enabled Hadoop Cluster

Setting the Hadoop UserGroupInformation in JAVA Programs


import org.apache.hadoop.security.UserGroupInformation;
org.apache.hadoop.conf.Configuration conf = new   
org.apache.hadoop.conf.Configuration();
conf.set("hadoop.security.authentication", "Kerberos");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("example_user@IBM.COM",
"/path/to/example_user.keytab");

Accesssing HCatalog with MapReduce

I was running a MapReduce job in which one of the Mappers was using HCatalog to access a hive metastore. I was getting the following error:


17/02/07 14:04:25 WARN hive.metastore: set_ugi() not successful, 
Likely cause: new client talking to old server. Continuing without it.
org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read
(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll
(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody
(TBinaryProtocol.java:380)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin
(TBinaryProtocol.java:230)
at org.apache.thrift.TServiceClient.receiveBase
(TServiceClient.java:77)
...

I created a new directory on my home directory and added the following conf files:
core-site.xml, hdfs-site.xml, mapred-site.xml, hive-site.xml and then added the directory to my classpath.


CLASSPATH=/home/user_name/hadoop_conf:$CLASSPATH

and the problem was resolved. I was running my map reduce using the following command:


hadoop jar jar-file -conf /home/user_name/hadoop_conf/hive-site.xml 
-conf /home/user_name/hadoop_conf/mapred-site.xmp

Running a shell script as a certain user

If you want to run your shell as a certain service user.Just add the following line at the beginning of your script and script will run as the specified user. The kinit is to change user. This is specially helpful when you want your script to be run as a cron job.


kinit -k -t path/to/keytab user_name

However, if you want to change user from the command line, you have to destroy the existing ticket before you can change the user:


kdestroy
kinit -k -t path/to/keytab user_name (for keytab user)
kinit user_name@REALM (for non keytab user)
kinit (for non keytab user, should default to your user_name abnd REALM)

Running a MapReduce Job that uses HFileOutputFormat2

The HfileOutputFormat2 is used in MapReduce is used to generate Hfiles which is the backend format files for HBase. The HBase specific configurations that this types of job needs are the hbase-site.xml and hbase-default.xml. The hbase-site.xml comes with the hbase installation but the hbase-default.xml is integrated in the some hbase jar file. To override some configurations, there are tow ways to do it: (1) Programmatically and (2) Manually editing the file
To change a property in the hbase-site.xml programmatically, you can do something like the following in your Map Reduce code:


Configuration hbase_conf = HBaseConfiguration.create();
hbase_conf.set("hbase.zookeeper.quorum", "localhost");

or, you can simply copy the xml files in your local directory and edit the file and point your map reduce job to use the edited version of your xml file. For example, the default location of the hbase_staging directory is defined by the property "hbase.fs.tmp.dir" in hbase-default.xml and points to "

/user/${user.name}/hbase-staging

" in HDFS. To change that location, copy the original hbase-site.xml to your own location and edit it to add the following line:


property
name
hbase.fs.tmp.dir
/name
value
/home/tmp_dir
/value
/property

and point your job to the edited conf file.