Running Tensorflow in a Docker Container in Ubuntu

I found that the easiest and hassle-free way of running the GPU version of Tensorflow is with the help of docker as you don’t have to worry about the CUDA versions and Tensorflow versions compatibility. This is a tutorial to use Tensorflow in a docker container in Ubuntu. If you were to use Tensorflow in a docker container in Mac or Windows, it will only be simpler than this tutorial because you just need to install Docker for Desktop application for Mac or Windows as opposed to a Docker Engine.

Install Docker CE in Ubuntu

Uninstall old versions

sudo apt-get remove docker docker-engine docker.io containerd runc

Set up the repository

sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

Add Docker’s official GPG key

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

Verify the fingerprint

sudo apt-key fingerprint 0EBFCD88

Add the repository

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io

Verify the Docker CE installation

sudo docker run hello-world

Install nvidia-docker2 for GPU support (Only necessary if you’re planning on running Tensorflow in GPU)

If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers docker volume

ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f sudo apt-get purge -y nvidia-docker 

Add the package repositories

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey|sudo apt-key add - 

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) 

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

sudo apt-get update 

Install nvidia-docker2 and reload the Docker daemon configuration

sudo apt-get install -y nvidia-docker2 

sudo pkill -SIGHUP dockerd 

Test nvidia-smi with the latest official CUDA image

docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

Get Tensorflow docker image from docker hub

# run one of the following commands based on your need. Tag variants are
-gpu, -py3 and -jupyter

sudo docker pull tensorflow/tensorflow:latest-gpu-py3 #pulls the latest tensorflow docker image with GPU support and python3

sudo docker pull tensorflow/tensorflow:latest #pulls the latest tensorflow docker image with CPU only support and python2

Instantiate your container from your tensorflow docker image

# for tensorflow CPU image with python 2
sudo docker run -it --name=mytfcontainer tensorflow/tensorflow:latest bash

# for tensorflow GPU image with python 3
sudo docker run -it --runtime=nvidia --name=diwanshu1 tensorflow/tensorflow:latest-gpu-py3 bash

There you have it. Enjoy using tensorflow in docker

References:

SQL, Pandas and Spark

Most of us are familiar with writing database queries with SQL. But there are also other ways you can query your data from the database or from a file directly. One way is through a Python package called Pandas or through Apache Spark. Both of them are very popular these days in the Data Science field. If you can fit your data in memory in a single computer, I’d suggest to use Pandas. In case the data is big and you need to process your data in a distributed system in memory, Apache Spark is the technology to use. People who are familiar with Hadoop and not so familiar with Spark may be more inclined to use the traditional MapReduce to process big data, and that is fine but Spark comes with some built-in packages that allow you to process your data in a SQL-like manner which ends up saving a lot of development time. Today I’m going to compare SQL queries with Pandas and Spark, so in case you end up using these technologies, hopefully this will make slightly easier to get your head around it. Note that I’ll be showing you examples of Spark with the Python API, whose equivalence is available in JAVA and Scala APIs of Spark as well.

Employee Table/Dataframe

IdEmployee_NameSocial_Security_NumberDepartment_IdSalary
1Roger Martin546-98-1987265000
2Robert Waters437-781-4563170000
3Michael Peters908-809-0897175000

Organization Table/Dataframe

IdDepartment_Name
1Data Science
2Finance
3Human Resources

Column Selection

SQLPandasSpark
select Employee_Name, Department_Id
from Employee
Employee[[‘Employee_Name’,’Department_Id’]]Employee.select(‘Employee_Name’,’Employee_Id’)

Row Selection

SQLPandasSpark
select * from
Employee
where Department_Id=’1′
mask = Employee[‘Department_Id’] == 1
Employee[mask]
Employee.where(col(‘Department_Id’) == 1)

Group by

SQLPandasSpark
select Department_Id, avg(‘Salary’)
from Employee
group by Department_Id
Employee[[‘Department_Id’, ‘Salary’]].groupby([‘Department_Id’]).mean()Employee.groupBy(‘Department_Id’).agg(mean(‘Salary’))

Joins

SQLPandasSpark
select t1.Employee_Name, t2.Department_Name
from Employee t1, Organization t2
where t1.Department_Id = t2.Id
import pandas as pd
pd.merge(Employee, Organization, how=”inner”, left_on=’Department_Id’, right_on=’Id’)[[‘Employee_Name’,’Department_Id’]
]
joinexpr = Employee[‘Department_Id’] == Organization[‘Id’]
Employee.join(joinexpr, “inner”).select(‘Employee_Name’, ‘Department_Name’)