Categories
container emacs

Containerized Jupyter Lab Authentication for Emacs-IPython-Notebook

In my earlier post, I mentioned how to run Apache Spark in a container with Jupyter notebook. While I like to use Jupyter notebook to document my work and share it with my colleagues, I still prefer to use Emacs instead of a web browser to write code. The answer is EIN — Emacs IPython Notebook. Installing EIN is straight forward. The tricky part is to make it working with the containerized Jupyter.

Once you enter EIN, you will need to use the command “ein:login” to connect to Jupyter. Internally it goes through websocket and if you just use your password, as you would from a browser, EIN will throw an error of expired websocket and you will not be able to execute any code or create a new notebook. One fix of this problem is to use the Jupyter token as the password. The token can be found from the command below:

host$ docker exec spark-jupyter jupyter server list
Currently running servers:
http://1e2480237424:8888/?token=4899803e93c22739a8de56fa4deed22aef6568f93025c901 :: /home/jovyan

Jupyter can use both token and password for user authentication based on how it is configured. By default, token changes every time, which makes it more secure but also harder to use. In a secured environment, we can fix the token by define the JUPYTER_TOKEN environment (first line below) and pass it to the Docker container when starting Jupyter (second line):

export JUPYTER_TOKEN="4899803e93c22739a8de56fa4deed22aef6568f93025c901"

host$ docker run --name spark -d -p 8888:8888 -p 4040:4040 -p 4041:4041 -e JUPYTER_TOKEN=$JUPYTER_TOKEN -v $PWD:/home/jovyan --name spark-jupyter jupyter/all-spark-notebook

A better configuration is to run the “jupyter server password” command (new in Jupyter version 5.0 ) in the container, which will save a hashed password in /home/jovyan/.ipython/jupyter_server_config.json. Then you will only need to put the same password when login. Though you can also disable both password and token, I would recommend not to do it.

To enable inline image display in EIN/Emacs, just put this line in your .emacs file:

(setq ein:output-area-inlined-images t)
Categories
container Data

How to Run Apache Spark from a Container

Apach Spark is a large-scale data analytics engine that can utilize distributed computing resources. It supports common data science languages, e.g. Python and R. Its support for Python is provided through the PySpark package. Some advantages of using PySpark over the traditional vanilla Python (numpy and Pandas) are:

  • Speed. Spark can operate on multiple computers. So you don’t have to write your own parallel computing code. People are claiming 100x speed gain using Spark.
  • Scale. You can develop code on a laptop and deploy it on cluster computers to process data at scale.
  • Robust. It won’t crack if some nodes are taken off during the execution time.

Other features, like SparkSQL, Spark ML, and support for data streaming sources bring additional advantages.

After a quick tryout of the Spark container image from Bitnami, I moved on to another image released by Jupyter stack with good documentation. To run the container and expose the Jupyter notebook and share the current host directory with the container, use this command:

docker run -d -p 80:8888 -p 4040:4040 -p 4041:4041 -v ${PWD}:/home/jovyan jupyter/all-spark-notebook

If you need to install additional packages to the container image provided, you could install them by either going inside the container (“docker exec -it spark /bin/bash” or modifying the original docker-compose.yml file.