Once you will have followed all these installation instructions, you will be able to run jupyter notebooks by simply typing in a shell/DOS command:
jupyter notebook
The following webpage lists several Jupyter tricks (in particular, it illustrates many IPython magic
commands) that should improve your efficiency (note that this blog post is about two years old so some of the tricks may have been integrated in the default behavior of Jupyter now).
Using the Jupyter environment we deployed for this MOOC will allow to easily access any file from your default GitLab project. There are situations however where you may want to play with other notebooks.
Simply follow the following steps:
From the menu: File -> Open
. You're now in the Jupyter file manager.
Navigate to the directory where you want your notebook to be created.
Then from the top right button: New -> Notebook: Python 3
.
Give your notebook a name from the menu: File -> Rename
.
N.B.: If you create a file by doing File -> New Notebook ->
Python 3
, the new notebook will be created in the current directory. Moving it afterward is possible but a bit cumbersome (you'll have to go through the Jupyter file manager by following the menu File -> Open
, then select it, Shut
it down
, and Move
and/or Rename
).
If your notebook is already in your GitLab project, then simply synchronize by using the Git pull
button and use the File -> Open
menu. Otherwise, imagine, you want to import the following notebook from someone else's repository to re-execute it.
Open raw
(a small </>
within a document icon) and save (Ctrl-S
on most browsers) the content (a long JSON text file).File -> Open
and navigate to the directory where you want to upload your notebook.Upload
the previously downloaded notebook and confirm the upload.You will find here a list of jupyter notebooks that illustrate how different languages (python, R, SAS) can be used in Jupyter.
It used to be impossible with earlier versions of Jupyter but it is now very easy thanks to the the rpy2
package (see the details of the installation procedurer in the corresponding section below) that allows you to use both languages in the same notebook. Simply open a new python notebook and follow these instructions:
Loading rpy2
:
Using the %R
Ipython magic:
Python objects can then even be passed to R as follows (assuming df
is a pandas dataframe):
Note that this %%R
notation indicates that R should be used for the whole cell but an other possibility is to use %R
to have a single line of R within a python cell.
Here is an notebook example using both R et Python
Jupyter is not limited to Python and R. Many other languages are available: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels, including non-free languages like SAS, Mathematica, Matlab… Note that the maturity of these kernels differs widely.
None of these other languages have been deployed in the context of our MOOC but you may want to read the next sections to learn how to set up your own Jupyter on your computer and benefit from these extensions.
SAS is a proprietary statistical software which is very commonly used in health research. Since the question was asked several times, if you really need to stay with SAS, you should know that SAS can be used within Jupyter using either the Python SASKernel (similar to the IRKernel
) or the Python SASPy package (similar to the rpy2
package).
Since proprietary software such as SAS cannot easily be inspected, we discourage its use as it hinders reproducibility by essence. But perfection does not exist anyway and using Jupyter literate programming approach allied with systematic control version and environment control will certainly help anyway.
Install saspy
with the pip
command. E.g.,
python -m pip install saspy
On Windows, you will have to modify the file C:\Program
Files\Python\Python37\Lib\site-packages\saspy\sascfg_sav.py
and to adapt it to your own system. In both following screenshots, the left window corresponds to the initial file and the right window corresponds to the modified one:
Here is a example of Python/SAS notebook.
NB: Exporting from HTML (and therefore SAS) to PDF format via LaTeX does not work (notebookhtmlexportpdfvialatex.pdf). Until now it was possible to export from Pandoc. One way to do this was to export the notebook as HTML (or Markdown) in Jupyter and then run the following command:
pandoc --variable=geometry:a4paper --variable=geometry:margin=1in notebook_sas.html -o notebook_sas.pdf
The notebook-as-pdf extension now allows direct export from HTML format to PDF format (notebookhtmlexportpdfviahtml.pdf) and therefore the export of SAS notebooks in PDF format (notebookJupyterSAS.pdf). (Choose download PDF via HTML.)
Useful link: https://sassoftware.github.io/saspy/
The sas_kernel
is based on the saspy
so first instll saspy
by following the previous instructions.
Install the sas_kernel
package through pip
. E.g.,
python -m pip install sas_kernel
You will then be able to create SAS notebooks
Please note the top right SAS icon.
Here is a example of SAS notebook.
Useful link: https://sassoftware.github.io/sas_kernel/install.html
Many plugins can make your life easier when using Jupyter. The official ones are gathered here
Here are a few ones that can ease your life:
Code folding to improve readability when browsing the notebook.
pip3 install jupyter_contrib_nbextensions
# jupyter contrib nbextension install --user # not done yet
Hiding code to improve readability when exporting.
sudo pip3 install hide_code
sudo jupyter-nbextension install --py hide_code
jupyter-nbextension enable --py hide_code
jupyter-serverextension enable --py hide_code
Then in jupyter, choose Hide_code
in the menu
You should then obtain this:
You should then use the icons to export rather than going through the menu:
NB: In the first edition of the MOOC some people had issues making it work under Windows.
Table of Contents: The toc(2) extension also greatly improves navigation and folding capacities of Jupyter
In this section, we explain how to set up a Jupyter environment on your own computer similar to the one deployed for this MOOC.
First, download the most recent version of Miniconda. Miniconda is a lightweight edition of Anaconda, a software distribution that includes Python, R, Jupyter, and many popular libraries for scientific computing and data science.
On our server, we use version 4.5.4
of Miniconda and versio 3.6
of Python. In theory, you could download the environment file mooc_rr
and reproduce an identical environment on your own computer. Unfortunately, our server was set up in 2018, and conda
has changed quite a bit since then. Reconstructing this environment is therefore no longer possible. We will show you in the following how to get an equivalent environment but using more recent versions of all the software.
Install Miniconda following the supplied instructions. Whenever (it it not systematic) the installer asks you the question
Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no]
answer yes
. You will then see the advice
=> For changes to take effect, close and re-open your current shell. <=
which you must respect to make sure that the following steps work correctly.
Important: You should then run all the following commands through the conda shell. As explained in the Anaconda documentation, to open the Anaconda prompt:
The first command to run next is
conda update -n base -c defaults conda
which updates all the software in the conda
distribution.
We can now create a conda
environment for the RStudio path of out MOOC:
conda create -n mooc-rr-jupyter
and activate it:
conda activate mooc-rr-jupyter
It is not strictly necessary to activate an environment in order to use it, but doing so makes the use of the environment easier and less error prone. You have to perform this activation step every time you open a new terminal, before you can work with the environment.
The next step is the installation of all software packages we need and which are in the Miniconda distribution:
conda install jupyter python numpy matplotlib pandas r r-irkernel rpy2 tzlocal simplegeneric
We also need two packages that is not in Miniconda. We request the first one from the independent package source conda-forge:
conda install -c conda-forge r-parsedate
and the second one from the main Python code repository, PyPI:
pip install isoweek
You can now start Jupyter:
jupyter notebook
and work with our examples and exercises.
For exporting your notebooks as PDF files, you must also install LaTeX on your system. We describe this process in a separate resource.
To ease your experience, we added pull/push buttons that allow you to commit and sync with GitLab. This development was specific to the MOOC but inspired from a previous proof of concept. We have recently discovered that someone else developed about at the same time a rather generic version of this Jupyter plugin. Otherwise, remember that it is very easy to insert a shell cell in Jupyter in which you can easily issue git commands. This is how we work most of the time. If you choose this solution, you will have to configure Git on your computer. To do this, you can follow the video Configure git for Gitlab and read the document Git and GitLab.
This being said, you may have noticed that Jupyter keeps a perfect track of the sequence in which cells have been run by updating the "output index". This is a very good property from the reproducibility point of view but depending on your usage, you may find it a bit painful when committing. Some people have thus developed specific git hooks to ignore these numbers when committing Jupyter notebooks. There is a long an interesting discussion about various options on StackOverflow, the Jupyter Forum, and in NextJournal
For those who use JupyterLab rather than the plain Jupyter, a specific JupyterLab git plugin has been developed to offer a nice version control experience.