Table of Contents
- Introduction
-
Features (aka “What’s that powerful about it”)
- Embed code blocks in any language
- Results can be exported to many formats, like latex (demo) or this post.
- Programmable documents (aka “Literate programming”)
- Built in excel alternative
- Pass data between languages
- Outline view is powerful for organizing your work
- Navigate to code and between org files with ctags.
- Many more
- Installation
- Troubleshooting
- My workflow
-
Examples
- Org table to pandas and plotting
- Org table -> Pandas -> Org table
- Share code between code blocks
- TODO Connect to existing ipython kernel
- TODO Use global constant
- TODO Data frame sharing with org tables
- TODO Pass data directly between languages
- TODO Different language kernels
- Examples from other blog posts
- Additional configuration I plan to do
- Further reading
Introduction
Emacs org-mode
with ob-ipython
is the most powerful data analysis environment I ever used. I find it much more powerful than other tools I used, including jupyter and beaker web notebooks or just writing python in PyCharm.
Emacs org mode with ob-ipython is like jupyter or beaker notebook, but in Emacs instead of browser and with many more features.
Word “Emacs” may be scary. There are pre-packaged and pre-configured emacs distribution that have much smaller learning curve, my favorite being Spacemacs (I am in progress of rebasing my config with it). You can just use 1% of capabilities of Emacs (probably majority of Emacs users do not approach 10% of Emacs capabilities) and still benefit from it.
If you are going to bring up the common quote of “emacs is fine operating system, but it lacks decent text editor” – Emacs now have decent text editor by using the vim emulation evil-mode
. It’s the best vim emulation in existence and even many packages from vim are ported. Spacemacs is a nice emacs distribution that bundles evil mode.
I will try to introduce and describe org mode with ob-ipython it for users who never used Emacs before.
Since this blog post have been written in org mode, linear reading experience in exported format is less optimal experience than reading the org mode file directly in org mode.
Features (aka “What’s that powerful about it”)
Embed code blocks in any language
You can embed embeded source code and evaluate it with C-c C-c
. Results of evaluation of your source code are appended after the source code block. Result can be text (including org table) or image (charts).
What’s more You can have separate org file and ipython console open side by side. With ipython, reading python docstrings and code completion works well. See my screenshot.
Since ob-ipython uses jupyter, you can get the same environment for anything that have jupyter kernel, including matlab, Scala, Spark or R and many more.
Results can be exported to many formats, like latex (demo) or this post.
This blog post is just an export of org mode file via org2blog. All code examples have been written in org mode using workflow described in this post.
Exporting works to formats like html, latex (native and beamer), markdown, jira, odt (than can be imported to google docs and word), wiki formats and many more.
Syntax highlighting can be preserved for some exports, like html or latex.
You can just learn one way to edit documents and presentations than can be exported to majority of formats on earth.
Programmable documents (aka “Literate programming”)
Emacs org mode with org babel is a full fledged literate programming environment. Some people have published whole books or research papers as a large executable document in org. There is an even Research paper about it.
Python computations in science and engineering book supports org mode and it’s far better book reading experience than anything I ever experienced before. I can tweak and re-run code examples, link from my other notes, tag or bookmark interesting sections, jump between sections and many more.
When writing some latex in college, I recall situations when I am half way through writing latex document. I would came up with the idea of some parameter tweak, and suddenly I have to re-generate all charts.
With org mode, the document is generated pragmatically. Not only you can easily re-generate it, but readers of your document can tweak parameters or supply their own data set and re-generate the whole document.
Another example is training machine model. You can define your model parameters as org constants. You can tweak some model parameter and have separate org mode headings for things like “performance statistics”, “top miss-classified cross validation samples”, etc. Added benefit is that you can commit all this to git.
As soon as you learn org mode all of it is easy and seamless.
Built in excel alternative
Sometimes just “manually” editing the data is the most productive thing to do. You can do it with org mode spreadsheet capabilities on org tables.
The added benefit is that formulas are written in lisp, that is cooler and more powerful language than Visual basic. http://orgmode.org/manual/Translator-functions.html
Integration with pandas
My current Table->Pandas->Table workflow works. It is somewhat clunky, but it can be improved. See examples section.
Integration with other formats
You can export org tables to many formats by exporting it to pandas and then using pandas exporter. Nevertheless, org supports sql, csv, latex, html exporters.
Pass data between languages
Similar functionality is offered by beaker notebook.
I found out that org mode as intermediate format for data sometimes works better for me.
Since intermediate format for a data frame is the org table, I can import data frame to org, edit it as spreadsheet and export it back. See Pass data directly between languages in examples section.
Outline view is powerful for organizing your work
Org mode outline view is very handy for organizing your work. When working on some larger problem, I am only focusing on small subset of it. Org mode lets me just expand sections that are currently relevant.
I also find adding embedding TODO items in the tree quite handy. When I encounter some problem I mark a subtree as TODO, and I can later inspect just subtree headlines with TODO items with them. See:
Many more
You don’t have to use all features offered by org mode.
Embed latex formulas
Also works in html export with mathjax.
Fast integration with source control
I like to keep my notes in source control. To avoid overheard of additional committing I use magit-mode
. Out of the box you can commit directly from Emacs with 6 keyboard strokes. With a few lines of elisp you can auto generate commit messages or automatically commit based on some condition (e.g. save or file closed or focus-out-hook
).
Everything in org is plain text, including results of eval of code blocks, so it will be treated well by the source control.
Run a webserver that will let people do basic editing of you org files in the browser
Spaced repetition framework (remember all those pesky maths formulas)
If you are like me, you forgot a lot of maths formulas since college. Spaced repetition is a learning methodology that helps you avoid forgetting important facts like maths formulas. I recommend this very good post about spaced repetition in general from gwern.
People primarily use spaced repetition for learning words in new languages, but I use it for maths formulas or technical facts.
There are spaced repitition tools like anki or super memo, but as soon as you want advanced features like latex support they support them very badly (IMO) or not at all.
org-drill is a spaced repetition framework in drill, that allows you to use all of the org features for creating flash cards. .
Calendar
Managing papers citations
Tagging
Links
Agenda views
Even more
I only mentioned some of the features I use or plan to use soon. There are many more. Some urls to look at:
(browse-url-emacs
"http://kitchingroup.cheme.cmu.edu/org/2014/08/08/What-we-are-using-org-mode-for.org")
Installation
Install Emacs (with vim emulation)
Although I don’t use it, I recommend Spacemacs, pre-configured emacs distribution, like “Ubuntu” of Emacs.
Install python packages
If you don’t run those, you may run into troubles.
pip install --upgrade pip pip install --upgrade ipython pip install --upgrade pyzmq pip install --upgrade jupyter
Install ob-ipython
org mode should be bundled with your emacs installation. If you are new to emacs, you can install packages using M-x package-install
.
Elisp configuration
Add to your Emacs config:
(require ‘org)
(require ‘ob-ipython)
;; don’t prompt me to confirm everytime I want to evaluate a block
(setq org-confirm-babel-evaluate nil)
;;; display/update images in the buffer after I evaluate
(add-hook ‘org-babel-after-execute-hook ‘org-display-inline-images ‘append)
Troubleshooting
Verify that restarting ipython doesn’t help.
(ob-ipython-kill-kernel)
Open “Python” buffer to see python errors
Toggle elisp debug on error
(toggle-debug-on-error)
My workflow
I settled on workflow of having two buffers opened side by side. On one side I would have opened org file, on the other side I would the have ipython console.
I am experimenting with commands in the ipython console, and I copy back the permanent results I want to remember or share with people into the org src block.
Both windows re-use the same ipython kernel (So they share variables). You may have multiple kernels running. I have code completion and python docstrings in the ipython buffer.
Screenshot
Default ipython configuration
If you want to run some code in each ipython block you can add it to ~/.ipython/profile_default/startup
. Foe example, to avoid adding %matplotlib inline
to each source code block:
echo "%matplotlib inline" >> ~/.ipython/profile_default/startup/66-matplot.py
TODO Configure yasnippet
ob-ipython docs suggest yasnippet for editing code. So far I have been using custom elisp code, but a few things can be nicer about yasnippet.
# -*- mode: snippet -*-
# name: ipython block
# key: py
# —
#+BEGIN_SRC ipython :session ${1::file ${2:$$(let ((temporary-file-directory "./")) (make-temp-file "py" nil ".png"))} }:exports ${3:both}
$0
#+END_SRC
Examples
Org table to pandas and plotting
date | x | y | z |
---|---|---|---|
1 | 1 | 1 | |
2 | 2 | 2 | |
4 | 3 | 3 | |
8 | 4 | 4 | |
16 | 5 | 30 | |
32 | 6 | 40 |
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
df = pd.DataFrame(table[1:], columns=table[0])
df.plot()
Org table -> Pandas -> Org table
You have to write small reusable snippet to print pandas to org format. You can add it to your builtin ipython code snippets. You also need to tell src block to interpret results directly with :results output raw drawer :noweb yes
.
def arr_to_org(arr):
line = "|".join(str(item) for item in arr)
return "|{}|".format(line)
def df_to_org(df):
return "\n".join([arr_to_org(df.columns)] +
[arr_to_org(row) for row in df.values])
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
df = pd.DataFrame(table[1:], columns=table[0])
df.y = df.y.apply(lambda y: y * 2)
print df_to_org(df)
date | x | y | z |
1 | 2 | 1 | |
2 | 4 | 2 | |
4 | 6 | 3 | |
8 | 8 | 4 | |
16 | 10 | 30 | |
32 | 12 | 40 |
Afterwards, you may assign result table to variable, edit it with org spreadsheet capabilities and use in other python script.
TODO Connect to existing ipython kernel
I added support of connecting to existing ipython kernel in https://github.com/gregsexton/ob-ipython/pull/71/files.
You can start an ipython kernel on a server with lots of ram and cpu and connect it to a local lightweight machine running emacs.
Create kernel using (outside of the org mode, as it blocks):
#!/usr/bin/env python import os from ipykernel.kernelapp import IPKernelApp app = IPKernelApp.instance() app.initialize([]) kernel = app.kernel kernel.shell.push({'print_me': 'Running in previously started kernel.'}) app.start()
It will give you a connection json file name. Pass it as a session name.
#+BEGIN_SRC ipython :session kernel-8520.json print print_me #+END_SRC
Running in previously started kernel.
TODO Use global constant
TODO Data frame sharing with org tables
TODO Pass data directly between languages
Create my example based on http://minimallysufficient.github.io/2015/10/24/org-mode-as-an-alternative-to-knitr.html
TODO Different language kernels
This should work:
#+BEGIN_SRC ipython :session :kernel clojure
(+ 1 2)
#+END_SRC
#+RESULTS:
: 3
Examples from other blog posts
C-c C-c
block to open org file directly in Emacs:
(browse-url-emacs
"https://raw.githubusercontent.com/dfeich/org-babel-examples/master/python/pythonbabel.org")
(browse-url-emacs
"https://raw.githubusercontent.com/dfeich/org-babel-examples/master/python/ipython-babel.org")
Additional configuration I plan to do
Problems I did not resolve yet:
TODO ob-ipython-inspect
in popup
Currently it opens a separate buffer. I would prefer a popup.
TODO Configure the org-edit-src-code
to use ipython completion.
Currently, I have code completion only working in ipython buffer. It seems doable to configure it in the edit source block as well.
TODO Capture results from ipython to src block.
To avoid manual copying between ipython buffer and source code block, I could implement an ob-ipython-capture
function, that would add last executed command in the ipython console to the src block. Keyboard macros can work cross-buffer, so this could be simple keyboard macro, but I didn’t try it out yet.
TODO Figure out why SVG doesn’t work
In order to make a svg graphic rather than png, you may specify the output format globally to IPython.
%config InlineBackend.figure_format = 'svg'
Further reading
Official documentation of ob-ipython
Open org directly in Emacs:
(browse-url-emacs
"https://raw.githubusercontent.com/gregsexton/ob-ipython/master/README.org")
Research paper: An Effective Git And Org-Mode Based Workflow For Reproducible Research
Search by DOI 10.1145/2723872.2723881 on sci hub.
Whole research department on CMU ran on org mode
Interesting case of Chemical Engineering department on CMU managed by John R. Kitchin, ran mostly using org mode, with papers, assignments and books written in org mode.
Org mode for managing your server configuration
“Literate devops”