Overview
What is this?
In short, setting up Python (and other things) for scientific computing and research can be entirely more complicated than necessary. With that said, this aims to be a short how-to guide pointing to some resources that can make life much easier. This post is geared towards Political Scientists coming from either 1) using R as a programming language or 2) having no programming and minimal computing experience. Most of the things listed here have been attempted by me, but I make no guarantees that anything will work properly or won’t mess something up when attempted. As with anything proceed with caution and at your own risk.
This how-to is mainly geared towards OS X, but many of the suggestions should also work on Linux (and are probably easier). I don’t have any experience setting up Windows and would probably suggest looking into dual booting Linux (see here for more). Downloading Ubuntu to a CD and setting up a dual boot is extremely easy.
I’ll be adding to this as I have time and think of different things that have helped me. I know this post is long, but there is a large amount of information to share, and I think it is easier to get a lot of it in one place, rather than spread out.
Basics
I’m going to start from the assumption that if you’re reading this you’ve never dealt with a command-line interface, or if you have that you’ve only been briefly exposed (maybe through things like R or Stata). So, before working with these sorts of things it’s helpful to get acquainted with the Terminal (Command Prompt in Windows). You should be able to find this in your Applications folder in the Utilities folder. Go ahead and drag the app to your dock. It will make life easier for you to have it here.
If you open it up you will see something like
John-B-MacBook-Pro:~ john$
The word (john here) right before the $ is your username. There are some basic commands for working with the Terminal:
lsshows you all of the files and folders in your current working directoryls -ashows all files and folders including those that are hiddencdallows you to move from one location to anothermvlets you move filescpis copying
and many, many more. A basic workflow is as below:
$ ls
Applications Desktop Documents
$ cd Documents
$ ls -a
test.txt .hidden.txt
$ mv test.txt /Users/johnbeieler/Desktop
When in doubt Google what you want to do followed by “Terminal.” So, something like “remove file terminal” or “copy file terminal.” A word of warning, you’re “closer to the metal” when using a command-line interface. This means that you have a lot of power and flexibility when working with things, but it also means you have the ability to completely wipe your hard drive if you type the wrong command. Be careful.
You should probably go ahead and install XCode. This is included on the install CD that came with your Mac, or it can be downloaded from the Mac App Store. Make sure that you’re using the correct version for your version of OS X. This will install some things that will be necessary to work with later. Linux has an awesome concept called package managers, which allow you to easily install different applications and, well, packages by typing something like “sudo apt-get package” into the Terminal and things are automagically downloaded and installed. Macs lack this functionality. But! Some enterprising individuals have come up with a way to help. By heading to http://mxcl.github.com/homebrew/ you can download a package manager that makes life much easier. Feel free to look around for utilities that can be installed using homebrew that might be of use to you. As a word of advice, if you are on the verge of installing something, first check and see if it is available using homebrew since brew keeps things nice and organized in your usr/local folder instead of spread all over your computer.
Python
OS X comes with Python preloaded and is required by the operating system for many functions. This is good and bad. Good since you can type
$ python
and get up and running in an interactive session. It’s also bad because the structure of the Python installation on OS X can create some difficulties with certain libraries. This leaves two options: First, you can go with the default Python implementation. This will necessitate (sort of) the use of the Scipy Superpack http://fonnesbeck.github.com/ScipySuperpack/. The Superpack installs nearly every awesome Python library that your scientific researcher heart could desire. As a brief rundown of what each package does:
Numpy and Scipy
The heart of numerical computing in Python. These two libraries give array and matrix functions along with many other cool things. Numpy is short for Numeric Python and Scipy is short for Scientific Python. Many other libraries in Python are dependent on these. Much (digital) ink has been spilled on using these two so feel free to search around for more on how an array is different than a matrix in Numpy (hint: You should probably use an array).
Matplotlib
Plotting functions in Python. Allows you to make pretty graphs.
IPython
Stands for Interactive Python. When running python from the Terminal you should
type ipython instead of python. IPython gives many different
magic functions and has all kinds of need goodies in it that generally make life
easier.
Pandas
The best thing since sliced bread and pockets on jeans. Allows the R dataframe functionality in Python. Supports complex indexing for panel data, creation of various statistics such as moving averages, includes various read and write functionalities. It has some awesome documentation so go check it out.
Statsmodels
Statistical models in Python. This one is pretty self explanatory but is tremendously useful and is more intuitive than R in many ways.
Scikit-learn
Machine learning in Python. Has some of the most comprehensive documentation around, including a series of tutorials on how to get started with machine learning in general.
PyMC
Bayesian inference in Python. MCMC and more.
Other Utilities
nose, readline and DateUtils. Things that are useful for other packages. You can read up on these more if you would like. Of importance, however, is nose. Nose is a testing suite for Python that allows you to see if anything is wonky in your installation. You can (maybe) get away with skipping these, but it never hurts. Look up the different tests for each of the utilities if you want to run them.
In all honesty it is probably easiest to use the Superpack. The other option is to download another Python distrubution that has all of these things included plus a “vanilla” (non-Apple) build of Python. Some examples of these are:
- Enthought Python https://enthought.com/products/edudownload.php
- Python(x,y) https://code.google.com/p/pythonxy/
But really, just use the Superpack.
Other things
There are some other things that are useful (read: necessary) to use Python in any meaningful way. One of the most important is setuptools.
To install setuptools:
1) Go to http://pypi.python.org/pypi/setuptools. Download the .egg file located towards the bottom of the page. Since your version is 2.7 you would download (as of 08/29/2012) setuptools-0.6c11-py2.7.egg.
2) Place it on your desktop. Do NOT change the name.
3) Cd to your desktop in the Terminal:
$ cd Desktop
$ sh setuptools-0.6c11-py2.7.egg
4) That should be it.
What setuptools allows you to do is type easy_install package and it will
install that package for your use in Python. Some people suggest that a program
called pip is better because it has additional features such as the ability to
easily uninstall programs. To install pip you just type:
$ easy_install pip
That’s right. Pip is installed using easy_install.
Let’s try it out for a library called Scrapy. Scrapy is described as
Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
Sounds pretty cool. All you do is open up Terminal and type:
$ easy_install Scrapy
Alternatively if you’re using pip:
$ pip install Scrapy
Miscellany
There are some other things that are useful to have when doing this kind of programming/coding/scripting type work that aren’t related directly to working with Python.
Version Control
First is git and github. Git is what’s called a “version control system.” Have you ever been working on a paper and saved your work only to realize that you wrote over some changes that you didn’t mean to? Me too. Git keeps a detailed list of the versions of a file, including any changes or additions made to a specific version, and allows you to roll back to a previous version. So, if you wrote over a file and want to revert back you just have to find the version you want.
Github is a “social coding platform.” It’s basically git on the internet. You’re reading this on github right now. Normally git and github involve some (semi) complicated Terminal commands. Github has, however, provided a nice, functional program to use git and github. The Mac version is here http://mac.github.com/.
Text Editors
Text editors can cause a surprising amount of consternation on the internet. A decent editor can make your life easier with features such as syntax highlighting, autoindentation, tab completion, and other features. I won’t say which editor to use, but will give you a list of some of the big ones. (This was written using vim, MacVim to be specific).
Gedit: Standard on Linux distros. Simple. Includes some syntax highlighting.
Emacs: One of the big two editors. Built on LISP. Can basically be whatever you want it to be. Uses extensive, and sometimes complicated, key combinations to get things done.
Vim: The second of the big two. Has different modes such as insert and normal. Takes some getting used to. Has a ton of different add-ons.
Sublime Text 2: Probably more straightforward than Emacs or Vim, but more powerful than gedit. More modern than either Emacs or Vim. Free to try for a bit.
Other
I’ll add a shameless plug here for some code that I wrote, py_apsrtable. This is designed to provide easy functions to take output from Python statistical packages and turn it into pretty LaTeX tables. To install
$ pip install py_apsrtable
Documentation is on github.
I will also add as a final point that it is probably nice to take a look at
“The Zen of Python” by opening up a python shell, ipython, and typing
import this
Next, take a look at the Python style guide contained in PEP8 (PythonEnhancement Proposal 8). Following these guidelines will allow your code to be consistent with the prevailing style for Python code.
I know this was slightly rambling, and there are numerous points that I have missed, but I hope this provides some information that will be useful to those trying to get setup with Python for research. If you have any questions or suggestions, please feel free to contact me.