Getting Started
Install and set up software
In order to work through the techniques in this book, you will need to download and install some freely available software. As much as possible, we've tried to make everything compatible with Linux, Mac and Windows PCs. We assume that the majority of our readers will probably be using Windows, so we've taken the approach of getting a Windows XP version working first, then a Mac version and finally a Linux version. We'd be happy to include instructions for specific platforms, especially if you want to send them to us. We've also included peer feedback and commentary on the discussion page. If you run into trouble with our instructions or find something that doesn't work on your platform, please let us know. Since this is very much a work-in-progress, we will occasionally make comments and indicate things that are provisional in purple.
Linux instructions
- Thanks to Karin Dalziel! For more info, read the latest version of her notes.
- These instructions are for Ubuntu 7.10 "Gutsy Gibbon". When these instructions were written, Zotero was not yet compatible with Firefox 3. Since it now is, you can probably work with a later version of Ubuntu. We welcome feedback on this.
- Back up your computer
- Install the following Firefox extensions:
- Web developer toolbar
- Extension Developer's Extension
- If you are not already using it, install Zotero.
- To install Python:
- Click on "system" (upper left of the toolbar) -> Administration -> Synaptic Package Manager
- Go to "Settings" -> "Repositories" and make sure all the boxes are checked under the "Ubuntu software" tab
- Enter in your password
- Search for "Python" or "Python2.5" (searching just for "Python" helps find the most recent packages, and you can see other useful Python related packages).
- Check the packages "python" and "python2.5" (or whatever the latest number is). You might want to add "python2.5-doc" and "python2.5-examples" too.
- Note, Python is already installed for some (all?) Ubuntu installations.
- Create a directory where you will keep your Python programs. One option is to name it "src" and put it in your home folder (/home/username/src/)
- Again, through synaptic, install the package "python-beautifulsoup"
- As with the Mac and PC versions, you can install the program Komodo Edit. Just go to the website, download the Linux version, double click the file to decompress it, and then read the installation instructions for Linux.
- Start Komodo Edit. If you don't see the Toolbox pane on the right hand side, choose View->Tabs->Toolbox. It doesn't matter if the Project pane is open or not. Take some time to familiarize yourself with the layout of the Komodo editor. The Help file is quite good
- Now you need to set up the editor so that you can run Python programs
- Choose Toolbox->Add->New Command. This will open a new dialog window. Rename your command to "Run Python". Under "Command," use the pulldown menu to select
%(python) %f
- and under "Start in," enter
%D
- Click OK. Your new Run Python command should appear in the Toolbox pane
- Alternately, you can use Geany, an integrated development environment available through the Synaptic Package manager. The instructions throughout the tutorials will be slightly different if you do this.
- If you use Geany, instead of the "Run Python" button, you will save your file as "filename.py" and then click the "execute" button at the top instead.
- When you run a program it will look like this:
Mac instructions
- Back up your computer
- If you are not already using it, install the Firefox web browser
- Install the following Firefox extensions:
- Web developer toolbar
- Extension Developer's Extension
- If you are not already using it, install Zotero
- Go to the Python website, download the latest stable release of the Python programming language (Version 2.5.2 as of Mar 2008) and install it
- The OS X installation makes use of a .DMG (Disk Image) file. When this file has finished downloading to your machine, you can double click it to open a folder that contains a ReadMe.txt file and a MacPython installer
- Double click the MacPython.mpkg file to start the universal installer
- Create a directory where you will keep your Python programs (e.g., programming-historian)
- Download the latest version of Beautiful Soup and copy it to the directory where you are going to put your own programs
- Although MacPython includes an integrated development environment, we will be using a free and open source editor called Komodo Edit. Install it from the .DMG file
- Start Komodo. It should look something like this
- If you don't see the Toolbox pane on the right hand side, choose View->Tabs->Toolbox. It doesn't matter if the Project pane is open or not. Take some time to familiarize yourself with the layout of the Komodo editor. The Help file is quite good
- Now you need to set up the editor so that you can run Python programs
- Choose Toolbox->Add->New Command. This will open a new dialog window. Rename your command to "Run Python". Under "Command," use the pulldown menu to select
%(python) %f
- and under "Start in," enter
%D
- Click OK. Your new Run Python command should appear in the Toolbox pane
Windows instructions
- Back up your computer
- If you are not already using it, install the Firefox web browser
- Install the following Firefox extensions:
- Web developer toolbar
- Extension Developer's Extension
- If you are not already using it, install Zotero
- Go to the Python website, download the latest stable release of the Python programming language (Version 2.5.2 as of April 2008) and install it
- Download the latest version of Beautiful Soup and copy it to the Python library directory (usually C:\Python25\Lib)
- Install Komodo Edit
- Start Komodo. It should look something like this
- If you don't see the Toolbox pane on the right hand side, choose View->Tabs->Toolbox. It doesn't matter if the Project pane is open or not. Take some time to familiarize yourself with the layout of the Komodo editor. The Help file is quite good
- Now you need to set up the editor so that you can run Python programs
- Choose Edit->Preferences. This will open a new dialog window.
- Select the Python category and set the "Default Python Interpreter" (it should be C:\Python25\Python.exe)
- If it looks like this, click OK:
- Next choose Toolbox->Add->New Command. This will open a new dialog window. Rename your command to "Run Python". Under "Command," use the pulldown menu to select
%(python) %f
- and under "Start in," enter
%D
- N.B. If you forget the %f in the first command, Python will hang mysteriously because it isn't receiving a program as input
- If it looks like this, click OK:
- Your new command should appear in the Toolbox pane
- N.B. Some people have reported that you have to restart your machine before Python will work with Komodo Edit
"Hello world" in Python
It is traditional to begin programming in a new environment by trying to create a program that says "hello world" and terminates. In keeping with our polyglot approach, we will do this in a number of different ways using a few different programming languages.
The languages that we will be using are all interpreted. This means that there is a special computer program (known as an interpreter) that knows how to follow instructions written in the language. One way to use the interpreter is to store all of your instructions in a file, and then run the interpreter on the file. A file that contains programming language instructions is known as a program. The interpreter will execute each of the instructions that you gave it in your program and then stop. Let's try this.
In Komodo, create a new file, enter the following two-line program and save it as hello-world.py
# hello-world.py
print 'hello world'
You should then be able to double-click the "Run Python" button that you created in the previous step to execute your program. If all went well, it should look something like this:
Notice that the output of your program was printed to the "Command Output" pane.
Interacting with a Python shell
Another way to interact with an interpreter is to use what is known as a shell. You can type in a statement and press the Enter key, and the interpreter will respond to your command. Using a shell is a great way to test statements to make sure that they do what you think they should.
Linux instructions
Linux instructions are pretty much the same as Mac. Just go to Applications (again, upper left of toolbar) -> Accessories -> terminal
Mac instructions
You can run a Python interpreter by going to the Finder and double-clicking on Applications->Utilities->Terminal then typing "python" into the window that opens on your screen. At the Python interpreter prompt, type
print 'hello world'
and press Enter. The computer will respond with
hello world
When we want to represent an interaction with the shell, we will use -> to indicate the shell's response to your command, as shown below:
print 'hello world'
-> hello world
On your screen, it will look more like this:
Windows instructions
You can get access to a Python shell by double-clicking on C:\Python25\python.exe A new window will open on your screen. In the shell window, type
print 'hello world'
and press Enter. The computer will respond with
hello world
When we want to represent an interaction with the shell, we will use -> to indicate the shell's response to your command, as shown below:
print 'hello world'
-> hello world
On your screen, it will look like this:
The reason that we will be using Python for many of our programming tasks is that it is very high-level. It is possible, in other words, to write short programs that accomplish a lot. The shorter the program, the more likely it is for the whole thing to fit on one screen, and the easier it is to keep track of all of it in your mind.
"Hello world" in JavaScript
A second programming language that we will be using is JavaScript. Like Python, JavaScript is an interpreted language. One of the things that makes JavaScript special is that the browser is a JavaScript interpreter. So it is possible to write programs that control the behavior of your browser. In fact, that is what Zotero is, a program written (mostly) in JavaScript that adds some powerful functionality to Firefox.
Being able to program the browser makes it possible to do many interesting things, but it also introduces some important limitations. Imagine if someone else were able to use JavaScript to program your browser so that it erased all of the files on your hard drive? Not good. For this reason, the JavaScript language has no mechanisms for creating, opening, or deleting files. The language also prevents information from being exchanged outside of well-defined and fairly limited boundaries.
Hence our polyglot approach. For some tasks, we will want to use Python, for others, JavaScript. Sometimes we will mix code from both languages to get the best results. Most of the work that we do at the beginning will be in Python, however.
In Firefox, choose Tools->Extension Developer->Javascript Shell. A window should open on your screen. In that window type the following statements and press Enter.
print("hello world");
If all went well, it should look something like this:
Viewing HTML files
When you are working with online sources, much of the time you will be using files that have been marked up with HTML (Hyper Text Markup Language). Your browser already knows how to interpret HTML, which is handy for human readers. Most browsers also let you see the HTML source for any page that you visit. The two images below show a typical web page (the History News Network) and the HTML source used to generate that page, which you can see with the View->Page Source command in Firefox.
When you're working in the browser, you typically don't want or need to see the source for a web page. If you are writing a page of your own, however, it can be very useful to see how other people accomplished a particular effect. You will also study HTML source as you write programs to manipulate web pages or automatically extract information from them.
(To learn more about HTML, you may find it useful at this point to work through the W3 Schools HTML tutorial. Detailed knowledge of HTML isn't necessary to continue reading, but any time that you spend learning HTML will be amply rewarded in your work as a digital historian.)
"Hello World" in HTML
HTML consists of text and tags which typically indicate the beginning and ending of particular elements. Suppose you are formatting a bibliographic entry and you want to indicate the title of a work by italicizing it. In HTML you use em tags ("em" stands for emphasis). So part of your HTML file might look like this
... in Cohen and Rosenzweig's <em>Digital History</em>, for example ...
The simplest HTML file consists of tags which indicate the beginning and end of the whole document, and tags which identify a head and a body within that document. Information about the file usually goes into the head, whereas information that will be displayed on the screen usually goes into the body.
<html>
<head></head>
<body>Hello World!</body>
</html>
You can try creating some HTML code. Go to Komodo, and choose File->New. Copy the code below into the editor. The first line tells the browser what kind of file it is. The html tag has the lang property (for language) set to en (for English). The title tag in the head of the HTML document contains material that is usually displayed in the top bar of a window when the page is being viewed, and in Firefox tabs.
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en">
<head>
<title><!-- Insert your title here --></title>
</head>
<body>
<!-- Insert your content here -->
</body>
</html>
Change both
<!-- Insert your title here -->
and
<!-- Insert your content here -->
to
Hello World!
Save the file as hello-world.html. Now go to Firefox and choose File-> New Tab and then File-> Open File. Choose hello-world.html. Your message should appear in the browser.
"Hello World" in embedded JavaScript
Remember that we said that your browser already knows how to interpret both HTML and JavaScript. In fact, it also understands when you mix the two, as long as you tell it what you are doing. We are going to make extensive use of this capability later on, so let's see how it works.
If you want to include JavaScript within HTML, you use the script tag to tell the browser that you are doing so. You can then embed the script right in the body of your HTML file like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en">
<head>
<title>Hello World! Script</title>
</head>
<body>
<script type="text/javascript">
document.write("Hello World!");
</script>
</body>
</html>
Create a new empty HTML file in Komodo and modify the title and body to match the example above. Save it as hello-world-js.html. When you open it with Firefox, your message should appear as before.
We've now gotten the same result using HTML in two very different ways, so we should be clear about the difference. In the first case we created a very basic static web page using pure HTML. The body of the page says "Hello World!" and nothing else. In the second case, we created a blank HTML page and then ran a short JavaScript program to print "Hello World!" onto that blank page. From the point of view of the person reading the page, they look the same and it may not matter to them how the page was created. From our perspective, however, the difference is crucial, because the second method allows us to embed our JavaScript programs in HTML files which can be viewed in the browser. Anything that can be viewed in the browser can be indexed and annotated with Zotero. This means that you can keep track of the programs that you write and their output using the same system that you use to keep track of the rest of your research.
Back up your work
Once you begin to program, it is crucial that you make backups of your work regularly. Each day before you do any programming, make sure to back up your Zotero database. At the end of a day's work, make another backup of the Zotero database and of any programs that you've written that day. You should back up your whole computer at least weekly, and preferably more frequently.
Keep in touch with us
As you work through the examples in this book you will, no doubt, want to apply similar techniques to your own sources. If you come up with a variation or generalization, e-mail us to let us know about it. Likewise, if you run into trouble or can't figure out how to modify one of our programs so it applies to your situation, we'd like to hear from you. We can try to help you get something running, or try to add some new material to The Programming Historian to cover situations like yours.
Other resources
As you're working through the tutorials here, you will want to have a few key resources open in your browser. Until you become familiar with the programming languages that we're using, it is nice to have a few different introductory treatments to look at. There are many good online resources like
As you proceed (or if you already have some programming experience) you'll probably prefer more general references like:
- Python for Programmers
- Python documentation page
- Python tutorial
- Python library reference
- Pilgrim, Dive into Python
We also like to have a few printed books ready-to-hand, especially
- Lutz, Learning Python
- Lutz, Programming Python
- Martelli, Ravenscroft and Ascher, Python Cookbook
Other references will be cited as we make use of them.
Suggested readings
Some of our readers have expressed an interest in using The Programming Historian for formal or informal coursework. To get a solid foundation in Python programming, it is probably best to pair these exercises with some additional readings. We like Mark Lutz's Learning Python, 3rd ed. Sebastopol, CA: O'Reilly, 2008.
- Lutz, Learning Python
- (optional) Ch. 1: A Python Q&A Session
- Ch. 2: How Python Runs Programs
- Ch. 3: How You Run Programs












