How to Write a Zotero Translator:
A Practical Beginners Guide for Humanists

By: Adam Crymble

 

Chapter 1: Introduction

 next button

This Chapter

Zotero

The citation management program Zotero is a wonderful tool for researchers everywhere. Citations from the web may be "grabbed" simply by clicking on a book iconnewspaper icondocument iconfolder icon in your web browser address bar. The citation information displayed on the screen is then saved to your Zotero collection with little or no additional effort. However, for this to work, each and every website must either follow standardized metadata guidelines, or must have its own personal "translator" that tells Zotero which words on the screen correspond with which bibliographic fields. Computers are stupid; translators make them smart.

Most users who know about the citation capture feature are enthralled by it and want more. The Zotero forums receive multiple requests daily from users hoping their favourite site will be given this capability. Unfortunately, there just aren't enough Zotero programmers around to keep up with the demand for translators, and more intensive coding-projects take priority.

Luckily, Zotero translators are fairly easy to create (as far as computer programming goes). This guide seeks to help take some of that load away from the Zotero staff by teaching the community of Zotero users how to create their own translators and to share them with others.

Who is this guide for?

Anyone! No previous experience required!

In fact, the guide will assume that you have no programming experience whatsoever. You just need to have spent some time using Zotero and grabbing citations. (Check out the demo video if you are not familiar with this feature).

This guide uses plain language throughout and is written for people who are not programmers. You just need to be comfortable with computers, able to think logically, and not afraid to do some Google searching when you find a word or come across an error message you don't recognize.

Everything will be explained with a new user in mind. Therefore, you might find that you do not need to read all sections. If you are unsure whether or not you should read a section or skip it, it is probably a good idea to jump ahead to the end of the chapter and read the "What you should understand before moving on." If in doubt, it is probably best to read the chapter and refresh your knowledge. Skipping too much background will just leave you frustrated when you start coding.

If you fit into one of the following categories, chances are you are a great candidate for writing a translator:

  • Website administrator of a searchable database
  • Librarian or archivist
  • Researcher or journalist
  • Graduate student
  • Someone who wants to learn JavaScript

What you will learn

When you are finished with this guide, you should not only know enough to create your own working Zotero translator, but you should understand the following concepts and computer languages:

  • Basic HTML
  • the Document Object Model (DOM)
  • XPaths
  • JavaScript Regular Expressions (RegExp)
  • Basic JavaScript

Translators are written in a computer language called "JavaScript" so your work will involve learning to do some basic JavaScript programming.

You will also learn how to use a number of programs (all free and reputable). Among these are Firefox, Zotero, Scaffold, Komodo Edit and Solvent.

You will not learn how to embed JavaScript into HTML documents or to do DOM scripting; however, after learning how to write a Zotero translator you will be well on your way to understanding these concepts.

Benefits of Writing a Translator

If you are a web administrator or work for a company that maintains a searchable database, having a Zotero translator will increase your site's usability. Zotero has over a million users, many of whom judge a website's usability in part by whether or not they can automatically download citations. Adding this capability sends a message to your users that you believe their experience while using your site is important.

Writing your own translator is a much more proactive way to get your site included in Zotero than submitting a request on the Zotero forums.

If you are an end user rather than an administrator, writing your own translator allows you to customize it to your exact needs. If you only want Zotero to save the title and a copy of a pdf from a website, you can set your translator to do this. If you want all possible information from a site, you can arrange this as well.

If the website in question is a rather obscure database that may be password protected, you will have to submit your own translator. This is because, even if a Zotero programmer has time to work on your request, without access to the database he or she is powerless to write the translator.

Finally, Zotero is an open source, freeware project, adding to the software is in the spirit of its creation. Once you've finished a new translator you can submit it to the Zotero team so that everyone can benefit from the fruits of your labour.

The Three Major Types of Translators

  • Scrapers
  • Metadata Converters
  • Exporters

Scrapers

This guide will teach you how to create a "Scraper."

The advantage of a Scraper is that it is the only kind of translator that can be used on any website. What a scraper does is take (scrape) words off the webpage and tells Zotero which words correlate to which part of the citation. It's sort of like cutting and pasting in a text document, but by using code rather than keystrokes.

Another advantage of a scraper is that you can tailor it to your exact needs by choosing to gather all, some, or very little of the information available. Scrapers are easy to learn and to make.

The disadvantage of a scraper is that it relies heavily on format and the consistency of the webpage's creator. If a Webmaster decides to change the structure of a web page even slightly, you will have to alter your scraper to reflect this. However, these changes happen infrequently and once released to all Zotero users, you may find that others take an interest in your translator and help to keep it up to date when needed.

The other two types of translators are powerful and accurate, but are only possible to use under certain conditions that are almost always out of your control unless you are the website's administrator. We will look at how these work, but our focus will be on scrapers.

Metadata Converters

These translators take information that a Webmaster has voluntarily embedded in a webpage, known as metadata, and organizes it into the correct Zotero fields. You can think of metadata as invisible ink that only appears if you know how to find it. Obviously the catch here is that the Webmaster must have included this information in the first place.

The practice of including metadata is becoming more common, especially in databases. In the past couple of years, how people display metadata has become more standardized. Because of this standardization, Zotero already supports most sites that have it. Some of the most commonly used metadata convensions are:

If you are a website administrator and want your site to automatically be Zotero compliant, it is best to use one of these systems rather than writing a translator; they are standardized and reliable. Let me repeat that. If you are an administrator and can include standardized metadata, stop reading this guide and add the data! If you are not a website administrator, you might make this recommendation to the site in question.

Exporters

Exporters also rely on a website providing certain information. In this case, we need a link that allows us to download a citation. For the most part, there are very few export formats. You may have come across them before, labeled as "MARC display" buttons, or a "RefWorks" button. These options are most common in library catalogues and on academic journal databases. This type of translator actually uses two translators, one embedded in the other. This can get quite complicated so we will not cover it in detail in this guide. However, if you do need to write one of these and need a few hints to get started, here are a few (rather technical) pointers. If you do not need to write an exporter please feel free to ignore this section.

  1. Use an XPath to grab the link URL of the citation download.
  2. Use HTTP get to download the page found at that URL.
  3. Call the translator for that type of citation (ie, MARC, Bibtex, etc) to interpret the citation.
  4. Save results into Zotero.
  5. If you are lost, check one of the many Library Catalogue translators by launching Scaffold and loading the translator code.

You do not need to understand the latter two types of translators or the contents of the paragraph above to be successful at writing scrapers.

Note: from this point forward, we will be using the word "Translator" exclusively when referring to "scrapers." Most of the terminology used to explain how to write a scraper is the same as would be used to explain how to write any other translator. The other types of translators will be referenced explicitly if used.

A few notes before we begin:

As with anything new and computer-related, it is important that you backup your entire computer before you start. Coding is generally a safe practice but you never know when something could go wrong and your hard work gets wiped out! Best to backup now than be sorry later.

There are many short cuts available when writing JavaScript and experienced programmers may tell you to use these. You are free to learn the short cuts if you wish; however, this guide will not use them, for simplicity's sake.

This is NOT a guide detailing how to use Zotero. It is a guide detailing how to write code to extend the usefulness of Zotero.