For whatever reason, these two fields don't follow the same pattern as other fields. This was likely an oversight made when Zotero was first coded. Use the following pattern to save to these fields:
newItem.abstractNote = "content here";
newItem.archivelocation = "content here";
You can create a RegExp to single out anything imaginable in Strings. However, some are more common than others. Here are a few that you might find particularly useful.
(/^s+/g, ''); //finds all instances of white space at the start of a string.
(\s*$/g, ''); //finds all instances of white space at the end of a string.
(/^\s*|\s*$/g, ''') //finds all instances of white space at the beginning or end of a string.
(/\s+/g, '') //finds any spaces.
(/\s\s+/g, '') //finds any instances of more than one space side by side.
(/\d+/g, '') //finds any digits.
(/\d\d\d\d+/g, '') //finds any instances of four digits in a row. Useful for dates.
(/\W+/g, '') //finds any non-alphanumeric character
(/n+/g, ''); //finds any carriage returns (new lines).
(/\[|\]+/g. ''); //finds any square bracket character.
(/\;+/g, ''); //finds any semicolons.
You can use these as building blocks to create RegExps of your own that meet your site's specific needs. RegExps can be tricky. If you are having a lot of trouble, try posting a question on a message board. Sometimes you will have better results by using a String rather than a RegExp. If you decide to do this, make sure what you come up with will work across all pages, not just the first one you work on.
x.replace(": ", '');
rather than
x.replace(/\:\s/,'');
Websites seem to have a million different ways to write the names of their authors. This often requires reformatting on your part. Here are a few common problems and their solutions.
If the website posts the name of the author ALL IN CAPS, you need to convert this to a more suitable format before saving into Zotero, else your users will have problems with their saved citations.
This requires cleaning using a couple of Loops and some String Methods.
var authorName = "ADAM CRYMBLE";
var words = authorName.split(/\s/);
var authorFixed = '';
for (var i in words) {
words[i] = words[i][0].toUpperCase() + words[i].substr(1).toLowerCase();
Zotero.debug(words[i]);
authorFixed = authorFixed + words[i] + ' ';
Zotero.debug('authorFixed = ' + authorFixed);
}
newItem.creators.push(Zotero.Utilities.cleanAuthor(authorFixed, 'author'));
This code splits the author's name into an Array containing one item for each word in the name. A For Loop then takes each word individually, capitalizes the first letter and prints the rest in lower case. This reformatted word is then saved to the end of "authorFixed." Once all the words have been reformatted the author is then saved into Zotero.
What do you do if your author's name is "Crymble, Adam"? Reorder the words before saving. In this case, split the name at the comma, then resave it into a Simple Variable starting at the last name.
var authorName = "Crymble, Adam";
var words = authorName.split(", ");
var authorFixed = '';
for (i = words.length-1; i > -1; i--) {
authorFixed = authorFixed + words[i] + ' ';
}
newItem.creators.push(Zotero.Utilities.cleanAuthor(authorFixed, 'author'));
Use Zotero.debug() at various points in this block of code to see what is contained in each Variable at a given moment. This will help you understand exactly what is happening at each stage so you can tailor the code to your specific needs.
There is an alternative way to enter an author's name into Zotero than the standard :
newItem.creators.push(Zotero.Utilities.cleanAuthor(author, "author"));
If you need to be able to enter the first or last name separately, you can do so like this:
newItem.creators.push({lastName: x, firstName: y , creatorType: "author"});
The variables "x" and "y" would hold the values you wanted put into the lastName and firstName fields respectively. This can be particularly helpful for authors with more than one last name. When Zotero comes across a name with more than three words, it automatically assumes only the last word is the surname. In cases like "Peter Van der Meer" it does this:
'creators' ...
'0' ...
'firstName' => "Peter Van der"
'lastName' => "Meer"
'creatorType' => "author"
Zotero has assumed Peter has a middle name: "Van der". When the citation is created, Peter will be going by "Meer, Peter Van der". If this isn't what you want, you can use an If Statement to check for cases like Peter and then save them using the manual method.
var authorName = "Peter Van der Meer";
if (authorName.match("Van ")} {
var w = authorName.indexOf("Van ");
var x = authorName.substr(0, w-1);
var y = authorName.substr(w);
newItem.creators.push({lastName: x, firstName: y , creatorType: "author"});
}
You could use a similar technique for other situations like this.
This is common for repositories created in Canada or Europe where there is more than one official language. Often you will be able to toggle between English and foreign language versions of the site while looking at the single entry page. This might involve clicking a flag, or a "version francaise" link. Making your translator bi-lingual or multi-lingual is easier than you might think. As long as only the language changes on the various versions of the site and the format stays the same, you can use If Statements to convert the headers to English, then translate the site as normal. The easiest place to do this is in the While Loop that you populate the "items" Object. (This explanation builds upon the template used in the Sample Page tutorial. If you are having difficulty following along, check out Chapter 16 to see how the template was created).
while (headers = myXPathObject.iterateNext()) {
headersTemp = headers.textContent.replace(/\s+/g, '');
contents = myXPathObject2.iterateNext().textContent.replace(/^\s*|\s*$/g, '');
if (headersTemp == "titre:") {
headersTemp = "title:";
} else if (headersTemp == "AuteurPrincipal:") {
headersTemp = "PrincipalAuthor:"
}
items[headersTemp]=contents;
}
Continue to add Else If to the If Statement for each case where the spelling differs between English and the foreign language version of the site. Assuming your translator worked in English, it should now work for the foreign language version as well. If you do not see the proper Icon in the address bar when viewing the foreign language version of the page, your "target" on the "Meta Data" tab of Scaffold and possibly your "detectWeb" Function will have to be adjusted so that the foreign versions are picked up by Zotero.
If your repository has more than one content type that the "detectWeb" Function can distinguish between, you must make a few quick changes to your "scrape" Function. In the Sample Page tutorial, you created a "book" item type:
var newItem = new Zotero.Item('book');
If your page has multiple content types, you must use an If / Else If statement to create the proper Zotero.item for the current page.
if (detectWeb(doc, url) == "book") {
var newItem = new Zotero.Item("book");
} else if (detectWeb(doc, url) == "audioRecording") {
var newItem = new Zotero.Item("audioRecording");
} else if (detectWeb(doc, url) == "videoRecording") {
var newItem = new Zotero.Item("videoRecording");
} else if (detectWeb(doc, url) == "newspaperArticle") {
var newItem = new Zotero.Item("newspaperArticle");
}
This will open up the proper fields for that entry type. You may have noticed that a "book" entry in Zotero does not have a "Publication" field, but a "journalArticle" does. Having the correct entry type also ensures users get the proper citation style. Note that if the different entry types have different formats for types of data, you will have to use If Statements in the "scrape" function that performs the correct actions depending on what entry type is being saved.
For example, if the "book" pages display the title all in CAPITAL LETTERS and the "audioRecording" pages displays the title all in lower case letters, use an If Statement to check which entry type you are currently saving, then add the appropriate code for each.
if (detectWeb(doc, url) == "book"){
//insert code to reformat title from "CAPS" to "Proper Format"
} else if (detectWeb(doc, url) == "audioRecording")
//insert code to reformat title from "lower case" to "Proper Format"
}
This will have to be done for all cases where format changes across entry type.
Because a Zotero "scraper" relies heavily on a website having consistent format, it will save you a lot of stress to choose the site you want to translate carefully. Just because a site has several content types or formats does not mean you cannot translate it, it just might mean that you will require a lot more code. And just because a site looks new does not mean it will be easy to translate. For your sanity's sake, your first translator should probably adhere to most, if not all of the following criteria. You can start to translate more difficult sites once you've got the hang of things.
If you do not have a choice about which webpage you need to translate, it may still be possible. One solution might be to find another website that has the same problem that already has a translator built for it and take a look at its translator. You can do this by opening Scaffold (under Tools in Firefox) and click on the "Load from Database" button in the top left corner.
If you are the administrator of the site in question and your site fails to meet more than one of these criteria, you might consider making some changes to your site or adding a metadata system, discussed in Chapter 1.
Newspapers generally make excellent first attempts. Consider writing a translator for your local paper before moving on to journal repositories.
Good Luck!