Introduction to Mashups and APIs

Event Details

Date: October 16 – 17, 2009
Categories: event

Raymond Yee led an introductory group session on mashups and APIs, drawing examples from Google Maps and Flickr. We also had a round of lightening presentations.

Raymond Yee

How do we allow ordinary users to make mashups?

Commercial APIs are not purely open. Often they encourage a read-only mode rather than a read-write mode. They constrain opportunities for use, selectively closing things down.

It really matters what you deliver in terms of data because that constrains what people can do when they share or reuse. e.g., using a date stamp.

Why isn’t it as fun to play with humanities resources as it is to play with Flickr? We are behind the times in terms of trying to brand data and close access, and this is holding us back. What need is there for resource ownership? Is it a need for authority? Identity? The question goes to the heart of the funding system and the ways that academics earn a living from decade to decade. How do we give things away and still benefit? Still maintain a brand or an identity? Researchers create data and then sit on it because they are in competition with other researchers. Also issues of licensing, copyright, orphan works: for repositories that contain material that they do not own, access is one thing, but sharing is another.

Data may not be well documented. Often we don’t know the assumptions under which it was created; we don’t have data dictionaries. e.g., Wordstar format, 80 columns, ad hoc compression schemes.

Reliability, continuity, robustness. Google can do it, but we can’t. Project management and sustainability are key issues.

If we give away data we can monitor how it is used. When other people hack it, we can’t monitor that as easily. Sometimes you give away something that no one wants.

We have the opportunity to bring in philanthropists, to teach us how to give away things (and maybe give us some money in the process.)

Working with colleagues on Recovery.gov (US Government transparency re: Recovery Act). Web services, syndication feeds: how to help Americans understand what is happening to their money.

Discussion

Cohen: a successful API enables uses of content or services that the producer of that content or service hasn’t anticipated. Ideally, it also enables a wider range of developers to produce those new applications.
Krishnan: the best APIs are two way… e.g., Google Maps doesn’t allow reverse data flow to their service.
Ramsay: most humanities content providers are overly protective of their content.
Hitchcock: agrees that there’s a economic/political issue here before you get to the tech.
Chudnov: Library of Congress is in the business of giving everything away.
Rockwell: worried about the robustness of academic APIs. Can we depend on them in the long run?

Walter Lewis

Our Ontario (inherited from Alouette Canada). Plug-in widgets, e.g., take lat/lon from records and send KML to Google Earth. Canadian index, Ontario view. Change URL to change the view: HTML, RSS, XML Dublin Core, RDF, JSON, MODS. Solr, UnAPI, Apache Lucene. Solr is a layer on top of Lucene that allows efficient faceting.

Dan Cohen

Zotero and Omeka. Plug-in ready from the start. Client Firefox extension, SQLite, data model. Can build a Firefox extension to their Zotero extension (extensions can communicate with one another). Notifiers are sent out to all of the Firefox environment; short JavaScript can act when it hears notification. Sample hello world plug-in, in spirit of unAPI, can interact with Zotero through the export function. The whole website of Zotero server is built off of the API. Public API documentation.

Omeka is ‘WordPress for digital collections’. Plug-ins. PHP. Ask Jeremy Boggs for help with Omeka plug-ins. Data model, Dublin Core, modeled around items. Output JSON, XML, RSS2. Import CSV.

http://dev.omeka.org/apiworkshop

Josh Greenberg and Shekhar Krishnan

1) Digital Gallery. Undocumented JSON, Atom API. 750,000 digitized images with robust metadata.

http://digitalgallery.nypl.org

2) Geodata. NYPL map digitizer / rectifier / warper. Easy to pull data out of the system in KML etc. GeoRSS feed. Third phase gazeteer, consume GeoRSS from Flickr, Zotero. Voicethread.com for K12 education allows students to make annotated presentations.

http://maps.nypl.org

3) Relation. Yaddo exhibit never really went anywhere. FOAF; relationship and social network views.

Doug Reside

1) O3d. API for 3D within browser. Need a pretty good graphics card. Build something like Wikipedia for 3D models of historic theatres (cf Sketchup).

http://o3d.googlecode.com

2) AXE. AJAX XML Encoder. Adding it into Zotero. Link one URI to another, annotate audio etc. Deep tagging of multimedia. Uses Quicktime API (via JavaScript), Flash video API. Four layers: a) content, b) transcription / encoding, c) cloud of widgets – API level, d) user-generated data. Each layer is agnostic about the layers above it.

Stéfan Sinclair and Geoffrey Rockwell

VOYEUR. Content-oriented (e.g., Flickr, Twitter) vs. tool-oriented (TAPoR) APIs. Large scale. First, create or customize tool. Export as HTML code that can be embedded in an iframe.

http://voyeurtools.org

Dan Chudnov

“Web as API”. The web makes a really good API all by itself. If you do that you find yourself drawn to linked data. Build apps that scale in weird ways. e.g., World Digital Library (wdl.org) 9000 requests per second, 1.5 Gigabits throughput. e.g., Chronicling America makes newspaper data more generally available; 140,000 title records, 1.5 million pages, both OCR and images. On Chronicling America try searching for content, finding thumbnails and viewing source. Clean URIs spell out what they do and are guaranteed to be stable. Clean URIs make the site amenable to caching and are friendly to bots. Interesting links in the header when you view sources; rel links, HTML standards. Subscribe to batch feed–allows text mining, crawling, shows completely raw view. You can use wget -m to slurp the raw data. Data views, alternate views; all pages that make an issue; all issues that make a batch, etc.

Google for Tim Berners-Lee “link data issues” document. Important to use URIs as names of things, use HTTP URIs, give people useful info when they visit, give them links to related / interesting things.