API Workshop Combinatorial Exercises

Scroll this

Event Date: Oct 16 2009 – Oct 17 2009
Event Website: Event Webpage
City: Toronto, Ontario
Country: Canada
Primary Contact Name: William J. Turkel
Contact Email: william.j.turkel@gmail.com

Our task: combine public, researchers, librarians, content providers, content in new ways. Our approach: leverage network effects, diversity of participants, parallel search algorithms, combinatorics.

Five to seven people sit around a table. Each gets to play one card, subject to the following constraints:

  • Everyone plays one and only one card
  • No two roles or projects on the table can match
  • At least two of the cards have to represent projects
  • Only one super power / secret identity card can be played in a given round (these are optional)

Finding the right balance of cards will require some quick negotiation. The group shouldn’t spend more than a minute or two on this.

One person is designated as the notetaker. He or she will first record each of the participants in the round, and the card that each has played. The notetaker will then do their best to record the ensuing discussion. (Please print! We have to transcribe and edit these notes to put them online).

The group has exactly twenty minutes to create a plausible scenario about the cards that have been played. The scenario must incorporate both projects, one or more APIs, and as many roles as possible. What you are doing here is describing one possible future project in the digital humanities. Try to include answers to at least some of the following questions

  • What does the system look like to an ordinary user? To a mashup writer? To an individual researcher? To a research community? To an individual student? To a high school or undergraduate class? To a government worker?
  • What kinds of data or information are provided by online repositories? In what form?
  • What kind of search or discovery features are supported by the API(s)?
  • What kinds of tools are available to programmers who want to use the system?
  • What kinds of tools or other research support are provided to users?
  • Is there any provision for sharing or online collaboration? Any “Web 2.0” features?
  • Is the system open to being mashed-up for non-research or non-approved purposes?
  • Is it possible to incorporate external APIs (maps, timelines, gazetteers, etc.) to provide more functionality? How does this work?
  • Who pays for the system to be built or maintained?
  • What parts of the system can be shared as open source, open content or open access?

When twenty minutes has elapsed, the notetaker brings the notes up to the front of the room, and each of the participants joins a different table for the next round. N.B. There are almost 600,000 different ways to choose six people from a set of thirty.

Canadian Writing Research Collaborative x Zotero
Tim Hitchcock (role: historian), Marcel Fortin (role: GIS specialist), David Bamman (role: textual scholar), Bill Denton (role: librarian),
Dan Cohen (project: Center for History and New Media), Susan Brown (project: Canadian Writing Research Collaborative/Orlando),
WilliamWueppelmann (role: programmer)

How might CWRC use Zotero to create a collaborative database of citations? Dan: you could easily use the Zotero server’s group functionality and then export that to the CWRC site. Jeremy Boggs is creating a WordPress plugin to show a Zotero collection on WP-based sites. You could also create a plugin to the Zotero client to cc citations to the CWRC server. So, a variety of ways to do this. Susan: also wants to be able to geolocate items to maps texts over time and
where they get published.

How do we add geographical data to Zotero? Is it stored on other services? Do you have to modify the Zotero data model so that you don’t just stick geographical info in the “extra” field of Dublin Core. Marcel: you would link the Zotero item to info on a dedicated
GIS server. We don’t want to burden the average scholar with in-your-face GIS info.

Importance of linked data: good URIs that are exposed and predictable and can be linked other things. So, in this case, the Zotero URI for the object is linked to GIS URI on Marcel’s server.

Also, importance of FRBRizing all of this.

Maintenance: each service is responsible for its own persistence. weak linkages between the services–just based on unique ids.

Discussed the Open Annotation model (annotation, target, location, etc).

Great Unsolved Mysteries in Canadian History x Online MSc

Walter Lewis (note-taker), John Lutz (Mysteries), Jan Oosthoek (Online MSc) and others.

Mysteries is student oriented (Grade 5 to 2nd year University), evidence-based problem solving exercise. Outputs are largely wrapped in HTML and embed maps, scanned and transcribed documents, images, and 3d models specific to the problem sets.

The data challenge in the Online MSC is to pull in structured data, some of it abstract and index data to support graduate education. Inputs are largely XML files, or files transformed to XML. Major input is the National Library of Scotland.

Discussion centred around the desire of the Mysteries project to link to additional data points (external census data, maps, biographies etc.) and the collections of institutions like the National Library of Scotland with large collections of digital maps (the scope of the discussion didn’t confine itself to the single data source.

Other user perspectives entering the discussion entering game development and dramatic adaptation, both of which were naturally drawn to the Mysteries packages. In particular, discussion focused on the fact that gaming is not unlike a series of dramatic scenarios based on different possibilities supported by the mystery evidence. Could the content of the Unsolved Mysteries be exposed in a standard and reliable way that a gaming engine could draw on them dynamically rather than having to internalize the material?

Our Ontario x TAPoR
Brian Bell (role: librarian), James Chartrand (project:TAPoR), Dan Cohen (role: historian), Stéfan Sinclair (role: HPC advocate), Alan
MacEachern (role: general public), Walter Lewis (Our Ontario)

Discussed the use of SOLR and how many projects have used it recently. But Stéfan notes that Lucene (underneath SOLR) was not made for text analysis–it was made for search. Also discussed: normalization of data.

Might need different optimized data for different roles: general public might be very happy with the search potential of SOLR-enabled
collection; historian wishing to do text analysis might need other version of the db.

One use case for our combination of collections and roles: TAPoR used to create visualizations of Our Ontario (OO) that some segment of the audience doing a search on OO might consider worth pursuing further. In this scenario, the advanced research (text mining) is presented in widget/sidebar form to the general access/search interface, as opposed to a specialized “researcher interface” (which might not be used much)

TAPoR is fast enough to handle real-time text analysis of material from OO.

What types of data can OO provide: metadata in a variety of formats (dublin core, etc), full text as a package

We focused on the web-based experience, so it’s a little unclear about how other programmers would add on. but OO could hand off a slice of their db

Web 2.0 features might be as the ability to tweet/blog what you’ve discovered using the system

All of this is open source/open content but OO is an aggregate of data from other sources so there’s some trickiness about full sharing w/o checking with root sources.

Importance of building rights (creative commons) into these collections from the start

NiCHE encourages comments and constructive discussion of our articles. We reserve the right to delete comments that fail to meet our guidelines including comments under aliases, or that contain spam, harassment, or attacks on an individual.