General mailing list for discussions and development of PeerLibrary and related software.

List archive Help


Re: [PeerLibrary dev] Annotations and Highlights


Chronological Thread 
  • From: Mitar < >
  • To:
  • Subject: Re: [PeerLibrary dev] Annotations and Highlights
  • Date: Tue, 16 Sep 2014 01:27:43 -0700

Hi!

> My name is Larry, and I'm kind of a newcomer to this project. I was just
> wondering, how are the PDF highlights and annotations stored? Are they
> embedded in the PDFs themselves, or are they stored separately in MongoDB
> and then fetched and rendered client side?

They are stored in the database. You can see their schema here:

https://github.com/peerlibrary/peerlibrary/blob/development/lib/documents/highlight.coffee
https://github.com/peerlibrary/peerlibrary/blob/development/lib/documents/annotation.coffee

Highlights are stored as open annotation standard
(http://www.openannotation.org/) and we are using Annotator codebase for
that (http://annotateit.org/) with some additions from Hypothes.is
(http://hypothes.is/).

Annotations then can reference a highlight by creating a link in their
body. Body is HTML, so if there is an HTML link to a highlight, that
means it references that highlight.

Our integration of highlighting/annotator you can find here:

https://github.com/peerlibrary/peerlibrary/tree/development/client/lib/annotator

One tricky thing is that we are using Annotator codebase to only create
highlights. So things which are named annotation in Annotator codebase
are in fact our highlights. A bit confusing, but comments should be
clear about that.

We don't yet parse or store annotations and highlights from or into a
PDF. This could be part of a normalization workflow, but it currently
seems a bit complicated. So we currently do not have any logic to modify
PDFs, so exporting PDF + annotations is not yet there (it would be great
if we would have that!), and reading is also currently not done at all. See:

https://github.com/peerlibrary/peerlibrary/issues/198
https://github.com/mozilla/pdf.js/issues/5283
https://github.com/mozilla/pdf.js/issues/5252
http://lists.w3.org/Archives/Public/public-openannotation/2014Sep/0000.html

So in process of PDF normalization we should remove all annotations and
highlights (currently highlights are displayed, also color for PDF
links). All that should be removed and replaced with our annotation. In
this way, for example, if PDF had a highlight, this should become our
highlight (which we can then share with others because of the open
annotation standard).

For links we should think what to do. One option is if there is a link
embedded in the PDF, that we remove it and convert it to our highlight +
annotation. In this way we would have all links to and from PDF through
annotations. But when we will have HTML-based publications, not PDF, it
would be a bit strange if clicking on a link in the HTML would open
annotation on the side, and not open link directly. Or maybe that is
even good. (Because annotations on the side will eventually display
preview.)


Mitar

--
PeerLibrary, facilitating the global conversation on academic literature
https://peerlibrary.org/
http://blog.peerlibrary.org/
https://twitter.com/PeerLibrary



Archive powered by MHonArc 2.6.18.

Top of page