Many trials articles record their patient enrolment and outcodes as diagrams (fllow charts) often specifed by the CONSORT design (http://www.consort-statement.org/consort-statement/flow-diagram). We'll take DOI: 10.1056/NEJMoa1403291 (NEJM vol. 371 no. 13), Bel et al.
The paper is openly available and so we have the right to read it. Under the UK Hargreaves 2014 UK legislation we have the right to mine it ("data analytics").
. We plan to extract all the text, possiby normalize it, and build an XML representation. Not sure if CONSORT has an XML schema.
If we magnify the PDF we see the lines are clean (no jaggies or blur) so this is a vector diagram.
Isn't that a beautiful arrow? It's infinitely scalable.
So we are going to turn the paper into SVG and extract the diagram.