Corpus for mining.
The results from OATD search are not easy for machines to follow automatically. Like so many sites they are arranged as a series of single references. Here are the first five.
NOTE: There is no indication of licence or permissions on this page. I downloaded this by hand (like all the others). As I said before I have no complaints with the authors
Author: Anastasia Gousseva
Title: _Investigating the Expansion of Angiosperms during the Late
Cretaceous using a Modeling Approach_
Copyright: © Copyright by Anastasia Gousseva 2010
Author: CORINNE ALEXANDRA FAY
Title: _MID-CRETACEOUS pCO2, CARBON-CYCLING
AND THE RISE OF THE FLOWERING PLANTS_
Author: Karina I. Neimanis
Title: AN INVESTIGATION OF ALTERNATIVE OXIDASE PRESENCE, EXPRESSION, AND REGULATION IN THE MOSS PHYSCOMITRELLA PATENS
Copyright: None explicit
Includes: Follow this and additional works at: http://scholars.wlu.ca/etd
Part of the Bioinformatics Commons, Biology Commons, and theMolecular Biology Commons
Unfortunately these "Commons" now probably belong to #elsevier
Wilfrid Laurier University
Author: Charles Stuart Piper Foster
Title: Using Phylogenomic Data to Untangle the Patterns and Timescale of Flowering Plant Evolution
Copyright: None explicit
Author: Lisa Maria Ebner
Title: Untersuchung an Angiospermen- und Gymnospermenpollenkörnern aus der Potomac-Formation (Unterkreide) in den USA
Copyright: All rights reserved (site)
NOTE: the text is in German but AMI can still index much of it. We can use Wikipedia later to index the German words.
Files were manually copied to a directory:
(The filenames were edited to remove non-printing and URL-escaped characters).
oatp is called a
CProject in the
- The PDFs will all become
CTrees after this.
I selected 30 records from OATD. Of these:
- 2 sites failed to respond
- 1 link was broken
- 1 was embargoed til 2021
- 1 required me to sign in even though it was CC BY-NC
- 1 was "not available"
Leaving 24 that I'll work with.
So 20% of links in OATD won't give theses.
It took me about 30 mins to download 30 theses. There were at least 15 different "styles" to the repository, most were clunky and give a clear impression that libraries regard theses with a C20th mind, not a C21st. I agree that they are part of the educational and assessment process, but many are funded by Research funders and industry and IMO the theses themselves are excellently produced. It's a shame that they are not better deployed on the sites.
from now on almost everything is automatic ...