So I have a vague idea for a resource that contentmine might be able to make, which could potentially help me in some research and be useful to people in similar fields.
In ecology we often use databases like GBIF for species occurrence data, where researchers are able to upload geographical data on where species have been encountered. It works pretty well but inevitably the majority of researchers don't upload their data to it, so for individual species the information in the database can be pretty patchy compared to whats actually in the published literature. Manually trawling through the literature is obviously a pretty crappy way of determining the home-range of a species, so mining could be a really handy way to gain new insights into species distributions.
It could be quite interesting to create a tool where using regular expressions you can try to scrape geographical information for specific species from papers. My guess is that the user would need to be able to specify that they want to download the xml for any paper that contains both
- any given value from a series of species names
- any geographical coordinates
They may then have to go through the papers manually and identify if the desired species occurred at that site, or if they were merely being referred to in the body of the paper as part of a discussion of the previous literature.
Then potentially the user could use the output geographic data to map known occurrences of their species using GIS (Geographical Information Systems) software.
Does this sound possible?