Library partially supporting analysis of bitmapped images, mainly for scientific/technical diagrams, maps, charts, tables, etc.
imageanalysis transforms raw bitmaps (
bmp) into semantic pixel maps, either monochrome ("binarized") or "posterized" with a small number of colours. This is very messy and heuristic. We have used it successfully for phylogenetic trees, and also for chemical structure diagrams. Currently (2017-02) we are working on X-Y plots.
After binarization (or posterization) the diagrams consist of
pixelIslands which can be analyzed heuristically. Methods include:
- OCR for characters (mainly through Tesseract)
The results are held in an SVG DOM based on
svg. This means that , in principle, the results can be transformed into higher-level objects such as tables, chemistry, flowCharts, etc. The main problems are:
- fuzzy diagrams (JPG, antialiasing)
- unclear semantics ("l", vs "1", etc.)
However with diagrams of simple to medium complexity it is often possible to extract "almost all" information.
2017-02 ported to Github
2017-02 builds under Travis