EDAM (Ecostation Diversity Access Monitor) is an ongoing research project at the Berkeley Inistiute of Data Science. I did research there during the inception of the project, working on an inference model to predict biodiversity completeness of a given data repository.
Motivation As large biodiversity collections and environmental data become accessible online, global research communities have unprecedented access to datasets. However, these datasets are often incomplete or inaccessible, making them practically incompatible for use in the aggregate.
As a result, biodiversity research suffers. It's almost impossible to answer questions about engangered species count or conservation efforts across ecosystems when every region has their own dataset and every dataset is incompatible with the others.
The EDAM project aims to build a platform that summarizes data to enable a wholesome, side-by-side data comparison of ecosystems. These aims are achieved by automating data processing algorithms to compile species lists and associated food webs for participating ecostations, estimating the completeness of the lists and webs, calculating similarity of the lists and webs and creating a web accessible visualization tool that allows comparison.
We hope scientists will be able to use this platform to answer key conservation questions, and more accurately model links between climate, biodiversity, human activities and other changes over time.
We chose islands as our intial test models because they are tractable, well-defined geographically, and contained. As a result, their biodiversity profile doesn't change very much over time. (Incomplete) data was collected from four openly available online biodiversity data repositories: GBIF, iDigBio, iNaturalist, and EOL. Using a multimodal bayesian inference model, we showed that biodiversity data completeness is limited by distance to researchers, locally available research funding and the ability and willingness to participate in data sharing networks. Using this, we could predict the percentage completion for any given island dataset, and determine the expected number of species in a region and its variance across datasets. Using Django, we built a web-app to display this data. An interactive demo is available at https://github.com/BIDS-collaborative/EDAM/