charlesr wrote:I thought the whole idea of Recorder and the Gateway was that data could flow between the two without the worry of duplication? I thought the Gateway was a cut-down version of the data model used in Recorder (i.e., the NBN Data Model) and therefore is (theoretically) able to use the same mechanisms Recorder uses to prevent duplication and unwanted editing of data. Updating data would be handled in the same way we update data in Recorder: new data are added, changed data are updated, and already existing data (duplicates) are discarded. Is this not technically possible using Gateway held data? The custodian field is only really needed if you need to edit the data, otherwise the data are uneditable by default (in Recorder, at least).
While it's true the NBN Web Services are useful in allowing for NBN held data to be integrated into a local dataset, sometimes actually having the data stored locally is the better option, particularly where performance/bandwidth is is an issue, or extra metadata needs to be added to individual records.
I agree that this issue is another great example of the need for a way of segregating particular sets of data.
Cheers,
Charles
Hi Charles,
Just wanted to pick up on this a bit.
The Gateway database is a collation of datasets from many sources including recording packages, spreadsheets and custom databases. We assign our own unique keys to taxon occurrence records within the Gateway database for all records but we also store the original provider key. Provider keys must be unique within a dataset so that they can serve as identifiers. However, there is no checking between datasets - they're treated completely independently within the database.
The format of provider keys can vary widely from running integers to the 16 character fixed width format of Recorder. Only a very small proportion of the data on the Gateway comes from Recorder (most comes from schemes and societies via BRC), which means that only that proportion will have guaranteed globally unique keys. So we would have no chance at all detecting duplicates between datasets based on keys.
You could say that the NBN Gateway database model is a cut down version of the Recorder model but this would be a bit misleading. Yes, we have less tables (what database doesn't!), but the real difference is that the Gateway database is optimised for spatial queries, mapping and reporting to the website. There is no facility to edit individual occurrence records on the Gateway. Data loading is on a per dataset basis. When we update a dataset, the old version is completely deleted from the database before the new version is appended. Our database is as close to the NBN data model as, say, Biobase. It captures the usual concepts (site, species, record, people), shares a few table names for historical reasons and implements the species dictionary but that's about it.
I appreciate it is easier to work with data locally in many circumstances. However, the risk is that these data will be downloaded, edited and then find their way back into 'the system', exacerbating the national problem of duplicates. I'm not a Recorder expert, but it appears to me that a combination of a separate survey and the custodian field is an ideal mechanism for creating read-only copies of Gateway data in Recorder. How this will be implemented hasn't been completely thought out and we would certainly appreciate any ideas you (all) may have.
The appearance of NBN web services adds an interesting twist. I guess the web service/download comparison is analgous to the difference between downloading mp3s and listening to streamed media. I hope that as client tools for the web services develop, it will become transparent to the user whether the data they are analysing are sitting on their own hard disk or an external NBN node. There's lots to do before that and a bit of a cultural shift required before this goal is realised, though.
Andy