1

Topic: Data upload guidance - Darwin Core

I have just seen the most recent NBN Newtwork News arrive in my inbox which includes an article and links to the "Darwin Core" data upload templates.  I have had a look through and my first impression is that it is very complex, saying that "We would prefer datasets to be sent as Darwin Core Archives, and the metadata as an EML document, however, we appreciate that not all of our data partners have the technical resource to be able to create these documents" is probably a bit of a significant understatement.

I have a couple of queries / requests regarding the data required in the dataset.

I notice there is only one date field for the record, we no longer have the "start date - end date - date type" fields available.  Is the Atlas limited to only one date per record now?  What happens to records with a range of dates?

With the NBN Exchange Format files we had the option to run the dataset through the NBN Exchange Format Validator, which was very useful for a final check of the datafile and helpful in catching the odds and ends which would prevent the file from a complete upload.  Is there a similar checking system available for the Darwin Core format files?

Looking at the headers, I notice that one of the columns is headed "identificationVerificationStatus".  Looking at the list of example terms for this field I see the options are : "Correct, Considered correct, Not accepted, Unable to verify, Incorrect, Unconfirmed, Plausible, Not reviewed".  Again, give the recent series of discussions about unverified data being displayed on the Atlas, surely the Atlas should by default not map or display any records where the status is not one of "Correct" or "Considered Correct".

The spreadsheet "guides to the template and metadata are not the easiest things to look through and see what is what.  With the Gateway we had the "Guide to the NBN Exchange Format" provided as a Word document with the term "required" against certain column headers so we could see at a glance what was needed as a minimum when creating a datafile.  It would be very useful and probably more user friendly if this document could be updated in light of the changes seen with Darwin Core.  For each of the required fields needed for Darwin Core, if the old NBN document could be edited to highlighted the minimum required Darwin Core header equivalents, it would make using the new format more understandable and make the transition to the new format more understandable - eg "[Required] NBN Format 'TaxonVersionKey' = Darwin Core 'taxonID'.

2

Re: Data upload guidance - Darwin Core

Hi Matt,

The section of network news that you have quoted about Darwin Core Archives and EML documents are separate from the csv templates that are provided. An explanation of what a Darwin Core Archive is is given in the article (*a Darwin Core Archive is a single dataset (zip) that contains all of the information for species occurrence records. It is a collection of CSV/text files (that adhere to Darwin Core Terms), with a metadata XML to describe the files in the Archive.) The templates are simply adhering to Darwin Core headings and are not the 'archive', which is what the data partners with more technical resource will be able to create. The templates provided shouldn't be too different to completing a NBN Data Exchange Format template, once you've got used to the new wording.

With regards to your specific questions:

1) date field: In the guidance document, it gives an explanation for how to give a range of dates (see cell D16: "2007-11-13/15" is the interval between 13 Nov 2007 and 15 Nov 2007)

2) The NBN Record Cleaner is still in existence, and will allow csv and excel spreadsheets to be loaded. So, in theory the new Darwin Core terms should be able to be mapped into the record cleaner, and then the spreadsheet loaded to allow these checks to be carried out. I admittedly haven't trialled this, so I would be interested to hear if anyone has had any success with this. We are discussing implementing an internal checking system directly into the atlas, called the 'sandbox'. This is functional on the Australian atlas, so we hope to be able to transfer this functionality across at some point.

3) I think the discussion around verified and unverified data was about allowing all data to be displayed but to ensure that it's verification status is correctly marked. This field allows data partners to mark records at different levels of verification.

4) In the Darwin Core guidance spreadsheet, row 1 is merged to mark the fields that are 'required', 'required but with an option' (such as choosing between grid refs and lat/long), and 'suggested', with a separate tab that lists the other available options for Darwin Core that people may wish to use.

Row 7 in this document also gives the NBN Data Exchange Format equivalent term, so that users can see how their data will map across.

This spreadsheet does look busy, but that is because we have tried to give lots of examples to help users work out how to use this new format. I think that it would be hard to get all of this info into an easy to use word document. However, if there is general support for turning this spreadsheet into a word document, I can do this.

I hope this helps to explain the new documentation a little.
Ella

3

Re: Data upload guidance - Darwin Core

Hi Ella,
This is a bit of a shock to the system especially as so many of us use Recorder and the NBN Exchange Format files to submit our data, and have tried and tested systems to for preparing and checking our data systems.

I think we understand the need for the adoption of international standards, especially as some of us asked for NBN data to be included in GBIF. However, for those of us who do not have the technical resources to produce a Darwin Core Archive, some of us will struggle to change to the new system either through a lack of resources or technical expertise. Please remember that many data managers are volunteers and do not have the level of technical knowledge that is sometimes assumed by software developers.

Therefore, please can you mange this change slowly and with support that is simple and easy to follow. I appreciate that all the information is there in the spreadsheets, but it is not easy to follow. Spreadsheets were not intented to be used for text, so please can you convert the information into a text document with tables and use the spreadsheets to provide examples.

We appreciate that your workload and deadlines are very demanding, but this is important. If you get it wrong and the system is too complex you may find that some groups will not submit their records. You have an army of very experienced recorders and data managers, so why not use their expertise and ask some of them to comment on the draft guidance, it might help to produce a system which suits everyone's needs.

Christine

4

Re: Data upload guidance - Darwin Core

Hi Christine,

Thanks for your comments. I will put some time aside to transfer the guidance into a word document, I hope to find some time to make this available within the next few weeks.

I would like to reiterate, that a Darwin Core *Archive* is different from a spreadsheet that uses Darwin Core standard terms. It is the Archive that is a bit more technical, and would require some expertise to create, whereas the Darwin Core spreadsheet template should be a little more straightforward. I appreciate that there are not currently systems within our Network that will export directly into this format as has been done with the NBN Data Exchange Format in the past, so completing these spreadsheets will involve a little more work for some, but once it has been done a few times, I'm confident that it will become easier.

I have not gone into any detail on the help pages or in the Network News article about how to create the DwC *Archive*, as those with the technical know-how should be able to manage this, however, I am happy for questions to be sent through if anyone is struggling to create the Archive file. The detail that is outlined in the documentation is for the simplified Darwin Core CSV templates and metadata.

It did take me a little while to get my head around Darwin Core too, so I understand certain levels of confusion around this new process! I will definitely try to make the guidance documents clearer.

And, yes, as Christine has suggested, if there are any technical experts in the Network who would like to be involved in helping to re-create the guidance documents, please get in touch (e.vogel@nbn.org.uk)!

Ella

5 (edited by Matt Smith 15-06-2017 13:30:32)

Re: Data upload guidance - Darwin Core

I'm glad it's not just me that is a bit overwhelmed by the spreadsheets.  If we could have a Word document is a similar layout to the old Gateway documentation that clearly indicates what the absolute minimum requirements are for the new upload format that would be great.

6

Re: Data upload guidance - Darwin Core

Hi all,

I have started preparing a word guidance document to explain the Darwin Core CSV template. I thought I'd share it on here for some initial feedback before I spend more time on the document.

Things to bear in mind:
1) This is the guidance document to accompany the spreadsheet template. Users will still have to complete the spreadsheet, but this guidance word document should be more user-friendly than the spreadsheet guidance.

2) This is not complete. I will be adding a second table with some extra suggested fields. I have just trialled this here with the required fields.

3) The introductory wording can be (and probably will be) changed a little when it's all finalised.

4) The Darwin Core term names are all hyperlinked to the descriptive Darwin Core page for more info.

I would welcome some initial feedback on this layout, please.

Thanks all,
Ella

Post's attachments

Guide to NBN Atlas Darwin Core template_word.docx 185.86 kb, 10 downloads since 2017-06-19 

You don't have the permssions to download the attachments of this post.

7

Re: Data upload guidance - Darwin Core

That looks a good start, though given the complexities of data protection legislation and the "Open Access" nature of data on the Atlas, I think I will be using a generic fill for the Recorder name for everything uploaded to the Atlas.

8

Re: Data upload guidance - Darwin Core

For Darwin Core archive creation I strongly recommend the GBIF Archive Assistant tools here:
http://tools.gbif.org/dwca-assistant/

Even if you don't actually use the tool to create an archive, the page still gives a useful impression of what exactly the Core is about. Note that it's the "occurrences" option (tick box at the top) which we need, rather than the "taxon" one, which is more about defining and describing taxa.

I found it interesting as an intellectual exercise and a chance to think about what exactly occurrences and taxonomy really are; however, it remains pretty complicated compared with the basic spreadsheet outputs that I'm used to.

Linda

9

Re: Data upload guidance - Darwin Core

Thank you Ella, this is much easier to follow and a good start in explaining which fields are required and the format.

I understand Matt's reservations with respect to the use of recorder names. In our experience most recorders like to have their name associated with a record and we could probably just amend our terms and conditions so that individual recorders could specify whether they wish their name to be published.

Christine

10

Re: Data upload guidance - Darwin Core

chrisjohnson wrote:

In our experience most recorders like to have their name associated with a record and we could probably just amend our terms and conditions so that individual recorders could specify whether they wish their name to be published.

That's fine for new records, but getting the OK for historic data or older records may be tricky, you can't just change Ts&Cs without their approval.

11 (edited by Darwyn Sumner 22-06-2017 08:22:25)

Re: Data upload guidance - Darwin Core

If I was trying to persuade a small recording scheme to submit records to the NBN Atlas (and I am in that position) the guidance notes would be a lot simpler and contain less jargon.
In essence what the NBN Atlas needs is:
1. Occurrence dataset (conforming to the familiar 4Ws)
2. A description of that dataset (metadata)
3. If the small recording scheme hasn't submitted records before then NBN needs a few extra details:

    Organisation name
    Name and email address of a contact person for the organisation
    Organisation logo
    A representative photo for the organisation
    Organisation address
    A link to organisation's website
    A short description

Occurrence datasets can be in either NBN Data Exchange Format (as output from Recorder 6) or an updated format (see Ella's document)
Metadata comprises 14 pieces of information about the dataset which appear on the Atlas. Users may require to store more than this as a record of their work (date, version etc.)

Items 1. and 2. should be bundled together into a .zip file

12

Re: Data upload guidance - Darwin Core

Hi all,

I have updated the data upload templates and guidance documents to make them more user-friendly. They are available to download from this page https://nbnatlas.org/help/share-data-nbn-atlas/ . Anyone who is submitting datasets needs to simply complete the two templates, with help from the guidance documents, and email them through to the NBN Secretariat.

I hope that you agree that they are more straightforward than the original documents. As always, feedback is welcomed.

Ella

13

Re: Data upload guidance - Darwin Core

Thanks for the updated template document, it is much more understandable.  I have a couple of questions.

1) coordinateUncertaintyInMeters.  The definition says "Definition: The horizontal distance (in meters) from the given decimalLatitude and decimalLongitude, or Grid Reference, describing the smallest circle containing the whole of the Location" and the Example says ""30" (reasonable lower limit of a GPS reading under good conditions if the actual precision was not recorded at the time)"

What do you suggest we do with those records where we have no idea or information about this value or what to enter.  The great majority of records I recive, and certainly all of the "historical" records I deal with have no indication of this value as these were generated by reference to an OS map, not a GPS system.

How does this map over to the old "Precision" system used by the old Gateway.  As part of the process of preparing a dataset I had to generate a "Precision" value for all of the records.  How would you map over something with a Gateway Precision value of 1000 onto this new field.  Some exemplar guidance is needed here I think -ie Gateway Precision 1000 = Atlas Precision Value xx, Gateway Precision 100 = Atlax Precision yy.

2) identificationVerificationStatus.  Definition: An indicator of the extent to which the taxonomic identification has been verified to be correct.
Examples: 'Verified', 'Unverified'.  Why is the Atlas encouraging the submission of unverified records?  Surely any record submitted to the Atlas should be verified and the type of verification described in the dataset metadata?

14

Re: Data upload guidance - Darwin Core

Hi Matt,

As you have suggested, the term 'CoordinateUncertaintyInMeters' is equivalent to the old Gateway 'Precision' term (this is marked on the guidance sheet, column 2), so you can use it in the same way as you would have used 'precision' in the past.

Yes, the NBN Atlas can hold unverified datasets, and all of those that are unverified are clearly marked on the dataset page. The vast majority of discussions we have had with stakeholders, steering & working groups have strongly supported the inclusion of unverified records . This is seen as a good way of decreasing the "time to market" of data as some can currently be waiting for verification for a very long time.  It will be up to the data user to determine whether or not they want to use unverified data.

Ella