1

Re: Data flow, attribution and the Gateway

In our mission to gradually share more of our data online via the Gateway we have an issue that has us scratching our heads: when we provide data directly, we get attributed in the "Datasets available" box at the bottom of the Gateway pages. This is good.

However, when data are submitted to a national scheme and provided on the Gateway via that route, we get no attribution (other than, perhaps, in the metadata), even though we may have provided a substantial chunk of data. We have provided data to several schemes and yet a link to us will not appear at the bottom of pages where that data appears on the site. This is not so good for a number of reasons:

1) It encourages uploading of duplicate datasets by LRCs
2) It's a disincentive for co-operation with national schemes
3) It doesn't alert users to the presence of the LRC
4) It makes it look like we're providing less data than we really are

I think it makes sense that national schemes should be custodians of national datasets and should provide data to the Gatware. I also think it makes sense for record centre's be to be credited when data they have provided forms a component part of the national dataset. So perhaps it would make sense if, when viewing a dataset on the Gateway, attribution were given to the organisations comprising the component parts of a national dataset, and vice-versa? It might just solve the four problems I mentioned above and help smooth the flow of data.

Charles

Charles Roper
Digital Development Manager | Field Studies Council
http://www.field-studies-council.org | https://twitter.com/charlesroper | https://twitter.com/fsc_digital

2

Re: Data flow, attribution and the Gateway

Hi Chalres
Good points. We have been working a bit on the data provider profiles and the outputs of this are now available on the test site (www.testnbn.net). This does now allow multiple organisations to be listed for a single dataset. See for example:
http://www.testnbn.net/datasetInfo/taxonDataset.jsp?refID=2&action=0&dsKey=GA000171
This may not give quite the profile you are looking for but open to ideas on how it can be improved (particularly bearing in mind that potentially quite a number of record centres could have contributed to a single scheme. It does not list the dataset under the contributors page - maybe that would be an improvement?
We are just about to start to consult on the site as it stands.
STeve

3

Re: Data flow, attribution and the Gateway

I guess there are two issues here, the first is to enable multiple organisations to be listed on the Gateway for one dataset which as Steve says is in hand. The second would be mechanisms for and guidance on data flow before the data reaches the Gateway, to make sure that appropriate attribution is carried with the data as it is disseminated. A scheme may have contributions from multiple record centres so there is some effort required to keep track of the list.

John van Breda
Biodiverse IT

4

Re: Data flow, attribution and the Gateway

I have sympathy with the concern that LRCs will appear to be doing less than they are.  However I do not support the idea of having them indicated specifically as a source on the Gateway when they are part of a National Recording Scheme database.  Why should they be specifically singled out for such favourable treatment?. 

I presume it is only the appearance on the Gateway sources that is the issue.  I would expect the LRC as for any other supplier of a data set to be acknowledged as a source when a down-load of data is requested.  I would also expect, the Metadata to reflect major sources, which in the case of the caddis data base will be the Environment Agency.

I imagine that if a government funding agency requested it then the Gateway could show how many records had come via LRCs - but I accept that the public at large cannot see that.

There is an issue regarding duplication of records.  Whilst the software can handle the storage of all this stuff it does really give a poor impression when the LRC supplies data to a client.  I am increasingly feeling that an approach being used by rECOrd of offering mapping the data, including Gateway data, as an alternative to reams of records, is a pragmatic solution and there are software tools available too. However, in the end we probably have to wait for the arrival of truly intelligent software to do the job - or a robust pathway of record flow that involves LRCs.

Ian Wallace
Caddis Recording Scheme National Organiser

5

Re: Data flow, attribution and the Gateway

trichoptera wrote:

I have sympathy with the concern that LRCs will appear to be doing less than they are.  However I do not support the idea of having them indicated specifically as a source on the Gateway when they are part of a National Recording Scheme database.  Why should they be specifically singled out for such favourable treatment?. 

I presume it is only the appearance on the Gateway sources that is the issue.  I would expect the LRC as for any other supplier of a data set to be acknowledged as a source when a down-load of data is requested.  I would also expect, the Metadata to reflect major sources, which in the case of the caddis data base will be the Environment Agency.

I'll agree with Ians' comments here.  From the point of view of running a (small ?) recording scheme, we get data from multiple sources.  I do have some datsets from LRCs included in my database, but the species covered by my scheme (Tachinid flies) are not as "popular" as some groups, so some of the datasets I have from LRCs are quite small.  A few individuals have provided me with datsets many times larger than some I get from LRCs - should these individuals get the same or more "recognition" accorded to a LRC thats has provided some data.

I do keep track of the source of data for each of my records, so it is possible for me to flag which of "my" records come from which contributor, be they individuals, LRCs, Local Natural History Societies etc.  If we were to go down this route then the NBN would have to provide a mechanism to manage this - e.g. if we want "Borsetshire LRC" to show up as a working link on the NBN then every person or organisation who has had data from that LRC would have to code up their own dataset with the code for "Borsetshire LRC" to make sure it worked correctly.  I forsee list of contrbutors several times longer than the list of datasets.

Dataflow was also mentioned.  I have duplicate records that have arrived via various routes.  What would be the priority on deciding "whose" record goes up to the NBN, if the record here from the LRC was the "duplicate", would I need to amend my procedures to make sure that the LRC got enough "recognition" when I uploaded a dataset?

Matt Smith
Tachinid Recording Scheme