1

Re: Record duplication

Hello everyone,

This might be a bit of a newbie question but I have not been able to find it discussed on the forum. 

Just wondering what the state of play is regarding avoiding a single record being duplicated while it is shared between various data custodians.  For instance a ranger might send a record to their local recorder, the national recording group, an LRC etc.  Then all of these parties may well exchange records and so this one record becomes duplicated unless measures have been put in place to avoid it. 

More than happy for this to generate a discussion but I suppose my direct question is: have such measures been put in place and are they effective enough to justify the time taken to implement them?  Or is duplication just generally accepted as better than not sharing at all?

Thanks in advance for any opinions.

Mike Beard

Mike Beard
Natural Course Project Officer
Greater Manchester Local Records Centre

2

Re: Record duplication

And does anyone else see my message twice, or is it only me?

Mike Beard
Natural Course Project Officer
Greater Manchester Local Records Centre

3

Re: Record duplication

I'm only seeing it once.

Charles Roper
Digital Development Manager | Field Studies Council
http://www.field-studies-council.org | https://twitter.com/charlesroper | https://twitter.com/fsc_digital

4

Re: Record duplication

In theory the concept behind Recorder should be submit once, distribute many times. The idea being that users should  be encouraged  to send their records to just one place from where it is distributed to all who need it. We are no doubt a long way from this, partly because the concept is not widely undertood or  promoted,  and partly because of the fact that not everyone is using a system where the concept can be consistently applied.   My hope is that one day we will get there and believe that all those involved should be doing what they can to encourage the approach. For a start we should be encouraging data sharing through system like Recorder and avoiding, where we can,  importing data with new keys where the data is known to exist elsewhere with unique identifiers (eg. in system like  Recorder and Mapmate).

Having data in a system twice is probably not that big a problem apart from the impact it has on space etc. and the effect on statistics. In fact data which is truly duplicated can be identified and action taken to remove or ignore  it. However, very often the same data is  recorded in different systems in  different ways. For example different taxononomy can be used or data can be consolidated in various ways, making finding duplicates difficult.   



Mike

Mike Weideli
R6 Consortium

5

Re: Record duplication

Apologies for the delay in replying, had 'Bad behaviour' and proxy issues that have been resolved, thanks Mandy, Charles et al.

I think seeing my forum post twice only happened on that one day, and perhaps only for me?  Log it under 'strange quirk' I guess and see if it happens again.

Mike, excellent answer.
- Presumably the NBN gateway is the intended distribution mechanism? 
- Anybody have ideas regarding where the one place to receive records should be?  The NBN, BRC, local recording centres?
- Should we have a system similar to the one used for barcodes, whereby there is a central body handing out unique prefix numbers to organisations (including a country prefix if necessary), the numbers thereafter being of the organisations choosing.  We could apply this universal numbering system to Recorder, MapMate, or even Excel spreadsheets.

(T'other) Mike

Mike Beard
Natural Course Project Officer
Greater Manchester Local Records Centre

6

Re: Record duplication

Mike a typical data flow might be:

1. A local recorder sends data to their local group or LRC, often with the LRC helping to facilitate this process
2. A local group sends its data to the LRC, often with the LRC helping to facilitate this process
3. LRC exchanges data with National Schemes and Societies and NBN Gateway

The NBN Gateway is *one* distribution mechanism, depending on requirements. It typically holds a more "low-fi" version of the data and acts as a signposting mechanism to the data custodians who should be able to provide complete versions of the data. Local Record Centres are an ideal place to send records to because they are typically resourced to gather, clean, collate and distribute data, which is no small or easy task. National Schemes and Societies can also often accept data directly, depending on resources. As a local recorder or group, it's also good to be able to visit your LRC and get to know the people working there and strike up a relationship. LRCs can then, at the request of the data provider (i.e., the local group or local recorder), send clean and hopefully verified data on to National Schemes and or provision the data via the NBN Gateway.

Use of Recorder in all this helps because it uses NBN keys, the closest thing we have to the barcodes you mention. Every copy of Recorder goes out with a unique prefix code called the Site ID. The NBN Gateway uses a stripped down version of the NBN Data Model, which also uses NBN keys. Increasing numbers of National Schemes and Societies use Recorder. MapMate does not use NBN keys but does use a similar unique ID system which can be converted to NBN keys (but it's tricky to convert NBN keys to MapMate). If you're not using Recorder, I think you can apply to JNCC for a Site ID code, which makes it possible to create your own unique keys.

Charles Roper
Digital Development Manager | Field Studies Council
http://www.field-studies-council.org | https://twitter.com/charlesroper | https://twitter.com/fsc_digital

7

Re: Record duplication

Charles,

It is also possible for there to be a more complicated data flow, particularly for an organisation which transcends both the local record centre network and the interests of recorders/groups. This possibly leads to a triplication* of data where a recorder could supply records to that organisation, to the LRC and to a national scheme without each organisation being certain that the others have it and claiming its own custodianship. I don't claim to have a solution, as all the possibilities seem to have problems, but, like Mike, would be interested to know what people think.

Gordon

*if quadriplication is a word, I can think of a situation where data on our system is on at least 3 others - possibly more.

Gordon Barker
Biological Survey Data Manager
National Trust

8

Re: Record duplication

Before I saw this new thread http://forums.nbn.org.uk/viewtopic.php?id=1817 I had it in my mind that the NBNG could be an excellent way to distribute our records to the many LRCs and VC Recorders throughout Scotland.   

A unique record key is very beneficial for this so it is a shame that not all datasets have it.  Attaching a Recorder style record key would enable identification and processing of duplicate records, assuming the key can be retained when imported into other Recorder databases. 

The addition of a data provider field (Or dataset key into each record? Or lookup table of owners of Recorder/JNCC/Mapmate record key prefixes?) would make it easier to identify who needs to be acknowledged and whose permission might be required, where to ask for further details, and who to tell about errors in the data so then only the data provider makes the update which is then distributed during everyone's next update cycle.  Data provider could also be used as a filter so that you only upload records onto the NBN that 'belong' to you.  Another useful field for handling replication would be 'last changed' so then you can avoid exchanging entire datasets, just new or changed records.

Having thought through what field changes would be needed to use the NBNG as a replication portal, and considering how big the NBNG data holding has grown, I am guessing that these changes are not likely to happen and I need to rethink!

Mike Beard
Natural Course Project Officer
Greater Manchester Local Records Centre