Re: A failsafe method to check for duplicate data entries - how?
Hi
I thought I was beginning to understand R6 and how it works, but my slight confidence has been somewhat shattered at a critical time (when I need to get sets of records - not in duplicate or worse - to people who will write species accounts for the next bird report).
The problem:
I have many, possibly now several thousand duplicate records (ie same species, same date, same location, same abundance, same observer). This has occurred despite my use of a system whereby I put new files for import in one folder and after import move them to a different folder.
I know now that I get some duplicate records because observers send records to several public databases that I incorporate in my R6 database (records directly to me, or via BBS, Birdtrack, etc etc). Also, I have not found a foolproof method of checking whether records have been successfully imported or not.
The other day I found that the number of records in R6 for the year 2007 was about 24,000 but the number I expected from the number of records in Excel import files as around 32,000. I was unable to account for the difference and wanted to check the records in R6 to see what was missing. I used the Report Wizard and the newly acquired skill of selecting the vice counties, selecting a year, but have been unable to select 'Observer' from the list of variables to select on in 'Additional filters'. I have used 'Surname (Determiner)' and found that does not select all the records that a particular observer has had imported. An example was that one person for whom I had imported 3,862 records had only 204 identified by the Report Wizard.
I have, some time ago, requested that the import process be amended so that R6 checks for duplicate records between the file to be imported and existing data (not just duplicates within the import file which is what it seems to do currently). As far as I am aware there has been no progress with that feature.
Question:
Is there a method, eg an XML report, or some other add-on, that can have parameters set so that it can check for duplicates and weed them out? The only means I know of currently is to select the set of records for each date and delete each record, one at a time.
How do colleagues with much bigger sets of records than I have check to ensure they do not have duplicate records? How do you get rid of real duplicates?
I would be grateful for any suggestions!
Cheers, Ian