A bit more consideration Mike
I have just looked at the data and fortunately the worst examples are all in a single survey of fungus records. There are nearly 17,000 survey events in this survey, but if i create a query which groups the events by LOCATION_KEY, LOCATION_NAME, SPATIAL_REF and the three VAGUE_DATE fields, this figure is reduced to 1094 distinct events (most of which contain multiple samples, which include both duplicate tree species as associates and duplicate biotope records as well), which give an indication of the scale of the problem. Ideally I would also like to group by COMMENT, but my test query in the Access db will not allow grouping by text fields (I think this is probably a restriction of SQL Server or of ODBC).
We are doubly fortunate since all of this data derives from a single recorder so that is at least one complication I do not have to worry about. I am wondering if you have any idea of a cunning way I could overcome the difficulty in the possible case where there were multiple recorders for any of the samples. I can only think that I would need to step through each of the suspected duplicate survey events ensuring that each has exactly the same event recorders associated, or perhaps create a function which sorts and concatenates recorder name keys and group on that as well.
After that I guess I would have to step through the events, checking the samples within for similar correspondences and where I find duplicates would then need to transfer all the occurrences to the first sample within the first event of each group.
After that I would then check each resultant sample for duplicate occurrences, check each occurrence to see if it is recorded as a related occurrence and reassign all the related occurrences of a given taxon to a single example, marking all the others for deletion (perhaps setting the determination/validation fields appropriately?)
Finally I would need to ensure that each of the marked occurrences now no longer features in any related occurrence, before deleting them and any associated determination records etc.
Writing this message has just reminded me why I haven't attempted it yet. And none of this answers Hilary's question as to what she should do with new data of this type. As far as i can see, the only sensible option for importing large datasets of fungus or lichen data is to relegate the associations to a comment field. This is a major deficiency in the import wizard, since it gives the impression that it can import related occurrences, but what it actually does is hugely overcomplicate the organisation of data imported.
Rob Large
Wildlife Sites Officer
Wiltshire & Swindon Biological Records Centre