1

Topic: Corrupted taxon occurrence comments

We have just encountered a batch of records in R6 where the observation comments made no sense.  They were all imported from three tab-separated text files in Sept 2009 using the import wizard.

In looking at the original text files it appears that in some cases (but not all) the comment from one record has been imported into the comment field of the following record.

There seem to be two failure modes:

  • One is where a lengthy comment of one record has been truncated and re-used for the comment of the following record, replacing the record’s original comment. In this case the truncated comment is always 110 characters long and ends with three full-stops, i.e. "…". These can be easily found, e.g.

    SELECT COMMENT FROM TAXON_OCCURRENCE WHERE (([COMMENT]) Like "*..." And Len([COMMENT])="110");

    We have 86 such records, each being a truncated version of the prior record’s comment.

  • The other mode is where the comments have "slipped" by one record, each acquiring the comment of the record before it. This is impossible to check for automatically. I don’t know how many of these we have.

Both failure modes have been seen within the same import, but neither is consistent. Some long comments haven’t been re-used in truncated form and some comments haven’t slipped onto the next record.

A re-importation today of one of the three text files (the largest) has gone in perfectly, so whatever the problem was (with the import wizard?) appears to have gone away.

I'm hoping the problem was restricted to just these three imports, which can easily be reimported using the current wizard.

I just thought I'd issue an alert in case this has affected anyone else unknowingly. It took us more than five years to notice!

Keith

2

Re: Corrupted taxon occurrence comments

Did the data go through excel at any point? This looks like something we have had in the past caused by linefeed characters in a spreadsheet cell, causing the contents of that cell to be split and pushed to the next row.
The same could happen in a text file format, but that would tend to disrupt all the subsequent data and make the text file impossible to import.

Rob Large
Wildlife Sites Officer
Wiltshire & Swindon Biological Records Centre

3

Re: Corrupted taxon occurrence comments

Yes the records started out in excel and carriage returns may potentially have played a part with comments slipping onto the next record, but I don't think they have anything to do with the 110-character truncation issue. On closer examination this morning this is weirder than I first thought.

Here's an example from two adjacent records.

Obs key: SR0001490000EVHM
Sample Location Name: Priory CP
Obs comment: Supplied_location: Warren Villas NR; Supplied_comment: one by to the gate near the bend in the River Ivel this afternoon;

Obs key: SR0001490000EVHN
Sample Location Name: Warren Villas NR
Obs comment: Supplied_location: Warren Villas NR; Supplied_comment: one by to the gate near the bend in the River Ivel t...

The full obs comment (seen on record SR0001490000EVHM) belongs on record SR0001490000EVHN (because the Location name and all the other details of that record match with the original record). SR0001490000EVHN however actually contains the truncated comment of 110 chars ending with "...". 

So the truncated version isn't on the following record as I first thought, but on the intended record. The full comment has however moved onto the record before the intended record according to the Obs key sequence numbering.

We did have 474 instances of 110-character comments ending with "..." before I started reimporting records.

Keith

4

Re: Corrupted taxon occurrence comments

I've never seen that ellipsis (...) produced anywhere by Recorder or Excel, or indeed any other software I am familiar with. Clearly it is a logical way to mark something which has been truncated, but I wonder if it was something done by somebody, rather than an automated process.

Are you able to trace back to the original data source & see where in the process it occurred?

Nothing more useful than that to offer I'm afraid.

Rob Large
Wildlife Sites Officer
Wiltshire & Swindon Biological Records Centre