1

Topic: Problem with exported data

This year I have been experiencing difficulties exporting data to Excel for one of the two VCs for which I am Recorder. The data are imported to R6 in the same way and the imports have worked and I value the checking that R6 does so that the data are 'clean' when imported.

I am using Excel 2016 and am not very impressed with it. I used to extol the virtues of Excel for many years (at work I had to use Lotus 123 or whatever it was called - a pig to use) but the latest version of Excel has had many stupid problems like, at times, not letting me copy and paste. Because of this I am not sure if the issue is an Excel problem or if there is a problem getting the data out of R6 and I would very much welcome a view.

I have just exported c 303k records for VC51 without a problem. I created an export file of c 316k records for VC50 but Excel would not, or could not, open the file and I don't understand the reasons. The message mean nothing to me. They are shown in the attached file.

Is the problem in R6 and how could I correct it? Not knowing what the problem is does not help me to deal with it alone.

I will be very grateful for any guidance as to what is happening.

All the best, Ian

Post's attachments

Excel error after export from R6 20160702.docx 65.13 kb, 12 downloads since 2016-07-02 

You don't have the permssions to download the attachments of this post.

2

Re: Problem with exported data

Hi Ian,
Not sure whether this is the same problem but we sometimes have to save excel exports as .xls rather than .xlsx. There seems to be a spurious character lurking in our recorder database somewhere that is handled by the old excel file format but not the new. The file is saved by recorder without any error message but excel can't open it and says it is corrupt.
Janet

Janet Simkin
British Lichen Society

3

Re: Problem with exported data

Hi Janet
I agree with what you say. The problem, for me, only arises with data from VC50 for 2015. I have exported bits as xls but if I want a bigger export (over 65k records) xls won't do the job. From what you say, it may be that this is another bug with Excel 2016 but I can't think what kind of character would cause the problem when the data were imported successfully.

Given what you say I may try Microsoft and see what they say.

Thanks for your reply, Ian

4

Re: Problem with exported data

One thing which will cause this is a special character of a plus over a minus (+ on top of -)  to indicate approximate. Excel doesn't support the resulting xml (&plusmn) or rtf  \'b1 .  If this is the problem where do you think it might be ?  In taxon occurrence comment as rtf  ?  We should be able to find it and change it to something else.  I am not aware of any other special characters which cause the problem, but there may be others.   


You may also be able to export the file as text (export Other Options) and import this into Excel. This would also  enable us to find what is causing the problem. Not much we can do in R6 other than find and change the characters Excel doesn't like as R6 uses third part software to do the export.

Mike Weideli

5

Re: Problem with exported data

Hi Mike
Thanks for your suggestion. I don't use (knowingly) a plus-over-minus for anything. I will export the 2015 data (that contain the problem) as text and see if I can identify any strange characters and will come back with what I find.

Cheers, Ian

6

Re: Problem with exported data

Hi

I have exported 46k+ records for VC50 as a text file. I opened this with Excel and it did the opening without any problem and I then saved the file as a .xlsx. Having done so I closed down the file and then opened it without any problem at all.

I don't know if there is a funny character, and with 46K records it will take a while to find it. My idea for doing that would be to Find each letter in turn and replace with nothing, then the numbers and see what is left. Is there a better way to find a strange and unknown character?

Cheers, Ian

7

Re: Problem with exported data

I think the conversion to text will change unknown characters to an ascii character.  I think Excel  may then change it yet again.   However, I am not sure if the character causing you the problem is in fact the +/- so looking for a character when we don't know what it is will be very difficult without code.

You could start by scanning down the comment column looking for anything which doesn't appear correct. Unfortunately, it may show as a space (even though it isn't) so finding it  this way may be impossible.

Any chance of letting me having the text  file so I can use code to look for anything outside the normal ascii range.

Mike Weideli

8

Re: Problem with exported data

Hi Mike

I did the search and replace within the .xlsx file and at the end was left with the following that looked weird:

            ├®

and

             ├?┬?├?┬?├?┬?├?┬®         

but both of these disappeared when I chose to search for ? on its own. Why would R6 export these characters as they did not get imported? I will send you a text version of the file to see what you can find.

Again, many thanks for your help and suggestions, Ian

9

Re: Problem with exported data

I think you will find that you have imported these as special characters. They have been converted by Excel into something which looks different, but they will relate to special characters in your R6 data.  Have you copied and pasted data into the comments field  of your import files ?

I have tested the +- and this end up in Excel as something which looks a bit like a 7.

Mike Weideli

10

Re: Problem with exported data

I do copy and paste into the Comments field, usually text but also record reference numbers from BirdTrack and Cofnod. I don't recall seeing any special characters. Funny how the problem has only arisen with data from VC50 in 2015 when I have records for VCs 50 and 51 since 2004. Have you seen the text file I emailed to you?

Cheers, Ian

11

Re: Problem with exported data

I now think that it may be the accented 'e' on the end of cafe which is causing the problem.
 
Please can you let me know what actually appears in R6 for these records. I could be wrong about what it actually is.

I am not sure if we can anything about it. The problem could be in the way the Report Wizard interprets the RTF, in the third party software which creates the output file or in Excel (it is oK in xls files).  If we can identify the actual entries which cause the problem we may be able to find a way around the issue.

Mike

Mike Weideli

12

Re: Problem with exported data

Hi

Many thanks for the text file you sent me with the weird characters identified from my text file of VC50 2015 records. You identified three records where the observer had used the character 'e acute' at the end of the word 'cafe' (here without the accent). I found the records and removed the 'e acute' rubbish symbols in the observer's comments and changed them for a plain 'e'. Having done so I exported that set of records again and this time Excel was able to open the .xlsx file without any problems!

When preparing records for import I must remember to look for accented letters and change them to plain letters.

So, it seems that R6 changes accented letters to something else. Many thanks for helping to track down this issue.

Thanks again, Ian

13

Re: Problem with exported data

Excel (XLSX) doesn't have any problems with the report output when 'é' is either copied and pasted into a comments field or entered directly using the numpad.

To me 'Undeclared entity' suggests that the é has been entered into Recorder as an XML entity e.g.' è' which is being written to the XLSX as is. By editing the raw XML of the XLSX you can reproduce this 'Undeclared entity' error.

I would suggest looking at the raw comments field to see how the characters are encoded. The output wizard seems to handle them fine, but perhaps something is stopping it in this case?

Charlie Barnes
Information Officer
Greater Lincolnshire Nature Partnership

14

Re: Problem with exported data

Hi Charlie

I have to confess that I don't understand your second paragraph. I don't know what you mean, or how to go about "editing the raw XML of the XLSX". I would welcome knowing what this means.

I am also not sure how to check how characters are encoded in R6 comments fields. I can do a visual check of what appears in the comments field in an R6 record. I don't know how to check how each character is encoded. Again, I would be grateful for enlightenment.

I hope I am not asking you to write a book and look forward to hearing.

Cheers, Ian

15

Re: Problem with exported data

No problem Ian - that was more of a comment for a Mike :-)

An XLSX is just a ZIP file in disguise. If you change the file extension to .ZIP or open it with a WinZip etc. you can view the actual content. You'll probably want to look in the sharedStrings.xml file (e.g. searching it for "caf" and see what the é looks like; in mine it looks like an é).

To check how the comments are stored in Recorder 6 you'd need access to the SQL Server and do e.g. a simple "select comment from taxon_occurrence where taxon_occurrence_key = 'XYZ'".

Charlie Barnes
Information Officer
Greater Lincolnshire Nature Partnership

16

Re: Problem with exported data

If e acute is input in Recorder 6 using the alt key +  130 then both Recorder 6 and Excel accept this. Also copying and pasting from a word document seeme OK.  In both cases e acute is held in Recorder 6 in RTF format as \'e9.  The problem seems to occur when text is pasted in from some other sources. I suspect that the copied text may contain its own character encoding (Unicode of some sort) which Recorder can't cope with when  its own rtf formatting is added  to the pasted info.  What would be useful would be to track an example of this from the originally pasted information  through R6 to EXcel.

Mike Weideli

17

Re: Problem with exported data

Hi Mike

One of the offending characters was in the comment, below.

Nesting in roof space of café of North Wales Garden World.

Do you wish me to change this to e acute then try to import and export to Excel? Interestingly, when I put an e acute in Word I do Alt 0233.

I created a new survey Test and imported the one record from Bill Myerscough with the weird characters substituted by Alt 0233 which put in an e acute. I imported this one record to the survey, then did my usual report but from this survey and in the usual format that I use for bird reports. I opened the exported .xlsx file and all was well and the comment held cafe with an e acute.

The original record had been entered into BirdTrack so I don't know which character(s) Bill used when inputting the record. The line I copied, above, was in the file I exported from BirdTrack. Tomorrow, I shall ask at the BTO if they can track down what was input, or how BirdTrack deals with e acute.

I hope that, at least partly, answers your query.

All the best, Ian

18

Re: Problem with exported data

(it's the © that's causing the problem - it's getting converted to © in the XLSX which is undefined - hence the error. This looks like a bug in the exporter; AFAIK it should be converting it to ©; thinks like ® and € are handled correctly).

(the BTO have an encoding problem on their website which is causing the é to become é)

Charlie Barnes
Information Officer
Greater Lincolnshire Nature Partnership

19

Re: Problem with exported data

Hi Charlie

Thanks for telling me that .xlsx files are glorified .zip files. I had a look in a small file of three records that I sent to the BTO and it took me a while to find the file with the content in it. In this case it was easy to see the replacement of é by é. However, this was with just three records. It would be much harder to find them in the original exported file of 50k+ records.

I hope to hear from the BTO what they find at their end and will report back.

All the best, Ian

20

Re: Problem with exported data

Hi Mike

You used a report to identify the records in my file with 'funny' characters. Is that a report that we could use to clean data alongside your excellent duplicate with survey report?

Cheers, Ian

21

Re: Problem with exported data

Hi

I am thinking about how I might identify the records througha XML report. It was fairly easy in your text file, but not so easy directly in R6 as the characters are held as part of the rtf formatting.

Mike Weideli

22

Re: Problem with exported data

Hi Mike,

Going back a bit here, but did you ever manage to produce a report that identified odd characters? I have the same error message as here in an export of 90,000 records. I have managed to export to a text file but can't find an e-acute, copyright or A-tilde there (or caf, in case café was the problem)

Gordon Barker
Biological Survey Data Manager
National Trust