Re: Update on the import wizard in R6.13.3.182
I hope you have all had a good Christmas!
Over the festive season I have decided that I needed to reconstruct my database. The prompt came from a data supplier (a LRC) that wishes to provide me with a complete, new set of all of its records every year. This immediately caused panic as I had not fully understood the value and importance of 'surveys' when I started my database. I decided to rebuild my database of c 95,000 records by importing again, all records divided up into different surveys named after each main data supplier, leaving a large 'catch the rest' that I have called 'Observer data'. Consequently, I have been using the import wizard a lot over the last few days and have found some weaknesses that I wished to report.
Firstly, not all the key checks for valid data are done before importing. A key aspect is the date. The import wizard expects dates to be, in Excel terms, to be an integer when in number format, though the date must be in dd/mm/yyyy format for the import. Depending on how thorough the user is, it is possible to have data in Excel that look like integers but are not. The only way I have found to really check them is to copy all the data from one sheet to another and using Edit, Paste Special, values to check what the 'integers' really are. Many can be exposed in this way as still decimal fractions, including time data that R6 does not like.
However, if the thorough check in Excel is not done, and apparently integer type dates are used for import, the wizard does not point out any errors with the format of the dates, but when the data are imported, in the survey there are a long list of records with date marked as 'Unknown'. This is not especially helpful.
Also, as part of the work I have had to do to track down these little and simple details, I found that at least one file to be imported contained duplicate, or even triplicate records. The wizard has, not once, identified any such duplicates during the import process, leaving me with no confidence at all that the message of '0 duplicates' is correct.
I have had one or two bits of invalid data and have found it near to impossible to deal with them. The messages about the invalid records are about keys that are not imported (as the data were invalid) with no message in the language of the original record to show what the problem is.
Therefore, I make some pleas for a late Recorder Christmas present:
1 For the import wizard to be modified to show, early, if the date is not in an acceptable format.
2 For the wizard to be modified so that it does identify duplicate records within an import file. I appreciate that there could be some pedantic discussion about what is a 'duplicate' and I would probably be at the more pedantic end of the discussion. However, same species, same date, same observer, same place, same sex/stage and same abundance seem to be pretty good indicators of a duplicate.
3 For the messages that are provided for Invalid records (and may be for duplicates but I have never seen one) that contain the information as now if that really is of any help to anyone, but also showing the actual record, as in the import file, so that an ordinary person, without clear knowledge of the keys and internal structure of R6, can make sense of the messages.
I have nearly finished my reimporting of data but am sure that the above modifications could help many people who import data from Excel (or other) files. If R6 is to appear more attractive (than Mapmate, say) to volunteer recorders, then this important interface with the user needs to work better than it currently does. I do, occasionally try to persuade people to join the set of Recorder users. I am never quite sure if this is just cognitive dissonance (ie I have put in so much effort myself to get here, that I have to make R6 sound good to others) or if I really do believe it to be a useful tool. I would prefer the latter.
I do hope these pleas do not fall on deaf ears...
All the best, Ian