1

Re: Update on the import wizard in R6.13.3.182

I hope you have all had a good Christmas!

Over the festive season I have decided that I needed to reconstruct my database. The prompt came from a data supplier (a LRC) that wishes to provide me with a complete, new set of all of its records every year. This immediately caused panic as I had not fully understood the value and importance of 'surveys' when I started my database. I decided to rebuild my database of c 95,000 records by importing again, all records divided up into different surveys named after each main data supplier, leaving a large 'catch the rest' that I have called 'Observer data'. Consequently, I have been using the import wizard a lot over the last few days and have found some weaknesses that I wished to report.

Firstly, not all the key checks for valid data are done before importing. A key aspect is the date. The import wizard expects dates to be, in Excel terms, to be an integer when in number format, though the date must be in dd/mm/yyyy format for the import. Depending on how thorough the user is, it is possible to have data in Excel that look like integers but are not. The only way I have found to really check them is to copy all the data from one sheet to another and using Edit, Paste Special, values to check what the 'integers' really are. Many can be exposed in this way as still decimal fractions, including time data that R6 does not like.

However, if the thorough check in Excel is not done, and apparently integer type dates are used for import, the wizard does not point out any errors with the format of the dates, but when the data are imported, in the survey there are a long list of records with date marked as 'Unknown'. This is not especially helpful.

Also, as part of the work I have had to do to track down these little and simple details, I found that at least one file to be imported contained duplicate, or even triplicate records. The wizard has, not once, identified any such duplicates during the import process, leaving me with no confidence at all that the message of  '0 duplicates' is correct.

I have had one or two bits of invalid data and have found it near to impossible to deal with them. The messages about the invalid records are about keys that are not imported (as the data were invalid) with no message in the language of the original record to show what the problem is.

Therefore, I make some pleas for a late Recorder Christmas present:

1 For the import wizard to be modified to show, early, if the date is not in an acceptable format.

2 For the wizard to be modified so that it does identify duplicate records within an import file. I appreciate that there could be some pedantic discussion about what is a 'duplicate' and I would probably be at the more pedantic end of the discussion. However, same species, same date, same observer, same place, same sex/stage and same abundance seem to be pretty good indicators of a duplicate.

3 For the messages that are provided for Invalid records (and may be for duplicates but I have never seen one) that contain the information as now if that really is of any help to anyone, but also showing the actual record, as in the import file, so that an ordinary person, without clear knowledge of the keys and internal structure of R6, can make sense of the messages.

I have nearly finished my reimporting of data but am sure that the above modifications could help many people who import data from Excel (or other) files. If R6 is to appear more attractive (than Mapmate, say) to volunteer recorders, then this important interface with the user needs to work better than it currently does. I do, occasionally try to persuade people to join the set of Recorder users. I am never quite sure if this is just cognitive dissonance (ie I have put in so much effort myself to get here, that I have to make R6 sound good to others) or if I really do believe it to be a useful tool. I would prefer the latter.

I do hope these pleas do not fall on deaf ears...

All the best, Ian

2 (edited by stevemcbill 01-01-2009 11:56:31)

Re: Update on the import wizard in R6.13.3.182

Ian

I would strongly agree with you on ALL of those points.

The validation of input fields has always been a problem and is one of the major items which caused rECOrd to write their On-line Data Input System (RODIS - see this by clicking the large "RODIS" letters on the home page of the rECOrd website (www.rECOrd-LRC.co.uk). This on-line data capture facility ensures that ALL fields captured are correctly formatted and even species names and locations names as well as dates, grid refs., etc. are in the necessary formats to ensure easy importation via the Import Wizard in Recorder.

RODIS also checks for duplicates within the file itself, but also against all the other files entered through RODIS, AND against the main Recorder database thereby preventing duplicate records ever getting near the system and/or the main database. RODIS will import spreadsheets to allow for rapid checking of all of the above parameters and formats. It also allows users to submit the data to rECOrd for inclusion within the main Recorder database; to export the data back to themselves in clean spreadsheet format for use with their own database of choice (or as spreadsheet files); or to export the data to anybody else with an e-mail address (e.g. County Recorders, National Recording Schemes, etc., etc.).

Having said ALL the above I would still support any moves to improve the Import Wizard itself in the manner you have described.

The error messages coming out of the Import Wizard have always been nothing less than ARCANE  !!  Any improvement (especially one which allowed the User to relate the error back to the original record) would be a wonderful improvement and would greatly simplify the finding and fixing of problems !!

Cheers and a Happy New Year to you and all recorders and Recorder Users (and MapMate users too) !!

Steve     :)

Steve J. McWilliam
www.rECOrd-LRC.co.uk
www.stevemcwilliam.co.uk/guitar/

3

Re: Update on the import wizard in R6.13.3.182

Hi Ian and Steve

A couple of my own thoughts on this:
1) Agreed - if this is happening it would make sense for the import wizard to detect the date format problems during the initial linking of the column to the date column type - that's when it checks for other formatting problems such as spatial references. Having said that, it would be even nicer if it could just accept the integer type dates and fix them internally without bothering you about it.
2) The concept of duplicate in this instance is not the same as you are expecting - the import wizard is looking for duplicates with respect to the uniquely identifiable database record, not the actual occurrence. So, if a person observes a species on a place and date, then this is typed into 2 systems, when you import this data it is not a duplicate. However, if this is typed into one system, then exported and re-imported, it is a duplicate. There has been some discussion of adding the type of duplicate checking you are referring to into Recorder's import, but it has never been implemented because of concerns regarding the performance. However, there is now (in version 6.13) a report that can identify potential duplicates for you which you can run after import. Go to Reports\Run... from the menu. The report can be found under JNCC\Housekeeping report\H1 - Duplicate Records.
Perhaps a way forward would be to provide an option to run this report at the end of an import, and modify the report so that it is filtered to look for duplicates only within the newly imported data.
3) Agreed. It ought to be possible for Recorder to give you the Excel row number, or even show you the original data.

Happy New Year to you all,

John van Breda
Biodiverse IT

4

Re: Update on the import wizard in R6.13.3.182

Ian, have you tried using the temporary database to track down invalid items? If you know some Access this is not too difficult. If, for example, the import wizard flags a record as invalid and tells you that the grid reference is outside the bounding box for the survey you can look up the key in the temporary database and at least find out what the offending grid reference is, although, I agree, it would be much better if this was in the original error message. I have also suggested this to JNCC. The Help provides more information on the temporary database – see Contents – Tasks – Exchanging data – Import data – Invalid items. There is also a help topic on Duplicate items which explains the Recorder concept of duplicates.

Sally Rankin, JNCC Recorder Approved Expert
E-mail: s.rankin@btinternet.com
Telephone: 01491 578633
Mobile: 07941 207687

5

Re: Update on the import wizard in R6.13.3.182

Hi All,

Just to let you know that we have drafted some changes to the import wizard to identify some issues with date and time info in the same column - this will turn the rows red (as with other highlighted errors in the import wizard) to indicate that there is an issue with a particular row of data. I will add the issue of checking the date format itself to this CCN. This issue, however,  is not likely to be fixed in the next release (it will more than likely have to wait until next financial year now as the remainder of the budget has been allocated). As John suggested above the import wizard is looking for duplicates but not in the way that a user would like it to - ie duplicates in the data itself. These kind of checks made in the import wizard are likely to reduce performance which is not something we are keen on - it is probably simpler (and quicker) to make these checks yourself in excel as part of tidying up the data before import. The third point has been logged previously.

Thank you all for your thoughts and comments on this.
Cheers,
Lynn