1

Re: Importing Observer(s) & Determiner Names

Hi all,

I've received a couple of datasets (from MapMate, but the data source is irrelevent) which I've exported to an Excel spreadsheet; however, the contents of the Observer(s) and Determiner Name fields to import mostly require complete reformatting to conform to the Recorder 6 import specification, otherwise I get all sorts of combinations of strings that Recorder 6 parses as multiple Names from a single Name.

To make matters worse, the Observer(s) and Determiner Names I have to deal with are in all sorts of formats i.e. singularly in a field as "Les Hill", "Hill, Les", "Hill, L.J." Hill, Mr L." or as multiple recorders in a field like "Les Hill, Hill, Les, Hill, L.J. and Hill, Mr L." (these are the better examples, although Recorder 6 can't parse "Hill, L.J." correctly (it does work if there is only 1 initial)).

I've test-imported quite a few combinations of names (single or multiple names in a field) and ways of formatting them; subsequently, I have a list of formats that Recorder 6  can import; ideally though, I want to get these fields formatted in such a way that Recorder 6 imports them with minimum subsequent action from me, as I potentially will be dealing with hundreds of recorders' names in a single local dataset and thousands in the national dataset.

Does anyone have, or know of any tools to assist formatting Observer(s) and the Determiner Name entries that don't conform, so they conform to the import specification, or is this another case of having to "jump through hoops" to get the data formatted correctly prior to import? i.e. Good ol' Search & Replace!!

I know this is a long-shot as I just can't see how any SQL or Excel Function can handle this issue without being horrendously complicated!

In anticipation,
Les.

Les Evans-Hill
Senior Data Officer, National Moth Recording Scheme

2

Re: Importing Observer(s) & Determiner Names

Hi Les,

Just a quick suggestion: as I see they appear to generally be derivations of your own name, then by extension, if you do a simple find and replace within the Excel source, you should still be able to verify the validity of your records. Oh and R6 will just need to match it up the once!

I realise that you may be seen to be picky if you send out an edict to your recorders to maintain consistency when filling in recording sheets, however, these simple steps pay large dividends in the long run, not least once you get into the realm of truly historical data.


Cheers for now, Rob.

3

Re: Importing Observer(s) & Determiner Names

Les, I'm tackling the very same issue with a large MapMate dataset I'm currently dealing with. The trouble is, what makes MM so appealing (it's simplicity) is also the cause of data storage problems, particularly when it comes to names of recorders. I don't have a simple solution I'm afraid. I'm having to resort to running the data through an increasingly complex Ruby script which cleans up the mess.

The most difficult issue is the one you've identified with the multiple recorders. I've tackled this by creating a lookup the takes each occurrence of a multiple recorder and replaces it with something I've defined. So I've got an array that is full of entries like this:

"Dr. & Mrs. E & N. J. H. Bloggs" => "Dr. E. Bloggs & Mrs N. J. H. Bloggs"

It seems like an almost impossibly difficult task unpicking entries like this in an automated fashion. I've written some code that swaps back-to-front names around, so the following:

"Bloggs,  C. E. (Rev. )"
"Jones, J.(Miss)"
"Edwards Porter, H."

Becomes...

"Rev. C. E. Bloggs"
"Miss J. Jones"
"H. Edwards Porter"

I can provide this code (it's really just a regular expression) if that would help?

Charles

Charles Roper
Digital Development Manager | Field Studies Council
http://www.field-studies-council.org | https://twitter.com/charlesroper | https://twitter.com/fsc_digital

4

Re: Importing Observer(s) & Determiner Names

Rob/Charles,

Thanks for coming back to me...

Rob - I've written a small document which is imminently going to all the county moth recorders detailing "best practice" formatting of data, especially when it boils down to recorders/determiners names and that other old chestnut i.e. site names and grid references (multiple site names for a single grid reference and vice versa).

This is based on Martin Harvey's original MapMate data entry guidelines from 2003, so my interpretation shouldn't come as a shock (as Martin's has been in the domain for nearly 5 years now) and lead to a rebellion, although my tin hat is beside me in case!

Charles - Bang on! I have so many recorder entities that just don't conform to the import specification that require manual intervention.

Last night I played (unpaid!) at home with Excel 'Lookup' tables and this may part-solve the issue. Basically, I created SQL in MapMate to give me a list of all the unique recorder names based on the dataset I'm working with, pasted this into a column in Excel (not forgetting to sort alphabetically), manually typed in the next column the entry I desire and created my Lookup, which worked well. Now to try it (paid!) with a dataset of 160,000 records and 500 unique names; and this is one of the smaller datasets I'll have to deal with!

I've searched the WWW for code and the only ones I've found (Excel formulae) deal with converting a single format throughout i.e. "Smith, J." to "J. Smith" - no good when your data doesn't conform to the same format throughout.

I be delighted to see the code you are offering.

Best wishes,
Les.

Les Evans-Hill
Senior Data Officer, National Moth Recording Scheme

5

Re: Importing Observer(s) & Determiner Names

I have written a comprehensive name processor, in the Fortran language, which takes a name string such as

Mark Hill and Preston CD - TRANSFORMED TO - Hill, M. and Preston, C.D.
Hill, Les and Roper, C and Wilkinson, Steve and Mark Hill - TRANSFORMED TO - Hill, L., Hill, M., Roper, C. and Wilkinson, S.
Rev C Anderson et al - TRANSFORMED TO - Anderson, Rev. C. et al.

The input has to be a text file, the output is a text file which should be opened in Excel.  The program gives diagnosticss for ill-formed names, or if it thinks the name is a club or society.  All names are put into alphabetical order.  It also provides an "atomized" form where the names are listed in separate columns.  We find it very useful at BRC Monks Wood.  I am happy to supply it on a personal basis.  It ought at some stage to go on our website.  The executable is compiled in MS Fortran, and operates in a DOS window, but runs fine under Windows XP.

Mark Hill