1

Re: Name Formats in Import Wizard

Well I was having huge problems with the import wizard being very fussy about what name formats it does accept and assuming that the data I receive will always have similar problems much of it being from MapMate [MapMap allows users to decide on name formats]. I decided to write a routine that would reformat names according to some rules that could be enforced each time I receive data from third parties. After all it couldn't be that difficult.

Well it is not that easy either. I realised that Regular Expressions [RegEx] are now available to windows users [something I've been sad enough to want for many years]. RegEx are a world in their own right with no absolute constants between languages. VBA is a late comer [it uses VBScript] and has many missing bits including split and replace. Both of which would have saved me a lot of time.

I have come up with a RegEx that others may find useful, may...

    "^(\w[\']\w*|\w*)\,\s(\u004D\u0072\u0073\s?|\u004D\u0072\s?|\u004D\u0069\u0073\u0073\s?)?(\w\.\w\.\,?|\w\.\s\w\.\,?|\w*\.?\s?\w?\.?)?"

With this I can walk through a list of names in roughly this format:

Surname, A., Surname, A.B., Surname, A. B., Surname, Mrs A., Surname, Mr A.

Its then a simple case of reformatting the name to A. Surname, and putting the list back together. The routine only finds the first item, I process and remove it and send it back until all the items have been dealt with.

Add to this a Select Case statement to deal with variants that don't conform and therefore have to be adjusted by a specific case [I had 15 in 650] and I hope to test this when my new server arrives. The old one died, probably because it could no longer cope with my attempts to import data :)

I had to make use of some software http://www.regexbuddy.com/ written by Jan Goyvaerts co author of "Regular Expressions Cookbook". It would have been difficult without this help and other resources such as http://www.regular-expressions.info/ another of Jan's sites.

I hope someone finds it useful.

Tony

Data Manger
Somerset Environmental Records Centre

2

Re: Name Formats in Import Wizard

All greek to me Tony, but I suspect it is something I would find useful with a bit of explanation. One day...

Rob Large
Wildlife Sites Officer
Wiltshire & Swindon Biological Records Centre

3

Re: Name Formats in Import Wizard

I highly recommend getting to grips with Regular Expressions, Rob. I also do similar processing on MapMate data to what Tony describes. They look pretty impenetrable at first, but are worth the effort. I can vouch for RegEx Buddy too - an essential too when learning and working with RegExes.

Charles Roper
Digital Development Manager | Field Studies Council
http://www.field-studies-council.org | https://twitter.com/charlesroper | https://twitter.com/fsc_digital

4

Re: Name Formats in Import Wizard

It occurs to me that the problems with names parsing is being tackled in the wrong place. The problem  exists not because of any fault in Recorder or in fact in the Import Wizard, but because the  'Recorder' and  'Determiner' fields in MapMate are not structured. No matter how hard you try it is impossible to come up with a way of handling  all possibilities.

What is needed is for MapMate to have a reporting option  which takes its name field and turns this into something which Recorder and any other programs which need a structured field can handle.  All it would need to do is to split up the names using a  similar logic to Recorder, and then put them back together in a standard format which Recorder can handle,

A  carefully specified MSAccess function to do this should  take no more than a few days  to program and test so. the cost shared between all the LRC's who need to import  MapMate  data should  be minimal.

Mapmate users could then produce a report which summarised the structured names and investigate and correct anything which looked wrong before exporting it to Recorder. 

Perhaps the LRC's should get together and pay MapMate to add such a function.


Mike.

Mike Weideli

5

Re: Name Formats in Import Wizard

I have an Excel "Lookup" table with over 20,000 recorder/determiner names in all sorts of weird and wonderful formats. As Mike states in his first paragraph, it is impossible to to handle with 100% certainty.

This method, although it has taken sometime to build (as datasets come in) does work for me very well.

Les Evans-Hill
Senior Data Officer, National Moth Recording Scheme

6

Re: Name Formats in Import Wizard

I would like to think that MIke is write but I have this niggle in the shadows of my mind that some of the problems of names cannot be parsed. I found this for instance;

Cname A. Sname & Town & District Natural History Society

Without knowing the society exists this seems difficult to parse. Although on import one would hopefully recognise it.

This one is also difficult as there is no consistency. We look at it and see the answer but I could not see an obvious way to parse it.

Sname, A.B, A Sname, Sname, A.B

The best I could do was to identify those circumstances where my routine did not work and write in the exceptions.

It would be better placed within MapMate but failing that a common set of Access or other routine [dll] to be called upon to do the bulk of the work would be useful. I had difficultly explaining the need for me to spend the time developing a solution. It did take a week to learn and implement regular expressions following lots of dead ends in technique to get there.

With so many of these things if it is not the complete solution there are some advantages to developing it ones self. It can easily be tinkered with to fit changing circumstances and the finishing touches are not quite needed. The down side of course is that there are a limited number of people with time or resources to do it.

Data Manger
Somerset Environmental Records Centre

7

Re: Name Formats in Import Wizard

Another tool you may find useful both for Regular Expressions and for general searching is WinGrep, particularly text files such as those one might be importing but are too big for Excel. www.wingrep.com.

Grep is a familiar Unix tool that has been around for decades and while I have used versions in windows this is the best so far and supports regular expressions and soundex. It also does search and replace.

One significant factor for using it is that the windows search does not do a search in the real sense. Windows search tool will only search in files which it understands i.e. types that are registered with Microsoft. So Office files are Ok and many others but the text files I use in programming are not. I only found this when searching for a variable in a mapbasic file. Windows did not find it even though I asked specifically to search the correct file type and I could find it myself. This limitation is noted by Microsoft, they just don't tell you when you do a search that it really is ignoring your request.

Wingrep will display the lines before and after a matched item an a list of files it finds a match in along with the number of occurrences.

If you can't think of a time you might use it I suggest you have a go anyway as it may open up an opportunity you can exploit in future.

Data Manger
Somerset Environmental Records Centre