26

Re: Accuracy of data migration from Gateway to Atlas

Yes, I think there has been a significant problem with regard to the linking of junior synonyms to the recommended name or taxonomic concept. Tony Irwin emailed me this evening:

Just tried to look up Tephrochlaena oraria (Heleomyzidae), and find that that species is down as a synonym of Syntormon pseudospicatum (Dolichopodidae!). The map for the species does appear when I search for Tephrochlaena halterata, although that is actually as a misidentification synonym of oraria. I don't know whether this is the same as what was on the Gateway, or whether this is a new issue that has arisen consequent to the migration.

My example was that "Alophora barbifrons" seems to be linked as a junior synonym of "Parapiophila vulgaris", which it isn't - it's the junior synonym of "Phasia barbifrons".

I've been in touch with David Martin but he says he is travelling this week so I doubt we will get a quick fix for it.

Chris Raper, Manager of the UK Species Inventory, Angela Marmont Centre for UK Biodiversity,
Natural History Museum, Cromwell Road, London, SW7 5BD.  (tel: 020 7942 5894)
also Tachinid Recording Scheme (http://tachinidae.org.uk/)

27

Re: Accuracy of data migration from Gateway to Atlas

Jim Bacon wrote:

I don't mind what matching algorithm is used so long as the results are correct.

Neither do I - but as we've seen matching on the name isn't and since we have a cast iron code why not use it and bypass any potential slip-ups?

[The agg isn't missing from Oligia strigalis - just not displayed in the processed information]

Charlie Barnes
Information Officer
Greater Lincolnshire Nature Partnership

28

Re: Accuracy of data migration from Gateway to Atlas

Hi Jim,
I'll keep testing to see if I can provide more information to help you sort out the root of the problem.
I'll report later.

Christine

29

Re: Accuracy of data migration from Gateway to Atlas

Hi Charlie,
My current impression (which remains pure conjecture based on the anomalies we are seeing) is that, when occurrence records were imported, the TVK associated with the record was used to try to make a match with the UK Species Inventory. However, it seems as if that copy of UKSI did not contain some of the TVK/names that the Gateway had employed meaning the the code could not be used.
Jim.

30

Re: Accuracy of data migration from Gateway to Atlas

Jim Bacon wrote:

However, it seems as if that copy of UKSI did not contain some of the TVK/names that the Gateway had employed meaning the the code could not be used.

I wonder if that information is stored anywhere (I can't see it) - i.e. see the steps it has taken (failed using Taxon Id, failed on Scientific name, using Genus). We only seem to be able to see the "last" step. Although if there are only three steps, then you can work it out from the Name match metrics (scientific name -> taxon id -> higher level ?)

Charlie Barnes
Information Officer
Greater Lincolnshire Nature Partnership

31

Re: Accuracy of data migration from Gateway to Atlas

Hi Chris

I see what you are saying about the synonomy of Alophora barbifrons.

Likewise I see Tephrochlaena oraria is down as a synonym of Syntormon pseudospicatum not Tephrochlaena halterata.

I can't see any way of telling whether this a problem caused in migration to the Atlas or was a pre-existing problem. Presumably you think this is new, Chris, but I can't for the moment think how that might have arisen.

I guess we have to wait for David Martin to be back on the scene to look in to it all.

Jim Bacon.

32

Re: Accuracy of data migration from Gateway to Atlas

Looking further at the "hybridization" of the Larch Tortrix moth with the Dutch Elm tree (last issue on page 1 of this thread) I note from https://species.nbnatlas.org/species/NB … 2111#names that the scientific names of Zeiraphera diniana (Guenée, 1845) and Zeiraphera occultana (Douglas, 1846) have also been fused into this strange union, but their preferred synonym of Zeiraphera griseana (according to UKSI) isn't it the list of names even though this is the names url returned by a search for it.

The UKSI recommended TVK for the moth is NHMSYS0021143278 and for Ulmus glabra x minor x plotii = U. x hollandica it is NBNSYS0000162111. I can't see any way yet for the two to have become conflated...?

Keith

33

Re: Accuracy of data migration from Gateway to Atlas

Hi Charlie

I don't see any assertions about the taxon matching process but I can get some more information about the results. The query

https://records-ws.nbnatlas.org/occurrences/search?q=*:*&facets=name_match_metric

returns eight different results for the name matching method and the number of records to which it was applied. They are

+-------------------+-------------------+
| Name match metric | Number of records |
+-------------------+-------------------+
| taxonIdMatch      | 213427156         |
+-------------------+-------------------+
| exactMatch        | 2438552           |
+-------------------+-------------------+
| higherMatch       | 675574            |
+-------------------+-------------------+
| noMatch           | 514150            |
+-------------------+-------------------+
| canonicalMatch    | 59858             |
+-------------------+-------------------+
| fuzzyMatch        | 11790             |
+-------------------+-------------------+
| vernacularMatch   | 2820              |
+-------------------+-------------------+
| Unknown           | 2080              |
+-------------------+-------------------+

I'm reeling slightly at the size of those numbers... are there really 217 million records in there?
We can do a bit of digging in to any of these, for example with this query

https://records-ws.nbnatlas.org/occurrences/search?q=name_match_metric:noMatch&facets=raw_name

This reveals, top of the list, 32k records of Korscheltellus lupulina, NHMSYS0021142343, which have not found a match. Since neither this species nor this genus can be found in UKSI, and consequently not the TVK either, it fits my theory that there should be no match. Example record. There is no name you can search on to access these records. Meanwhile, according to the Atlas species list, this name is accepted in UKSI.

Jim Bacon.

34

Re: Accuracy of data migration from Gateway to Atlas

Thanks Jim - that's really useful and more complex than I predicted!

Charlie Barnes
Information Officer
Greater Lincolnshire Nature Partnership

35 (edited by TeresaF 05-04-2017 13:04:37)

Re: Accuracy of data migration from Gateway to Atlas

Jim Bacon wrote:

I'm reeling slightly at the size of those numbers... are there really 217 million records in there?

Sounds about right - 152 million in the new BTO dataset, and there were over 100 million on the Gateway (included records with no public view which mostly haven't been published on the atlas, accounting for the "missing" records).

Thanks again for your detective work into the taxa matching issues everyone and particularly Jim, I hope it helps the team solve the issues quickly. There seem to be a lot of lepidoptera being affected - could it have anything to do with the new lepidoptera checklist (see threads in the species dictionary forum here and here)?

-----------------
Teresa Frost | Wetland Bird Survey National Organiser | BTO
Other hat  | National Forum for Biological Recording Council
(Old hats  | NBN Board, ALERC Board, CBDC, KMBRC)

36 (edited by Jim Bacon 05-04-2017 16:45:46)

Re: Accuracy of data migration from Gateway to Atlas

I note an issue has been reported of absence records being imported as presence records. I wonder if we should be raising our problems over there rather then here...

A query

https://records-ws.nbnatlas.org/occurrences/search?q=*:*&facets=occurrence_status

suggests that there are no absence records at all in the Atlas. Be alert if that may be relevant to your datasets.

Jim Bacon

PS. Wow! BTO records in the Atlas. What great news!

37

Re: Accuracy of data migration from Gateway to Atlas

Jim Bacon wrote:

I can't see any way of telling whether this a problem caused in migration to the Atlas or was a pre-existing problem. Presumably you think this is new, Chris, but I can't for the moment think how that might have arisen.

Hi Jim

Yeah, like you I could understand if the link was just broken complete but to establish a new, erroneous link seems weird. The UKSI database is fine and we've never had issues with the taxa that currently have problems so I doubt that anything was wrong with UKSI as a source of their data. The export was requested in an odd format (Text CSV) so it's possible that something happened during the export but I can easily reissue the files OR just give them an original Access copy which they can use to extract what they need.

I will see what David wants to do.

Chris R.

Chris Raper, Manager of the UK Species Inventory, Angela Marmont Centre for UK Biodiversity,
Natural History Museum, Cromwell Road, London, SW7 5BD.  (tel: 020 7942 5894)
also Tachinid Recording Scheme (http://tachinidae.org.uk/)

38

Re: Accuracy of data migration from Gateway to Atlas

Hi everyone,
Going back to my Arran Carpet example I realised that this species has had a recent txonomic up-date (New Lepidoptera Checklist 2013/2014) when the genus name changed from Chloroclysta to Dysstroma and Arran Carpet became a subspecies of marbled carpet.

So to test whether part of the problem was related to recent taxonomic changes I looked at some fungi where thesome species had been moved from Hygrocybe to Gliophilus. This change was not made in the BMS list which I use on Recorder until fairly recently and was not on the Gateway until last year.

Fisrt I tried two Hygrocybe species which had not changed - the search work fine but again if I searched on a trinomial I only got the trinomial records, but if I searced on the binomial I all the records were returned.

When I tried two species which had changed from Hygrocybe to Gliophorous if I searched under the new species name although it appeared on the results lists there were no records. If I searched under the vernacular name, records appeared under related searches and if I looked at the original versus processed data the taxon concept id and species id were missing - hence no match. If I searced under the old name there were no results, although in one case Hygrocybe irrigata returned two records both showing matching taxonomic ids in the original and processed data, but these were the old TVKs. For some reason these two records had not been up-dated.


taxon id    taxon concept    species id   
Hygrocybe glutinipes glutinipes    NHMSYS0001484471    NHMSYS0001484471    NHMSYS0001484470    Taxon GUID match
Hygrocybe glutinipes rubra    NHMSYS0001484472    NHMSYS0001484472    NHMSYS0001484470    Taxon GUID match
Hygrocybe glutinipes rubra    NHMSYS0001484470    NHMSYS0001484470    NHMSYS0001484470    Taxon GUID match
               
Hygrocybe virginea virginea    BMSSYS0000008392    BMSSYS0000008392    NHMSYS0001484537    Taxon GUID match
Hygrocybe virginea ochraceopallida    BMSSYS0000008389    BMSSYS0000008389    NHMSYS0001484537    Taxon GUID match
Hygrocybe virginea fuscescens    NHMSYS0001484539    NHMSYS0001484539    NHMSYS0001484537    Taxon GUID match
Hygrocybe virginea    NHMSYS0001484537    NHMSYS0001484537    NHMSYS0001484537    Taxon GUID match
               
               
Gliophorus psittacinus    BMSSYS0000045739           
               
Gliophorus irrigatus    BMSSYS0000045736           
Hygrocybe irrigata    NHMSYS0001484476    NHMSYS0001484476    NHMSYS0001484476    Taxon GUID match

This would suggest that there is a problem where there have been recent taxonomic changes.

Christine