<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title><![CDATA[Forum — Accuracy of data migration from Gateway to Atlas]]></title>
		<link>https://forums.nbn.org.uk/viewtopic.php?id=6803</link>
		<atom:link href="https://forums.nbn.org.uk/extern.php?action=feed&amp;tid=6803&amp;type=rss" rel="self" type="application/rss+xml" />
		<description><![CDATA[The most recent posts in Accuracy of data migration from Gateway to Atlas.]]></description>
		<lastBuildDate>Thu, 06 Apr 2017 09:27:02 +0000</lastBuildDate>
		<generator>PunBB 1.4.6</generator>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26673#p26673</link>
			<description><![CDATA[<p>Hi everyone,<br />Going back to my Arran Carpet example I realised that this species has had a recent txonomic up-date (New Lepidoptera Checklist 2013/2014) when the genus name changed from Chloroclysta to Dysstroma and Arran Carpet became a subspecies of marbled carpet.</p><p>So to test whether part of the problem was related to recent taxonomic changes I looked at some fungi where thesome species had been moved from Hygrocybe to Gliophilus. This change was not made in the BMS list which I use on Recorder until fairly recently and was not on the Gateway until last year. </p><p>Fisrt I tried two Hygrocybe species which had not changed - the search work fine but again if I searched on a trinomial I only got the trinomial records, but if I searced on the binomial I all the records were returned.</p><p>When I tried two species which had changed from Hygrocybe to Gliophorous if I searched under the new species name although it appeared on the results lists there were no records. If I searched under the vernacular name, records appeared under related searches and if I looked at the original versus processed data the taxon concept id and species id were missing - hence no match. If I searced under the old name there were no results, although in one case Hygrocybe irrigata returned two records both showing matching taxonomic ids in the original and processed data, but these were the old TVKs. For some reason these two records had not been up-dated.</p><br /><p>taxon id&nbsp; &nbsp; taxon concept&nbsp; &nbsp; species id&nbsp; &nbsp; <br />Hygrocybe glutinipes glutinipes&nbsp; &nbsp; NHMSYS0001484471&nbsp; &nbsp; NHMSYS0001484471&nbsp; &nbsp; NHMSYS0001484470&nbsp; &nbsp; Taxon GUID match<br />Hygrocybe glutinipes rubra&nbsp; &nbsp; NHMSYS0001484472&nbsp; &nbsp; NHMSYS0001484472&nbsp; &nbsp; NHMSYS0001484470&nbsp; &nbsp; Taxon GUID match<br />Hygrocybe glutinipes rubra&nbsp; &nbsp; NHMSYS0001484470&nbsp; &nbsp; NHMSYS0001484470&nbsp; &nbsp; NHMSYS0001484470&nbsp; &nbsp; Taxon GUID match<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <br />Hygrocybe virginea virginea&nbsp; &nbsp; BMSSYS0000008392&nbsp; &nbsp; BMSSYS0000008392&nbsp; &nbsp; NHMSYS0001484537&nbsp; &nbsp; Taxon GUID match<br />Hygrocybe virginea ochraceopallida&nbsp; &nbsp; BMSSYS0000008389&nbsp; &nbsp; BMSSYS0000008389&nbsp; &nbsp; NHMSYS0001484537&nbsp; &nbsp; Taxon GUID match<br />Hygrocybe virginea fuscescens&nbsp; &nbsp; NHMSYS0001484539&nbsp; &nbsp; NHMSYS0001484539&nbsp; &nbsp; NHMSYS0001484537&nbsp; &nbsp; Taxon GUID match<br />Hygrocybe virginea&nbsp; &nbsp; NHMSYS0001484537&nbsp; &nbsp; NHMSYS0001484537&nbsp; &nbsp; NHMSYS0001484537&nbsp; &nbsp; Taxon GUID match<br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <br />Gliophorus psittacinus&nbsp; &nbsp; BMSSYS0000045739&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <br />&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <br />Gliophorus irrigatus&nbsp; &nbsp; BMSSYS0000045736&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <br />Hygrocybe irrigata&nbsp; &nbsp; NHMSYS0001484476&nbsp; &nbsp; NHMSYS0001484476&nbsp; &nbsp; NHMSYS0001484476&nbsp; &nbsp; Taxon GUID match</p><p>This would suggest that there is a problem where there have been recent taxonomic changes.</p><p>Christine</p>]]></description>
			<author><![CDATA[null@example.com (chrisjohnson)]]></author>
			<pubDate>Thu, 06 Apr 2017 09:27:02 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26673#p26673</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26670#p26670</link>
			<description><![CDATA[<div class="quotebox"><cite>Jim Bacon wrote:</cite><blockquote><p>I can&#039;t see any way of telling whether this a problem caused in migration to the Atlas or was a pre-existing problem. Presumably you think this is new, Chris, but I can&#039;t for the moment think how that might have arisen.</p></blockquote></div><p>Hi Jim </p><p>Yeah, like you I could understand if the link was just broken complete but to establish a new, erroneous link seems weird. The UKSI database is fine and we&#039;ve never had issues with the taxa that currently have problems so I doubt that anything was wrong with UKSI as a source of their data. The export was requested in an odd format (Text CSV) so it&#039;s possible that something happened during the export but I can easily reissue the files OR just give them an original Access copy which they can use to extract what they need. </p><p>I will see what David wants to do. </p><p>Chris R.</p>]]></description>
			<author><![CDATA[null@example.com (ChrisR)]]></author>
			<pubDate>Wed, 05 Apr 2017 18:47:18 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26670#p26670</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26669#p26669</link>
			<description><![CDATA[<p>I note an <a href="https://github.com/nbnuk/nbnatlas-issues/issues/215">issue</a> has been reported of absence records being imported as presence records. I wonder if we should be raising our problems over there rather then here...</p><p>A query </p><div class="codebox"><pre><code>https://records-ws.nbnatlas.org/occurrences/search?q=*:*&amp;facets=occurrence_status</code></pre></div><p> suggests that there are no absence records at all in the Atlas. Be alert if that may be relevant to your datasets.</p><p>Jim Bacon</p><p>PS. Wow! BTO records in the Atlas. What great news!</p>]]></description>
			<author><![CDATA[null@example.com (Jim Bacon)]]></author>
			<pubDate>Wed, 05 Apr 2017 16:32:20 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26669#p26669</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26655#p26655</link>
			<description><![CDATA[<div class="quotebox"><cite>Jim Bacon wrote:</cite><blockquote><p> I&#039;m reeling slightly at the size of those numbers... are there really 217 million records in there?</p></blockquote></div><p>Sounds about right - 152 million in the new <a href="https://registry.nbnatlas.org/public/showDataResource/dr528">BTO dataset</a>, and there were <a href="https://nbn.org.uk/news/100-million-records-on-nbn-gateway/">over 100 million</a> on the Gateway (included records with no public view which mostly haven&#039;t been published on the atlas, accounting for the &quot;missing&quot; records).</p><p>Thanks again for your detective work into the taxa matching issues everyone and particularly Jim, I hope it helps the team solve the issues quickly. There seem to be a lot of lepidoptera being affected - could it have anything to do with the new lepidoptera checklist (see threads in the species dictionary forum <a href="https://forums.nbn.org.uk/viewtopic.php?id=6396">here</a> and <a href="https://forums.nbn.org.uk/viewtopic.php?id=6168">here</a>)?</p>]]></description>
			<author><![CDATA[null@example.com (TeresaF)]]></author>
			<pubDate>Wed, 05 Apr 2017 13:04:20 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26655#p26655</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26654#p26654</link>
			<description><![CDATA[<p>Thanks Jim - that&#039;s really useful and more complex than I predicted!</p>]]></description>
			<author><![CDATA[null@example.com (charliebarnes)]]></author>
			<pubDate>Wed, 05 Apr 2017 12:28:51 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26654#p26654</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26653#p26653</link>
			<description><![CDATA[<p>Hi Charlie</p><p>I don&#039;t see any assertions about the taxon matching process but I can get some more information about the results. The query<br /></p><div class="codebox"><pre><code>https://records-ws.nbnatlas.org/occurrences/search?q=*:*&amp;facets=name_match_metric</code></pre></div><p> returns eight different results for the name matching method and the number of records to which it was applied. They are </p><div class="codebox"><pre><code>+-------------------+-------------------+
| Name match metric | Number of records |
+-------------------+-------------------+
| taxonIdMatch      | 213427156         |
+-------------------+-------------------+
| exactMatch        | 2438552           |
+-------------------+-------------------+
| higherMatch       | 675574            |
+-------------------+-------------------+
| noMatch           | 514150            |
+-------------------+-------------------+
| canonicalMatch    | 59858             |
+-------------------+-------------------+
| fuzzyMatch        | 11790             |
+-------------------+-------------------+
| vernacularMatch   | 2820              |
+-------------------+-------------------+
| Unknown           | 2080              |
+-------------------+-------------------+</code></pre></div><p>I&#039;m reeling slightly at the size of those numbers... are there really 217 million records in there? <br />We can do a bit of digging in to any of these, for example with this query</p><div class="codebox"><pre><code>https://records-ws.nbnatlas.org/occurrences/search?q=name_match_metric:noMatch&amp;facets=raw_name</code></pre></div><p>This reveals, top of the list, 32k records of Korscheltellus lupulina, NHMSYS0021142343, which have not found a match. Since neither this species nor this genus can be found in UKSI, and consequently not the TVK either, it fits my theory that there should be no match. <a href="https://records.nbnatlas.org/occurrences/ef2f10c5-108f-45a6-9ee6-a5a6fed260de">Example record</a>. There is no name you can search on to access these records. Meanwhile, according to the Atlas species list, this name is <a href="https://species.nbnatlas.org/species/NHMSYS0021142343#names">accepted in UKSI</a>.</p><p>Jim Bacon.</p>]]></description>
			<author><![CDATA[null@example.com (Jim Bacon)]]></author>
			<pubDate>Wed, 05 Apr 2017 12:13:29 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26653#p26653</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26652#p26652</link>
			<description><![CDATA[<p>Looking further at the &quot;hybridization&quot; of the Larch Tortrix moth with the Dutch Elm tree (last issue on page 1 of this thread) I note from <a href="https://species.nbnatlas.org/species/NBNSYS0000162111#names">https://species.nbnatlas.org/species/NB … 2111#names</a> that the scientific names of Zeiraphera diniana (Guenée, 1845) and Zeiraphera occultana (Douglas, 1846) have also been fused into this strange union, but their preferred synonym of Zeiraphera griseana (according to UKSI) isn&#039;t it the list of names even though this is the names url returned by a search for it.</p><p>The UKSI recommended TVK for the moth is NHMSYS0021143278 and for Ulmus glabra x minor x plotii = U. x hollandica it is NBNSYS0000162111. I can&#039;t see any way yet for the two to have become conflated...?</p><p>Keith</p>]]></description>
			<author><![CDATA[null@example.com (Keith Balmer)]]></author>
			<pubDate>Wed, 05 Apr 2017 12:12:02 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26652#p26652</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26650#p26650</link>
			<description><![CDATA[<p>Hi Chris</p><p>I see what you are saying about the synonomy of <a href="https://species.nbnatlas.org/species/NBNSYS0100004453#names">Alophora barbifrons</a>.</p><p>Likewise I see <a href="https://species.nbnatlas.org/species/NBNSYS0100005674#names">Tephrochlaena oraria</a> is down as a synonym of Syntormon pseudospicatum not Tephrochlaena halterata.</p><p>I can&#039;t see any way of telling whether this a problem caused in migration to the Atlas or was a pre-existing problem. Presumably you think this is new, Chris, but I can&#039;t for the moment think how that might have arisen.</p><p>I guess we have to wait for David Martin to be back on the scene to look in to it all.</p><p>Jim Bacon.</p>]]></description>
			<author><![CDATA[null@example.com (Jim Bacon)]]></author>
			<pubDate>Wed, 05 Apr 2017 09:59:17 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26650#p26650</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26647#p26647</link>
			<description><![CDATA[<div class="quotebox"><cite>Jim Bacon wrote:</cite><blockquote><p> However, it seems as if that copy of UKSI did not contain some of the TVK/names that the Gateway had employed meaning the the code could not be used.</p></blockquote></div><p>I wonder if that information is stored anywhere (I can&#039;t see it) - i.e. see the steps it has taken (failed using Taxon Id, failed on Scientific name, using Genus). We only seem to be able to see the &quot;last&quot; step. Although if there are only three steps, then you can work it out from the Name match metrics (scientific name -&gt; taxon id -&gt; higher level ?)</p>]]></description>
			<author><![CDATA[null@example.com (charliebarnes)]]></author>
			<pubDate>Wed, 05 Apr 2017 09:21:24 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26647#p26647</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26645#p26645</link>
			<description><![CDATA[<p>Hi Charlie,<br />My current impression (which remains pure conjecture based on the anomalies we are seeing) is that, when occurrence records were imported, the TVK associated with the record was used to try to make a match with the UK Species Inventory. However, it seems as if that copy of UKSI did not contain some of the TVK/names that the Gateway had employed meaning the the code could not be used.<br />Jim.</p>]]></description>
			<author><![CDATA[null@example.com (Jim Bacon)]]></author>
			<pubDate>Wed, 05 Apr 2017 09:08:51 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26645#p26645</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26640#p26640</link>
			<description><![CDATA[<p>Hi Jim,<br />I&#039;ll keep testing to see if I can provide more information to help you sort out the root of the problem.<br />I&#039;ll report later.</p><p>Christine</p>]]></description>
			<author><![CDATA[null@example.com (chrisjohnson)]]></author>
			<pubDate>Wed, 05 Apr 2017 08:35:56 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26640#p26640</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26639#p26639</link>
			<description><![CDATA[<div class="quotebox"><cite>Jim Bacon wrote:</cite><blockquote><p>I don&#039;t mind what matching algorithm is used so long as the results are correct.</p></blockquote></div><p>Neither do I - but as we&#039;ve seen matching on the name isn&#039;t and since we have a cast iron code why not use it and bypass any potential slip-ups? </p><p>[The agg isn&#039;t missing from Oligia strigalis - just not displayed in the processed information]</p>]]></description>
			<author><![CDATA[null@example.com (charliebarnes)]]></author>
			<pubDate>Wed, 05 Apr 2017 08:33:27 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26639#p26639</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26632#p26632</link>
			<description><![CDATA[<p>Yes, I think there has been a significant problem with regard to the linking of junior synonyms to the recommended name or taxonomic concept. Tony Irwin emailed me this evening: </p><div class="quotebox"><blockquote><p>Just tried to look up Tephrochlaena oraria (Heleomyzidae), and find that that species is down as a synonym of Syntormon pseudospicatum (Dolichopodidae!). The map for the species does appear when I search for Tephrochlaena halterata, although that is actually as a misidentification synonym of oraria. I don&#039;t know whether this is the same as what was on the Gateway, or whether this is a new issue that has arisen consequent to the migration.</p></blockquote></div><p>My example was that &quot;Alophora barbifrons&quot; seems to be linked as a junior synonym of &quot;Parapiophila vulgaris&quot;, which it isn&#039;t - it&#039;s the junior synonym of &quot;Phasia barbifrons&quot;.</p><p>I&#039;ve been in touch with David Martin but he says he is travelling this week so I doubt we will get a quick fix for it.</p>]]></description>
			<author><![CDATA[null@example.com (ChrisR)]]></author>
			<pubDate>Tue, 04 Apr 2017 20:16:43 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26632#p26632</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26630#p26630</link>
			<description><![CDATA[<p>To throw another oddity into the mix I eventually got a &quot;Search for Taxa&quot; search for &quot;Larch Ladybird&quot; to work (after several timeouts) and on page 6 of the results (I now jump to later pages for a smile) I find:</p><div class="quotebox"><blockquote><p>species hybrid: Ulmus glabra x minor x plotii = U. x hollandica Mill.&nbsp; – Dutch Elm</p><p>&nbsp; &nbsp; Dutch Elm, Dutch Elm, Larch Tortrix</p><p>&nbsp; &nbsp; Larch Tortrix<br />&nbsp; &nbsp; &nbsp; &nbsp; View images of species within this species hybrid Occurrences: 465</p></blockquote></div><p>Wondering why I&#039;d been given a tree as a search result I noticed &quot;Larch Tortrix&quot; in the list of names. Isn&#039;t this a moth? (<a href="http://www.ukmoths.org.uk/species/zeiraphera-griseana">http://www.ukmoths.org.uk/species/zeiraphera-griseana</a>). Thus we have something that appears to be simultaneously a tree and a moth? I know it says it is a hybrid, but!</p><p>The link to occurrences takes me to records of the tree.</p><p>The taxon details page <a href="https://species.nbnatlas.org/species/NBNSYS0000162111#names">https://species.nbnatlas.org/species/NB … 2111#names</a> gives both Dutch Elm and Larch Tortrix as &quot;UK Species Inventory preferred&quot;. </p><p>A search for the moth species &quot;Zeiraphera griseana&quot; gives</p><div class="quotebox"><blockquote><p>species: Zeiraphera griseana (Hübner, 1799) (accepted name Ulmus glabra x minor x plotii = U. x hollandica Mill.)</p><p>Zeiraphera griseana (Hübner, 1799)<br />Zeiraphera griseana</p><p>&nbsp; &nbsp; Occurrences: 217</p></blockquote></div><p>Note the accepted name for this moth is the Elm hybrid.</p><p>Clicking on the &quot;Occurrences: 217&quot; link actually takes me to 465 records for the Elm hybrid!</p><p>I&#039;ve now tried to search for only two species on the Atlas and had two sets of bizarre results. </p><p>Keith</p>]]></description>
			<author><![CDATA[null@example.com (Keith Balmer)]]></author>
			<pubDate>Tue, 04 Apr 2017 19:25:06 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26630#p26630</guid>
		</item>
		<item>
			<title><![CDATA[Re: Accuracy of data migration from Gateway to Atlas]]></title>
			<link>https://forums.nbn.org.uk/viewtopic.php?pid=26628#p26628</link>
			<description><![CDATA[<p>Hi Christine,<br />Putting aside any peculiarities in the search I wonder if you have uncovered any new aspects of content in the database that might be considered erroneous?<br />So far we have </p><ul><li><p>species records converted to genus records e.g Aglais io,</p></li><li><p>a difference between the accepted name/TVK in the species search compared to occurrences, e.g. Pasiphila chloerata</p></li><li><p>an alteration of a species name, e.g. Oligia strigalis agg</p></li></ul><p>A species search for Arran Carpet suggests <a href="https://species.nbnatlas.org/species/NHMSYS0021157753#names">Dysstroma truncata subsp. concinnata (Stephens, 1831)</a> is the preferred name with TVK NHMSYS0021157753. Attributed to UKSI but not a name I can find through the UKSI website.</p><p>This brings back no records as, you tell me, any record imported with that TVK has been changed to NBNSYS0000005844. I suggest that is because the match on import couldn&#039;t find the subspecies or TVK so, as we are informed, it reverted to a match on the higher taxa, lopping off the subspecies name. <a href="http://www.nhm.ac.uk/our-science/data/uk-species/species/chloroclysta_truncata.html">UKSI</a> confirms that this is the TVK for that name. </p><p>The conclusion I seem to be coming to is that occurrence records have been processed to match the current version of UKSI while the species dictionary that the Atlas uses for searching is based on a different version of UKSI with a different set of names and TVKs</p><p>Jim Bacon.</p>]]></description>
			<author><![CDATA[null@example.com (Jim Bacon)]]></author>
			<pubDate>Tue, 04 Apr 2017 16:33:04 +0000</pubDate>
			<guid>https://forums.nbn.org.uk/viewtopic.php?pid=26628#p26628</guid>
		</item>
	</channel>
</rss>
