1

Topic: Download issues - missing taxa and records

Hello folks, I downloaded the complete British Bryological Society dataset then allocated the records to Hertfordshire by representing the grid references as grid squares and choosing those records whose grid squares intersected our county boundary. I also downloaded what I thought would be all BBS records for my county using the LERC boundary layer on the Atlas.

LERC Boundary region download: 20,124 records
ALL BBS data represented as grid squares: 26,610 records

The difference appears to be caused by two things:

1. Region downloaded data only includes records recorded to species or sub species and excluded varieties (eg Ephemerum serratum var. serratum) or 'unranked' (eg Hypnum cupressiforme sensu lato) taxa

2. Region downloaded data appears to be using the central point of a grid reference to determine whether or not a record is within the boundary so records for an edge grid square are only included if its central point falls within the region. This means records that I would expect to be in the download for my county are missed.

I had a quick look at drawing a polygon and using that to select records and the same issue occurred 100m squares (1kms, tetrads and hectads too), that overlapped my boundary were only shown if their centroid was within the boundary I'd drawn.

It would be good if both behaviours could be changed or at least a warning that records might be missing as currently it doesn't seem to be possible to use the Atlas to view or download all relevant records for an area.

2

Re: Download issues - missing taxa and records

Thanks for flagging this up Ian. It is a pretty big concern for me as we have carried out a few downloads* using our LERC boundary. (*obviously not of CC-BY-NC data!)

Whilst point 2 clearly will require some work by the NBN Atlas coders, this is a problem that was tackled and dealt with long ago by LERC technical staff. Perhaps some knowledge exchange is needed within the NBN family?

Point 1 sounds to me (as a non-technical person) like it should be quick and easy to fix with a slight tweak to the code.

Do you have any idea what proportion of the "missing" c. 6.5k records is caused by each of the errors? (I know some will fall into both camps). I'd imagine that the varieties problem could be more significant for the  Bryophyte data you were using than it would for some other groups.

Let's hope that both of these issues can be ironed out quickly, as at the moment any data searches which include Atlas data will be incomplete.

Adam

3

Re: Download issues - missing taxa and records

Hi Ian,
Thank you for highlighting the first point - I have logged an issue and asked for it to be corrected as soon as possible.

Yes, you are correct that records are only included in a boundary if the centre of the grid square falls within the boundary. In the short term we need to make this much more clear (i.e. a message on the page and better documentation) but also investigate a solution. All records in the Atlas are stored with lat-long coordinates, if the data is supplied with a grid reference the centroid of the GR is used as the location. The Atlas then does a point to polygon search to determine which records are in each boundary.

For now, I will ask for a message to be put on the page.

Many thanks, Sophie

Sophia Ratcliffe
Technical & Data Partner Support Officer, NBN

4

Re: Download issues - missing taxa and records

Hi Sophie

I'm really glad that something can be done on 1.

On 2. ... this is pretty fundamental for good geographic data management and is an issue that LERCs have cracked over the last 10-15 years! We use the OS grid reference system for biological recording in this country and if the system adopted for the Atlas can't work with the fact that 1km or 10km records could occur in any part of the those squares and not just at the central point, then it's not really fit for purpose. Records appear as squares on the Atlas Interactive maps, so I assumed that these squares would be used for any spatial selections being made.

Yours in anticipation

Adam

5

Re: Download issues - missing taxa and records

Thanks Sophie, it's good to know that issue one will be looked at soon and I would hope issue two can be solved as soon as possible as the Atlas is currently not able to provide what should be a basic data provision function. It's good that a warning about issue two can be displayed, but I do wonder what use a data download is where the end user receives only some of the data they might expect but has no way of telling what has been missed out.

It does seem bizarre to me that the system converts grids to lat long then converts back to grids for display purposes only! As Adam points out the OS grid reference is the system of choice for biological recording here, so it seems a strange choice to adopt a system that doesn't really work properly with it.

Thanks for looking at these issues and hope they can be fixed soon.

6

Re: Download issues - missing taxa and records

Hi Ian,
I have asked Reuben, our developer, to let me know exactly how the the search works, just to confirm. Then we will add a note and also investigate how to resolve it.

Thanks, Sophie

Sophia Ratcliffe
Technical & Data Partner Support Officer, NBN

7

Re: Download issues - missing taxa and records

I think part of the problem is that most GIS type systems are developed to use the Lat-Long system as standard, that way the software can be used worldwide.  OS Grid references are an "extra" that have to be calculated, and I'm not sure the importance / ease of use / ease of "reading by eye" of the OS grid reference is quite understood by some software developers who may have little actual experience of biological recording.  This applies across more than one recording system I have come across.  OS Grid references should be the default input / output format for the Atlas.

8

Re: Download issues - missing taxa and records

Thanks Sophie, does the Atlas accept records with 10m accurate grid references? There were none in my downloads - and I've had a quick look at one of the BLS datasets on screen and there don't seem to be any in the small sample I looked at?

Thanks

Ian

9

Re: Download issues - missing taxa and records

You can filter according to the "coordinate uncertainty" when searching for records (using customize filters). According to this there are 544,047 records with 10m accuracy (not necessarily grid references, but most likely to be): https://records.nbnatlas.org/occurrence … %2210.0%22

Charlie Barnes
Information Officer
Greater Lincolnshire Nature Partnership

10

Re: Download issues - missing taxa and records

Thanks Charlie

11

Re: Download issues - missing taxa and records

Sorry Ian, I missed your message.

Yes, as Charlie says, the Atlas can accept 10m accurate grid references.

Reuben has corrected the first problem, so that records selected in the Region pages include all taxa where the name has been matched to a rank, and should include varieties, forms and unranked names. Please let me know if you find any problems.

Reuben is looking into the second issue, and I will write a message as soon as I have an update.

Thanks, Sophie

Sophia Ratcliffe
Technical & Data Partner Support Officer, NBN

12

Re: Download issues - missing taxa and records

Hi Ian,
I noticed that the species aggregate example you give, Hypnum cupressiforme sensu lato, is missing the qualifier 'sensu lato' on the Atlas: https://species.nbnatlas.org/species/NHMSYS0000310884.

We should have this resolved in the next update of the UKSI. We were not including the Attribute field from the UKSI, where the qualifier is sometimes stored.

thanks, Sophie

Sophia Ratcliffe
Technical & Data Partner Support Officer, NBN

13

Re: Download issues - missing taxa and records

Thanks Sophie, good news I've re-downloaded the data and it now contains varieties. It looks as though the species aggregate issue isn't confined to Hypnum cupressiforme - I've looked for a few others - the sensu lato is missing from Sphagnum recurvum, Sphagnum subsecundum, Ulota crispa... so seems to be part of a wider issue. There is nothing in the downloaded file to indicate the records are for the aggregate, unfortunately without it the results are misleading. It seems odd that the names include it in the complete BBS dataset download but not the filtered by location one - I would expect the name to come from the same place.

Ian

14

Re: Download issues - missing taxa and records

As Ian and Adam pointed out, the issue of searching centroid points is a big issue for accuracy of results - for example it creates some strange results when dealing with sensitive records which have been blurred. If I use the analyse tools to search a small part of a 10km sq, the only time that species which are blurred to 10km sq resolution will appear on the species list is when the search area includes the centroid of the 10km sq. So for example Black Grouse may be present in the 10km square that your search area is within but you won't find out unless your search area just happens to overlap the 10km centroid. It's rather misleading to a user... hopefully something can be done to resolve this.

Mark

Mark Pollitt
SWSEIC (formerly DGERC)

15

Re: Download issues - missing taxa and records

Hi Ian,
I am sorry, I  missed you message.

I have asked for Rank to be added back to the download fields. We did some updates to the columns and their order in the download file and I think that Rank must have been removed. At least then you will be able to identify any species aggregates, if it's not clear from their name.

Could you give me an example of where the name includes the aggregate in the complete BBS download but not not in the one filtered by location. As you say, the names come from the same place, so there shouldn't be any differences. Would email me the two files, please?

Many thanks for your help,

Best wishes, Sophie

Sophia Ratcliffe
Technical & Data Partner Support Officer, NBN