1 (edited by Steve Corvus 08-10-2018 17:57:26)

Topic: Mis-matched download pairs

I have downloaded some British Dragonfly Society Recording Scheme data by vice-county, one complete year at a time. I download search results in pairs - a default simple spreadsheet format and a custom format with just "Miscellaneous fields" selected, needing to match the record keys in both files to get the information I need.

The first two pairs (South Hampshire and South Wiltshire for 2010) worked perfectly, in that the record keys at the start of each line matched throughout the files (after sorting them). Since then about half of the pairs contain several lines where the record keys don't match, even though the number of records is the same. As an example, Dorset 2010 with 1,968 records has 1,407 matches but 561 record keys in each file which don't appear in the other. In all cases, the number of records in downloads matches the number shown in the search results on screen. I tried repeating one of the download pairs a few days later but the results were the same.

In addition, the custom file includes several duplicate lines.

The problem isn't limited by vice-county; for example, while South Hampshire 2010 is ok, 2011 has several mismatched records.
By using the NBN record number from the custom file to search for the record online, I can see that the data is complete for those records, in that it includes data that I would hope to find in the default file.

As a newcomer to NBN Atlas, I might be doing something wrong or expecting more than is available - but I think there is a problem with whatever creates the download from the search results.

I've listed the first few record keys from the Dorset 2010 download, in case my explanation isn't clear.

Any help appreciated, thanks.

Default file:                                                 Custom file:
"00072f39-a871-4718-8ba6-ff79175ac144"    "00751e6a-4d06-4ddd-8a5f-ee7ec3d5f824"
"0035ba3b-53a8-4d5a-bac2-3fe1748bcae5"    "00751e6a-4d06-4ddd-8a5f-ee7ec3d5f824"
"00796d47-5af0-413a-9b3e-0dcbcafda1d1"    "0084e83d-262f-4e85-8658-a0bf11b5bfb6"
"0084e83d-262f-4e85-8658-a0bf11b5bfb6"    "0084e83d-262f-4e85-8658-a0bf11b5bfb6"
"00a1af6b-872b-4c67-a04e-dba78fea85e4"    "00a1af6b-872b-4c67-a04e-dba78fea85e4"
"013a86c0-14d9-40d7-9a34-a0d7d5a68028"    "00a1af6b-872b-4c67-a04e-dba78fea85e4"
"01423fd4-82b5-4d3e-bb56-8807ae20ddcc"    "013a86c0-14d9-40d7-9a34-a0d7d5a68028"
"0148c019-4c72-48cf-b562-68f50313d156"    "01423fd4-82b5-4d3e-bb56-8807ae20ddcc"

2

Re: Mis-matched download pairs

Hi Steve,
I have downloaded records from Dorset in 2010 and South Hampshire in 2011 in the simple spreadsheet format and the customised format - miscellaneous fields, and in both cases the record keys match perfectly. I tried it once with the rowkey (the guid that the Atlas gives each record) and then with the occurrence ID - all records were matched in both cases.

However, I have 5,182 records in Dorset in 2010, compared to your 1,986. I wonder if there was a problem with the index over the days that you did the download. The index (on which the search and filter is based) was rebuilt on Monday evening. I will talk to Reuben about the problem - but at the moment I haven't been able to recreate it.

It sounds like you have spent quite a bit of time already on this, is there anything that I can do to help? I can download the records and match the simple and custom files for you in R - it sounds like you might be doing it by hand in Excel.

I am sorry for the inconvenience,

Sophie

3

Re: Mis-matched download pairs

Hi Sophie,

Thanks very much for looking at this.

On Thursday (11 October) the download pair for Dorset 2010 was all ok, so I thought that last Monday's index rebuild had fixed my problem.  I tried another pair (South Hampshire 2011 I think) but again it had several mismatches. The next day, late afternoon on Friday. I searched for and downloaded eight vice-county/year, simple/custom-miscellaneous pairs but all of the custom-miscellaneous downloads failed. This is an example failure notice I received in the email, for the South Hampshire 2010 custom file:

The download has failed. uniqueId:77b1642c-978b-35d1-851c-689fd928142c-1539359396839 path:/77b1642c-978b-35d1-851c-689fd928142c/1539359396839/NBN-11-2010-cust-misc.zip

All of the simple downloads were ok but I had no custom files to check them against for matching record keys.  I tried again this morning, Monday 15 October but with the same result - simple download ok, custom failed.

Thanks for your offer of sending me the downloads but for the moment at least I'd like to persevere. The time I might spend on this doesn't matter; I have enough correctly matched data to continue with other work so I'm not being held up.  I'm not using Excel, I match the paired files programatically in VBA (!!!) and check a sample of the results manually for confirmation. If I do give in, I would like all odonata records up to and including 2012 for South Hampshire, plus 2010 records for Dorset, South Wiltshire, North Hampshire and West Sussex, in separate vice-county files; would that be possible?

I am puzzled; you mentioned 5,182 records in Dorset in 2010; My search criteria are just four: vice-county, BDS Recording Scheme, year start/year end and I always get the same number of search results, 1,968 for Dorset 2010. I tried "odonata" in one of the taxon fields, leaving the partner field blank but that returned just three additional records and I'm not too bothered about those. Am I missing something? As I said, I am an NBN Atlas noob so perhaps my search criteria aren't correct?

I do appreciate the level of support that you provide.

Regards

Steve

4

Re: Mis-matched download pairs

Hi Steve,
Now I understand - are you using the advanced record search: https://records.nbnatlas.org/occurrence … hartsView?

If you do a date search using the advanced search, it only matches records that have an exact date. So any month or year precision records are ignored. I will ask for a note to be added to the search to indicate that.

I did the search from the BDS Recording Scheme dataset page and filtered the records by vice county and decade. In the advanced record search, search by dataset and vice county (but not date) and then filter the records by year. The filters are on the left-hand side and if the year filter is not listed, click on the 'customise filters' button at the top and select year.

Sorry... I searched by decade (2010) and not the year. A year search gave me 1,993 records.

I will investigate the problem with the custom downloads now.

Best wishes, Sophie

5

Re: Mis-matched download pairs

Hi Sophie

I had been taking this route for searches -  "Data and partners/Advanced record search" then entered the four criteria as above. This results in a record set with this address

… records.nbnatlas.org/occurrences/search?q=occurrence_date ... South+Hampshire%22&fq=data_provider_uid%3Adp97

I have now tried "Data and partners/Search for a data partner", select BDS from the map, view records, refine by checking years/vice-county, which results in a record set with this address

… records.nbnatlas.org/occurrences/search?q=data_provider ... %22South+Hampshire%22&fq=(year ... OR%20year%3A%222012%22)

   ... I also found the "year by decade"  checkbox so I will use this route in future, thanks.

However, the downloads suffered the same fate (simple ok, custom failed) so I couldn't check for matching pairs. I will try again later in the week.

Thanks again

Steve

6

Re: Mis-matched download pairs

Hi Steve,
I have reported the problem with the custom downloads - it failed for me as well. I will write a message as soon as I have more information.

I am sorry for the inconvenience,

Best wishes, Sophie

7

Re: Mis-matched download pairs

Hi Sophie

Just to let you know that I successfully downloaded a pair of Dorset 2010-2012 files today and they matched throughout, so thank you for getting this resolved and again for your advice on the dates. I was a bit surprised to discover that the order of the custom-miscellaneous fields has changed; no matter, I had already written code to handle the changing order of the quantity fields so I just need to extend that to take full account of the whole header line, which presumably fluctuates to accommodate changing requirements. In the meantime I will grab all the data I need (a handful of nearby vice counties) and set it aside until my load process is working again.

Best wishes

Steve