Re: Mapping all submitted datasets

One thing I can't recall any discussion about in the recent consultations is the fact that with some datasets which currently loaded onto the NBN, none of the records can be viewed, even at the 10k level.  Instead, we get a message saying "datasets unavailable", and if we want to know even if there is a 10k dot in our particular we have to apply to the dataset provider for better access.

TBH, I cannot see the point of loading a datset onto the NBN if none of the records are visible.  I am familiar with all the arguments about restricting access to data at certain levels, but not even mapping 10k records of common species?

Could I suggest that if a dataset is uploaded to the NBN, then as a condition of this data must be made visible on the maps at 10k level, and that no datasets can remain "hidden" as a matter of course.


Re: Mapping all submitted datasets

Hi Matt
That’s true, there is an additional data access control which enables data providers to put datasets on the Gateway with no public access at all.  This control was created for use in the following circumstances:
1.    Datasets that are still undergoing verification may be loaded on the Gateway with no public access as a temporary measure until the verification process is complete and public access can be granted.  One of the Scottish LRCs is currently using this control for this purpose.
2.    Datasets which consist entirely of records which are too sensitive to allow public access even at 10km resolution.  For example, some RSPB single species breeding records datasets. 
3.    Demonstration or test datasets, such as the one we use to demonstrate Record Cleaner at training sessions.
The advantage of putting these datasets on the Gateway despite the lack of public access is that access can be granted to key data users.
We do not plan to remove this control, i.e. it will still be possible to share datasets via the Gateway with no public access under the proposed new system of access controls.  However, we do plan to monitor its use more closely to ensure it is only being used in the circumstances described above.  Most importantly, we will encourage data providers who use this control to proactively grant access (i.e. at the data loading stage) to a wider group of key data users, which would include all statutory agencies and local environmental records centres.
There are currently 1.6 million records from 22 data providers with no public access via the Gateway.
You’re right, I don’t think this control has been discussed during the consultation so far, so thank you very much for raising it! 
Best wishes


Re: Mapping all submitted datasets

I don't think 1 is a valid reason - what happens if some of the records turn out to be wrong? The entire dataset has to be reloaded. Surely it makes sense to only upload validated/verified data - or am I missing something with this procedure..?

There is another reason why some datasets are not publicly available - because some organsiations ask that the NBN Gateway is used as the delivery mechanism for data e.g. SxBRC Inventories for the Environment Agency's use only.

This appears to me be a bit of a fudge due to the constraints of the current system, and could be overcome by two ways: allowing filtering of records by designation/custom species list (e.g. so SxBRC could filter for Protected Species Register, Sussex Rare Species Inventory... against it's other datasets and allow the EA access to these records) or hide the entire dataset from public view.

The former would obviously require a bit of work but I believe the proposed system of access controls at the record level would make this possible.

Charlie Barnes
Information Officer
Greater Lincolnshire Nature Partnership

4 (edited by Matt Smith 16-03-2012 12:00:49)

Re: Mapping all submitted datasets

Regarding the access to datasets, I'm quite happy that there are controls on the access levels to individual datasets, some of the datasets I manage have limited access (eg public access is limited to seeing records at a 1km level so as to "blur" the data for sensitive species such as Great Crested Newt). 

The point I am making is that if you look at NBN maps on a regular basis you notice that there are a number of LRCs that have submitted data where apparently the entire submission it tagged as "dataset unavailable", even for common species, if I or anyone else wanted to look at a distribution at 10k level we would have to apply for access to those datasets.  Look at a map of Episyrphus balteatus of you want an example of the sort of thing I am talking about.  Surely ALL data submitted to the NBN should be visible at the 10k level by default except for the records of the few sensitive species as noted above.


Re: Mapping all submitted datasets

Thanks for that example Matt, I see that there are three LRC datasets containing Episyrphus balteatus that are not available to the public at all.  I know that one of those datasets has been flagged as ‘no public access’ because it is undergoing verification and the LRC in question hopes to make it publicly available soon.   I’m not sure about the other two, but I will look into it.  We do plan to monitor use of this control more closely in future.  I agree, ALL data should be publicly available at the 10km level unless any of the three exceptions mentioned in the previous post apply.  I suspect that at least some of the 1.6 million records that currently have no public access could be made accessible at low resolution without risk of environmental harm.

The proposed new system will allow data providers to set access at the record level, which would definitely be an improvement and much less ‘constrained’.  If the system is implemented, it will initially only be possible to grant enhanced access based on taxonomy, VC boundaries, LRC boundaries, SSSI boundaries and date ranges.  However, it would be possible to allow filtering by designation as well at a later date if there is demand for this.

I know that several organisations (e.g. Natural England) require LRCs to provide data to them via the Gateway, but that in itself is not a reason for using the ‘no public access’ control – I can’t think of any datasets that are so highly specific that they could only ever be useful to a single organisation!  I think most LRCs comply with these requirements by setting public access at low resolution where necessary to prevent environmental harm or to protect their business model, but granting full access to the organisations in question.

Regarding the question of putting unverified data on the Gateway, certain users such as the Environment Agency require access to all available data and can make informed decisions as to whether or not to use particular records.  There are only  a handful of datasets that are flagged as 'no public access' because they are unverified.  Yes, the whole dataset needs to be re-loaded if records are removed or corrected but this is also necessary when new records are added.

Sorry it has taken me a long time to follow up on this post, I have been away from the office a lot and didn’t see it until today!


Re: Mapping all submitted datasets

I think Charlie's second suggestion may be more useful viz. the ability to hide datasets existence from the eyes of the general user of the Gateway with the data provider proactively giving users/organisations access.

Since the stat agencies are requiring the use of the Gateway as a delivery mechanism, some LRCs are getting round the problem of refusals from individual recorders to put data on the Gateway (in a publically available sense) by creating datasets of these records and only allowing access to their partners. This is usually not the LRC themselves being difficult btw! As well as this case, there are other reasons for records being placed in "hidden" datasets, such as them being known duplicates of those uploaded by a national scheme (but perhaps not quickly enough or detailed enough for the stat agencies) or unverified as previously mentioned.

I don't think an assumption that records wouldn't be useful to others is ever a reason for non-public access datasets, and reasons of species sensitivity/environmental harm are probably in the minority.

It could be argued that the NBNG is not the best mechanism for this kind of non-public disemmination, but we are where we are and organisations have bought into the Gateway and developed tools to use the data, so the number of these non-public datasets is likely to increase rather than decrease in the short-term at least.

To debate with myself, I can see a reason for not hiding these datasets - the signposting opportunities that the Gateway offers for the existence of pertinent records to researchers and specialists. I think all the datasets mentioned here do clearly explain the situation in their metadata pages so perhaps the status quo is fine?

I think we are one of the "best" LRCs for putting data on the Gateway in the sense of putting a high proportion of our records on, but probably one of the "worst" compared to LRCs who only put their really super high quality, comprehensively verified data on. Regardless, I am concerned that records that fall into the categories above which we make available to partners offline will not be available to our national partners so will probably be uploading a "hidden" dataset soon of those records.

Teresa Frost | Wetland Bird Survey National Organiser | BTO
Other hat  | National Forum for Biological Recording Council
(Old hats  | NBN Board, ALERC Board, CBDC, KMBRC)


Re: Mapping all submitted datasets

Another reason for uploading unverified data as a hidden dataset would be to give access to a verifier only who could comment on the records using the commenting facility.

Teresa Frost | Wetland Bird Survey National Organiser | BTO
Other hat  | National Forum for Biological Recording Council
(Old hats  | NBN Board, ALERC Board, CBDC, KMBRC)


Re: Mapping all submitted datasets

SNH manages one dataset which falls into the 'hidden' category, the Invertebrate Site Register for Scotland. The ISR is quite an old dataset, compiled from multiple sources, and was removed from general public viewing after it was pointed out that it contains a relatively high level of errors. The ISR should not be relied on for mapping species distributions or environmental change, but may be suitable for identifying areas of search for species of conservation interest. It's appropriate that specialists who understand the limitations of the data should be able to use it to follow up old records of scarce species in order to target survey work, especially in areas of poor coverage.
Caveats over data quality are set out clearly in the ISR's metadata, but it became apparent that few users consult metadata, so there was potential for doubtful records to be accepted unquestioningly. Requiring users to request access means we can grant it with a strong 'health warning' in the covering message, making it clear what the data should & shouldn't be used for.
The alternative would have been to remove the dataset from the Gateway completely, in which case the ISR would have been accessible only to those few who somehow knew of its existence, and then only on request from SNH. This would have involved both users and ourselves in extra work handling requests, and would have increased the risk of uncontrolled copies of the data entering circulation, without the benefit of any corrections that may be made to the master version.