1

Topic: Importing Recorder data into Indicia

Hi - I hope this is a suitable place for Indicia questions not directly related to development.

I've recently been handed the poisoned chalice of looking after the BSBI records for VC63 (SW Yorkshire).  I'm not a big fan of Recorder or Mapmate (which the BSBI use I think), and don't have a suitable Windows machine to run them on anyway.

So...I took the plunge and installed an Indicia Warehouse (on Linux). I've been able to get everything working fine and have imported about 100k records from the previous recorder via spreadsheets.   I've got no problem with mucking about in SQL and it is quite good being able to access the data via QGis (which I'm familiar with) for mapping.

I've now been given an extract of another 100k records from Recorder as part of a data-sharing agreement, and I've noticed a couple of oddities whilst trying to import the data (via the occurrences csv import tool on the warehouse) which I wonder if anyone could help with.

I'm using the latest version of Indicia (1.0.0) which I updated only a few days ago.


The first oddity is importing species with an attribute as part of the name.  Eg, "Aconitum napellus sensu lato".  I have a copy of the NHM database which more or less matches the source system, although eventually the species names will have to be matched to Stace 3 for the BSBI.

This has the matching species in it, which for this example has ID NHMSYS0000494973.

However, the importer is not able to find it - ie it throws the error [Taxon: Could not find "Aconitum napellus sensu lato" in Taxa taxon list Taxon]. Can it use the attribute at all for species matching?  If not, then presumably I'll have to add a synonym which has a matching full name and no attribute.  I have checked that the format of the attribute is the same in both the source and target, so that doesn't seem to be the problem.


The other oddity I suspect is a bug.  Previously (with version 0.9.1), you could upload vague dates of the format "yyyy-mm", but in version 1.0.0, this doesn't seem to work.  Import works if you use the full month name (eg April 2010), but for some obscure reason, records for the month of October  (eg "October 2003") fail with the error [sample:date_type: Please supply a date for your observation.]   Any ideas?

Thanks,

Tim

2

Re: Importing Recorder data into Indicia

Hi

I may be starting from the least consequential item but I have picked up the bug causing October to be a problem.
The 'to' in the middle of the month is being picked up by a faulty regex and thought to indicate a date range like 'to 1970'.

The bug is in line 48 of \application\helpers\vague_date.php and I think the insertion of ^ to give the following will be a solution.

          'regex' => '/^(?P<sep>to|pre|before[\.]?)/i',

Jim Bacon.

3 (edited by VC63 01-06-2016 22:40:33)

Re: Importing Recorder data into Indicia

Thanks - well spotted - I hadn't got as far as trawling the code.

Forcing the prefix word to be the beginning of the string must be correct.

I edited vague_date.php and that definitely fixes the problem (as you'd expect).

4

Re: Importing Recorder data into Indicia

Hi

It's this commit which has broken the processing of the yyyy-mm format.

Now that a hyphen-separated date ranges do not need spaces around the hyphen, yyyy-mm formats are mistaken for year ranges.
'2010-11' is treated as meaning '2010 to 2011' rather than November 2010 and '2010-01'  throws up warnings and fails.

Another upshot of this commit is that a date range like '2010-02-01 - 2010-03-31' fails although '2010-02-01 to 2010-03-31' works as expected.

A possible solution might be to only allow no-space, hyphen-separated date ranges in the form yyyy-yyyy but, since I don't know what the specification is, I don't know if this is an acceptable resolution. I will confer with others.

Jim Bacon.


For info and my own future reference, I was using the following test script, saved in the warehouse root, since I haven't yet got to grips with PHPUnit

define('IN_PRODUCTION', FALSE);
define('EXT', '.php');
define('APPPATH', 'application/');
define('MODPATH', 'modules/');
define('SYSPATH', 'system/');
$kohana_pathinfo = pathinfo(__FILE__);
define('DOCROOT', $kohana_pathinfo['dirname'].DIRECTORY_SEPARATOR);
define('KOHANA',  $kohana_pathinfo['basename']);

include 'application/helpers/vague_date.php';
include 'application/libraries/DateParser.php';
class DateParser extends DateParser_Core { }
include 'system/core/Kohana.php';

echo 'September 2010: ';
print_r(vague_date::string_to_vague_date('September 2010'));
echo '<br />';
echo 'October 2010: ';
print_r(vague_date::string_to_vague_date('October 2010'));
echo '<br />';

echo '2010-02-01: ';
print_r(vague_date::string_to_vague_date('2010-02-01'));
echo '<br />';
echo '2010-02-01 - 2010-03-31: ';
print_r(vague_date::string_to_vague_date('2010-02-01 - 2010-03-31'));
echo '<br />';
echo '2010-02-01 to 2010-03-31: ';
print_r(vague_date::string_to_vague_date('2010-02-01 to 2010-03-31'));
echo '<br />';
echo '2010 - 2011: ';
print_r(vague_date::string_to_vague_date('2010 - 2011'));
echo '<br />';
echo '2010-2011: ';
print_r(vague_date::string_to_vague_date('2010-2011'));
echo '<br />';
echo '2010-11: ';
print_r(vague_date::string_to_vague_date('2010-11'));
echo '<br />';
echo '2010-01: ';
print_r(vague_date::string_to_vague_date('2010-01'));
echo '<br />';

5 (edited by VC63 02-06-2016 15:25:11)

Re: Importing Recorder data into Indicia

Hmm.  It might be easiest to go back to enforcing a space either side of a date range separator, but as you say, that's down to requirements.  It is a problem having a dual-use character.  If only the long dash ( — ) was easy to type...


I tried to find the taxon matching code for occurrence import last night but php isn't my native language so I didn't get very far.  I assume it is in one of the controllers?

As a workaround for this one, I'm going to concatenate the attribute onto the species name as it makes sense to do this for display purposes anyway.  I can always remove it later, and it is just a single SQL update either way.

6

Re: Importing Recorder data into Indicia

I haven't been near the importer for a long while but I think a lot of the code that does the work can be found in client_helpers/import_helper.php.

Jim Bacon.

7 (edited by Jim Bacon 03-06-2016 09:50:32)

Re: Importing Recorder data into Indicia

Regarding your species name issue, I've just had a look to see how this species is stored in our database. You've obviously done this already. If your import of the NHM Dictionary is like ours then we should have similar results.

My query

SELECT ttl.id, t.taxon, t.external_key, tr.rank, attribute, ttl.taxon_meaning_id, ttl.preferred
FROM taxa t, taxa_taxon_lists ttl, taxon_lists tl, taxon_ranks tr
WHERE external_key = 'NHMSYS0000494973'
AND t.deleted = false
AND ttl.taxon_id = t.id
AND ttl.taxon_list_id = tl.id
AND tr.id = t.taxon_rank_id
AND tl.title = 'UK Master List'

returns

------------------------------------------------------------------------------------------------------------
|   id   |      taxon             |   external_key   |        rank        |  attribute | meaning_id | pref |
------------------------------------------------------------------------------------------------------------
| 307227 | Aconitum napellus      | NHMSYS0000494973 | Species aggregate  | sens. lat. |     131319 |    f |
| 285499 | Aconitum napellus      | NHMSYS0000494973 | Species aggregate  | sensu lato |     131319 |    t |
|  30541 | Aconitum napellus agg. | NHMSYS0000494973 | Species aggregate  |            |      13554 |    f |
| 293615 | Aconitum napellus agg. | NHMSYS0000494973 | Species sensu lato | sensu lato |     131319 |    f |
------------------------------------------------------------------------------------------------------------

It sounds to me like you are about to prove that the importer matches the incoming species name against taxon and not taxon + attribute.

I can see the code in client_helpers/import_helper.php is used to set up the import job which is then executed through a web service on the url index.php/services/import/upload. The code for that is in modules/indicia_svc_import/controllers/services/import.php. It's too clever for me to be able to make any sense of it without a big effort and you seem to have it under control so I'm going to leave it at that.

It's really good to have someone with your skills joining the pool of Indicia users. I hope it proves useful. Contributions to improve it are welcome!

Jim Bacon.

8

Re: Importing Recorder data into Indicia

Thanks Jim - yes, that's the format of my data.  I didn't follow the standard procedure for importing the NHM list because I altered it to make new Stace 3 species names the preferred meaning (where required), but everything is there.

I'll have another look at the code in a week or so (not going to get any time this week), but it does appear that the attribute is ignored on import.

Perhaps it should be available as an extra import column rather than allowing it to be appended to the species name?  There certainly ought to be a way to distinguish between the aggregate and a specific species when the "sensu lato" nomenclature is used, though, and this certainly seems to be a legitimate format for export from Recorder.

Anyway, now I know where to look, I'll stop pestering!  Thanks again for the assistance.


Incidentally, I've just been having a play with the "TomBio" plugin for Qgis - which looks to be very useful for generating species maps.  The Recorder now has 2000 odd to look through to see if there are any oddities (probably easier than reading through a spreadsheet).

9

Re: Importing Recorder data into Indicia

TomBio, http://www.tombio.uk/node/23 was news to me. Interesting to see how Indicia and QGIS work together for you on the desktop. I think you are a pioneer! We install GeoServer on our systems to provide web-delivered mapping.

We've got a bit of a confusing thread here with two issues in one. Just to return to the date formatting issue, I got the following feedback from John van Breda

Regarding the specification

The vague date “requirements” were based on the vague date support in Recorder 6/2002/2000, which in turn came from Recorder 3.3 and the original NBN data model (not the exchange format, but the model that underpinned Recorder).

Here’s the original specification: http://www.recorder6.info/WebHelpR6Main/Topics/Vague_Dates_Stored.htm

Indicia has gone way beyond this spec in its support for date formats but this is not documented apart from in the code itself The recent change was to support yyyy-yyyy and dd.mm.yyyy-dd.mm.yyyy date ranges without spaces around the hyphen.

While enforcing spaces around the hyphen would be the quickest fix, this will then cause a problem for a user who specifically requested support for the above format.

Therefore, we need to parse for strings containing a single hyphen without spaces around it and not at one end or the other and interpret
[ul]
[li]2 digits then hyphen then 4 digits = mm-yyyy[/li]
[li]4 digits then hyphen then 2 digits = yyyy-mm[/li]
[li]Else, it’s a date to date range[/li]
[/ul]

I'm just documenting this here in case I am not able to commit a fix soon.

Jim Bacon
(Preview suggests my BBCode is wrong, sorry)

10

Re: Importing Recorder data into Indicia

I have committed fixes for various vague date bugs in warehouse v1.2.1

Jim Bacon.