1

Topic: Query for complete species list

Hi

I have been trying to write a new query to get the complete species list from the database with numbers of each taxon

Using this:

SELECT     INDEX_TAXON_NAME_1.PREFERRED_NAME, INDEX_TAXON_NAME_1.COMMON_NAME, COUNT(TAXON_OCCURRENCE.TAXON_OCCURRENCE_KEY) 
                      AS CountOfOccs, TAXON_RANK.SHORT_NAME, TAXON_GROUP.TAXON_GROUP_NAME
FROM         TAXON_GROUP 
INNER JOIN                      TAXON_VERSION ON TAXON_GROUP.TAXON_GROUP_KEY = TAXON_VERSION.OUTPUT_GROUP_KEY 
INNER JOIN                      TAXON_LIST_ITEM 
INNER JOIN                      INDEX_TAXON_NAME ON TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = INDEX_TAXON_NAME.RECOMMENDED_TAXON_LIST_ITEM_KEY 
INNER JOIN                      INDEX_TAXON_NAME AS INDEX_TAXON_NAME_1 ON                      INDEX_TAXON_NAME.RECOMMENDED_TAXON_LIST_ITEM_KEY = INDEX_TAXON_NAME_1.TAXON_LIST_ITEM_KEY 
INNER JOIN                      TAXON_DETERMINATION ON TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = TAXON_DETERMINATION.TAXON_LIST_ITEM_KEY 
INNER JOIN                      TAXON_OCCURRENCE ON TAXON_DETERMINATION.TAXON_OCCURRENCE_KEY = TAXON_OCCURRENCE.TAXON_OCCURRENCE_KEY 
INNER JOIN                      TAXON_RANK ON TAXON_LIST_ITEM.TAXON_RANK_KEY = TAXON_RANK.TAXON_RANK_KEY ON 
                      TAXON_VERSION.TAXON_VERSION_KEY = TAXON_LIST_ITEM.TAXON_VERSION_KEY
WHERE     (TAXON_DETERMINATION.PREFERRED = 1)
GROUP BY INDEX_TAXON_NAME_1.PREFERRED_NAME, INDEX_TAXON_NAME_1.COMMON_NAME, INDEX_TAXON_NAME_1.SORT_ORDER, 
                      TAXON_RANK.SHORT_NAME, TAXON_GROUP.TAXON_GROUP_NAME
ORDER BY INDEX_TAXON_NAME_1.SORT_ORDER

gives me a shorter than expected list, but a record count nearly three times the size of the database. Any ideas what I am doing wrong?

Gordon Barker
Biological Survey Data Manager
National Trust

2

Re: Query for complete species list

I have re-worked this and the following seems Ok. 

Select COUNT(TOCC.TAXON_OCCURRENCE_KEY), ITN.PREFERRED_NAME , ITN1.COMMON_NAME, TR.SHORT_NAME, TG.TAXON_GROUP_NAME From TAXON_OCCURRENCE TOCC INNER JOIN TAXON_DETERMINATION TDET
ON TDET.TAXON_OCCURRENCE_KEY = TOCC.TAXON_OCCURRENCE_KEY AND TDET.PREFERRED =1
INNER JOIN TAXON_LIST_ITEM TLI ON TLI.TAXON_LIST_ITEM_KEY = TDET.TAXON_LIST_ITEM_KEY
INNER JOIN INDEX_TAXON_NAME ITN ON ITN.TAXON_LIST_ITEM_KEY = TLI.TAXON_LIST_ITEM_KEY
INNER JOIN INDEX_TAXON_NAME ITN1 ON ITN.RECOMMENDED_TAXON_LIST_ITEM_KEY = ITN1.TAXON_LIST_ITEM_KEY
INNER JOIN TAXON_RANK TR ON TR.TAXON_RANK_KEY = TLI.TAXON_RANK_KEY
INNER JOIN TAXON_VERSION TV ON TV.TAXON_VERSION_KEY = TLI.TAXON_VERSION_KEY
INNER JOIN TAXON_GROUP TG ON TG.TAXON_GROUP_KEY = TV.OUTPUT_GROUP_KEY
GROUP BY ITN.PREFERRED_NAME, ITN1.COMMON_NAME,ITN1.SORT_ORDER, TR.SHORT_NAME, TG.TAXON_GROUP_NAME
ORDER BY ITN1.SORT_ORDER

Not sure why your version  doesn't work, but you can probably work it out.

Mike Weideli

3

Re: Query for complete species list

I assume that the count was linked wrongly so that it gave multiples in my version. Your version gave the right number of occurrences but duplicates in the species list. Changing to ITN1.PREFERRED_NAME in the SELECT AND GROUP lines reduced those. Taking out the Taxon Rank reduces it further (species/subspecies matching between lists?) but I still seem to have some fungi duplicates.

Would a different approach using the ORGANISM table work? I can't work out how to link it in.

Gordon Barker
Biological Survey Data Manager
National Trust

4

Re: Query for complete species list

Reply. I think because this is TLI key based any differences in what comes from these  keys (eg preferred name, Taxon rank)  causes a duplicate name to appear.  What name were you trying to get ?.  This is always going to happen if you want the
name as input or the preferred name. If the recommended name is Ok then the query needs bit of work to get this. Let me know if this is what you want and I will have  go. Basically, you need to link to ITN and then use the recommended TLI key from this to link to tyhe tables to get  all the other information (eg rank group etc), but I need to think about how this affects 'count'.

Mike Weideli

5

Re: Query for complete species list

The question I get asked is how many species do we have records for? Up until now I have been using a modified version of one of Sally's reports, combining the preferred names to reduce duplicates. I recently noticed that this didn't account for the bracketed subgenus species (e.g. Acrotrichis (Acrotrichis) arnoldi) so attempted to write a new one as above to make use of the Recommended name. The count isn't essential but is useful as an error checker. I can't quite get my head around the order of the joins, my original came out of SQL Management Studio.

Ideally I would like Taxon Name (Latin & English), Rank, Group, Count of Occurrences but with as many duplicates as possible removed, which I thought using the loop on ITN.Recommended TLIK would achieve. Probably not actually possible to get them all out, so we might be as close as posible

Gordon Barker
Biological Survey Data Manager
National Trust

6

Re: Query for complete species list

This version works on the Recommended name

Select COUNT(TOCC.Taxon_OCcurrence_Key), ITN.ACTUAL_NAME, ITN.COMMON_NAME,TR.Short_Name as RANK, TG.TAXON_GROUP_NAME
from taxon_Occurrence TOCC
Inner Join Taxon_Determination TDET ON TDET.Taxon_Occurrence_Key = TOCC.Taxon_Occurrence_Key and TDET.Preferred = 1
INNER JOIN Index_Taxon_Name ITN ON ITN.Taxon_List_Item_Key = TDEt.Taxon_List_Item_Key
INNER JOIN Taxon_LIst_Item TLI2 ON TLI2.Taxon_List_Item_Key = ITN.Recommended_Taxon_List_Item_Key
INNER JOIN Taxon_Version TV ON TV.Taxon_Version_Key = TLI2.Taxon_Version_Key
INNER JOIN Taxon_GROUP TG ON TG.TAXON_GROUP_KEY = TV.OUTPUT_GROUP_KEY
INNER JOIN TAXON_RANK TR ON TR.TAXON_RANK_KEY = TLI2.Taxon_Rank_Key
GROUP BY ITN.actual_name,ITN.Common_Name, TR.Short_Name, TG.TAXON_GROUP_NAME,ITN.Sort_Order
ORDER BY ITN.Sort_Order

Mike Weideli

7

Re: Query for complete species list

It seems to need the ITN Recommended loop to remove duplicates. I get >21000 taxa from your last version, but 17000 with the loop built back in, although it is entirely possible that I have misunderstood something important.

Select COUNT(TOCC.Taxon_OCcurrence_Key), ITN1.PREFERRED_NAME, ITN1.COMMON_NAME,TR.Short_Name as RANK, TG.TAXON_GROUP_NAME
from taxon_Occurrence TOCC
Inner Join Taxon_Determination TDET ON TDET.Taxon_Occurrence_Key = TOCC.Taxon_Occurrence_Key
and TDET.Preferred = 1
INNER JOIN Index_Taxon_Name ITN ON ITN.Taxon_List_Item_Key = TDEt.Taxon_List_Item_Key
INNER JOIN INDEX_TAXON_NAME AS ITN1 ON ITN.RECOMMENDED_TAXON_LIST_ITEM_KEY = ITN1.TAXON_LIST_ITEM_KEY
INNER JOIN Taxon_LIst_Item TLI2 ON TLI2.Taxon_List_Item_Key = ITN.Recommended_Taxon_List_Item_Key
INNER JOIN Taxon_Version TV ON TV.Taxon_Version_Key = TLI2.Taxon_Version_Key
INNER JOIN Taxon_GROUP TG ON TG.TAXON_GROUP_KEY = TV.OUTPUT_GROUP_KEY
INNER JOIN TAXON_RANK TR ON TR.TAXON_RANK_KEY = TLI2.Taxon_Rank_Key
GROUP BY ITN1.PREFERRED_name,ITN1.Common_Name, TR.Short_Name, TG.TAXON_GROUP_NAME,ITN.Sort_Order
ORDER BY ITN.Sort_Order

Gordon Barker
Biological Survey Data Manager
National Trust

8

Re: Query for complete species list

The differences, comes from the difference between preferred names and recommended names. Recommended names are derived via Nameserver and are the names which are in the Organism table, theoretically representing the latest thinking on the  name  for the species.  Preferred names are based on individual taxon lists, so may vary depending on which list was used for the  input. Recommended names is probably best, but I think that because the group by includes the taxon name the query is grouping together taxa which have different attributes (eg sensu stricta). You can check this by putting ITN.Recommended_taxon_List_Item_Key in the output and group by .  If you get different keys and the same names, then this is the problem.    The solution is to use the UDF which outputs the taxon name and the attribute.


Select COUNT(TOCC.Taxon_OCcurrence_Key), dbo.LCRecNameWithAttribute( ITN.Recommended_taxon_List_item_key, 1), ITN.COMMON_NAME,TR.Short_Name as RANK, TG.TAXON_GROUP_NAME
from taxon_Occurrence TOCC
Inner Join Taxon_Determination TDET ON TDET.Taxon_Occurrence_Key = TOCC.Taxon_Occurrence_Key and TDET.Preferred = 1
INNER JOIN Index_Taxon_Name ITN ON ITN.Taxon_List_Item_Key = TDEt.Taxon_List_Item_Key
INNER JOIN Taxon_LIst_Item TLI2 ON TLI2.Taxon_List_Item_Key = ITN.Recommended_Taxon_List_Item_Key
INNER JOIN Taxon_Version TV ON TV.Taxon_Version_Key = TLI2.Taxon_Version_Key
INNER JOIN Taxon_GROUP TG ON TG.TAXON_GROUP_KEY = TV.OUTPUT_GROUP_KEY
INNER JOIN TAXON_RANK TR ON TR.TAXON_RANK_KEY = TLI2.Taxon_Rank_Key
GROUP BY dbo.LCRecNameWithAttribute( ITN.Recommended_taxon_List_item_key, 1),ITN.Common_Name, TR.Short_Name, TG.TAXON_GROUP_NAME,ITN.Sort_Order
ORDER BY ITN.Sort_Order

Mike Weideli

9

Re: Query for complete species list

I am still slightly confused. As far as I can see the Organism table has no names. To get one I can get from there to Taxon_Version using TVK (should there be a relation between these tables? 1:many?). Next link TV.RTVK to Nameserver.ITVK then NS.RTLIK to ITN.TLIK and get preferred & common names from there. (or can I go direct from Org.TVK to NS.ITVK?)

I can't see Recommended Name anywhere as a field, my assumption has been that this effectively means the Preferred Name derived from the Recommended TLIK.

I now seem to be going in the opposite direction

Gordon Barker
Biological Survey Data Manager
National Trust

10

Re: Query for complete species list

What seems to be wrong with the query I have given you ?  It should  just be getting the Recommended names with their attributes attached. The UDF is confusing things a bit, because it is (should be) returning the Recommended name + attribute, avoiding an extra join on Index_Taxon_Name (see below).

Everthing you need in the way of names apart from the attributes can be obtained from Index_Taxon_Name.   This table takes into Account the other tables involved like NameSdrver and Taxon_Common_Name. Within ITN,  recommended names can be  identified by the fact that the Taxon_LIst_Item_Key and Recommended_taxon_List_Item_Key are the same.

This query will return all the recommended names and their associated common name, but you don't normally need to use it in this format.  For your information, there is one entry in the Organism table for each recommended name, but you shouldn't need to use the table directly. .


Select ITN.Actual_Name, ITN.Common_Name from Index_Taxon_Name ITN where Taxon_List_Item_Key = Recommended_Taxon_List_Item_Key ORDER BY ITN.ACtual_name

The normal way of getting the recommended names is as follows. This will return a list recommended names in your database. 
Select  ITN1.Actual_Name, ITN1.Common_Name
From Taxon_Determination TDET
INNER JOIN Index_Taxon_Name ITN ON ITN.Taxon_List_Item_Key = TDET.TAxon_List_Item_Key and TDET.Preferred = 1
INNER JOIN Index_Taxon_Name ITN1 ON ITN1.Taxon_List_Item_Key = ITN.Recommended_Taxon_List_Item_Key
GROUP BY ITN1.Actual_Name, ITN1.Common_Name
ORDER BY  ITN1.Actual_Name

Apart from the effect  of attributes this should return the same number of rows as my query above. Preferred_Name doesn't feature in this.

If you want to see the effect of attributes then the following

Select  ITN1.Actual_Name, ITN1.Common_Name,TV.Attribute
From Taxon_Determination TDET
INNER JOIN Index_Taxon_Name ITN ON ITN.Taxon_List_Item_Key = TDET.TAxon_List_Item_Key and TDET.Preferred = 1
INNER JOIN Index_Taxon_Name ITN1 ON ITN1.Taxon_List_Item_Key = ITN.Recommended_Taxon_List_Item_Key
INNER JOIN TAXON_LIST_ITEM TLI ON TLI.TAXON_LIST_ITEM_KEY = ITN1.TAXON_LIST_ITEM_KEY
INNER JOIN TAXON_VERSION TV ON TV.TAXON_VERSION_KEY = TLI.TAXON_VERSION_KEY
GROUP BY ITN1.Actual_Name, ITN1.Common_Name, TV.Attribute
ORDER BY  ITN1.Actual_Name






Mike

Mike Weideli

11

Re: Query for complete species list

I have rerun all the of the queries and got the following numbers of rows for each:
St03            29950  (standard version in R6 Reports)
St03(GB)    29945  (my modified version - no real difference)
NOS            29950  (same as St03)
NOS 2            18695  (my version, matching latin names)
FP1                    10040  (from first post in this thread, completely wrong)
FP2                    19322
FP3                    17370
FP6              21447
FP7             17153
FP8             19102
FP10a        17123
FP10b        17151

Looks like I managed to overcomplicate things a bit and confuse myself. In the end I took 10a and added in the links to Group and Rank. 17153 rows with only 14 duplicates. 15737 at species rank, which is as good an answer as is probably possible.

Thanks Mike, I think I understand it all a bit better now.

Gordon Barker
Biological Survey Data Manager
National Trust