1

Re: SQL Query - Too Many Rows - Duplication

Hi I'm writing a query in MS SMS to select all taxon occurrences and I'm hopeful that I'm pretty much there. Then there is reality, D'oh. I'm getting around 30% more rows out than I have in the summary info and this also ties in with the total number reported within the reporting wizard with no filters on. So I'm not excluding unchecked etc. which is fine at present.

I have included "WHERE     (dbo.TAXON_DETERMINATION.PREFERRED = 1)" so should not be getting duplicated from non preferred taxa, but still have too many rows.

Does anyone know what is wrong with this?

SELECT     dbo.INDEX_TAXON_NAME.PREFERRED_NAME AS Latin_Preffered, dbo.INDEX_TAXON_NAME.COMMON_NAME,
                      dbo.TAXON_DETERMINATION.PREFERRED, dbo.SAMPLE.SAMPLE_REFERENCE, dbo.SAMPLE.SPATIAL_REF,
                      dbo.FormatEventRecorders(dbo.SAMPLE.SAMPLE_KEY) AS Recorders, dbo.LCReturnDate(dbo.SAMPLE.VAGUE_DATE_START,
                      dbo.SAMPLE.VAGUE_DATE_TYPE, 'F') AS Start_Date, dbo.LCReturnDate(dbo.SAMPLE.VAGUE_DATE_END, dbo.SAMPLE.VAGUE_DATE_TYPE, 'F')
                      AS End_Date, dbo.INDEX_TAXON_NAME.SORT_ORDER, dbo.INDEX_TAXON_NAME.ABBREVIATION, dbo.SURVEY.ITEM_NAME AS Survey_Name,
                      dbo.LOCATION.FILE_CODE, dbo.LOCATION.DESCRIPTION, dbo.ufn_GetDesignations(dbo.TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY, 1,
                      'SYSTEM0000000000', NULL) AS All_Designated, dbo.LCReturnEastingsV2(dbo.SAMPLE.SPATIAL_REF, 'OSGB', 0) AS Easting,
                      dbo.LCReturnNorthingsV2(dbo.SAMPLE.SPATIAL_REF, 'OSGB', 0) AS Northing, dbo.TAXON_OCCURRENCE.TAXON_OCCURRENCE_KEY
FROM         dbo.TAXON_LIST_ITEM INNER JOIN
                      dbo.TAXON_DETERMINATION ON
                      dbo.TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = dbo.TAXON_DETERMINATION.TAXON_LIST_ITEM_KEY INNER JOIN
                      dbo.TAXON_OCCURRENCE ON
                      dbo.TAXON_DETERMINATION.TAXON_OCCURRENCE_KEY = dbo.TAXON_OCCURRENCE.TAXON_OCCURRENCE_KEY INNER JOIN
                      dbo.SAMPLE ON dbo.TAXON_OCCURRENCE.SAMPLE_KEY = dbo.SAMPLE.SAMPLE_KEY INNER JOIN
                      dbo.TAXON_COMMON_NAME ON dbo.TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = dbo.TAXON_COMMON_NAME.TAXON_LIST_ITEM_KEY INNER JOIN
                      dbo.INDEX_TAXON_NAME ON
                      dbo.TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = dbo.INDEX_TAXON_NAME.RECOMMENDED_TAXON_LIST_ITEM_KEY INNER JOIN
                      dbo.SURVEY_EVENT ON dbo.SAMPLE.SURVEY_EVENT_KEY = dbo.SURVEY_EVENT.SURVEY_EVENT_KEY INNER JOIN
                      dbo.SURVEY ON dbo.SURVEY_EVENT.SURVEY_KEY = dbo.SURVEY.SURVEY_KEY INNER JOIN
                      dbo.LOCATION ON dbo.SAMPLE.LOCATION_KEY = dbo.LOCATION.LOCATION_KEY AND
                      dbo.SURVEY_EVENT.LOCATION_KEY = dbo.LOCATION.LOCATION_KEY INNER JOIN
                      dbo.TAXON_DESIGNATION ON dbo.TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = dbo.TAXON_DESIGNATION.TAXON_LIST_ITEM_KEY INNER JOIN
                      dbo.Index_Taxon_Designation ON dbo.TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY = dbo.Index_Taxon_Designation.Taxon_List_Item_Key INNER JOIN
                      dbo.TAXON_DESIGNATION_TYPE ON
                      dbo.TAXON_DESIGNATION.TAXON_DESIGNATION_TYPE_KEY = dbo.TAXON_DESIGNATION_TYPE.TAXON_DESIGNATION_TYPE_KEY AND
                      dbo.Index_Taxon_Designation.Taxon_Designation_Type_Key = dbo.TAXON_DESIGNATION_TYPE.TAXON_DESIGNATION_TYPE_KEY
WHERE     (dbo.TAXON_DETERMINATION.PREFERRED = 1)

Any pointers would be most appreciated.

Tony

Data Manger
Somerset Environmental Records Centre

2 (edited by MikeWeideli 11-05-2011 11:40:09)

Re: SQL Query - Too Many Rows - Duplication

First thought is that the taxon_designation tables will be returning mutiple records. You don't need any of these in the FROM clause as the function you are using works directly from Taxon_list_Item_Key. Also unsure about the common name, but I am not sure what you are trying to do here, so it may be OK.  Also by including Survey Event  and Sample links to Location  you may be excluding  records where the Survey_event has no Location and/or returning an additional record where the survey_event has a different location to the sample.  The following untested query,  probably isn't  what you need as it returns recommended rather than preferred names , but it may give you some ideas .

SELECT     ITN2.ACTUAL_NAME AS Latin_Preffered,
                     ITN2.COMMON_NAME,
                     dbo.TAXON_DETERMINATION.PREFERRED, dbo.SAMPLE.SAMPLE_REFERENCE, dbo.SAMPLE.SPATIAL_REF,
                      dbo.FormatEventRecorders(dbo.SAMPLE.SAMPLE_KEY) AS Recorders, dbo.LCReturnDate(dbo.SAMPLE.VAGUE_DATE_START,
                      dbo.SAMPLE.VAGUE_DATE_TYPE, 'F') AS Start_Date, dbo.LCReturnDate(dbo.SAMPLE.VAGUE_DATE_END, dbo.SAMPLE.VAGUE_DATE_TYPE, 'F')
                      AS End_Date, ITN2.SORT_ORDER, ITN2.ABBREVIATION, dbo.SURVEY.ITEM_NAME AS Survey_Name,
                      dbo.LOCATION.FILE_CODE, dbo.LOCATION.DESCRIPTION, dbo.ufn_GetDesignations(dbo.TAXON_LIST_ITEM.TAXON_LIST_ITEM_KEY, 1,
                      'SYSTEM0000000000', NULL) AS All_Designated, dbo.LCReturnEastingsV2(dbo.SAMPLE.SPATIAL_REF, 'OSGB', 0) AS Easting,
                      dbo.LCReturnNorthingsV2(dbo.SAMPLE.SPATIAL_REF, 'OSGB', 0) AS Northing, dbo.TAXON_OCCURRENCE.TAXON_OCCURRENCE_KEY
FROM        TAXON_DETERMINATION
                   INNER JOIN TAXON_OCCURRENCE
                  ON TAXON_OCCURRENCE.TAXON_OCCURRENCE_KEY = TAXON_DETERMINATION.TAXON_OCCURRENCE_KEY
INNER JOIN INDEX_TAXON_NAME ON INDEX_TAXON_NAME.TAXON_LIST_ITEM_KEY =
TAXON_DETERMINATION.TAXON_LIST_ITEM_KEY
INNER JOIN INDEX_TAXON_NAME ITN2 ON ITN2.TAXON_LIST_ITEM_KEY = INDEX_TAXON_NAME.RECOMMENDED_TAXON_LIST_ITEM_KEY
INNER JOIN SAMPLE ON SAMPLE.SAMPLE_KEY = TAXON_OCCURRENCE_SAMPLE_KEY
INNER JOIN LOCATION ON LOCATION.LOCATION_KEY = SAMPLE.LOCATION_KEY
INNER JOIN SURVEY_EVENT ON SURVEY_EVENT_KEY = SAMPLE.SURVEY_EVENT_KEY
INNER JOIN SURVEY ON SURVEY_KEY = SURVEY_EVENT_KEY
WHERE     (dbo.TAXON_DETERMINATION.PREFERRED = 1)

Mike Weideli

3

Re: SQL Query - Too Many Rows - Duplication

Hi Tone

I haven't checked Mike's version above, but the following SQL code is my best effort at a simplified version. On our system it produces almost exactly the right result (there appear to be three extra records in 949805, which I cannot explain)

The six tables are linked unambiguously and the only fields which are not produced are those relating to the location (FILE_CODE and DESCRIPTION). These two are slightly problematic as many of your records will have no LOCATION. I haven't had time to check, but you might just be able to add in table LOCATION and perhaps LOCATION NAME, or alternatively you could write a a bit more SQL which appends these details afterwards, or wirte some user defined functions to populate the fields without making the FROM clause any more complex. Of course all these options have a time overhead, but you are intending to run this unattended overnight so shouldn't be a problem.

I am not sure that this version will report designations properly, but it should because it uses the RECOMMENDED_TAXON_LIST_ITEM_KEY from the ITN table as the feed for the designation function. The use of a second ITN as in Mike's version, joined to the first by the RTLIK may be the source of duplicates as this effectively introduces all possible synonyms (I think). Of course my version is only as reliable as the ITN table is.

Call me if you want to discuss the various options. Anyway here's the code.

SELECT     dbo.INDEX_TAXON_NAME.PREFERRED_NAME, dbo.INDEX_TAXON_NAME.COMMON_NAME, dbo.SAMPLE.SAMPLE_REFERENCE,
                      dbo.SAMPLE.SPATIAL_REF, dbo.FormatEventRecorders(dbo.SAMPLE.SAMPLE_KEY) AS Expr1, dbo.LCReturnDate(dbo.SAMPLE.VAGUE_DATE_START,
                      dbo.SAMPLE.VAGUE_DATE_TYPE, 'F') AS Expr2, dbo.LCReturnDate(dbo.SAMPLE.VAGUE_DATE_END, dbo.SAMPLE.VAGUE_DATE_TYPE, 'F') AS Expr3,
                      dbo.INDEX_TAXON_NAME.ABBREVIATION, dbo.INDEX_TAXON_NAME.SORT_ORDER, dbo.SURVEY.ITEM_NAME,
                      dbo.ufn_GetDesignations(dbo.INDEX_TAXON_NAME.RECOMMENDED_TAXON_LIST_ITEM_KEY, 1, 'SYSTEM0000000000', NULL) AS Expr4,
                      dbo.LCReturnEastingsV2(dbo.SAMPLE.SPATIAL_REF, 'OSGB', 0) AS Expr5, dbo.LCReturnNorthingsV2(dbo.SAMPLE.SPATIAL_REF, 'OSGB', 0) AS Expr6,
                      dbo.TAXON_OCCURRENCE.TAXON_OCCURRENCE_KEY
FROM         dbo.SURVEY INNER JOIN
                      dbo.SURVEY_EVENT ON dbo.SURVEY.SURVEY_KEY = dbo.SURVEY_EVENT.SURVEY_KEY INNER JOIN
                      dbo.SAMPLE ON dbo.SURVEY_EVENT.SURVEY_EVENT_KEY = dbo.SAMPLE.SURVEY_EVENT_KEY INNER JOIN
                      dbo.TAXON_OCCURRENCE ON dbo.SAMPLE.SAMPLE_KEY = dbo.TAXON_OCCURRENCE.SAMPLE_KEY INNER JOIN
                      dbo.TAXON_DETERMINATION ON
                      dbo.TAXON_OCCURRENCE.TAXON_OCCURRENCE_KEY = dbo.TAXON_DETERMINATION.TAXON_OCCURRENCE_KEY INNER JOIN
                      dbo.INDEX_TAXON_NAME ON dbo.TAXON_DETERMINATION.TAXON_LIST_ITEM_KEY = dbo.INDEX_TAXON_NAME.TAXON_LIST_ITEM_KEY
WHERE     (dbo.TAXON_DETERMINATION.PREFERRED = 1)

Rob Large
Wildlife Sites Officer
Wiltshire & Swindon Biological Records Centre

4

Re: SQL Query - Too Many Rows - Duplication

Thanks Mike and Rob; I have now run you solution Rob and find I only now end up with 7 'extra records'. This would seem acceptable at this time considering over a million in the total.

I shall certainly look to extending this in the areas I need and as you say probably by an extra ufn.

Thanks, I am hopeful that the more I play the more I will grasp of this bag of spiders [R6].

Tony

Data Manger
Somerset Environmental Records Centre

5

Re: SQL Query - Too Many Rows - Duplication

You can end up with more than one preferred determination on a record under some circumstances (see http://forums.nbn.org.uk/viewtopic.php?id=1945). Could that be the source of your extra records? There's a batch update to resolve such records (see same thread).

Richard Burkmar
Biodiversity Project Officer
Field Studies Council

6

Re: SQL Query - Too Many Rows - Duplication

Often these duplicates turn out to be where you have set up an alternative name for a species. This gives mutiple entries in Index_Taxon_Name for a  Taxon_List_Item_Key.  In these cases you can overcome the problem by adding WHERE Index_Taxon_Name.System_Supplied_Data = 1 to the query.. This doesn't stop user added taxa being included, but does resolve issues where you have aded an alternative name for an existing species.

Mike Weideli

7

Re: SQL Query - Too Many Rows - Duplication

If you want to know whether you have observations with more than one preferred determination and which observations they are, run the xml reports Sy06 and Sy07 in Problems V2.zip – see http://forums.nbn.org.uk/viewtopic.php?pid=8931#p8931 . Investigating specific cases may help identify why the problem is occurring in your system. These reports will check your system for a range of problems, some of which could cause problems with reports, e.g. more than one preferred location name.

Sally Rankin, JNCC Recorder Approved Expert
E-mail: s.rankin@btinternet.com
Telephone: 01491 578633
Mobile: 07941 207687