Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG2-VALIDATION_COUNTRY_FOUND #21

Open
iDigBioBot opened this issue Jan 5, 2018 · 33 comments
Open

TG2-VALIDATION_COUNTRY_FOUND #21

iDigBioBot opened this issue Jan 5, 2018 · 33 comments
Labels
CODED Conformance CORE TG2 CORE tests Parameterized Test requires a parameter SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY

Comments

@iDigBioBot
Copy link
Collaborator

iDigBioBot commented Jan 5, 2018

TestField Value
GUID 69b2efdc-6269-45a4-aecb-4cb99c2ae134
Label VALIDATION_COUNTRY_FOUND
Description Does the value of dwc:country occur in the bdq:sourceAuthority?
TestType Validation
Darwin Core Class dcterms:Location
Information Elements ActedUpon dwc:country
Information Elements Consulted
Expected Response EXTERNAL_PREREQUISITES_NOT_MET if the bdq:sourceAuthority is not available; INTERNAL_PREREQUISITES_NOT_MET if dwc:country is bdq:Empty; COMPLIANT if value of dwc:country is a place type equivalent to administrative entity of "nation" in the bdq:sourceAuthority; otherwise NOT_COMPLIANT
Data Quality Dimension Conformance
Term-Actions COUNTRY_FOUND
Parameter(s) bdq:sourceAuthority
Source Authority bdq:sourceAuthority default = "The Getty Thesaurus of Geographic Names (TGN)" {[https://www.getty.edu/research/tools/vocabularies/tgn/index.html]}
Specification Last Updated 2024-08-19
Examples [dwc:country="Eswatini": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:country is a valid country name in the bdq:sourceAuthority"]
[dwc:country="Tasmania": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="Tasmania is not found at the level of national in the bdq:sourceAuthority"]
Source ALA, GBIF
References
Example Implementations (Mechanisms) Kurator/FilteredPush geo_ref_qc Library DOI: 10.5281/zenodo.14064324
Link to Specification Source Code https://github.com/FilteredPush/geo_ref_qc/blob/v2.0.1/src/main/java/org/filteredpush/qc/georeference/DwCGeoRefDQ.java#L158
Notes Non-country information such as "high seas" will fail this test (High Seas should use dwc:countryCode = "XZ" and have dwc:country empty). Getty Place Types for administrative level "nation" are 81010 nation, 81011 independent sovereign nation, and 81012 independent nation. Multiple values in the dwc:country field (whether to signify on a border or in a list of possibilities) will fail this test. Locations outside of a jurisdiction covered by a country code should not have a value in the field dwc:countryCode. This test should find any matches at the Getty "nation" level including internationalized names and historical representations of that nation (where boundaries are same). This test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters.
@iDigBioBot
Copy link
Collaborator Author

Comment by Anonymous migrated from spreadsheet:
None

@iDigBioBot
Copy link
Collaborator Author

Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet:
Is this something that should be added to NOTES column?

@iDigBioBot
Copy link
Collaborator Author

Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet:
In cases where there is no country, test only useful AFTER interpretation of country from coordinates

@iDigBioBot
Copy link
Collaborator Author

Comment by Paul Morris (@chicoreus) migrated from spreadsheet:
Treat "High Seas" as a valid country value. Country of origin, or high seas is critical information for Nagoya Protocol implementation.

@ArthurChapman
Copy link
Collaborator

Discussion from Gainesville meeting: Should we be using current country? of country at time of collection? For may reasons, it was agreed that for this test, the country should be current country.

For the Sequence - see Paula's comment above - run validations, run amendments - run validations again (Paul) Will handle a number of cases where concerned about sequence.

John: do we require country to be a written out version of a country code or can be other political entity - for example United Kingdom.n Important test. Not necessarily an error, but a warning that the country is not a standard modern country. French Indo-China could mean any one of a number of current countries.

Agreed to apply to current ISO countries. Question why ISO and not Getty Thesaurus. Geonames another suggestion. Getty Thesaurus has hierarchies.

Test for now could just as "in a vocabulary" and what vocabulary could change over time. DwC mentions Getty as a recommended vocabulary. We could thus go that way to be consistent with DwC. Put the ISO test to ISO, and the human readable "Country" to the richer Getty Thesaurus.

@ArthurChapman ArthurChapman added the Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT label Jan 16, 2018
@tucotuco tucotuco added the Parameterized Test requires a parameter label Nov 5, 2018
@Tasilee Tasilee changed the title TG2-VALIDATION_COUNTRY_NOTSTANDARD TG2-VALIDATION_COUNTRY_STANDARD Mar 22, 2022
@tucotuco
Copy link
Member

i suggest the Description:

'Does the value of dwc:country occur as the equivalent of a nation in the bdq:sourceAuthority?'

in place of:

'Does the value of dwc:country occur in bdq:sourceAuthority?'

@ArthurChapman
Copy link
Collaborator

That fits with the equivalent NAME tests

@Tasilee Tasilee changed the title TG2-VALIDATION_COUNTRY_STANDARD TG2-VALIDATION_COUNTRY_FOUND Aug 28, 2022
@Tasilee
Copy link
Collaborator

Tasilee commented Aug 28, 2022

From the zoom meeting today, we agreed to align this test with the taxonomic counterparts by renaming "STANDARD" to "FOUND".

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 12, 2022

Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters."

chicoreus added a commit to FilteredPush/geo_ref_qc that referenced this issue Jun 22, 2023
…st current (2023-06-12) test descriptions. Addressed implementation of tdwg/bdq#21 VALIDATION_COUNTRY_FOUND Adding ProvidesVersion annotations.  Removing now empty stub for checked method.  Adding singleton for managing caching of matches.  Adding generated classes for interacting with Getty TGN web service (see filteredpush/gen_gettytgn_client project), added dependencies to support this to pom. VALIDATION_COUNTRY_FOUND implementation supports lookup on TGN web service, with caching, and lookup on country name lists from NaturalEarth shapefiles and copy of country name data from datahub.io  Added unit tests.  Added wrapper for method using default value.
chicoreus added a commit to FilteredPush/geo_ref_qc that referenced this issue Jun 22, 2023
chicoreus added a commit to FilteredPush/geo_ref_qc that referenced this issue Jun 22, 2023
@chicoreus
Copy link
Collaborator

"Equivalent" raises the question of equivalent in rank to nation, or equivalent to any name for the country, or both. I believe the intent was equivalent in rank to nation, with an exact match on the preferred name (as given in the examples, where the preferred name for a nation is treated as compliant, but an english form is treated as not compliant. Question is raised as we have conflicting assertions in validation data dataID 18.

@tucotuco
Copy link
Member

tucotuco commented Aug 5, 2024

I think the important thing is to be able to match a name unambiguously to a current entity that has an ISO county code. In this case the second example would be incorrectly incorrect, that is, found, as the name of the test suggests. The key is to be able to assign a dwc:countryCode where one does not exist.

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 5, 2024

I agree that this test is happy for any match against a national name. It says nothing about 'preferred' or dwc:countryCode.

I therefore agree with @tucotuco about the second example.

We have #62 linking dwc:country with dwc:countryCode and #20 and #48 addressing dwc:countryCode.

How do we proceed?

@ArthurChapman
Copy link
Collaborator

This is becoming a common theme where many of the Vocabularies we link to don't have synonyns, or synonyms easily accessible.

@chicoreus
Copy link
Collaborator

chicoreus commented Aug 6, 2024 via email

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 6, 2024

Tests that use Getty TGN are this one and

#95
#118
#139
#199
#200
#201

I defer to the spatial gurus @tucotuco and @ArthurChapman on changes.

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 6, 2024

The concept of 'current' or 'preferred' raised a degree of synonymy, but what about 'unambiguous' (unique) here? For example, the Test Data DataID #19 has dwc:country="Congo" but as the comment suggests, that is ambiguous, yet we have no explicit test for uniqueness in the Expected Response. FYI

#118 does include "unambiguous"
#139 does include "unambiguous"
#201 does include "unambiguous"

@ArthurChapman
Copy link
Collaborator

@Tasilee - @tucotuco knows TGN better than I but

Nearly all of those include more than just country which makes it more difficult

It may be difficult looking for synonyms for State and Province in TGN (#200, #201) - I would think you probably need an exact match to be "unambiguous".

#139 and #95 (see comment #139 (comment) where we have the results of an earlier ZOOM discussion) - #139 "looks at only one level in the hierarchy at a time and checks the validity of what is there at the level." and as such - if looking for dwc:country would appear to duplicate #21 - maybe we need to drop "dwc:country" out of #139 - or drop #21?

#95 is looking for combinations, and as such it may be a bit much looking for combinations with synonyms - but better than #199 - the combination may be less ambiguous than would be the case for #199.

#199 - just looking for State/Province - hard to be unambiguous if you are trying to cover synonyms, etc. as I think there may be duplicates at that level anyway between different countries and thus would be ambiguous

I think the only place you can look at using synonyms is at the country level is #21, but then see comments under #139 above.

I hope that is clear!

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 6, 2024

Given we have designated #95, #139 and #118 as DO NOT IMPLEMENT (for reasons we have now reiterated), we can ignore them further.

The subtlety of #200 and #201 are somewhat lost on me. Could a COMPLIANT result from values of dwc:country and dwc:stateProvince in #200 also result in a NOT_COMPLIANT in #201 ?

#199 is a direct equivalent of this test (#21). In either case, the way I read the Expected Response, any match at the appropriate administrative level will result in COMPLIANT. Given the complement of SPACE tests, this seems fair and makes the implementation easier, one hopes.

@chicoreus comment on #62. One scenario results in simplifying the core part of part of the Expected Response of #62 -

"COMPLIANT if the country as determined from dwc:countryCode matches the value in dwc:country; otherwise NOT_COMPLIANT"

suggests #62 would only need an ISO3166 lookup. If on the other hand, we want to allow for the country from the ISO country code to match any national administrative level name, the Expected Response would be more like

"COMPLIANT if the dwc:countryCode matches a national administrative-level country name in the bdq:sourceAuthority; otherwise NOT_COMPLIANT" ?

Paul (implementation) and I (Test Data) need a decision on these so we can advance.

@ArthurChapman
Copy link
Collaborator

@tucotuco I think you'll need to comment on some of these issues. Some beyond my knowledge.

@Tasilee
Copy link
Collaborator

Tasilee commented Aug 8, 2024

In illustrating the issues I raised above with @ArthurChapman, I also note Paul's comment above " > The geo_ref_qc implementation of #62 already matches against any form of the country name in Getty TGN." but #62 only uses ISO3166.

Given my comments, I think I have to set a NEEDS WORK against this test, #62, #199, #200, #201 to make sure the issue of the nature of name matches in TGN, and how we phrase the Expected Responses are carefully considered and agreed on.

@tucotuco
Copy link
Member

The subtlety of #200 and #201 are somewhat lost on me. Could a COMPLIANT result from values of dwc:country and dwc:stateProvince in #200 also result in a NOT_COMPLIANT in #201 ?

No. But the reverse could happen. #201 is the strongest test. If it passes for a record, #200 must necessarily also pass and doesn't tell you anything. If #201 fails, #200 could still pass and that would tell you that there are multiple matches on the country/stateProvince combo. That is to say that it would tell you the nature of the problem. Along with #42 (Country not empty), #200 would tell you whether there was an ambiguous combination of country (not empty) and stateProvince, such as would happen with Argentina/Buenos Aires. While if country is empty, then the ambiguity is purely at the stateProvince level.

@chicoreus comment on #62. One scenario results in simplifying the core part of part of the Expected Response of #62 -

"COMPLIANT if the country as determined from dwc:countryCode matches the value in dwc:country; otherwise NOT_COMPLIANT"

suggests #62 would only need an ISO3166 lookup. If on the other hand, we want to allow for the country from the ISO country code to match any national administrative level name, the Expected Response would be more like

"COMPLIANT if the dwc:countryCode matches a national administrative-level country name in the bdq:sourceAuthority; otherwise NOT_COMPLIANT" ?

Paul (implementation) and I (Test Data) need a decision on these so we can advance.

#62 is odd. With only one lookup, the value in dwc:country has to be pretty special, and #62 does not allow for having figured out a countryCode from the value in dwc:country (a vocabulary lookup). There is no other test that does so, but it should be possible to do so. Thus, #62 as stated is less useful than it could be, but it is much simpler than the alternative of finding an unambiguous country match and then looking up it countryCode to see if they match.

@chicoreus
Copy link
Collaborator

Yes, #62 seems to involve magical thinking about the source authority. #62 probably does need to consult both Getty and the ISO list.

@chicoreus
Copy link
Collaborator

Note: "Non-country information such "high seas" will fail this test." would suggest that we need a solution to marking HighSeas other than dwc:country=High Seas, or we amend the test specification to explicitly allow that value.

@ArthurChapman
Copy link
Collaborator

Could we use dwc:waterBody as a consulted term - but as that includes rivers and lakes - not easy. Perhaps OBIS have a vocabulary of non-country Water Bodies that could be consulted as a Source Authority. @ymgan does such a list exist?

@ymgan
Copy link
Collaborator

ymgan commented Aug 14, 2024

I am lost in the discussions of this thread. May I know why should "high seas" pass this test please? I thought high seas specifically are not subject to any individual nation’s laws or control. I struggle to understand why it should not fail the test?

I have forwarded the question to OBIS slack and will keep you posted.
While waiting for the response, is any list in Marine Regions helpful in this? It's the repository OBIS community often use for marine regions.

@chicoreus
Copy link
Collaborator

@ymgan yes, the "High Seas" is the portion of the seas outside the juristidiction of any country, that is, waters beyond the exclusive economic zones of any country. The problem is that for reasons around international treaties it is becoming more and more important to be able to identify the national origin of any specimens (and likely observations as well, but physical specimens that may be genetic resources at the most relevant right now). Darwin Core doesn't have any explicit term or convention for using a term to identify the High Seas (other that coordinates with good enough metadata), so one option for a convention for representing this would be to use the value "High Seas" in dwc:country.

@ArthurChapman the difficulty with dwc:waterBody is that many water bodies will span the waters of both high seas and one or more EEZs.

chicoreus added a commit to FilteredPush/geo_ref_qc that referenced this issue Aug 19, 2024
chicoreus added a commit to FilteredPush/geo_ref_qc that referenced this issue Aug 19, 2024
@chicoreus
Copy link
Collaborator

Getty Place Types for administrative level "nation" from http://vocabsservices.getty.edu/Schemas/TGN/tgn_place_type.xsd

81010 nation
81011 independent sovereign nation
81012 independent nation

@Tasilee
Copy link
Collaborator

Tasilee commented Sep 19, 2024

Checking NEEDS WORK status: Is this still needed? We accept that synonyms are ok (and should be given historical data), "high seas" or similar aren't currently ok (but see dwc:countryCode tests). If on continental shelf/EEZ etc (non-continental) with dwc:country is bdq:Empty seems ok as "INTERNAL_PREREQUISITES_NOT_MET".

@ArthurChapman
Copy link
Collaborator

The NEEDS WORk goes a long way back to a comment from @chicoreus (#21 (comment))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CODED Conformance CORE TG2 CORE tests Parameterized Test requires a parameter SPACE Test Tests created by TG2, either CORE, Supplementary or DO NOT IMPLEMENT TG2 Validation VOCABULARY
Projects
None yet
Development

No branches or pull requests

6 participants