-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TG2-VALIDATION_COUNTRY_FOUND #21
Comments
Comment by Anonymous migrated from spreadsheet: |
Comment by Arthur Chapman (@ArthurChapman) migrated from spreadsheet: |
Comment by Paula Zermoglio (@pzermoglio) migrated from spreadsheet: |
Comment by Paul Morris (@chicoreus) migrated from spreadsheet: |
Discussion from Gainesville meeting: Should we be using current country? of country at time of collection? For may reasons, it was agreed that for this test, the country should be current country. For the Sequence - see Paula's comment above - run validations, run amendments - run validations again (Paul) Will handle a number of cases where concerned about sequence. John: do we require country to be a written out version of a country code or can be other political entity - for example United Kingdom.n Important test. Not necessarily an error, but a warning that the country is not a standard modern country. French Indo-China could mean any one of a number of current countries. Agreed to apply to current ISO countries. Question why ISO and not Getty Thesaurus. Geonames another suggestion. Getty Thesaurus has hierarchies. Test for now could just as "in a vocabulary" and what vocabulary could change over time. DwC mentions Getty as a recommended vocabulary. We could thus go that way to be consistent with DwC. Put the ISO test to ISO, and the human readable "Country" to the richer Getty Thesaurus. |
i suggest the Description: 'Does the value of dwc:country occur as the equivalent of a nation in the bdq:sourceAuthority?' in place of: 'Does the value of dwc:country occur in bdq:sourceAuthority?' |
That fits with the equivalent NAME tests |
From the zoom meeting today, we agreed to align this test with the taxonomic counterparts by renaming "STANDARD" to "FOUND". |
Added to Notes: "This test will fail if there are leading or trailing white space or non-printing characters." |
…st current (2023-06-12) test descriptions. Addressed implementation of tdwg/bdq#21 VALIDATION_COUNTRY_FOUND Adding ProvidesVersion annotations. Removing now empty stub for checked method. Adding singleton for managing caching of matches. Adding generated classes for interacting with Getty TGN web service (see filteredpush/gen_gettytgn_client project), added dependencies to support this to pom. VALIDATION_COUNTRY_FOUND implementation supports lookup on TGN web service, with caching, and lookup on country name lists from NaturalEarth shapefiles and copy of country name data from datahub.io Added unit tests. Added wrapper for method using default value.
tdwg/bdq#21 VALIDATION_COUNTRY_FOUND.
"Equivalent" raises the question of equivalent in rank to nation, or equivalent to any name for the country, or both. I believe the intent was equivalent in rank to nation, with an exact match on the preferred name (as given in the examples, where the preferred name for a nation is treated as compliant, but an english form is treated as not compliant. Question is raised as we have conflicting assertions in validation data dataID 18. |
I think the important thing is to be able to match a name unambiguously to a current entity that has an ISO county code. In this case the second example would be incorrectly incorrect, that is, found, as the name of the test suggests. The key is to be able to assign a dwc:countryCode where one does not exist. |
This is becoming a common theme where many of the Vocabularies we link to don't have synonyns, or synonyms easily accessible. |
On Mon, 05 Aug 2024 13:57:26 -0700 Lee Belbin ***@***.***> wrote:
We have #62 linking dwc:country with dwc:countryCode and #20 and #48
addressing dwc:countryCode.
The geo_ref_qc implementation of #62 already matches against any form of the country name in Getty TGN. We've been implicitly interpreting the note about country name in the original language to mean preferred or other names. We should be explicit about that in the notes for #62.
#20 and #48 don't have dwc:country as an information element, so nothing needed there.
We should review tests that include state/province and any other tests where we've specified the Getty TGN as a source authority.
|
The concept of 'current' or 'preferred' raised a degree of synonymy, but what about 'unambiguous' (unique) here? For example, the Test Data DataID #19 has dwc:country="Congo" but as the comment suggests, that is ambiguous, yet we have no explicit test for uniqueness in the Expected Response. FYI #118 does include "unambiguous" |
@Tasilee - @tucotuco knows TGN better than I but Nearly all of those include more than just country which makes it more difficult It may be difficult looking for synonyms for State and Province in TGN (#200, #201) - I would think you probably need an exact match to be "unambiguous". #139 and #95 (see comment #139 (comment) where we have the results of an earlier ZOOM discussion) - #139 "looks at only one level in the hierarchy at a time and checks the validity of what is there at the level." and as such - if looking for dwc:country would appear to duplicate #21 - maybe we need to drop "dwc:country" out of #139 - or drop #21? #95 is looking for combinations, and as such it may be a bit much looking for combinations with synonyms - but better than #199 - the combination may be less ambiguous than would be the case for #199. #199 - just looking for State/Province - hard to be unambiguous if you are trying to cover synonyms, etc. as I think there may be duplicates at that level anyway between different countries and thus would be ambiguous I think the only place you can look at using synonyms is at the country level is #21, but then see comments under #139 above. I hope that is clear! |
Given we have designated #95, #139 and #118 as DO NOT IMPLEMENT (for reasons we have now reiterated), we can ignore them further. The subtlety of #200 and #201 are somewhat lost on me. Could a COMPLIANT result from values of dwc:country and dwc:stateProvince in #200 also result in a NOT_COMPLIANT in #201 ? #199 is a direct equivalent of this test (#21). In either case, the way I read the Expected Response, any match at the appropriate administrative level will result in COMPLIANT. Given the complement of SPACE tests, this seems fair and makes the implementation easier, one hopes. @chicoreus comment on #62. One scenario results in simplifying the core part of part of the Expected Response of #62 - "COMPLIANT if the country as determined from dwc:countryCode matches the value in dwc:country; otherwise NOT_COMPLIANT" suggests #62 would only need an ISO3166 lookup. If on the other hand, we want to allow for the country from the ISO country code to match any national administrative level name, the Expected Response would be more like "COMPLIANT if the dwc:countryCode matches a national administrative-level country name in the bdq:sourceAuthority; otherwise NOT_COMPLIANT" ? Paul (implementation) and I (Test Data) need a decision on these so we can advance. |
@tucotuco I think you'll need to comment on some of these issues. Some beyond my knowledge. |
In illustrating the issues I raised above with @ArthurChapman, I also note Paul's comment above " > The geo_ref_qc implementation of #62 already matches against any form of the country name in Getty TGN." but #62 only uses ISO3166. Given my comments, I think I have to set a NEEDS WORK against this test, #62, #199, #200, #201 to make sure the issue of the nature of name matches in TGN, and how we phrase the Expected Responses are carefully considered and agreed on. |
No. But the reverse could happen. #201 is the strongest test. If it passes for a record, #200 must necessarily also pass and doesn't tell you anything. If #201 fails, #200 could still pass and that would tell you that there are multiple matches on the country/stateProvince combo. That is to say that it would tell you the nature of the problem. Along with #42 (Country not empty), #200 would tell you whether there was an ambiguous combination of country (not empty) and stateProvince, such as would happen with Argentina/Buenos Aires. While if country is empty, then the ambiguity is purely at the stateProvince level.
#62 is odd. With only one lookup, the value in dwc:country has to be pretty special, and #62 does not allow for having figured out a countryCode from the value in dwc:country (a vocabulary lookup). There is no other test that does so, but it should be possible to do so. Thus, #62 as stated is less useful than it could be, but it is much simpler than the alternative of finding an unambiguous country match and then looking up it countryCode to see if they match. |
Note: "Non-country information such "high seas" will fail this test." would suggest that we need a solution to marking HighSeas other than dwc:country=High Seas, or we amend the test specification to explicitly allow that value. |
Could we use dwc:waterBody as a consulted term - but as that includes rivers and lakes - not easy. Perhaps OBIS have a vocabulary of non-country Water Bodies that could be consulted as a Source Authority. @ymgan does such a list exist? |
I am lost in the discussions of this thread. May I know why should "high seas" pass this test please? I thought high seas specifically are not subject to any individual nation’s laws or control. I struggle to understand why it should not fail the test? I have forwarded the question to OBIS slack and will keep you posted. |
@ymgan yes, the "High Seas" is the portion of the seas outside the juristidiction of any country, that is, waters beyond the exclusive economic zones of any country. The problem is that for reasons around international treaties it is becoming more and more important to be able to identify the national origin of any specimens (and likely observations as well, but physical specimens that may be genetic resources at the most relevant right now). Darwin Core doesn't have any explicit term or convention for using a term to identify the High Seas (other that coordinates with good enough metadata), so one option for a convention for representing this would be to use the value "High Seas" in dwc:country. @ArthurChapman the difficulty with dwc:waterBody is that many water bodies will span the waters of both high seas and one or more EEZs. |
…ed to implementation.
…ed to implementation.
Getty Place Types for administrative level "nation" from http://vocabsservices.getty.edu/Schemas/TGN/tgn_place_type.xsd 81010 nation |
Checking NEEDS WORK status: Is this still needed? We accept that synonyms are ok (and should be given historical data), "high seas" or similar aren't currently ok (but see dwc:countryCode tests). If on continental shelf/EEZ etc (non-continental) with dwc:country is bdq:Empty seems ok as "INTERNAL_PREREQUISITES_NOT_MET". |
The NEEDS WORk goes a long way back to a comment from @chicoreus (#21 (comment)) |
The text was updated successfully, but these errors were encountered: