-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schematron bug, related to ISO 15511 regular expression pattern #549
Comments
Just to follow up, I tested reversing the first two characters of the current regex, and that does indeed fix the issue. |
See SAA-SDT/eas-schematrons@afe49d1 for the patch. I'm still planning to update this in the new Schematron to use the country codes, however. |
Hi @fordmadox, I agree that we should restrict a repository/maintenance agency code that is declared to be ISO 15511 compliant to maximal 16 characters. However, XX actually is a valid country code as it is part of the ranges that can be user-assigned. What it stands for might be different from one context to another, but against ISO 3166-1 it is valid. "User-assigned codes - If users need code elements to represent country names not included in ISO 3166-1, the series of letters AA, QM to QZ, XA to XZ, and ZZ, and the series AAA to AAZ, QMA to QZZ, XAA to XZZ, and ZZA to ZZZ respectively, and the series of numbers 900 to 999 are available." (https://www.iso.org/glossary-for-iso-3166.html) |
@kerstarno Regarding "XX" and ISO 3166, or any codes reserved for private use (e.g. 'qab' in ISO 639-2), I wonder if we should still flag those as invalid. Given that there is no agreement about what those codes can represent, shouldn't we expect a user to record their usage within the control section, and also set the code list "otherCountryEncoding"?. That country code (not to mention the numeric equivalents, and the 3-character user-assigned options) was never valid in EAD2002 nor EAD3... though any 2-character A-Z code could be used in the agency code heading in EAD3, which would make this type of error especially odd:
Where the maintenanceagency element is invalid in EAD3, but the agencycode element is valid! Quite the mixed message, there 😄 Also, it looks like the regular expression test in EAC 1.0 for country codes was limited to any 2-digit or 4-digit A-Z code. Given all that, I do prefer EAD3's approach to the country code validation (not the ISIL one, though, due to the discrepancy highlighted above). |
@fordmadox - I see your point about it not being clear what "XX" (or any other of these user assigned codes) stands for specifically, but they are part of the ISO 3166, so "otherCountryEncoding" would not necessarily be correct, I'd say. Also, with the officially assigned codes we only check whether they are part of the ISO standard, we don't necessarily relate them to the appropriate country names, right? I mean, for validation, we don't really care, whether "XX" stands for "Country A" or "Country B", do we? Maybe there's a possibility to let these codes validate, but to flag them as user-assigned? Same as we discussed with regard to deprecated codes? |
While testing the new schematron file for EAC 2.0, I noticed that the regex borrowed from the EAD3 schematron has a small bug. For example, the following value is valid according to the EAD3 schematron:
US-oclc-12345678901
However, that is a fake 19 digit code, which should NOT be valid. That same 19-digit code is, correctly, not valid in EAD2002 nor EAC 1.0.
I am going to recreate that pattern for EAC 2.0 by following, essentially, the EAD2002 model, which does validate the country code, when present. Since we are validating the country code elsewhere, it seems like we should do that here, as well, rather than just using a two-character match pattern for that. Anyhow, here's the current EAD3 regex:
(^([A-Z]{2})|([a-zA-Z]{1})|([a-zA-Z]{3,4}))(-[a-zA-Z0-9:/-]{1,11})$
Whereas that should probably be (though NOT tested):
^(([A-Z]{2})|([a-zA-Z]{1})|([a-zA-Z]{3,4}))(-[a-zA-Z0-9:/-]{1,11})$
To decide:
Should we:
Another example: right now, the following is also valid in EAD3:
XX-1
Whereas that same fake code is correctly not valid in EAD2002 (though it is in EAC-CPF 1.0, which switched to a pure regex validation).
Creator of issue
The issue relates to
Wanted change/feature
Reporting a bug
Suggested Solution
Steps to Reproduce (for bugs)
Context
Your Environment can be a clue to a bug
The text was updated successfully, but these errors were encountered: