Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommendations on missing/unknown/not recorded data in Darwin Core #437

Closed
ymgan opened this issue Feb 27, 2023 · 3 comments
Closed

Recommendations on missing/unknown/not recorded data in Darwin Core #437

ymgan opened this issue Feb 27, 2023 · 3 comments

Comments

@ymgan
Copy link

ymgan commented Feb 27, 2023

This issue is inspired by Robert Mesibov's post in GBIF discourse - The vexed question of missing data in Darwin Core.
The discussions on the thread and Arctos are very insightful. (Thank you!)

In the post, Bob mentioned:

The Darwin Core recommendations don’t provide a lot of guidance. The entry “unknown” is recommended when footprintSRS, geodeticDatum, verticalDatum or verbatimSRS isn’t known. On the other hand, the recommendation for coordinateUncertaintyInMeters is Leave the value empty if the uncertainty is unknown, cannot be estimated, or is not applicable (because there are no coordinates).

Take the term geodeticDatum for example. unknown and not recorded are recommended in different sources.

From Darwin Core Quick Reference Guide

Recommended best practice is to use the EPSG code of the SRS, if known. Otherwise use a controlled vocabulary for the name or code of the geodetic datum, if known. Otherwise use a controlled vocabulary for the name or code of the ellipsoid, if known. If none of these is known, use the value unknown.

From Georeferencing Best Practices

It is thus recommended to record the EPSG code of the coordinate reference system if possible, otherwise, record the EPSG code of the datum if possible, otherwise, record the EPSG code of the ellipsoid. If none of these can be determined from the coordinate source, record "not recorded"

Subsequently these recommendations affect downstream implementation such as:

Hence I would appreciate if there will be a general guidelines on how to treat different scenario of NITS (Nothing Interesting To Say) in Darwin Core. I appreciate Bob's suggestion on how to treat missing data in his post:

Here’s a possible answer to the “What to do with missing data?” question, and it’s one I regularly propose to the compilers whose Darwin Core data tables I audit: If a data item is missing, leave it blank. If you have a reason for the "missingness’, put it in a …Remarks field.

Thanks a lot!

@qgroom
Copy link
Member

qgroom commented Feb 27, 2023

In the context of transcribing labels from specimens we also made a recommendation to break down unknown into...

  • unknown
  • unknown:undigitized
  • unknown:missing
  • unknown:indecipherable
  • known:withheld

Quentin Groom, Mathias Dillen, Helen Hardy, Sarah Phillips, Luc Willemse, Zhengzhe Wu, Improved standardization of transcribed digital specimen data, Database, Volume 2019, 2019, baz129, https://doi.org/10.1093/database/baz129

@Mesibov
Copy link

Mesibov commented Mar 3, 2023

Here's a good summary from Data Carpentry about missing values as blanks:

https://datacarpentry.org/spreadsheet-ecology-lesson/02-common-mistakes/#null

@tucotuco
Copy link
Member

This issue has been translated into an actionable change request for geodeticDatum. Other than examples or comments on specific terms, Darwin Core does not provide generalized guidelines that might cross standards. Such documentation thus far has been promulgated by the Technical Architecture Group (for an example, see Best practices for serializing booleans). If there are other specific terms for which changes are needed, please submit Term change requests for each of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants