-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enh: add subject and sample aggregating for bids dandisets #92
Conversation
Codecov Report
@@ Coverage Diff @@
## master #92 +/- ##
==========================================
+ Coverage 96.33% 96.37% +0.04%
==========================================
Files 16 16
Lines 1417 1433 +16
==========================================
+ Hits 1365 1381 +16
Misses 52 52
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments on possible improvement, although not critical
if value["identifier"] not in stats["subjects"]: | ||
stats["subjects"].append(value["identifier"]) | ||
if value.get("identifier", None): | ||
subject = value["identifier"].replace("_", "-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so here it seems you are trying to get subject
from two possible places -- wasAttributedTo (to be extracted by dandi-cli) and then the filename. I don't think we have any consistency checks (e.g. wasAttributedTo says one thing and filename -- another), so may be at least here we could
- just assign
subject = None
first - here assign to the value from identifier
- when discovering from filename below (I would also add check that part index is 0 -- shouldn't be in the middle of the name) -- if
subject is not None and subject != subject_from_filename:
-- issue a warning, and keep the one from identifier.
This way later whenever dandi-cli gets proper bids extraction and thus attributedTo -- it would take precedence etc.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the replacement is there because dandi cli replaces underscores with hyphens to make values of keys in a filename bids compliant.
this is the aggregate function. perhaps that kind of thing should go into a validation function. it would be a good sanity check. but i don't want to turn this into a bids validator (at least not yet). the main purpose is to provide some additional summary for these dandisets so we can improve the reporting.
this also assumes there is a sample even if it's just a json file. technically it should really also have a proper microscopy extension (png/tiff/ngff/h5). very much a hack for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the replacement is there because dandi cli replaces underscores with hyphens to make values of keys in a filename bids compliant.
ok, fair enough -- this would harmonize with a possible id within the filename. ok - let's proceed then.
@djarecka and @TheChymera - this is a quick hack to do bids counting at the dandiset level. ideally once you update the metadata extraction we can get rid of these changes.