Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enh: add subject and sample aggregating for bids dandisets #92

Merged
merged 3 commits into from
Oct 7, 2021

Conversation

satra
Copy link
Member

@satra satra commented Oct 5, 2021

@djarecka and @TheChymera - this is a quick hack to do bids counting at the dandiset level. ideally once you update the metadata extraction we can get rid of these changes.

@satra satra added patch Increment the patch version when merged release Create a release when this pr is merged labels Oct 5, 2021
@codecov
Copy link

codecov bot commented Oct 5, 2021

Codecov Report

Merging #92 (2e2cfd2) into master (5434f8b) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #92      +/-   ##
==========================================
+ Coverage   96.33%   96.37%   +0.04%     
==========================================
  Files          16       16              
  Lines        1417     1433      +16     
==========================================
+ Hits         1365     1381      +16     
  Misses         52       52              
Flag Coverage Δ
unittests 96.37% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
dandischema/metadata.py 97.11% <100.00%> (+0.17%) ⬆️
dandischema/tests/test_metadata.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5434f8b...2e2cfd2. Read the comment docs.

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments on possible improvement, although not critical

if value["identifier"] not in stats["subjects"]:
stats["subjects"].append(value["identifier"])
if value.get("identifier", None):
subject = value["identifier"].replace("_", "-")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so here it seems you are trying to get subject from two possible places -- wasAttributedTo (to be extracted by dandi-cli) and then the filename. I don't think we have any consistency checks (e.g. wasAttributedTo says one thing and filename -- another), so may be at least here we could

  • just assign subject = None first
  • here assign to the value from identifier
  • when discovering from filename below (I would also add check that part index is 0 -- shouldn't be in the middle of the name) -- if subject is not None and subject != subject_from_filename: -- issue a warning, and keep the one from identifier.
    This way later whenever dandi-cli gets proper bids extraction and thus attributedTo -- it would take precedence etc.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the replacement is there because dandi cli replaces underscores with hyphens to make values of keys in a filename bids compliant.

this is the aggregate function. perhaps that kind of thing should go into a validation function. it would be a good sanity check. but i don't want to turn this into a bids validator (at least not yet). the main purpose is to provide some additional summary for these dandisets so we can improve the reporting.

this also assumes there is a sample even if it's just a json file. technically it should really also have a proper microscopy extension (png/tiff/ngff/h5). very much a hack for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the replacement is there because dandi cli replaces underscores with hyphens to make values of keys in a filename bids compliant.

ok, fair enough -- this would harmonize with a possible id within the filename. ok - let's proceed then.

@satra satra merged commit 0933382 into master Oct 7, 2021
@satra satra deleted the enh/aggregatebids branch October 7, 2021 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged release Create a release when this pr is merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants