enh: implement metadata aggregator for asset summary #34

satra · 2021-06-16T02:07:45Z

This PR implements the computation of asset summary for a dandiset to this library.

slight changes to regexes required pattern updates (so new schema will be released)

addresses part 1 of dandi/dandi-archive#337

will file a PR on API to use this.

codecov · 2021-06-16T02:08:49Z

Codecov Report

Merging #34 (9ab77b1) into master (075d8f7) will increase coverage by 1.09%.
The diff coverage is 96.87%.

@@            Coverage Diff             @@
##           master      #34      +/-   ##
==========================================
+ Coverage   94.65%   95.75%   +1.09%     
==========================================
  Files          11       11              
  Lines         879      965      +86     
==========================================
+ Hits          832      924      +92     
+ Misses         47       41       -6

Flag	Coverage Δ
unittests	`95.75% <96.87%> (+1.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
dandischema/metadata.py	`97.07% <95.58%> (-0.07%)`	⬇️
dandischema/consts.py	`100.00% <100.00%> (ø)`
dandischema/models.py	`93.73% <100.00%> (+2.04%)`	⬆️
dandischema/tests/test_metadata.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 075d8f7...9ab77b1. Read the comment docs.

IMHO aggregate is not sufficiently descriptive of it purpose and behavior (mutates, instead of return aggregated value). AFAIK this function is not (yet at least) to be used by external tools, so I made it protected. Also added some type hints while at it

It is used only once, not generic (spreads logic about variableMeasured). Placing that logic directly where used IMHO makes it easier to follow the code

That is AFAIK where all models are defined and IMHO they should not be in metadata, which is supporting functionality for dealing with all those models/metadata

so we somehow do not match a directory or some funkily named non-nii file

I do not quite like aggregate_assets_summary and would prefer to start producing constructors such as AssetsSummary.from_metadata and alike. But I disliked toSummary more: name camel cased, mutating provided stats. Notable other difference - decided to return AssetsSummary itself, and let outside code to do .json_dict() or any other desired serializer

Made them all private since we do not / want to expose them AFAIK since otherwise would become a part of the interface etc

Detected by mypy and I think it would be a legit thing to do

to keep metadata module interface free of manipulating models. Later we might RF to get AssetsSummary.from_assets and alike

Explicit better than implicit and IMHO semantically (since we are not saying that it is ALLOWED_OLDER_INPUT_SCHEMAS) I think it would be more correct. Did it in the code to avoid hardcoding duplicates

… that

* upstream/master: Update dandischema/datacite.py Only attempt to remove contributorType if present

yarikoptic · 2021-06-16T20:15:53Z

Let's proceed!

enh: implement metadata aggregator for asset summary

40fca52

satra requested a review from yarikoptic June 16, 2021 02:19

satra mentioned this pull request Jun 16, 2021

rf: change metadata aggregation to dedicated functions from dandischema dandi/dandi-archive#338

Merged

satra added patch Increment the patch version when merged release Create a release when this pr is merged labels Jun 16, 2021

yarikoptic and others added 16 commits June 16, 2021 11:58

RF: dissolve/inline _append_values

572c185

It is used only once, not generic (spreads logic about variableMeasured). Placing that logic directly where used IMHO makes it easier to follow the code

RF: move nwb_ bids_ standards into models

5d876ac

That is AFAIK where all models are defined and IMHO they should not be in metadata, which is supporting functionality for dealing with all those models/metadata

ENH: use Path.suffixes for more robust testing for .nii and .json

34804ea

so we somehow do not match a directory or some funkily named non-nii file

RF: use dict.pop instead of get + follow up with del

d311516

ENH: provide typing for stats records

7e68e75

Made them all private since we do not / want to expose them AFAIK since otherwise would become a part of the interface etc

ENH: explicitly cast "id" to str before testing with startswith

6979cd0

Detected by mypy and I think it would be a legit thing to do

BF(typing): only DandiBaseModel would have unvalidated we provide

0f8b5ad

linting (unused imports and comments format)

f129f15

RF: make aggregate_assets_summary return a dict (not our model)

411c697

to keep metadata module interface free of manipulating models. Later we might RF to get AssetsSummary.from_assets and alike

RF: list current new version in ALLOWED_INPUT_SCHEMAS

b54749e

Explicit better than implicit and IMHO semantically (since we are not saying that it is ALLOWED_OLDER_INPUT_SCHEMAS) I think it would be more correct. Did it in the code to avoid hardcoding duplicates

ENH(TST): handle only allowed input schema versions for assets + test…

9771a4f

… that

ENH(TST): test that "double migration" does not do anything but copy

1c97155

linting

a271084

fix: validate requires allowed schemas

3ce53f5

satra force-pushed the enh/aggregate branch from 359be48 to 3ce53f5 Compare June 16, 2021 19:01

Merge remote-tracking branch 'upstream/master' into enh/aggregate

9ab77b1

* upstream/master: Update dandischema/datacite.py Only attempt to remove contributorType if present

yarikoptic merged commit b38ce14 into master Jun 16, 2021

yarikoptic deleted the enh/aggregate branch June 16, 2021 20:15

yarikoptic mentioned this pull request Jun 16, 2021

Setup pre-commit black'ing and linting similarly to dandi-cli #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enh: implement metadata aggregator for asset summary #34

enh: implement metadata aggregator for asset summary #34

satra commented Jun 16, 2021

codecov bot commented Jun 16, 2021 •

edited

Loading

yarikoptic commented Jun 16, 2021

enh: implement metadata aggregator for asset summary #34

enh: implement metadata aggregator for asset summary #34

Conversation

satra commented Jun 16, 2021

codecov bot commented Jun 16, 2021 • edited Loading

Codecov Report

yarikoptic commented Jun 16, 2021

codecov bot commented Jun 16, 2021 •

edited

Loading