Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add remove metadata #169

Closed
wants to merge 7 commits into from

Conversation

yasirroni
Copy link
Contributor

Add new argument for clean and filter "-M" or "--remove-notebook-metadata" to "remove notebook metadata.

    clean_parser.add_argument(
        "-M",
        "--remove-notebook-metadata",
        action="store_true",
        help="remove notebook metadata",
    )

re-PR from #163

@yasirroni yasirroni force-pushed the clean_notebook_metadata branch from 8b1e1cf to 0da1136 Compare November 3, 2022 08:23
@yasirroni
Copy link
Contributor Author

Fails due to pylint:

poetry run pylint src  
************* Module nb_clean
src\nb_clean\__init__.py:147:0: R0913: Too many arguments (6/5) (too-many-arguments)

-----------------------------------
Your code has been rated at 9.95/10

@yasirroni
Copy link
Contributor Author

Maybe refactor to keyword-only-argument? See SO answer and PEP 3102.

@yasirroni
Copy link
Contributor Author

Any update, @srstevenson

The fails is not mandatory and can be fixed by making all args as keyword-only-argument

@srstevenson
Copy link
Owner

I've held off from merging this because I'm not yet convinced there's a real user requirement for it. Most of the notebook metadata doesn't change frequently, and so doesn't contribute to the noise in VCS diffs which nb-clean was written to address.

@yasirroni
Copy link
Contributor Author

Different editor such as Google Colab vs VSCode and different Python version across user produce different metadata. That is the use case.

Not convinced enough?

@yasirroni yasirroni mentioned this pull request Jan 26, 2023
@haplav
Copy link

haplav commented Jan 26, 2023

@srstevenson:

Most of the notebook metadata doesn't change frequently, and so doesn't contribute to the noise in VCS diffs which nb-clean was written to address.

I would like to support the idea of this PR (can't yet assess its quality wrt. nb-clean coding standards). As I mentioned in #157 (comment), the notebook-level metadata actually do make noise in git commits.

For example, this metadata gets changed right after opening in VS Code without even running the notebook, and populated with things like conda environment name or some weird hash.

For me, the ability to filter out those is entirely in the spirit of nb-clean.

Example

Note: nb-clean was run after each step

1) initial state

 "metadata": {
  "language_info": {
   "name": "python"
  },

2) after opening in web browser and saving

 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },

3) after opening in VS Code and saving

 "metadata": {
  "kernelspec": {
   "display_name": "scratch",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  },
  "vscode": {
   "interpreter": {
    "hash": "8951b5bc4efd704e8ce7b1700f3d36aa2c5348526bfde6695033584eb49dd3bb"
   }
  }
 },

This is on the same machine using the same conda environment (!) Needless to say that if my colleague happens to open the notebook on his computer, edit some stuff, save and do a git commit with the nb-clean hook, the commit will include his edits PLUS several metadata changes.

Copy link

@haplav haplav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found some typos

@yasirroni
Copy link
Contributor Author

@haplav

"metadata": {
"language_info": {
"name": "python"
},

Since notebook metadata language_info is indeed needed, current PR remove this metadata. How about add other parameter such as --add-language-metadata NAME, with default name to Python?

@yasirroni
Copy link
Contributor Author

Another approach is to change this PR into --preserve-metadata language_info metadata2 metadata3. But, support of hierarchy of metadata sometimes are needed.

But, based on your example, the language_info is destroyed and replaced with codemirror_mode. That is another problem. And that is not what happen on my side. Here is my case:
Original:

 "metadata": {
  "language_info": {
   "name": "python"
  }

After VSCode open:

 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.8.10 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.8.10"
  },
  "vscode": {
   "interpreter": {
    "hash": "..."
   }
  }

@haplav
Copy link

haplav commented Feb 1, 2023

"metadata": {
"language_info": {
"name": "python"
},

Since notebook metadata language_info is indeed needed, current PR remove this metadata. How about add other parameter such as --add-language-metadata NAME, with default name to Python?

I don't understand @yasirroni, I don't mind having this in the cleaned notebook with hard-wired "name": "python". It seems rather unlikely this would need to be changed.

@haplav
Copy link

haplav commented Feb 1, 2023

Another approach is to change this PR into --preserve-metadata language_info metadata2 metadata3. But, support of hierarchy of metadata sometimes are needed.

I don't think we need to complicate things right now.

But, based on your example, the language_info is destroyed and replaced with codemirror_mode. That is another problem. And that is not what happen on my side. Here is my case: Original:

Ah OK, I think you read it wrong; language_info is NOT destroyed; codemirror_mode gets just ADDED to it.

@haplav
Copy link

haplav commented Feb 1, 2023

Please let GitHub apply suggestions and resolve discussions automatically as described in https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request

@yasirroni
Copy link
Contributor Author

Please let GitHub apply suggestions and resolve discussions automatically as described in https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/incorporating-feedback-in-your-pull-request

Did not know that feature exist. Sorry, Sir.

yasirroni and others added 2 commits February 3, 2023 10:45
Co-authored-by: Vaclav Hapla <[email protected]>
@yasirroni
Copy link
Contributor Author

Ah OK, I think you read it wrong; language_info is NOT destroyed; codemirror_mode gets just ADDED to it.

Yeah, but the structure changes, since name no longer the child of language_info.

"metadata": {
"language_info": {
"name": "python"
},

Since notebook metadata language_info is indeed needed, current PR remove this metadata. How about add other parameter such as --add-language-metadata NAME, with default name to Python?

I don't understand @yasirroni, I don't mind having this in the cleaned notebook with hard-wired "name": "python". It seems rather unlikely this would need to be changed.

But, I also works in Octave and octave-based notebook did not use that metadata. Here is a new notebook in Octave created using jupyterlab:

 "metadata": {
  "kernelspec": {
   "display_name": "Octave",
   "language": "octave",
   "name": "octave"
  }
 },

@yasirroni
Copy link
Contributor Author

I think, two options might be good.

Hard clean: like this PR.
Soft clean: like this PR, but skip for

"metadata": {
  "kernelspec": {
   "language": "..."
  },
  "language_info": {
   "name": "...",
  },
 },

@haplav
Copy link

haplav commented Feb 6, 2023

Yeah, but the structure changes, since name no longer the child of language_info.

No. Please, take a good look - name is contained in kernelspec and codemirror_mode but also still directly in language_info:

 "metadata": {
  "kernelspec": {
   ...
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "python",
    ...
   },
   ...
   "name": "python",
   ...
  },
  ...
 },

I think, two options might be good.

Once again, I'm fine with keeping just metadata/language_info/name (if already present). It seems to be the same in all cases you and I mentioned. But it doesn't make any sense to preserve kernelspec exactly because

But, I also works in Octave and octave-based notebook did not use that metadata. Here is a new notebook in Octave created using jupyterlab:

 "metadata": {
  "kernelspec": {
   "display_name": "Octave",
   "language": "octave",
   "name": "octave"
  }
 },

It's changed based on what interpreter the kernel uses. Keeping just empty metadata, as this PR already does, looks also fine to me. I don't think any interpreter would depend on having anything in there.

@yasirroni yasirroni closed this Feb 7, 2023
@yasirroni yasirroni reopened this Feb 7, 2023
@yasirroni
Copy link
Contributor Author

No. Please, take a good look - name is contained in kernelspec and codemirror_mode but also still directly in language_info:

@haplav oops. i'm sorry. It will be easier than.
_
Okay, maybe just let this PR as it is and later add feature to skip language_info

@haplav
Copy link

haplav commented Apr 11, 2023

@srstevenson any chance to get this merged?

@github-actions github-actions bot added the stale label Aug 21, 2023
@github-actions
Copy link

This PR was closed due to inactivity. Please reopen if still relevant.

@github-actions github-actions bot closed this Aug 29, 2023
@yasirroni
Copy link
Contributor Author

Any chance this can be re-opened? I'm willing to fix merge conflict. @srstevenson

yasirroni added a commit to yasirroni/nb-clean that referenced this pull request May 27, 2024
@srstevenson srstevenson removed the stale label May 27, 2024
@srstevenson
Copy link
Owner

#163 is still open with the same branch, so let's use that PR instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants