Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

moving towards functional programming in sisl #665

Merged
merged 15 commits into from
Feb 13, 2024
Merged

Conversation

zerothi
Copy link
Owner

@zerothi zerothi commented Dec 20, 2023

This commit separates the geometry from the
global name-space, and puts it in a sub-module
for it self.
This enables one to more coherently add
more functionalities to the geometry, without
disrupting the entire code-base.

  • Closes #x
  • Added tests for new/changed functions?
  • Ran isort . and black . at top-level
  • Documentation for functionality in docs/
  • Changes documented in CHANGELOG.md

This is primarily a placeholder for future developments.

I think that this functional approach, much like numpy's, could be very instructive.
Methods could be added at will to the Geometry class, and patched as needed in user-codes.
I believe this functional approach would also make the passivation requests more useful as the documentation could be made simpler and more separated.

My main concerns right now are these:

  1. would it be appropriate for end-users to overwrite internal methods (possibly used by internal methods), i.e. currently one can register a translate in your own code, and then all internals would also use that one
  2. should all functions be prefixed with g_? Or something else? We don't want to type too much, but we should also consider that many of the methods also exists for Lattice.
  3. should the methods instead of having g_ prefixes, be a dispatch method based on the class of the first argument?
    I.e. translate is the exposed method, and then the real method is called based on the type of the first object.
  4. it will make it simpler to add functionality, but allow some to not be a inherent for the Geometry class

This is mainly for my own remembrance!

@zerothi
Copy link
Owner Author

zerothi commented Dec 20, 2023

Once #393 is merged, the code should be moved to sisl.geometry

@zerothi zerothi force-pushed the geometry-separation branch from da2f353 to f4e194e Compare December 20, 2023 21:19
Copy link

codecov bot commented Dec 20, 2023

Codecov Report

Attention: 44 lines in your changes are missing coverage. Please review.

Comparison is base (4f73db8) 87.62% compared to head (201298e) 86.77%.
Report is 5 commits behind head on main.

❗ Current head 201298e differs from pull request most recent head b0427f9. Consider uploading reports for the commit b0427f9 to get more accurate results

Files Patch % Lines
src/sisl/_core/_ufuncs_lattice.py 92.30% 10 Missing ⚠️
src/sisl/physics/state.py 59.09% 9 Missing ⚠️
src/sisl/_core/_ufuncs_geometry.py 96.90% 7 Missing ⚠️
src/sisl/physics/_ufuncs_state.py 92.70% 7 Missing ⚠️
src/sisl/_ufuncs.py 93.33% 4 Missing ⚠️
src/sisl/typing/_common.py 83.33% 3 Missing ⚠️
src/sisl/_core/_ufuncs_grid.py 97.56% 2 Missing ⚠️
src/sisl/_core/geometry.py 96.42% 1 Missing ⚠️
src/sisl/viz/processors/wavefunction.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #665      +/-   ##
==========================================
- Coverage   87.62%   86.77%   -0.85%     
==========================================
  Files         363      359       -4     
  Lines       48870    45739    -3131     
==========================================
- Hits        42820    39688    -3132     
- Misses       6050     6051       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pfebrer
Copy link
Contributor

pfebrer commented Dec 21, 2023

If I understand correctly, the goal is to have the methods of Geometry defined as pure functions and then plug them into the class so that they can also be used as object methods as it is done now, right?

Love it! This will make it much easier to use these functions as nodes :) And also to play with the composite geometries.

@zerothi
Copy link
Owner Author

zerothi commented Dec 21, 2023

If I understand correctly, the goal is to have the methods of Geometry defined as pure functions and then plug them into the class so that they can also be used as object methods as it is done now, right?

Love it! This will make it much easier to use these functions as nodes :) And also to play with the composite geometries.

Yes, that is the plan. Eventually, some of them should also be completely removed from the Geometry, I don't have a complete overview.

When we get to it, it would be valuable with some more feedback on the approach, see my "worries" section above.

@zerothi
Copy link
Owner Author

zerothi commented Jan 8, 2024

@pfebrer do you have anything to say about the way to implement things above? Should we use dispatch methods, or?

@pfebrer
Copy link
Contributor

pfebrer commented Jan 8, 2024

I think a dispatcher is nice to centralize everything, but it shouldn't be the way that users would be expected to use sisl. I say this because the documentation of dispatchers is really bad, specially the interactive documentation. I.e. when you open the parenthesis for a function call and the editor shows you how you should use the function, autocompletes arguments, etc... With the dispatcher they'll just see something like translate(obj, *args, **kwargs).

The type hinting is also not trivial to make it work, but it could be done.

However, it would be very nice to use the dispatcher internally. I would propose that the dispatcher exists, but when a class is registered to it, it also creates the corresponding method on the class. Some advantages that I can think of:

  • The prefix of the functions would not be important, you could just prepend the class name like _Geometry_translate.
  • The dispatcher would facilitate the handling of associated data as in Create an attribute class which can accept events #196. E.g. when you call translate on an object, you could call the same dispatcher on its associated data.
  • The method names for the same operation are standardized, since the dispatcher is responsible for creating them.
  • Daily usage of sisl would involve just calling methods, which is what users are used to in python and is very convenient because it doesn't require extra imports. The use of the dispatcher would be reserved for library code or more advanced usage, for users that already have a good understanding of sisl.

The creation of the methods could be done at runtime or on the source code if doing it at runtime causes a significant overhead.

@zerothi
Copy link
Owner Author

zerothi commented Jan 16, 2024

@pfebrer could you have a look now, I think this method seems appropriate.

It automatically creates the dispatch method, if needed, and exposes it (perhaps the way of exposing it should be done differently).
A method is registered through register_sisl_dispatch which uses the typing of the first argument to be the class that decides the dispatched method.

Hence one can now do:

sisl.tile(geometry, 2, 2)
# which is equivalent to
geometry.tile(2, 2)

Right now I have only done this for the tile method for Geometry, Grid and Lattice to make it simple. Many more routines needs adjusting if needed to be done, however, if we agree on this approach, I would suggest it'd be done quite fast as it prohibits other things to be done (changing file locations is a mess).

@zerothi zerothi force-pushed the geometry-separation branch from 3b00ab8 to fa65b5e Compare January 16, 2024 07:37
@@ -144,13 +144,13 @@
from numpy import argsort, dot, pi, sum

import sisl._array as _a
from sisl._core.grid import Grid

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
sisl._core.grid
begins an import cycle.
@@ -144,13 +144,13 @@
from numpy import argsort, dot, pi, sum

import sisl._array as _a
from sisl._core.grid import Grid
from sisl._core.lattice import Lattice

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
sisl._core.lattice
begins an import cycle.
@@ -353,11 +352,12 @@

# Since all these things depend on other elements
# we will simply import them here.
from sisl.physics.electron import wavefunction

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
sisl.physics.electron
begins an import cycle.
@pfebrer
Copy link
Contributor

pfebrer commented Jan 16, 2024

Looks awesome!

I just had a quick glance at it, later I will have a deeper look, but I have some doubts:

  1. Does core really need to be private?
  2. Shouldn't sparse_geometry go into core?
  3. I think nothing inside core should depend on things outside of core. This is what is causing all the import cycles that CodeQL warns about.

@zerothi
Copy link
Owner Author

zerothi commented Jan 16, 2024

Looks awesome!

I just had a quick glance at it, later I will have a deeper look, but I have some doubts:

1. Does `core` really need to be private?

I think so, I don't think users should get used to doing from sisl.core import Geometry, I think that would be counter-intuitive. I kind-of borrowed the idea from numpy about having the private _core, they have a core, but it is deprecated.

2. Shouldn't `sparse_geometry` go into core?

Agreed, and SparseCSR and more.

3. I think nothing inside `core` should depend on things outside of `core`. This is what is causing all the import cycles that CodeQL warns about.

Agreed!

The re-arrangements of things can be done later, that will be minor changes that won't affect end users.

@zerothi
Copy link
Owner Author

zerothi commented Jan 16, 2024

documentation looks fine, except that the Geometry.tile documentation lists the first (self) argument. A bit annoying...

@pfebrer
Copy link
Contributor

pfebrer commented Jan 16, 2024

I think so, I don't think users should get used to doing from sisl.core import Geometry, I think that would be counter-intuitive. I kind-of borrowed the idea from numpy about having the private _core, they have a core, but it is deprecated.

Ok 👍

The re-arrangements of things can be done later, that will be minor changes that won't affect end users.

Yes, my only worry is that once you rearrange things everything basically ends up inside core, which would make it superflous 😅

documentation looks fine, except that the Geometry.tile documentation lists the first (self) argument. A bit annoying...

I can probably fix this, I have a Master in modifying function signatures hahah

@zerothi
Copy link
Owner Author

zerothi commented Jan 16, 2024

Perhaps the idea would be to:

  • let _core classes and functions exist for end-users
  • let helper routines etc exist at the top-level in various private modules, for instance sisl._help is only used for internal purposes.
  • Perhaps you are right this is superfluous, my main motivation was to keep lattice and geometry together, since adding much more functionality to them would require much code at the top-level. Might not be too problematic, but... Well..


__all__ = ["Geometry", "sgeom"]
from .atom import Atom, Atoms

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
sisl._core.atom
begins an import cycle.

__all__ = ["Geometry", "sgeom"]
from .atom import Atom, Atoms
from .lattice import Lattice, LatticeChild

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
sisl._core.lattice
begins an import cycle.
__all__ = ["Geometry", "sgeom"]
from .atom import Atom, Atoms
from .lattice import Lattice, LatticeChild
from .orbital import Orbital

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
sisl._core.orbital
begins an import cycle.
from .utils.mathematics import fnorm
from sisl.utils.mathematics import fnorm

from .geometry import Geometry

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
sisl._core.geometry
begins an import cycle.
from sisl.utils.mathematics import fnorm

from .geometry import Geometry
from .lattice import BoundaryCondition, Lattice, LatticeChild

Check notice

Code scanning / CodeQL

Cyclic import Note

Import of module
sisl._core.lattice
begins an import cycle.
@pfebrer
Copy link
Contributor

pfebrer commented Jan 16, 2024

Do you think there can be a way to treat associated data in a general way?

E.g. imagine in Geometry you keep an _associated_data dictionary pointing to objects that are associated to the geometry. For example it could contain an array of forces wrapped in some custom class CartesianVector. You could define some Geometry.apply_ufunc method that first applies the method on the geometry and then on the associated data.

In some sense Geometry has already three associated objects: the coordinates, Atoms and Lattice. I guess you can't apply the ufunc independently on each of them because there is some interplay in most ufuncs.

But still for example for the tile function, you could wrap the coordinates in some CellCoordinates class and then in geometry just do:

def tile(geometry, ...):
      xyz = geometry.xyz.tile(geometry.lattice, ...)
      lattice = geometry.lattice.tile(...)
      atoms = geometry.lattice.tile(...)
      return geometry.__class__(xyz, atoms, lattice)

So in essence Geometry is just a wrapper that instructs its members to apply the ufunc on themselves. Instead of defining the method you could just define the order on which to apply the ufunc and if there are dependencies:

sisl.Geometry.associated_data = ["xyz", "lattice", "atoms"]

# For the translate ufunc just apply the ufunc on everything, the order doesn't matter
# Perhaps this isn't even needed and the ufunc dispatcher could directly look for associated 
# data if there is no implementation for Geometry.
register_ufunc_wrapper(sisl.Geometry, "translate")

# Define how to apply the tile ufunc on Geometry
register_ufunc_wrapper(sisl.Geometry, "tile", [
    ("xyz", ["lattice"]),
    ... # Meaning the rest, without a required order.
])

It doesn't need to be exactly like that, but you get the idea. Maybe it is too abstract, but I just put my ideas here because you might come up with a version of it that is less crazy :)

Comment on lines +48 to +47
def sub(
coefficient: Coefficient, index: IndexArgument, inplace: bool = False
) -> Optional[Coefficient]:

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.


@register_sisl_dispatch(module="sisl.physics")
def sub(state: State, index: IndexArgument, inplace: bool = False) -> Optional[State]:

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
Comment on lines +142 to +143
def sub(
statec: StateC, index: IndexArgument, inplace: bool = False
) -> Optional[StateC]:

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
@zerothi zerothi force-pushed the geometry-separation branch 2 times, most recently from ee13f5d to b4b04e3 Compare January 18, 2024 15:46
@zerothi
Copy link
Owner Author

zerothi commented Jan 18, 2024

Do you think there can be a way to treat associated data in a general way?

E.g. imagine in Geometry you keep an _associated_data dictionary pointing to objects that are associated to the geometry. For example it could contain an array of forces wrapped in some custom class CartesianVector. You could define some Geometry.apply_ufunc method that first applies the method on the geometry and then on the associated data.

In some sense Geometry has already three associated objects: the coordinates, Atoms and Lattice. I guess you can't apply the ufunc independently on each of them because there is some interplay in most ufuncs.

But still for example for the tile function, you could wrap the coordinates in some CellCoordinates class and then in geometry just do:

def tile(geometry, ...):
      xyz = geometry.xyz.tile(geometry.lattice, ...)
      lattice = geometry.lattice.tile(...)
      atoms = geometry.lattice.tile(...)
      return geometry.__class__(xyz, atoms, lattice)

So in essence Geometry is just a wrapper that instructs its members to apply the ufunc on themselves. Instead of defining the method you could just define the order on which to apply the ufunc and if there are dependencies:

sisl.Geometry.associated_data = ["xyz", "lattice", "atoms"]

# For the translate ufunc just apply the ufunc on everything, the order doesn't matter
# Perhaps this isn't even needed and the ufunc dispatcher could directly look for associated 
# data if there is no implementation for Geometry.
register_ufunc_wrapper(sisl.Geometry, "translate")

# Define how to apply the tile ufunc on Geometry
register_ufunc_wrapper(sisl.Geometry, "tile", [
    ("xyz", ["lattice"]),
    ... # Meaning the rest, without a required order.
])

It doesn't need to be exactly like that, but you get the idea. Maybe it is too abstract, but I just put my ideas here because you might come up with a version of it that is less crazy :)

Let's deal with that later. Right now this is already getting very complicated, it shouldn't affect this PR.

@zerothi
Copy link
Owner Author

zerothi commented Jan 18, 2024

@pfebrer you might know how to do this,

In commit b4b04e3 you can see I have disabled lots of documentation with type-hinting.

Basically, sphinx and the way the typehinting was enforced in my changes would not allow it to work.

The basic problem is that sphinx and singledispatch would forcefully track down the globally defined object that was type-annotated. However, this causes circular imports and thus prevents me from doing anything.

from __future__ import annotations won't work since the object will still not be available. I think there is no way around this but to remove the autodoc-typehints (as done in the mentioned commit) and fill in details better. When building the documentation locally, now I have issues with linking from the type-hinted names (e.g. AtomsArgument) to the actual entry in the documentation. Check out Geometry.translate which should show this problem (no link created).

I have seached around, and can find numerous bug reports on python, sphinx, sphinx-autodoc-typehints, and basically Python is at fault here due to ForwardRefs being fully resolved. So the other tools would require some hackery to fix it. I don't know how, but I also don't want type-hinting to be a major burden. So the current way (on this branch) seems the simplest to work-around these issues, however, it will have some problems for the build documentation.

@pfebrer
Copy link
Contributor

pfebrer commented Jan 20, 2024

Some day next week I'll try to build the documentation and see what it looks like.

By the way, a certain ufunc has common arguments that should work for all objects + some arguments that are specific to the object. I think it would be nice to define the interface of the ufunc for the common arguments, and add some documentation that indicates what the operation is expected to do in a generic way.

E.g. here:

Screenshot from 2024-01-20 12-57-01

I would like that the signature is something like:

tile(obj: T, reps: int, axis: int, **kwargs) -> T

And the docstring explains what the operation is expected to do in an abstract way. Maybe in the parameters section of the docstring you could have a list of what types obj support, but I'm not sure about that.

I believe that then we could expect that people really starts using sisl.tile(geometry, ...) instead of geometry.tile(...), since it would be more user-friendly. Also ufuncs could then be listed and documented in the HTML documentation.

@zerothi
Copy link
Owner Author

zerothi commented Jan 20, 2024

Some day next week I'll try to build the documentation and see what it looks like.

By the way, a certain ufunc has common arguments that should work for all objects + some arguments that are specific to the object. I think it would be nice to define the interface of the ufunc for the common arguments, and add some documentation that indicates what the operation is expected to do in a generic way.

E.g. here:

Screenshot from 2024-01-20 12-57-01

I would like that the signature is something like:

tile(obj: T, reps: int, axis: int, **kwargs) -> T

And the docstring explains what the operation is expected to do in an abstract way. Maybe in the parameters section of the docstring you could have a list of what types obj support, but I'm not sure about that.

I believe that then we could expect that people really starts using sisl.tile(geometry, ...) instead of geometry.tile(...), since it would be more user-friendly. Also ufuncs could then be listed and documented in the HTML documentation.

I would very much like that, however, the problem of inspecting the type-hints is what is causing circular imports, and hence I can't get this to work.
I tried the following patch, to no avail, only ForwardRef problems.

diff --git a/src/sisl/_ufuncs.py b/src/sisl/_ufuncs.py
index 3f2fdc622..71292988c 100644
--- a/src/sisl/_ufuncs.py
+++ b/src/sisl/_ufuncs.py
@@ -3,9 +3,10 @@
 # file, You can obtain one at https://mozilla.org/MPL/2.0/.
 from __future__ import annotations
 
+from collections import OrderedDict
 from functools import singledispatch
 from textwrap import dedent
-from typing import Callable, Optional
+from typing import Any, Callable, Dict, Optional, get_type_hints
 
 from sisl.messages import SislError, warn
 
@@ -52,6 +53,37 @@ def register_sisl_function(name, cls, module=None):
     return decorator
 
 
+def _args_intersection(args1: Dict, args2: Dict, skip: int = 1, order: bool = True):
+    """The intersection of two dictionaries containing typed arguments
+
+    Parameters
+    ----------
+    args1, args2 : Callable[..., Any]
+        the functions to compare and return the argument list from.
+        This function will only compare typed arguments (may change later)
+    skip: int
+        number of initial arguments to be skipped from the comparison
+    order: bool
+        whether the order of the arguments matter, or if they should just
+        be the overlap of arguments
+    """
+    args = OrderedDict()
+    args1 = _ordereddict_strip_first(args1, skip)
+    args2 = _ordereddict_strip_first(args2, skip)
+    if order:
+        for (n1, t1), (n2, t2) in zip(args1.items(), args2.items()):
+            if n1 == n2 and t1 == t2:
+                args[n1] = t1
+            else:
+                break
+    else:
+        inter = args1.keys() & args2.keys()
+        for n in inter:
+            if args1[n] == args2[n]:
+                args[n] = args1[n]
+    return args
+
+
 def _append_doc_dispatch(method: Callable, cls: type):
     """Append to the doc-string of the dispatch method retrieved by `method` that the `cls` class can be used"""
     global _registry
@@ -64,9 +96,47 @@ def _append_doc_dispatch(method: Callable, cls: type):
 
     # Append to doc string
     doc = (
-        f"\n{cls.__name__}.{name}: equivalent to '{name}({cls.__name__.lower()}, ...)'."
+        f"{cls.__name__}.{name}: equivalent to '{name}({cls.__name__.lower()}, ...)'.\n"
     )
     method_registry.__doc__ += doc
+    doc = method_registry.__doc__
+
+
+def _ordereddict_strip_first(d: Dict, skip: int):
+    out = OrderedDict()
+    for i, item in enumerate(d.items()):
+        if i < skip:
+            continue
+        out[item[0]] = item[1]
+    return out
+
+
+_param = "Parameters\n----------\n"
+_see_also = "See Also"
+
+
+def _args_to_doc(args):
+    return _param + "\n".join(map(lambda x: f"{x[0]}: {x[1]}", args.items()))
+
+
+def _patch_doc_parameters(method_registry):
+    """Amend the documentation by adding the parameters section, possibly overwriting the old one"""
+
+    doc = method_registry.__doc__
+    args = method_registry.__sisl_args__
+
+    # strip parameters section and put in a new one
+    iloc = [0, 0]
+    iloc[0] = doc.find(_param)
+    if iloc[0] >= 0:
+        iloc[1] = doc.find(_see_also)
+        doc = doc[: iloc[0]] + doc[iloc[1] :]
+
+    param_docs = _param + "cls:\n" + _args_to_doc(args) + "\n"
+
+    iloc = doc.find(_see_also)
+    doc = doc[:iloc] + param_docs + "\n" + doc[iloc:]
+    method_registry.__doc__ = doc
 
 
 def register_sisl_dispatch(
@@ -79,7 +149,7 @@ def register_sisl_dispatch(
     global _registry
     if method is None and module is None:
         return lambda f: register_sisl_dispatch(cls, method=f)
-    elif method is None:
+    if method is None:
         if isinstance(module, str):
             return lambda f: register_sisl_dispatch(cls, module, method=f)
         return register_sisl_dispatch(cls, method=module)
@@ -104,6 +174,10 @@ def register_sisl_dispatch(
         method_registry.__doc__ = doc
         method_registry.__name__ = name
         method_registry.__module__ = "sisl"
+        method_registry.__sisl_args__ = _ordereddict_strip_first(
+            get_type_hints(method), 1
+        )
+
         _registry[name] = singledispatch(method_registry)
 
     # Retrieve the dispatched method
@@ -121,6 +195,12 @@ def register_sisl_dispatch(
         _append_doc_dispatch(method, cls)
         register_sisl_function(name, cls)(method)
 
+    # retrieve the overlapping arguments
+    args = get_type_hints(method)
+    args = _args_intersection(args, method_registry.__sisl_args__)
+    method_registry.__sisl_args__ = args
+    _patch_doc_parameters(method_registry)
+
     return method

It simply results in:

/opt/gnu/12.3.0/python/packages/3.11/sisl-dev/0/lib/python3.11/site-packages/sisl/__init__.py:114: in <module>
    from ._core import *
/opt/gnu/12.3.0/python/packages/3.11/sisl-dev/0/lib/python3.11/site-packages/sisl/_core/__init__.py:8: in <module>
    from . import _ufuncs_geometry, _ufuncs_grid, _ufuncs_lattice, geometry, grid, lattice
/opt/gnu/12.3.0/python/packages/3.11/sisl-dev/0/lib/python3.11/site-packages/sisl/_core/_ufuncs_geometry.py:15: in <module>
    from sisl.typing import AtomsArgument, Coord, CoordOrScalar, LatticeOrGeometryLike
/opt/gnu/12.3.0/python/packages/3.11/sisl-dev/0/lib/python3.11/site-packages/sisl/typing/__init__.py:6: in <module>
    from ._common import *
/opt/gnu/12.3.0/python/packages/3.11/sisl-dev/0/lib/python3.11/site-packages/sisl/typing/_common.py:39: in <module>
    sisl.Atom,
E   AttributeError: partially initialized module 'sisl' has no attribute 'Atom' (most likely due to a circular import)
!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!
============================== 1 error in 0.67s ==============================

which is the problem mentioned above.

Lastly, with the current state, the doc will look like this (and they are added in a new segment called Functional (to be changed).

tile_new

I think the above looks kind of OK.

@pfebrer
Copy link
Contributor

pfebrer commented Jan 20, 2024

I don't understand, why do you need l inspect the typehints to define a generic signature for tile?

I think tile could be defined in _ufuncs.py with a given signature and a given docstring, and then you should register specific cases of tile for the different objects. E.g. in _ufuncs_geometry.py:

@tile.register(Geometry)
def tile(...):
    ...

Then tile could check that the registered methods adhere to the generic signature.

Creating dispatchers on the fly is also a good feature but I think the ufuncs provided as "builtins" by sisl could benefit from enforcing standards and having documentation for what the ufunc is expected to do.

@zerothi
Copy link
Owner Author

zerothi commented Jan 21, 2024

I don't understand, why do you need l inspect the typehints to define a generic signature for tile?
I don't, but it could be done automatically! :)

I think tile could be defined in _ufuncs.py with a given signature and a given docstring, and then you should register specific cases of tile for the different objects. E.g. in _ufuncs_geometry.py:

@tile.register(Geometry)
def tile(...):
    ...

Then tile could check that the registered methods adhere to the generic signature.
Even this would be problematic, I can't check type-hints, only argument names and defaults, due to the circular imports.

Creating dispatchers on the fly is also a good feature but I think the ufuncs provided as "builtins" by sisl could benefit from enforcing standards and having documentation for what the ufunc is expected to do.

Agreed, but on the other hand, you would still have to read the documentation for Geometry.tile vs State.tile since they do different things (and some similar things).

I think for now this is ok, adding this later is just a matter of defining the ufuncs the first time and change the API. So it can easily be fixed. Probably the best thing to do now would be to finalize the methods that needs moving to the functional paradigm. Then we can always take it from there.

@pfebrer
Copy link
Contributor

pfebrer commented Jan 21, 2024

Wouldn't the import problem disappear if you first import the objects (classes) and then the functions on sisl/_core/__init__.py? The problem is that Atom is unavailable because it still hasn't been imported when you try to import the ufuncs, no?

Anyways I think it makes more sense to import them in this order: objects, then functions. Because functions make no sense without objects 😅

@zerothi zerothi mentioned this pull request Jan 23, 2024
10 tasks
@pfebrer
Copy link
Contributor

pfebrer commented Jan 25, 2024

By the way, does this functional paradigm support inheritance? Should it? E.g. if you do:

class MyGeometry(Geometry):
    ...

geom = MyGeometry(...)

sisl.tile(geom)

Does it work?

@zerothi
Copy link
Owner Author

zerothi commented Jan 25, 2024

By the way, does this functional paradigm support inheritance? Should it? E.g. if you do:

class MyGeometry(Geometry):
    ...

geom = MyGeometry(...)

sisl.tile(geom)

Does it work?

Yes, check this out:

from functools import singledispatch

class A:
    pass

class B(A):
    pass
class C(B):
    pass
class D(C):
    pass


@singledispatch
def test(a):
    print('P', type(a))

@test.register(A)
def _(a):
    print('A', type(a))
@test.register(B)
def _(a):
    print('B', type(a))
@test.register(C)
def _(a):
    print('C', type(a))


test(A())
test(B())
test(C())
test(D())

which works as expected. :)

This was high on the list, otherwise it wouldn't have been feasible.

However, there are still some things that are annoying:

  • classmethods are near-impossible to make work, it requires a hack in the register_sisl_dispatch
  • I would have loved a way for it to also behave depending on the content of classes, for instance Hamiltonian functions may depend on spin content. However, this requires another way as well.

@pfebrer
Copy link
Contributor

pfebrer commented Jan 25, 2024

Great!

I would have loved a way for it to also behave depending on the content of classes, for instance Hamiltonian functions may depend on spin content. However, this requires another way as well.

Hmm that leads into thinking that the right approach would be to have a separate class for each spin type matrix 😅 I know, crazy, but that is what I thought hahah

@pfebrer
Copy link
Contributor

pfebrer commented Jan 31, 2024

Hey, while there aren't standard signatures for ufuncs, could the default signature be (obj, *args, **kwargs) instead of (*args, **kwargs). I think it would make sense, since the first argument must always be there and it is the most important one.

obj could also be mentioned in the docstring to clarify that its type will actually determine which function will be used.

@zerothi
Copy link
Owner Author

zerothi commented Feb 1, 2024

Good idea, however I played a bit with it and the documentation for the objects methods turns out to be confusing. They will mention and object argument that isn't part of the api.
So perhaps in a notes section.

@zerothi
Copy link
Owner Author

zerothi commented Feb 13, 2024

do we have more details here, or should we start with getting this in? @pfebrer have you played with the docs here?

@pfebrer
Copy link
Contributor

pfebrer commented Feb 13, 2024

I haven't looked at the docs, I thought everyhting was solved now. What are the things that are left to fix?

Apart from it I think it is good to just merge this, since for now it doesn't result in any change on the API. One thing that I believe would be very important to do later is to define a generic description and interface for sisl ufuncs, since otherwise there is nothing "universal" about them :)

@zerothi
Copy link
Owner Author

zerothi commented Feb 13, 2024

I haven't looked at the docs, I thought everyhting was solved now. What are the things that are left to fix?
Ok, great. I will probably merge as soon as I get the tests running on my new comp...

Apart from it I think it is good to just merge this, since for now it doesn't result in any change on the API. One thing that I believe would be very important to do later is to define a generic description and interface for sisl ufuncs, since otherwise there is nothing "universal" about them :)
Ok, but I don't think we'll ever get them to be universal, the interfaces will largely be very different. Let's see. :)

I hope this may be of use to somebody ;)

@pfebrer
Copy link
Contributor

pfebrer commented Feb 13, 2024

Ok, but I don't think we'll ever get them to be universal, the interfaces will largely be very different.

If not the interface, at least the description. There is some reason for the functions to share a common name, so that could be explained.

This commit separates the geometry from the
global name-space, and puts it in a sub-module
for it self.
This enables one to more coherently add
more functionalities to the geometry, without
disrupting the entire code-base.

Signed-off-by: Nick Papior <[email protected]>
Now all sisl core functionality should be
placed in the _core module.
This should adapt the needs for future additions
more easily and aid in the modularization of the code structure.

As it is, nothing needs to be changed in terms of usage.
One thing that does make it simpler to use is that
the aim of sisl is to make the methods that expose a functional
approach be a dispatch method.

For instance:

def tile(geometry: Geometry, ...)
def tile(lattice: Lattice, ...)

are now defined once, and then attached to the
appropriate class of the same name.
Simultaneously the method is hosted in a dispatch
form so that: sisl.tile(geom, ...) can be done.
A call sisl.tile(0, ...) will result in a SislError
since the first argument has not been registered.

As such, one cannot really see anything.

To gain access to the direct method (with appropriate
documentation), simply do:

  geometry_tile = sisl.tile.dispatch(sisl.Geometry)

Signed-off-by: Nick Papior <[email protected]>
These accommodations required an upgrade of
the typing system.

Sadly there is a *bug*/*feature-request*
that broke >=3.9 with missing names.
It is related to this:
 https://bugs.python.org/issue41987

Basically the AtomsArgument can't be used
in the type-hinting because the ForwardRef lookup
tries to resolve it, regardless of whether we are
in a TYPE_CHECKING state or not. Hence it *must*
be defined before we can use it.

This happens, even if __future__ annotations
has been imported.

This forces us to define them via strings manually,
*sigh*.

Signed-off-by: Nick Papior <[email protected]>
Added for the physics/state.py code.

Also fixed cases where derived classes got a method
assigned through the dispatch method.

These were wrongly warned before. Now they will be ignored.

Signed-off-by: Nick Papior <[email protected]>
Sadly, the register function of singledispatch does
not handle the ForwardRefs very elegantly
as the objects references *needs* to be available
in the global space.
This is rather unfortunate as this is exactly what
we are trying to avoid using __future__ annotations.

I'll have to see later if this causes issues in the
documentation build.

Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Added overwriting class name for the ufunc
registration.

Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
Signed-off-by: Nick Papior <[email protected]>
@zerothi zerothi force-pushed the geometry-separation branch from b0c91f5 to b0427f9 Compare February 13, 2024 20:53

from .sparse import *
from .sparse_geometry import *
from ._core import *

Check notice

Code scanning / CodeQL

'import *' may pollute namespace Note

Import pollutes the enclosing namespace, as the imported module
sisl._core
does not define '__all__'.
@@ -5,8 +5,8 @@

import sisl._array as _a
from sisl import SislError
from sisl._core.sparse import _rows_and_cols

Check notice

Code scanning / CodeQL

Unused import Note

Import of '_rows_and_cols' is not used.

from functools import singledispatch
from textwrap import dedent
from typing import Callable, Optional

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'Callable' is not used.
from itertools import product
from math import acos
from numbers import Integral, Real
from pathlib import Path
from typing import TYPE_CHECKING, Iterator, List, Optional, Sequence, Tuple, Union
from typing import Iterator, List, Optional, Sequence, Tuple, Union

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'List' is not used.

if scale_atoms:
# Atoms are rescaled to the maximum scale factor
atoms = geometry.atoms.scale(max_scale)

Check failure

Code scanning / CodeQL

Potentially uninitialized local variable Error

Local variable 'max_scale' may be used before it is initialized.
@zerothi zerothi merged commit d55f5c9 into main Feb 13, 2024
5 of 6 checks passed
@zerothi zerothi deleted the geometry-separation branch February 13, 2024 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants