Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of string parameters #803

Merged
merged 3 commits into from
Feb 23, 2023
Merged

Improve handling of string parameters #803

merged 3 commits into from
Feb 23, 2023

Conversation

cameronwhite
Copy link
Contributor

@cameronwhite cameronwhite commented Feb 14, 2023

  • Add tests for filename and UTF8 string parameters / return values. By explicitly setting the G_FILENAME_ENCODING environment variable to an encoding other than UTF8, we can ensure that the tests fail if a filename string is incorrectly in UTF8 or UTF16 due to marshalling issues. On Windows this doesn't work since glib always uses UTF8, but at least some platforms will produce a failing test.
  • Introduce SafeHandle types for UTF8 and filename-encoded native strings
  • UTF8 string return types cannot use the default marshalling, since the wrong memory allocator can be used (bug Verify handling of string return parameters on Windows #791).
  • Filename parameters and return values are manually marshalled since they need to be converted to and from the filename encoding using g_filename_from_utf8()
  • UTF8 string parameters are also manually marshalled for better control over their lifetime. The default marshalling almost works for in parameters, but for the girtest_string_tester_utf8_return_transfer_none() test case the default marshalled string would be freed before the return value can be converted back to a managed string. For out parameters we'll likely need to manually marshal anyways for issue Verify handling of string return parameters on Windows #791.
  • Don't support callback return values that are a string with transfer=none. This doesn't work well since we need to return a newly-allocated UTF-8 string, which would then have no owner.
  • Update manual bindings for these changes

@cameronwhite cameronwhite force-pushed the fix/string-filename branch 3 times, most recently from 6c73fc4 to 989245c Compare February 19, 2023 03:41
By explicitly setting the G_FILENAME_ENCODING environment variable to an encoding other than UTF8, we can ensure that the tests fail if a filename string is incorrectly in UTF8 or UTF16 due to marshalling issues.
On Windows this doesn't work since glib always uses UTF8, but at least some platforms will produce a failing test.
@cameronwhite cameronwhite force-pushed the fix/string-filename branch 4 times, most recently from d47e8b9 to 5ae1ca5 Compare February 19, 2023 18:10
@cameronwhite
Copy link
Contributor Author

This is getting a bit large :) so I'm thinking of implementing inout and out string parameters in a subsequent PR
These changes handle in parameters and returning strings from functions and callbacks

@cameronwhite cameronwhite marked this pull request as ready for review February 19, 2023 18:30
Copy link
Member

@badcel badcel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The concept with the string handles is a great and future proof solution. In the generator I would like to keep the separation between the different string types.

@cameronwhite cameronwhite force-pushed the fix/string-filename branch 2 times, most recently from cc22e84 to ae04882 Compare February 22, 2023 05:26
@cameronwhite
Copy link
Contributor Author

Updated with the changes - the separate NonNullable and Nullable types work pretty well and make it easier to avoid missing nullable checks
The only complexity was needing to convert between the nullable and non-nullable type or vice versa in the implementaiton of the handles

@badcel
Copy link
Member

badcel commented Feb 22, 2023

The only complexity was needing to convert between the nullable and non-nullable type or vice versa in the implementaiton of the handles

I think I have a solution for this, which avoids the complexity and improves the usablity from an internal API point of view, too. I would suggest some renaming of the classes, too. Currently the handle class hierarchy looks like:

SafeHandle -> NullableUtf8StringHandle    -> NullableUtf8StringUnownedHandle
                                          -> NullableUtfStringOwnedHandle
           -> NonNullableUtf8StringHandle -> NonNullableUtf8StringUnownedHandle
           				  -> NonNullableUtf8StringOwnedHandle

My suggestion is to move it into such a structure:

SafeHandle -> OwnedUtf8StringHandle      -> NonNullableOwnedUtf8StringHandle
           -> UnownedUtf8StringHandle    -> NonNullableUnownedUtf8StringHandle

The OwnedUtf8StringHandle and UnownedUtf8StringHandle actually are nullable string handles. If some API accepts a "nullable owned utf8 string" it would only require an OwnedUtf8StringHandle. As NonNullableOwnedUtf8StringHandle derives from OwnedUtf8StringHandle the user is free to pass in either class.

On the other hand if an API requires a NonNullableOwnedUtf8StringHandle there is no way to pass in a OwnedUtf8StringHandle as it could contain null.

This is the complex code:

public static NullablePlatformStringOwnedHandle Create(string? s)
{
     var handle = new NullablePlatformStringOwnedHandle();
     if (s is not null)
     {
         var nonNullHandle = GLib.Internal.Functions.FilenameFromUtf8(GLib.Internal.NonNullableUtf8StringOwnedHandle.Create(s), -1, out _, out _, out var error);
         GLib.Error.ThrowOnError(error);

         // FilenameFromUtf8() returns a non-nullable handle, so transfer the result into our nullable handle.
         handle.SetHandle(nonNullHandle.DangerousGetHandle());
         nonNullHandle.SetHandleAsInvalid();
     }

     return handle;
 }

It could probably be rewritten to:

public static OwnedPlatformStringHandle Create(string? s)
{
    if(s is null)
        return OwnedPlatformStringHandle.CreateNullHandle(); 

    NonNullableOwnedPlatformStringHandle handle = GLib.Internal.Functions.FilenameFromUtf8(GLib.Internal.NonNullableUtf8StringOwnedHandle.Create(s), -1, out _, out _, out var error);
    GLib.Error.ThrowOnError(error);
    return handle;
 }

What do you think?

@cameronwhite
Copy link
Contributor Author

I think one oddity is that in parameters would now require having an OwnedHandle passed in, which is a bit weird since they just need a valid string and don't care about whether the handle owns its pointer?
In practice, the generated code is always calling OwnedPlatformStringHandle.Create() so it doesn't cause much of an issue.
But I think the ConvertToString() implementation for UnownedPlatformStringHandle would run into the problem of needing to "cast" itself to an OwnedHandle when calling FilenameToUtf8()?

@badcel
Copy link
Member

badcel commented Feb 22, 2023

I see your point: From C perspective the owned / unowned information is irrelevant. But if PInvoke calls the C function there will always be only a pointer. So we do not pass this information for the sake of C but to explain the GObject requirements for the call. In this case the requirement is: Please provide a pointer to a string, which is owned by the caller as the string will not be freed by the function.

With the current design we could produce memory leaks as we could pass unowned strings to functions which do not transfer ownership (e.g. by manually binding some API in the wrong way as it requires checking the documentation to do the right thing).

With the new design we sattisfy both needs:

  • Ensure that the nullablity requirements are met as it is not possible to pass a null pointer to a non nullable function
  • Ensure that ownership requirements are met as it is not possible to pass unowned strings to functions which do not take ownership and vice versa.

Do you agree or do I have some error in my thoughts?

@badcel
Copy link
Member

badcel commented Feb 22, 2023

Hm you are right regarding your UnownedPlatformStringHandle.ConvertToString hint.

What about a small workaround (redeclaring the g_filename_to_utf8 function):

[DllImport(ImportResolver.Library, EntryPoint = "g_filename_to_utf8")]
private static extern GLib.Internal.NonNullableOwnedUtf8StringHandle FilenameToUtf8(IntPtr ptr, long len, out nuint bytesRead, out nuint bytesWritten, out GLib.Internal.ErrorOwnedHandle error);

public string? ConvertToString()
{
    if (IsInvalid)
        return null;

    NonNullableOwnedUtf8StringHandle result = FilenameToUtf8(DangerousGetHandle(), -1, out _, out _, out var error);
    GLib.Error.ThrowOnError(error);
    return result.ConvertToString();
}

Edit: Ah perhaps I missed something? If there is a function which does not transfer ownership it actually does not care if an owned or unowned pointer is provided right? As both pointers have an owner. In case of the "owned pointer" it's us, in case of the "unowned pointer" it's someone else. But C does not care if the function is called who is the owner.

@badcel
Copy link
Member

badcel commented Feb 22, 2023

Hm I think it could work if there is some casting possible from "owned and freed by C#" to "unowned by C#, meaning it will not be freed anymore by C#". But I think this would be another pull request which must be evaluated carefully.

I think I would merge your PR in the first step do you agree?

And if the implementation is complete we / I can evaluate if it can be improved. I will collect thoughts in #811.

@cameronwhite
Copy link
Contributor Author

Yeah I think using #811 to figure out any improvements sounds good 👍

If there is a function which does not transfer ownership it actually does not care if an owned or unowned pointer is provided right? As both pointers have an owner. In case of the "owned pointer" it's us, in case of the "unowned pointer" it's someone else. But C does not care if the function is called who is the owner

That's correct, the C function doesn't care whether we own the string ourselves or if we got the string from some other C function that returned us an UnownedHandle, so allowing either would be more consistent with the C side of things
I don't think there is any danger of memory leaks here with either proposed API

The manual binding of FilenameTo/FromUtf8 might be the better approach for avoiding any awkwardness in the internal implementation of the handles, and then we can just worry about what the more public API is?

@badcel
Copy link
Member

badcel commented Feb 23, 2023

The manual binding of FilenameTo/FromUtf8 might be the better approach for avoiding any awkwardness in the internal implementation of the handles, and then we can just worry about what the more public API is?

you mean you add private FilenameTo/FromUtf8 functions into your current handle implementations which does only need IntPtr parameter to avoid conversion? I think this would be a good idea and then we merge this PR.

- UTF8 string return types cannot use the default marshalling, since the wrong memory allocator can be used (bug #791).
- Filename parameters and return values are manually marshalled since they need to be converted to and from the filename encoding using `g_filename_from_utf8()`
- UTF8 string parameters are also manually marshalled for better control over their lifetime. The default marshalling almost works, but for the `girtest_string_tester_utf8_return_transfer_none()` test case the default marshalled string would be freed before the return value can be used.
- Don't support callback return values that are a string with transfer=none. This doesn't work well since we need to marshal and allocate a new UTF-8 string, which would then have no owner.
@cameronwhite
Copy link
Contributor Author

Yep, I've added that in 👍

@badcel badcel merged commit 205c345 into main Feb 23, 2023
@badcel badcel deleted the fix/string-filename branch February 23, 2023 18:48
cameronwhite added a commit that referenced this pull request Feb 25, 2023
This is largely just a refactoring to make the manual bindings consistent with the generated bindings after #803, using the new string handle classes

For string return values this adds some more type safety over IntPtr, and fixes a couple nullable warnings. `FontOptions.Variations` should have actually been a nullable string, while `ToyFontFace.Family` is correct to be a non-nullable string.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants