Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add permissions proposal #689

Closed
wants to merge 6 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions designdocs/permissions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# proposal for permissions request


### Philosophy
The user-agent will always be closer to the user than a spec. The user agent has greater access to current conditions than we (the spec writers) ever will. As a philosophy we should give the user agent *as much freedom as possible to “do the right thing”* under whatever circumstances are at the time a web application is started.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be interpreted as meaning that we don't need privacy and security guidelines or related normative text.

I think it's possible for such text to be written in a way that allows applications can "do the right thing."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying this statement gives the user agent too much freedom? How about "As a philosophy we should give the user agent as much freedom as possible to do the right thing under whatever circumstances are at the time a web application is started while at the same time allowing applications to create a smooth workflow for their users"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it implies that such freedom is the priority over security and privacy. Specifically, "as much freedom as possible" could lead to this interpretation, but there should probably also be some explicit reference to being within the privacy and security guidelines/requirements.

nit: Although it's used in a reasonable way in isolation, the word "whatever" contributes to the laissez-faire tone of the statement. This can probably be fixed with: s/whatever circumstances are/the circumstances/


### scope

Currently we are having difficulties finding a solution that all parties can agree to. This is about the permissions for various properties in WebXR. This proposal addresses the following concerns

* what permissions are available to be asked for
* how granular are these permissions
* are the permissions granted at page load / application start
* or are the permissions granted at the time of use.

### stating concerns

Concerns about granting at page load time are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "page load" is the right timeframe/term. Maybe something like "when starting the experience."

Such a term can cover explicit request when the user clicks the "Enter VR", requests resulting from requestSession(), and (in the future) seamless immersive navigation scenarios. It might even be worth mentioning those - or breaking them up if there are differences.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change references to 'page load time' to 'request immersive session time'. Would that work for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since we'd like it to be possible for a page to request a "local" or "local-floor" reference space for an inline session, this will need to apply more generically.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the appropriate language here?

* users may be requested to approve multiple permissions one after another, creating click fatigue.
* if all apps request permissions at start, the user has no information to judge whether or not they should grant this, because they haven’t entered the experience yet.

Concerns about granting requests once already inside of immersive mode are:
* access requests when inside immersive mode should be done with some sort of a secure method, such as a dedicated hardware button, so that applications cannot spoof such permissions.
* some devices or scenarios may not be able to provide such a secure method, thus doing the request before entering immersive mode is preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another critical concern is that this can change the privacy properties, which the user agent may have carefully explained to the user when creating the session. This could lead to user confusion, frustration, etc.

As an example, if the "AR Mode" entry interstitial says "the application has access" then the application is allowed to request more access - the user may be confused or not understand the implications of combining those levels of access. Browser permission UI is not generally sufficient to explain such interactions, so the initial interstitial is a unique opportunity.

/cc @johnpallett @NellWaliczek

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still trying to read between the lines here and failing :-/

Are you saying that user agents MUST NOT request user consent both at the beginning and during a session? Even if the consent being requested mid-session is unrelated to the consent granted at the beginning?

I think the root of my confusion is that I'm struggling to differentiate between you're saying Chrome is planning to do vs. what y'all are saying must be requirements on all user agents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to be clear, I'm not taking a stance either way! I'm just trying to ensure I'm understanding your stance accurately)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, my comments are general about what the spec (and this doc) say, which are related to general guidance and requirements that apply to all user agents.

I'm just raising the concern so that it can be considered and addressed. My comment was not intended to imply any requirements (i.e., "MUST NOT request..."). While that is one possible solution, there may be others.

FYI, I filed #702 for related issues.

Concerns about both systems include:
* the user agents know more about the user, the current scenario, and can make better decisions *at the time of use* than a spec can. Neither method give the user agent freedom to adapt.
* there may be new attacks devised or new flaws discovered after we have shipped, but user agents will not be able to address these because they are constrained by the spec.

There are most likely more concerns than just what I have listed above.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One goal of session-based consent (for non-inline sessions) is to make the scope of that consent clear to the user and to avoid persisting consent (i.e., "permissions" - see @klausw's comment below) that could be used in future non-WebXR visits to that origin. Both of these were driving factors behind "AR Mode" - see https://github.com/immersive-web/privacy-and-security/blob/master/EXPLAINER.md#augmented-reality-mode

Copy link
Member

@NellWaliczek NellWaliczek Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, and I did read that when it was originally designed. But again, I'm confused about if "Instead of permissions, an alternative approach could be..." is intended as a requirement for all user agents or just an option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That repository's readme says, "The purpose of this repo is to explore those threat vectors and possible mitigations that may form the basis of the Privacy and Security Considerations sections for APIs related to the immersive web, as well as informing normative requirements for those APIs." The exact requirements for WebXR or AR modes are still TBD in the specifications in this and other repositories.

I believe that the AR Mode concept is critical to the experience and developers knowing what to expect, so there would likely be normative language about it for all user agents. As a simple example, the fact that consent is session-based, which means users may need to consent each time an AR session is requested. The API design could still allow for configuration as discussed elsewhere in this PR.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another goal is that developers should generally know what to expect across user agents (i.e., what may cause a prompt and when) and that the user experience should be relatively consistent across user agents.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there agreement on this point? From what I can tell, some UAs want to do just-in-time permissions but others want them upfront. As long as a single code path exists that will hit both, is there a specific reason UAs should be forced to conform?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of this proposal is to give UAs freedom while giving developers only a single path to code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the specification needs to make it clear to developers when there might be consent prompts so they can avoid patterns that may cause undesirable results. It would also help with compatibility if there are fewer ways things can vary across implementations.

### Solution
To address these I propose the following: *do both*. Let the developers have a single code path where they /request permissions at application start/ *and* when /the permission is actually needed/. The first request is considered the upper bounds of possible permissions. The user agent can decide whether to actually show user request dialogs at app time or when the permission is actually used, or some other scheme that the user agent deems to be better (such as warding off new attacks in the future).
Copy link
Contributor

@johnpallett johnpallett Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshmarinacci - during our previous conversation you'd suggested a more granular set of feature requests which I took to mean something like:

Required - needed for the session to run at all.
Optional - not blocking for the session to run, but desirable.
Deferrable - not required at session creation, but might be required depending upon what the user does

note: I made up those names because I couldn't remember exactly what you said, but some of this is covered in #424

The intent as I understood it was to give guidance to the user agent on the timing of when user consent events would occur, and whether or not a session should be created if the user said 'no' to certain features.

Do you want to add that detail here? Can a user agent make the decisions outlined in this paragraph without this type of information?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I made up those names

About terminology, have you considered using a term other than "permissions" since this is somewhat different from existing web permissions? Please ignore if you've discussed this already. Elsewhere we've used terms such as "consent" to distinguish it, especially if this is likely to be a per-session prompt instead of permanently granting access to a site.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting... that's not the interpretation I took... I thought by granular we meant something like "spatial-tracking", "real-world-understanding", "camera-access", "eye tracking", etc. The big question there is what are the buckets we should use? That said, if we do take the modularization approach, the good news is that we'd only need to define/ bikeshed the one that would cover XRPose and XRViewerPose data for now, yeah? Then, as we add other modules we can define the appropriate enum values.

The other interesting thing about this approach is that we have the opportunity to possibly integrate it with https://www.w3.org/TR/permissions/#enumdef-permissionname so that developers can request non-XR things up front as well! That would address the concerns about developers wanting to prompt for permissions for things like Microphone mid-immersive session :)

That said... I've been consistently getting the impression that folks from Google have specific concerns about the Permissions API given the number of times y'all have mentioned wanting to avoid the word "permissions" in favor of "user consent". Is there some background or context that I'm missing that makes it contentious?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, rereading @johnpallet's privacy doc, I'm realizing it's probably more like "spatial-tracking" and "spatial-tracking-unbounded" that would be needed for the core WebXR spec?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said... I've been consistently getting the impression that folks from Google have specific concerns about the Permissions API given the number of times y'all have mentioned wanting to avoid the word "permissions" in favor of "user consent". Is there some background or context that I'm missing that makes it contentious?

Nell, the privacy explainer talks about some of the concerns and distinctions in the permissions section,

I wouldn't necessarily call it contentious, but I think it's helpful to make a distinction between the more abstract concept of "how do we get informed user consent for using these features" from a specific implementation method as in "let's use the browser's existing Permissions API support for this". While the permissions API spec is fairly general, the practical implementations in web browsers tend to be based around persistently saved per-site choices, and as the privacy doc explains this isn't necessarily a good fit to some of the AR/VR use cases. I think it's better to avoid the word "permissions" at least during the design phase to avoid conflating these two interpretations.

Personally I'm in favor of using the Permissions API where it makes sense, but AFAIK it hasn't commonly been used for use cases such as per-session temporary permissions. That seems compatible with the spec by slightly bending the "new information about the user’s intent" rule as applying when a session ends, but I think that's not what people would typically have in mind when talking about web permissions, and it may need additional implementation work in browsers to generalize internal permission handling to make this work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, https://w3c.github.io/encrypted-media/#dom-mediakeysrequirement is a similar tristate requests for specific properties and https://w3c.github.io/encrypted-media/#navigator-extension-requestmediakeysystemaccess works to satisfy the various requested properties. This API was also intended to allow a single consent request that covered various properties and possible configurations. Something like that might be useful here, though it gets complicated.

I made a comment above similar to @klausw's. Please also see https://github.com/immersive-web/privacy-and-security/blob/master/EXPLAINER.md#augmented-reality-mode.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't await of the Permissions spec until now. Interesting. Would it's query() function violate our desire to avoid fingerprinting? As long as query() is called after the initial session request would it still be okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ddorwin Thanks for the pointer to the MediaKeysRequirement stuff! time to go fall down the rabbit hole of spelunking through a new spec!

Copy link
Member

@NellWaliczek NellWaliczek Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshmarinacci That's basically the question I'm asking. If it's ok for Navigator, would it be ok if there was a XRSession.Permissions.query()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the privacy design doc the term 'permission' was intentionally avoided in part to avoid inheriting the algorithms and requirements of the Permission API, and also because 'permissions' are an established concept in browsers. It's still not clear to me whether using the term 'permissions' leaves sufficient room for user consent flows such as the screenshot from @blairmacintyre (a dialog with multiple toggles), or user consent that is valid only for the duration of a session (vs. causing a more persistent state change). For those reasons the privacy design doc instead uses the term 'user consent' to describe the requirement.

@joshmarinacci my original question got a bit lost in this thread. Did you intend to have 'required', 'optional' and 'maybe later' as three different options when requesting features?


### Details
From the developer’s perspective it would work like this.

The developer specifies the total permissions they are likely to need during the entire run of the application when requesting the XR Session, using an array with pre-defined permissions constants. Something like this:

``` javascript
const perms = [
MICROPHONE,
IMMERSIVE_MODE,
HEADSET_POSE,
//CAMERA_DATA, developer decides not to use camera
]
navigator.xr.requestSession(perms).then(session => {
//start up the rest of the app
})
```

At this point the user agent can either prompt the user for these permissions, or merely use them as an upper bound of permissions that could potentially be requested later.


Later, when the application needs to actually use a permission, say the microphone, the developer will request this access through a promise based API. Ex:

```javascript
navigator.mediaDevices.getUserMedia(opts).then(stream => {
console.log("i have the microphone stream")
})
```

At this point the user agent can either prompt the user for the microphone permission, or immediately grant it if the user already approved it previously.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One caveat to this is that the timing of the application matters, and that cannot be known by the user agent. For example: Let's assume I have two applications that both use some plane geometry and camera access. AppA is an app where you place virtual furniture, and can get away with only having plane access until the user says they want to save a picture. AppB is tour guide app and both plane access and camera access are absolutely essential to it's function.

If we assume that the UA is capable of doing just-in-time prompts, AppA works out pretty well, in that the app starts up and asks "Planes, please?" Then waits until the user clicks the "Save image" button at which point the user is asked "Camera access: Cool, or nah?" User understands the context, clicks "Yes", everyone is happy.

However, with AppB the user first immediately sees a prompt saying "Planes!" then, upon clicking yes or no gets an immediate followup prompt says "Camera!" and now we're just pissing them off by making them click through multiple layers of prompts because the UA wanted to do the "right thing" of deferring permission till the time it was needed.

As such, we probably want the developer to be able to specify a list of permissions that MUST be requested prior to session creation, as well as permissions they MAY use later (which could then be requested up front based on UA capabilities if needed.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's funny.. This exact thing was something I used to be pretty worried about... But this is how the rest of the web works today, yeah? So I'm kinda less worried? A couple things to consider:

  • Even if we manage to do something like you've described for XR related permissions, the same problem can occur for non-XR permission prompts while in an immersive session. Who's to say a developer won't ask for microphone and the immediately ask for geolocation?
  • Alternatively, given that promises aren't resolved immediately, perhaps user agents have the option to be clever and also delay the display of a permission prompt until after control has returned to the user agent? In which case a developer invoking two permissions requests in the same function flow would cause that consolidation to occur, yes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same thing @johnpallett was asking about above. A way to specify if a particular permission is must vs may (required vs optional)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd actually asked whether there should be three options for feature requests: "required", "optional" and "later, not now" (the third one I don't have the right name for). @joshmarinacci this was based upon what I thought you described during the f2f.

The idea behind these three options as I understood them were:
Required: I need this. Otherwise, fail to create the session.
Optional: I don't necessarily need this, but please get consent now so I know what I have.
Later, Not Now: I don't necessarily need this, and I don't need to know now. You can ask later, or now, either works for me.

That way the user agent has a sense of when to ask for user consent: Required at session creation, Optional at session creation (if the platform supports the feature), 'Later not now' either at session creation, or when the feature is actually used, the choice being at the discretion of the user agent.


Note that if the application did not include MICROPHONE as the initial list of permissions to `requestSession` then the user agent *must* reject any later requests to `getUserMedia()`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This starts to affect other APIs and the permission infrastructure of implementations. That would probably require greater socialization and review. (The permissions experience on the web could definitely be better, and there are probably other efforts looking at that, but it's beyond the scope of WebXR.)


*The initial list of permissions is the maximum set of permissions the application may request throughout it’s lifetime.* This enables, say, a vr chat client to let the user enter a move and talk only mode and later request camera access because the user wants to share something, without having to bug the user initially; if the user agent supports immersive permission requests. On user agents which do not support immersive permission requests, for whatever reason (form factor, user prefs, domain name, phase of moon, etc.), then the request for camera can be done at application start.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"lifetime" is an important concept here - do you want to specifically say "duration of the session"?

If I understand it correctly, the application would be free to end the session and start a new one with a different set of desired permissions, while still preserving JS and graphics state. That has its own issues, i.e. a desktop VR setup may not offer an easy way to resume the session, but I think it's a potential alternative for developers. An app could have distinct modes where some features are only needed in specific circumstances, for example a VR sculpting app with optional copresence support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is totally a key thing we need to nail down.. From what I'm gathering there are 3 possibilities under consideration for the lifetime of a permission:

  • Session
  • Browsing context
  • Origin

The big question I have is... can this be something left to the user agent to decide?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these permissions are requested when obtaining the session, I think they should be for the lifetime of the session.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The privacy design doc suggests that for a given origin, user consent for a set of feature should last as long as the browsing context. This means the same origin doesn't have to repeatedly ask for user consent for the same set of features during the same browsing session. However, if the user closes the browser and restarts, then user consent would be required.



### Advantages
This system has two advantages over other approaches.
* the developer has a single code path that will always be followed. They will not need to write conditional code to work around different user agent authentication systems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Developers will likely want a method for determining if permission for a feature has already been granted prior to calling a potentially-permission-producing method. This is to allow for the common pattern of displaying a quick in-app explanation regarding why the user is about to see a dialog just prior to displaying the platform-level prompt. If the permission has already been granted at session start, however, they would end up showing a message saying "Hey! You're about to see something that looks like this, and here's why you should click 'Yes'" followed by nothing, which appear broken to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i meant to put that other comment here... moving it....

If we do take the approach of integrating with https://www.w3.org/TR/permissions/#enumdef-permissionname then we can also take advantage of the https://www.w3.org/TR/permissions/#dom-permissions-query API? Which to be honest, is interesting to me that this is queryable since for some reason I thought there were concerns about websites being able to differentiate between a user rejecting a permission and some other random reason a feature might not be available? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I thought this was a fingerprinting issue? If not then yes, let's have a query.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think of a decent way to use the query API as a fingerprint, but I'm happy to be corrected if someone else can. I'm tentatively in favor of allowing the query (because it will enable more progressive permissions--yay!)

* this gives the user agent maximum flexibility to adapt behavior to the current situation.

User agents could implement many different behaviors which would be perfectly valid under this spec.

* request all permissions of the user at start, as a series of prompts
* collapse all permissions into a single prompt
* auto-approve certain permissions based on domain (ex: anything from mycompany.com is always approved)
* auto-approve certain permissions when others are requested. (ex: if dev asks for camera, we might as well give them microphone too)
* handle some permissions using a trusted immersive method such as an un-spoofable hardware button, or a sigil; but then use pre-immersive granting for certain permissions that are deemed extra dangerous.
* ignore permissions for a certain amount of time when set to kiosk mode


### Open questions:
* if an application wishes to request permissions that were previously out of the initial list of permissions (say a user creates an account and now wishes to increase the initial permissions). We could advise the app to reload the page, forcing the user out of immersive mode.
* how do permissions transfer when jumping from one immersive page to another? Should they continue to be granted if on the same domain? Auto-shut off? Always leave immersive mode when requesting particular permissions that are extra dangerous?