Self-Review Questionnaire: Security and Privacy

1. Introduction

Adding features to the web is a tricky thing; on the one hand, we want to provide developers with access to all the things they need in order to build amazing experiences. On the other, we need to ensure that we don’t accidentally hand over too much power to malicious folks who could abuse it, or unintentionally expose people’s private data without adequate controls. Ideally, careful review of every specification we publish will allow us to strike the right balance.

Working groups can (and should) begin this review process early, of course. It’s easy to mitigate risks to users on the web before a feature is finalized and shipped in user agents. Changing APIs or introducing restrictions becomes nigh impossible once the web begins to depend on a particular implementation.

This document encourages early review by posing a number of questions that you as a individual reader of a specification can ask—and that working groups and spec editors might consider themselves, before asking for more formal review. The intent is to highlight areas which have historically had interesting implications on a user’s security or privacy, and thereby to focus the editor’s attention and working group’s attention and reviewers' attention on areas that might previously have been overlooked.

Note: Answering these questions obviously doesn’t constitute "wide review" in and of itself, but could provide a helpful basis of understanding upon which future reviewers can build.

2. Threat Models

"Security" and "Privacy" are big concepts. In order to pare them down to something which could feasibly guide working groups' decisions, let’s consider the types of threats to both which the web makes possible:

2.1. Passive Network Attackers

A passive network attacker has read-access to the bits going over the wire between users and the servers they’re communicating with. She can’t modify the bytes, but she can collect and analyze them.

Due to the decentralized nature of the internet, and the general level of interest in user activity, it’s reasonable to assume that practically every unencrypted bit that’s bouncing around the network of proxies, routers, and servers you’re using right now is being read by someone. It’s equally likely that some of these attackers are doing their best to understand the encrypted bits as well (though that requires significantly more effort).

The IETF’s "Pervasive Monitoring Is an Attack" document [RFC7258] is useful reading, outlining some of the impacts on privacy that this assumption entails.
Governments aren’t the only concern; your local coffee shop is likely to be gathering information on its customers, your ISP at home is likely to be doing the same.

2.2. Active Network Attackers

An active network attacker has both read- and write-access to the bits going over the wire between users and the servers they’re communicating with. She can collect and analyze data, but also modify it in-flight, injecting and manipulating JavaScript and HTML at will. This is more common than you might expect, for both benign and malicious purposes:

ISPs and caching proxies regularly cache and compress images before delivering them to users in an effort to reduce data usage. This can be especially useful for users on low-bandwidth, high-latency devices like phones.
ISPs also regularly inject JavaScript [COMCAST] and other identifiers [VERIZON] for less benign purposes.
If your ISP is willing to modify substantial amounts of traffic flowing through it for profit, it’s difficult to believe that state-level attackers will remain passive.

2.3. Same-Origin Policy Violations

The same-origin policy is the cornerstone of security on the web; one origin should not have direct access to another origin’s data (the policy is more formally defined in Section 3 of [RFC6454]). A corollary to this policy is that an origin should not have direct access to data that isn’t associated with any origin: the contents of a user’s hard drive, for instance. Various kinds of attacks bypass this protection in one way or another. For example:

Cross-site scripting attacks involve an attacker tricking an origin into executing attacker-controlled code in the context of a target origin.
Cross-site request forgery attacks trick user agents into exerting a user’s ambient authority on sites where they’ve logged in by submitting requests on their behalf.
Data leakage occurs when bits of information are inadvertantly made available cross-origin, either explicitly via CORS headers [CORS], or implicitly, via side-channel attacks like [TIMING].

2.4. Third-Party Tracking

Flesh this out. <https://github.com/w3ctag/security-questionnaire/issues/7>

3. Questions to Consider

3.1. Does this specification deal with personally-identifiable information?

Personally-identifiable information (PII) includes a large swath of data which could be used on its own, or in combination with other information, to identify a single person. The exact definition of what’s considered PII varies from jurisdiction to jurisdiction, but could certainly include things like a home address, an email address, birthdates, usernames, fingerprints etc. Wikipedia has a fairly good description at [PII].

If the specification under consideration exposes PII to the web, it’s important to consider ways to mitigate the obvious impacts. For instance:

A feature which uses biometric data (fingerprints or retina scans) could refuse to expose the raw data to the web, instead using the raw data only to unlock some origin-specific and ephemeral secret and transmitting that secret instead.
User mediation could be required, in order to ensure that no data is exposed without a user’s explicit choice (and hopefully understanding).

3.2. Does this specification deal with high-value data?

Data which isn’t personally-identifiable can still be quite valuable. Sign-in credentials (like username/password pairs, or OAuth refresh tokens) can be extrememly powerful in the wrong hands, as can financial instruments like credit card data. Making this data available to JavaScript, for instance, could expose it to XSS attacks and active network attackers who could inject code to read and exfiltrate the data. For instance:

Credential Management [CREDENTIAL-MANAGEMENT] allows sites to request a user’s credentials from a user agent’s password manager in order to sign the user in quickly and easily. This opens the door for abuse, as a single XSS could expose user data trivially to JavaScript. They mitigate the risk by only offering the username and password as an opaque FormData object which cannot be directly read by JavaScript, and strongly suggest that authors use Content Security Policy [CSP] with resonable connect-src and form-action values to further mitigate the risk of exfiltration.

3.3. Does this specification introduce new state for an origin that persists across browsing sessions?

For example:

Service Worker [SERVICE-WORKERS] intercept all requests made by an origin, allowing sites to function perfectly even when offline. A maliciously-injected service worker, however, would be devastating (as documented in that spec’s security considerations section). They mitigate the risks an active network attacker or XSS vulnerability present by requiring an encrypted and authenticated connection in order to register a service worker.
Platform-specific DRM implementations might expose origin-specific information in order to help identify users and determine whether they ought to be granted access to a specific piece of media. These kinds of identifiers should be carefully evaluated to determine how abuse can be mitigated; identifiers which a user cannot easily change are very valuable from a tracking perspective, and protecting the identifiers from an active network attacker is an important concern.
Cookies, ETag, Last Modified, Local Storage, Indexed DB, etc. all allow an origin to store information about a user, and retrieve it later, directly or indirectly. User agents mitigate the risk that these kinds of storage mechanisms will form a persistent identifier by offering users the ability to wipe out the data contained in these types of storage.

3.4. Does this specification expose persistent, cross-origin state to the web?

For example:

The GL_RENDERER string exposed by some WebGL implementations improves performance in some kinds of applications, but does so at the cost of adding persistent state to a user’s fingerprint. These kinds of device-level details should be carefully weighed to ensure that the costs are outweighed by the benefits.
The NavigatorPlugins list exposed via the DOM practically never changes for most users. Some user agents have taken steps to reduce the entropy introduced by disallowing direct enumeration of the plugin list.

3.5. Does this specification expose any other data to an origin that it doesn’t currently have access to?

As noted above in §2.3 Same-Origin Policy Violations, the same-origin policy is an important security barrier that new features need to carefully consider. If a specification exposes details about another origin’s state, or allows POST or GET requests to be made to another origin, the consequences can be severe.

Content Security Policy [CSP] unintentionally exposed redirect targets cross-origin by allowing one origin to infer details about another origin through violation reports (see [HOMAKOV]). The working group eventually mitigated the risk by reducing a policy’s granularity after a redirect.
Beacon [BEACON] allows an origin to send POST requests to an endpoint on another origin. They decided that this feature didn’t add any new attack surface above and beyond what normal form submission entails, so no extra mitigation was necessary.

3.6. Does this specification enable new script execution/loading mechanisms?

HTML Imports [HTML-IMPORTS] create a new script-loading mechanism, using link rather than script, which might be easy to overlook when evaluating an application’s attack surface. The working group notes this risk, and ensured that they required reasonable interactions with Content Security Policy’s script-src directive.
New string-to-script mechanism? (e.g. `eval()` or `setTimeout([string], ...)`)
What about style?

3.7. Does this specification allow an origin access to a user’s location?

A user’s location is highly-desirable information for a variety of use cases. It is also, understandably, information which many users are reluctant to share, as it can be both highly identifying, and potentially creepy. New features which make use of geolocation information, or which expose it to the web in new ways should carefully consider the ways in which the risks of unfettered access to a user’s location could be mitigated. For instance:

Geolocation information can serve many use cases at a much less granular precision than the user agent can offer. For instance, a resturaunt recommendation can be generated by asking for a user’s city-level location rather than a position accurate to the centimeter.
A recent Geofencing proposal [GEOFENCING] ties itself to service workers and therefore to encrypted and authenticated origins.

3.8. Does this specification allow an origin access to sensors on a user’s device?

TODO.

3.9. Does this specification allow an origin access to aspects of a user’s local computing environment?

(e.g. screen sizes, installed fonts, installed plugins, bluetooth or network interface identifiers)?

TODO.

3.10. Does this specification allow an origin access to other devices?

Specifically, it’s interesting whether or not this specification allows access to devices on a user’s local network that would be otherwise inaccessible to a web origin. In particular, connection via Bluetooth and USB should be carefully evaluated to avoid exposing devices to the web that aren’t created with the web in mind; doing so has security implications, as these devices may not be hardened against malicious input as well as they should be.

The Network Service Discovery API [DISCOVERY] recommends CORS preflights before granting access to a device, and requires user agents to involve the user with a permission request of some kind. The spec’s Security and privacy considerations" section has more details.
Likewise, the Web Bluetooth [BLUETOOTH] has an extensive discussion of "Security and privacy considerations", which is worth reading as an example for similar work.

3.11. Does this specification allow an origin some measure of control over a user agent’s native UI?

(showing, hiding, or modifying certain details, especially if those details are relevant to security)?

TODO.

3.12. Does this specification expose temporary identifiers to the web?

(e.g. TLS features like Channel ID, session identifiers/tickets, etc)?

TODO.

3.13. Does this specification distinguish between behavior in first-party and third-party contexts?

Section 2.1 of [FIRST-PARTY-ONLY] defines "first-party" in line with existing browser behavior (Chrome and Firefox).

3.14. How should this specification work in the context of a user agent’s "incognito" mode?

Ideally, the feature would work in such a way that the website would not be able to determine that the user was in "incognito".
Less ideally, the feature wouldn’t work, but the website still wouldn’t be able to distinguish "incognito" from simply being denied permission to use the feature (for instance).
Unideally, the feature wouldn’t exist at all in "incognito", which means that the user wouldn’t be exposing data, but the website can probably tell that the user is in that state.

3.15. Does this specification persist data to a user’s local device?

How should user agent’s "Clear browsing data" functionality work with this data? Are there caches that the user agent needs to be particularly careful with?

3.16. Does this specification have a "Security Considerations" and "Privacy Considerations" section?

Interesting features added to the web platform generally have security and/or privacy impacts. Documenting the various concerns and potential abuses in "Security Considerations" and "Privacy Considerations" sections of a document is a good way to help implementers and web developers understand the risks that a feature presents, and to ensure that adequate mitigations are in place.

If it seems like a feature does not have security or privacy impacts, then say so inline in the spec section for that feature:

There are no known security or privacy impacts of this feature.

Saying so explicitly in the specification serves several purposes:

Shows that a spec author/editor has explicitly considered security and privacy when designing a feature.
Provides some sense of confidence that there are no such impacts.
Challenges security and privacy minded individuals to think of and find even the potential for such impacts.
Demonstrates the spec author/editor’s receptivity to feedback about such impacts.

3.17. Does this specification allow downgrading default security characteristics?

document.domain
[CORS]
[WEBMESSAGING]
referrer unsafe-always

4. Mitigation Strategies

4.1. Secure Contexts

In the presence of an active network attacker, offering a feature to an insecure origin is the same as offering that feature to every origin (as the attacker can inject frames and code at will). Requiring an encrypted and authenticated connection in order to use a feature can mitigate this kind of risk.

4.2. Explicit user mediation

If a feature has privacy or security impacts that are endemic to the feature itself, then one valid strategy for exposing it to the web is to require user mediation before granting an origin access. For instance, [GEOLOCATION-API] reveals a user’s location, and wouldn’t be particularly useful if it didn’t; user agents generally gate access to the feature on a permission prompt which the user may choose to accept.

Designing such prompts is difficult. Choosers are good. Walls of text are bad.

Bring in some of felt@'s ideas here.

4.3. Drop the feature

One way to mitigate the risks that a feature presents is to remove it from a specification.

The easiest way to mitigate potential negative security or privacy impacts of a feature, and even discussing the possibility, is to drop the feature.

Every feature in a spec should be considered guilty (of harming security and/or privacy) until proven otherwise. Every specification should seek to be as small as possible, even if only for the reasons of reducing and minimizing security/privacy attack surface(s).

By doing so we can reduce the overall security (and privacy) attack surface of not only a particular feature, but of a module (related set of features), a specification, and the overall web platform.

Ideally this is one of many motivations to reduce each of those to the minimum viable:

Minimum viable feature: cut/drop values, options, or optional aspects.
Minimum viable web format/protocol/API: cut/drop a module, or even just one feature.
Minimum viable web platform: Cut/drop/obsolete entire specification(s).

Move Tantek’s thoughts somewhere. They don’t really fit well here, though the sentiment of a minimum viable web platform might fit into some other TAG finding.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Abstract

Status of This Document

Table of Contents

1. Introduction

2. Threat Models

2.1. Passive Network Attackers

2.2. Active Network Attackers

2.3. Same-Origin Policy Violations

2.4. Third-Party Tracking

3. Questions to Consider

3.1. Does this specification deal with personally-identifiable information?

3.2. Does this specification deal with high-value data?

3.3. Does this specification introduce new state for an origin that persists across browsing sessions?

3.4. Does this specification expose persistent, cross-origin state to the web?

3.5. Does this specification expose any other data to an origin that it doesn’t currently have access to?

3.6. Does this specification enable new script execution/loading mechanisms?

3.7. Does this specification allow an origin access to a user’s location?

3.8. Does this specification allow an origin access to sensors on a user’s device?

3.9. Does this specification allow an origin access to aspects of a user’s local computing environment?

3.10. Does this specification allow an origin access to other devices?

3.11. Does this specification allow an origin some measure of control over a user agent’s native UI?

3.12. Does this specification expose temporary identifiers to the web?

3.13. Does this specification distinguish between behavior in first-party and third-party contexts?

3.14. How should this specification work in the context of a user agent’s "incognito" mode?

3.15. Does this specification persist data to a user’s local device?

3.16. Does this specification have a "Security Considerations" and "Privacy Considerations" section?

3.17. Does this specification allow downgrading default security characteristics?

4. Mitigation Strategies

4.1. Secure Contexts

4.2. Explicit user mediation

4.3. Drop the feature

Conformance

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

Informative References

Issues Index