Access Control for GraphQL, Part 1
GraphQL is an emerging open-source API standards project that front-end developers love because it puts them in control. Developers are no longer restricted to a fixed set of API methods and URI patterns but instead get to customize their queries in whichever ways best suit their applications. Because of this added control, and because of other benefits around non-breaking version upgrades and performance optimizations, GraphQL is on its way to becoming omnipresent among web APIs.
But security and access control are often not top of mind for API developers. Since GraphQL adoption is an emerging trend (it originated with Facebook in 2012 and moved under the Linux Foundation umbrella in late 2018), well-established practices are not yet available on how to apply access control to a GraphQL API. This multi-part blog series is going to take us through these challenges and provide some guidance to help ensure any new or existing GraphQL deployment is well protected.
GraphQL Disrupts Existing Access Control Infrastructure
Controlling access with web APIs has been influenced by the popularity of the most common API style over the last decade: representational state transfer (REST). A fundamental convention of the REST style of APIs is that resources are uniquely identified by HTTP URIs. This predictable aspect of REST APIs fostered a generation of access control methodologies in which rules are associated with the URI (resource) being accessed, or at least the pattern of the URI being accessed. Often, access control rules will be based on a combination of the HTTP verb (GET/PUT/POST/DELETE) and the HTTP URI (the resource identifier) patterns. Identifying which data is being accessed through the URI means that rules can be applied without visibility into (and most importantly, without an ability to understand) the payload in these API transactions. This has been practical, in particular, for middleware security solutions that enforce access control rules decoupled from the web API implementations themselves by sitting in front of them (e.g., gateways) or acting as agents (e.g., service filters).
GraphQL is not really a substitute for REST, and both API styles will continue to co-exist. API providers should pick a style of API best suited for each new set of requirements. Nevertheless, GraphQL is an increasingly common choice and its popularity is threatening to disrupt a decade of web API access control infrastructure. That disruption is due to one major divergence from the popular REST pattern: GraphQL requests do not identify the data being accessed via the HTTP URI. Rather, GraphQL identifies the data requested using its own query language, typically embedded inside an HTTP POST body. In fact, in a GraphQL API, all resources are accessed through a single URI (e.g., /graphql). Existing web API access control systems and infrastructure often are not designed for this type of API traffic.
A Banking API Example
Let’s start by looking at GraphQL access control through the lens of an open banking-inspired GraphQL example. Here is what a GraphQL request/response looks like for our example API:
Changing the query parameters affects the data returned. You can try this example yourself, including the access control strategy we are about to describe, by running it in your environment. All you need is neatly packaged in this public repo: https://github.com/flascelles/graphql-access-control Just skip to the section labeled “Running the GraphQL server sample.”
Developers consuming this banking API declare in their query what data they want the GraphQL API to return. A complete list of attributes is available to developers via the automatically generated interactive doc. Putting a front-end developer in control of what data they get back is great, but this principle should never override any privacy requirement that an API is responsible for. There is a limit to letting front-end developers control which data they get back from the GraphQL API. After all, I don’t want my personal banking information to be made available to other users just because a front-end developer decided it’s a good idea to allow it.
Obviously, some sort of user authentication should be built into the application, and the GraphQL implementation should only return banking data that belongs to the user associated with the request incoming from that application. In this first part of the GraphQL access control blog series, we examine how to implement this level of access control inside the GraphQL implementation.
Implementing Data-level Access Control inside a GraphQL Implementation
The environment we are using for our GraphQL implementation uses the Apollo server running on top of Express.JS, a Node.JS web application server framework that is easily installed via the Node package manager. This starting point is the same as the one described in the Apollo server’s getting started guide.
This blog (and its associated public repo) focuses on the server-side GraphQL API. Our GraphQL API expects the API client (the requesting application) to include an OAuth token that was issued through an OAuth handshake. There is nothing GraphQL-specific about how the client-side app gets a token in the first place—thankfully, that is one aspect of access control that is not disrupted by GraphQL. I use PingFederate in my environment. If you want more information about how to get tokens from PingFederate using OAuth, you’ll find a great resource in this developer guide.
On the receiving end, our banking GraphQL API receives the OAuth token embedded in the Authorization header. The GraphQL implementation will need to read this token to implement our access control rule as it processes requests. To this end, we populate the ApolloServer context with the incoming request (Node.JS code):
Later, we will introspect the incoming token. This allows the GraphQL implementation to achieve two things:
A GraphQL implementation could get this information by decoding the token itself if it were in a JWT format, but in our example, we call PingFederate’s introspection endpoint. This is what the token introspection API call looks like:
Notice the JSON response contains two claims, named ‘active’ and ‘Username.’ These are key to the GraphQL implementation achieving the two above-mentioned objectives.
Inside our GraphQL implementation, our banking data includes account records that have an ‘owner’ attribute:
This owner attribute can be matched against the Username claim coming back from our token introspection. We therefore isolate this value and populate it inside the ApolloServer context. Here’s the introspection callout and the subject capture in our Node.JS code:
GraphQL queries are processed through GraphQL resolvers. If you look back at the ApolloServer’s instantiation declaration, you may notice the ‘resolver’ argument. This object is responsible for tying it all together. For both our accounts and transfers queries, the resolver is looking up the subject (which may cause an introspection call) and matching it against the owner attribute for the raw data that it iterates through.
This simple example illustrates the pattern of incorporating basic access control in your GraphQL implementation. A real-world implementation would of course not traverse across the entire data set each time, and many potential optimizations are omitted on purpose here. Please reach out to share your own implementation experience or collaborate directly via the public repo on github.
Next Step: Delegated User Privacy Enforcement
In the next blog entry of the GraphQL Access Control series, we will step out of our GraphQL implementation and look at leveraging PingDataGovernance to enforce user privacy settings on behalf of the GraphQL API. Stay tuned!