GraphQL APIs, backed by Neo4j – a native graph database – enable complex relationships to be easily modeled, queried, and resolved. That could be a boon for newsrooms where usually nothing comes easily.
EDITOR’S NOTE: For many of the world’s online news publishers, name-brand, commercial content management systems (CMS) are insufficient. Some news organizations have collaborated to build their own add-ons for systems like WordPress, to tackle the feature deficiency problem. Others that don’t have the time or resources to collaborate, end up changing their own operations just so they can use their CMS, in the process changing their own story frameworks. A few will invest extraordinary sums to design and build their own all-inclusive systems, only to discover later they’re continuing to invest unsustainable amounts in upkeep, feature additions, and security. In the demonstration that follows, Neo4j’s William Lyon tackles these problems head-on by demonstrating how a modern graph database uses relationships to establish contextual associations between news articles. Such an advancement could potentially revolutionize both existing CMS platforms, as well as home-grown newsroom management systems, by giving them all the versatility, adaptability, and performance that old-world CMS typically lacks.
GraphQL has emerged as a productive and powerful technology for building the API layer that sits between the client and database. In this post we take a look at building a GraphQL API for news articles with functionality for search, personalized recommendations, and data enrichment, using the Neo4j graph database and the Neo4j GraphQL Library. After reading this article I hope you'll see how powerful thinking in graphs can be, especially by pairing a graph database with GraphQL and that you'll walk away with some ideas for how to apply this technology to your domain.
How GraphQL Makes Graph Databases Relevant For Full Stack Developers
Fundamentally, GraphQL is a query language for APIs and a runtime for executing these requests against a data layer. GraphQL uses a strict type system to define the data available in the API, how it is connected, and to specify which types of operations are supported by the API. Unlike REST APIs that return a fixed set of attributes for each resource, with GraphQL the client is free to request only the data needed and therefore avoids overfetching or underfetching, ensuring exactly the data required is returned. GraphQL is often said to be self-documenting because of its introspection feature. This allows clients to see the API schema, for generating documentation or developer tooling that relies on the GraphQL type system.
GraphQL is data layer agnostic - we can build GraphQL APIs using any database or data layer, including wrapping other APIs. Since GraphQL treats application data as a graph, Neo4j as a graph database is a natural fit for GraphQL backends. This way, we leverage graph thinking throughout our full stack application, from the database to the front-end data fetching GraphQL queries.
How GraphQL APIs are Typically Built, and Challenges that Commonly Emerge
There are two important GraphQL-specific concepts to understand how GraphQL API applications are built: type definitions and resolver functions.
GraphQL type definitions define the data available in the API and how the data is connected. These type definitions are typically defined using the GraphQL Schema Definition Language (SDL). However, type definitions can also be defined programmatically. Here we see type definitions for a simple conference application that deals with sessions that are given in specific rooms and have associated themes.
type Session {
sessionId: ID!
title: String!
description: String!
time: String
room: Room
theme: [Theme!]!
recommended: [Session]
}
type Room {
name: String
building: String
sessions: [Session!]!
}
type Theme {
name: String!
description: String
sessions: [Sessions!]!
}
GraphQL resolver functions are the functions responsible for actually fulfilling GraphQL operations. In the context of a query, this means fetching data from a data layer. Let's look at an example of what the resolver functions for our conference GraphQL type definitions might look like.
const resolvers = {
Query: {
Session: (object, params, context, info) => {
return context.db.sessionsBySearch(params.searchString);
}
},
Session: {
room: (object, params, context, info) => {
return context.db.roomForSession(object.sessionId);
},
theme: (object, params, context, info) => {
return context.db.themeForSession(object.sessionId);
},
recommended: (object, params, context, info) => {
return context.db.recommendedSession(object.sessionId)
}
}
}
Here we are using an imaginary data access layer to call to a database in each resolver function, using data passed into the resolver function for a search term or to look up sessions by ID. Because of the way these resolvers are called in a nested fashion, we can often end up making several requests to the database for a single GraphQL operation. This can cause performance issues, as each request to the data layer adds overhead. This is known as the N+1 query problem, and is a common issue to overcome when building GraphQL APIs.
Another common problem that arises is the need to write lots of boilerplate data fetching code in our GraphQL resolvers. Fortunately, there are database integrations for building GraphQL APIs that can address these issues, plus give developers other power ups.
The Neo4j GraphQL Library
The Neo4j GraphQL library is a Node.js package that makes it easier to build GraphQL APIs backed by the Neo4j graph database. The Neo4j GraphQL library has four main goals:
- Support GraphQL First Development
- Auto-Generate GraphQL API Operations
- Generate Database Queries From GraphQL Operations
- Extend GraphQL With The Power Of Cypher
Let's take a look at each of these points in more detail, in the context of building a news article GraphQL API using data from The New York Times.
Support GraphQL First Development
With the Neo4j GraphQL library, GraphQL type definitions drive the database data model. This means we don't need to maintain two separate schemas for our API and database. Instead, the data model is defined by GraphQL type definitions. Here are the GraphQL type definitions for our news API.
type Article {
abstract: String
published: Date
title: String
url: String!
authors: [Author!]! @relationship(type: "BYLINE", direction: OUT)
topics: [Topic!]! @relationship(type: "HAS_TOPIC", direction: OUT)
people: [Person!]! @relationship(type: "ABOUT_PERSON", direction: OUT)
organizations: [Organization!]! @relationship(type: "ABOUT_ORGANIZATION", direction: OUT)
geos: [Geo!]! @relationship(type: "ABOUT_GEO", direction: OUT)
}
type Author {
name: String!
articles: [Article!]! @relationship(type: "BYLINE", direction: IN)
}
type Topic {
name: String!
articles: [Article!]! @relationship(type: "HAS_TOPIC", direction: IN)
}
type Person {
name: String!
articles: [Article!]!
@relationship(type: "ABOUT_PERSON", direction: IN)
}
type Organization {
name: String!
articles: [Article!]! @relationship(type: "ABOUT_ORGANIZATION", direction: IN)
}
type Geo {
name: String!
location: Point
articles: [Article!]! @relationship(type: "ABOUT_GEO", direction: IN)
}
Note the use of the @relationship schema directive in the type definitions. A schema directive is GraphQL's built-in extension mechanism, and is used extensively by the Neo4j GraphQL library to configure the generated GraphQL API. In this case we use the @relationship
directive to specify the direction and type of the relationship.
These GraphQL type definitions map to the following graph data model in Neo4j. Using the Neo4j GraphQL library, we've now defined the data model for our GraphQL API and the database at the same time.
Auto-Generate GraphQL API Operations
Once we've created our GraphQL type definitions, we pass them to the Neo4j GraphQL library to create an executable GraphQL schema which can then be served using a GraphQL server like Apollo Server. We also need to create a database driver instance using a connection string for our database. I've stored the credentials as environment variables and used the Neo4j AuraDB free tier to create a cloud Neo4j instance.
const { Neo4jGraphQL } = require("@neo4j/graphql");
const { ApolloServer } = require("apollo-server");
const neo4j = require("neo4j-driver");
const driver = neo4j.driver(
process.env.NEO4J_URI,
neo4j.auth.basic(process.env.NEO4J_USER, process.env.NEO4J_PASSWORD)
);
const typeDefs = …
const neoSchema = new Neo4jGraphQL({
typeDefs,
driver,
});
neoSchema.getSchema().then((schema) => {
const server = new ApolloServer({
schema,
});
server.listen().then(({ url }) => {
console.log(`GraphQL server ready at ${url}`);
});
});
With the Neo4j GraphQL library, GraphQL type definitions provide the starting point for a generated API that includes Query and Mutation types with an entry point for each type defined in the schema, arguments for ordering, pagination, and complex filtering, as well as support for native data types like DateTime and Point.
For example, the following GraphQL code queries for the 10 most recent articles about geo regions within 10km of San Francisco:
{
articles(
where: {
geos_SOME: {
location_LT: {
point: { latitude: 37.7749, longitude: -122.4194 }
distance: 10000
}
}
}
options: { sort: { published: DESC }, limit: 10 }
) {
title
url
topics {
name
}
}
}
Note that we didn't need to write any resolver functions to define our data fetching. This is all handled for us by the Neo4j GraphQL library, greatly reducing the quantity of code required to get our GraphQL API up and running.
Generate Database Queries From GraphQL Operations
At query time the Neo4j GraphQL library inspects the incoming GraphQL operation and generates a single database query for the GraphQL operation (either a query or mutation). This solves the N+1 query problem by making a single roundtrip to the database. Additionally, graph databases like Neo4j are optimized for the type of nested traversals typically specified in many GraphQL operations.
To learn more about how database integrations like the Neo4j GraphQL library work under the hood, check out this presentation from GraphQL Summit, which goes into much more detail.
Extend GraphQL With The Power Of Cypher
Cypher is a powerful graph query language used by graph databases like Neo4j. Unlike GraphQL, which is an API query language rather than a database query language, Cypher includes support for complex graph operations like pattern matching and variable length path operations.
The Neo4j GraphQL library allows us to define custom logic using Cypher within our GraphQL type definitions. To do this we use the @cypher GraphQL schema directive in our GraphQL type definitions. This means we can add custom logic to our GraphQL API using just GraphQL type definitions!
Let's see how this works by adding an article recommendation feature to our API. If a user is reading an article, let's find other similar articles that they might be interested in by looking for articles with common topics, or about the same geo regions or persons. We'll add a similar
field to the Article type and add a @cypher
directive with the logic for this.
extend type Article {
similar(first: Int = 3): [Article]
@cypher(
statement: """
MATCH (this)-[:HAS_TOPIC|:ABOUT_GEO|:ABOUT_PERSON]->(t)
MATCH (t)<-[:HAS_TOPIC|:ABOUT_GEO|:ABOUT_PERSON]-(rec:Article)
RETURN rec ORDER BY COUNT(*) DESC LIMIT $first
"""
)
}
While Cypher is purpose built for graphs, it is an extremely powerful query language that includes support for things like making calls to other APIs. We can leverage this functionality in our GraphQL API to fetch data from other sources. Here we supplement our data by calling the Google Knowledge Graph API to find more detailed information about persons mentioned in the article, essentially acting as a data federation mechanism.
extend type Person {
description: String
@cypher(
statement: """
WITH this, apoc.static.get('gcpkey') AS gcpkey,
'https://kgsearch.googleapis.com/v1/entities:search?query=' AS baseURL
CALL apoc.load.json(baseURL + apoc.text.urlencode(this.name) + '&limit=1&key=' + gcpkey)
YIELD value
RETURN value.itemListElement[0].result.detailedDescription.articleBody
"""
)
}
The Neo4j GraphQL Library is open source and has many other features we didn't touch on today like a powerful authorization model, support for Relay-style pagination, Subscriptions, and much more!
Resources
You can see first-hand how this example works in this Codesandbox or find all code on GitHub. Also be sure to check out the Neo4j GraphQL landing page and documentation.