# How to Build GraphQL APIs for Text Analytics in Python

David Mráz

## Introduction

GraphQL is a query language and a server-side runtime environment for building APIs. It isn't tied to any specific database or programming language, meaning you can build your GraphQL server in Node.js, C#, Scala, Python and more. The ecosystem in JavaScript is the most evolved GraphQL-wise and it is usually better for client-facing APIs and frontend applications. You can take a look at our blog to learn more about building software with React, GraphQL, and Node.js. However, Node.js is not the best fit for data science, so in this article, we will take a look at building microservices with a GraphQL API. This will help us better serve our machine learning models and improve functionality for other computations like text analytics.

Flask is a minimalist framework for building Python servers. At Atheros, we use Flask to expose our GraphQL API to serve our machine learning models. Requests from client applications are then forwarded by the GraphQL gateway. Overall, microservice architecture allows us to use the best technology for the right job, and and it also allows us to use advanced patterns like schema federation. In this article, we will start small with the implementation of the so-called Levenshtein distance. We will use the well-known NLTK library and expose the Levenshtein distance functionality with the GraphQL API. In this article, we assume that you are familiar with basic GraphQL concepts like building GraphQL mutations.

Let's start by cloning our example repository with the following:

In our projects, we use Pipenv for managing Python dependencies. If you are located in the project folder we can create our virtual environment with this:

and install dependencies from Pipfile:

We usually define a couple of script aliases in our Pipfile to ease our development work-flow.

It allows us to run our dev environment easily with a command alias as follows:

The Flask server should be then exposed by default at port 5000. You can immediately move on to GraphQL Playground, which serves as the IDE for the live documentation and query execution for GraphQL servers. GraphQL Playground uses the so-called GraphQL introspection for fetching information about our GraphQL types. The following code initialises our Flask server:

It is a good practice to use the WSGI server when running a production environment. Therefore, we have also set-up a script alias for gunicorn with:

## Levenshtein distance (edit distance)

The Levenshtein distance, also known as edit distance, is a string metric. It is defined as the minimum number of single-character edits needed to change a one character sequence $a$ to another one $b$. If we denote length of such sequences $\left|a\right|$ and $\left|b\right|$ respectively, we get the following:

$lev_{a,b}\left(|a|, |b| \right)$,

where

\begin{aligned}\displaystyle \qquad \operatorname {lev} _{a,b}(i,j)={\begin{cases}\max(i,j)\\\min {\begin{cases}\operatorname {lev} _{a,b}(i-1,j)+1\\\operatorname {lev} _{a,b}(i,j-1)+1\\\operatorname {lev} _{a,b}(i-1,j-1)+1_{(a_{i}\neq b_{j})}\end{cases}}\end{cases}}\end{aligned}

$1_(a_{i}\neq b_{j})$ is the so-called indicator function, which is equal to 0, when $a_i = b_j$ and equal to 1 otherwise. $lev_{a,b}(i,j)$ is the distance between the first $i$ characters of $a$ and the first $j$ character of $b$. For more on the theoretical background, feel free to check out the wiki.

In practice, let's say that someone misspelled "machine learning" and wrote "machinlt lerning". We would need to make the following edits:

EditEdit typeWord state
0-machinlt lerning
1Substitutionmachinet lerning
2Deletionmachine lerning
3Insertionmachine learning

For these two strings we get a Levenshtein distance equal to 3. The Levenshtein distance has many applications, such as spell checkers, correction systems for optical character recognition, or similarity calculations.

## Building a GraphQL server with graphene in Python

We will build the following schema in our article:

Each GraphQL schema is required to have at least one query. We usually define our first query in order to healthcheck our microservice. The query can be called like this:

However, the main functional of our schema is to enable us to calculate the Levenshtein distance. We will use variables to pass dynamic parameters in the following GraphQL document:

We have defined our schema so far in SDL format. In the Python ecosystem, however, we do not have libraries like graphql-tools, so we need to define our schema with a code-first approach. The schema is defined as follows using the Graphene library:

We have followed the best practices for overall schema and mutations. Our input object type is written in Graphene as follows:

This code essentially defines dynamic arguments for our mutation. Those are then passed to the function responsible for calculating the Levenshtein distance.

Each time, we execute our mutation in GraphQL playground:

with the following variables

we obtain the Levenshtein distance between our two input strings. For our simple example of strings test1 and test2, we get 1. We can leverage the well-known NLTK library for natural language processing (NLP). The following code is executed from our resolver:

It is also straightforward to implement the Levenshtein distance by ourselves using, for example, an iterative matrix, but I would suggest to not reinvent the wheel and use the default NLTK functions.

## Conclusion

GraphQL is a great technology for building APIs and it is very useful for exposing the output from machine learning models and other calculations. The Python ecosystem is more suitable for data science and libraries such as graphene help us build our GraphQL schema for machine learning microservices with ease.

Did you like this post? You can clone the repository with the examples and project set-up. Feel free to send any questions about the topic to david@atheros.ai and subscribe to get more knowledge about building AI-driven systems and our updates for the upcoming Atheros cloud.