Running Flask server
Flask is a minimalist framework for building Python servers. At Atheros, we use Flask to expose our GraphQL API to serve our machine learning models. Requests from client applications are then forwarded by the GraphQL gateway. Overall, microservice architecture allows us to use the best technology for the right job, and and it also allows us to use advanced patterns like schema federation. In this article, we will start small with the implementation of the so-called Levenshtein distance. We will use the well-known NLTK library and expose the Levenshtein distance functionality with the GraphQL API. In this article, we assume that you are familiar with basic GraphQL concepts like building GraphQL mutations.
Let's start by cloning our example repository with the following:
In our projects, we use Pipenv for managing Python dependencies. If you are located in the project folder we can create our virtual environment with this:
and install dependencies from Pipfile:
We usually define a couple of script aliases in our Pipfile to ease our development work-flow.
It allows us to run our dev environment easily with a command alias as follows:
The Flask server should be then exposed by default at port 5000. You can immediately move on to GraphQL Playground, which serves as the IDE for the live documentation and query execution for GraphQL servers. GraphQL Playground uses the so-called GraphQL introspection for fetching information about our GraphQL types. The following code initialises our Flask server:
It is a good practice to use the WSGI server when running a production environment. Therefore, we have also set-up a script alias for gunicorn with:
The Levenshtein distance, also known as edit distance, is a string metric. It is defined as the minimum number of single-character edits needed to change a one character sequence to another one . If we denote length of such sequences and respectively, we get the following:
is the so-called indicator function, which is equal to 0, when and equal to 1 otherwise. is the distance between the first characters of and the first character of . For more on the theoretical background, feel free to check out the wiki.
In practice, let's say that someone misspelled "machine learning" and wrote "machinlt lerning". We would need to make the following edits:
|Edit||Edit type||Word state|
For these two strings we get a Levenshtein distance equal to 3. The Levenshtein distance has many applications, such as spell checkers, correction systems for optical character recognition, or similarity calculations.
Building a GraphQL server with graphene in Python
We will build the following schema in our article:
Each GraphQL schema is required to have at least one query. We usually define our first query in order to healthcheck our microservice. The query can be called like this:
However, the main functional of our schema is to enable us to calculate the Levenshtein distance. We will use variables to pass dynamic parameters in the following GraphQL document:
We have defined our schema so far in SDL format. In the Python ecosystem, however, we do not have libraries like graphql-tools, so we need to define our schema with a code-first approach. The schema is defined as follows using the Graphene library:
This code essentially defines dynamic arguments for our mutation. Those are then passed to the function responsible for calculating the Levenshtein distance.
Each time, we execute our mutation in GraphQL playground:
with the following variables
we obtain the Levenshtein distance between our two input strings. For our simple example of strings test1 and test2, we get 1. We can leverage the well-known NLTK library for natural language processing (NLP). The following code is executed from our resolver:
It is also straightforward to implement the Levenshtein distance by ourselves using, for example, an iterative matrix, but I would suggest to not reinvent the wheel and use the default NLTK functions.
GraphQL is a great technology for building APIs and it is very useful for exposing the output from machine learning models and other calculations. The Python ecosystem is more suitable for data science and libraries such as graphene help us build our GraphQL schema for machine learning microservices with ease.
Did you like this post? You can clone the repository with the examples and project set-up. Feel free to send any questions about the topic to firstname.lastname@example.org and subscribe to get more knowledge about building AI-driven systems and our updates for the upcoming Atheros cloud.