The Problem with Using Databases on the Edge / Serverless.

March 30, 2023 (Syndicated From dev.to)

What’s the deal with Databases and Severless? Sometimes it seems like they don’t mix well together like oil and water. What’s all the fuss about?

Here’s a deep dive video companion to this article:

What is serverless?

The dreaded term “serverless” has come with some animocity as of recently for it’s really not the best descriptior for the technology, but it overall encompases the idea that a server operation will be ephemeral. But a server still exists. Traditional servers operate with an “always on” and “ready” mentality where as serverless functions or endpoints are spun up at the time of request, and then broken down. This is important to note because when incopreating persistant data and databases into these functions or endpoints we need to connect to them and the ephemeral nature of serverless means, worst case scenario we’d be creating a client connection to a database each time we make use the serverless function.

What does “on the edge” mean?

The overall idea of the “edge” takes serverless to the next level. It’s about distributing serveless functions across the globe in many different data centers in different countries. An edge network then routes requests to the nearest datacenter to spin up that serverless operation closer to the end user.

Cloud Providers & Database Clients

There are many different cloud providers from AWS Lambda, Google Cloud Functions, Vercel Edge Functions, Deno Deploy and Cloudflare Workers. They all have the potential to handle the programming runtime differently, they all may have different TTL between requests.

In addition database clients may handle client connections, and pooling differently by default, and it may mean you need to structure your code in a specific way. The database service provider also may have connection pooling settings as well.

Breaking the edge

I first saught out to create an example to “break” the edge / database connection, by creating too many requests to an edge-run webpage. The goal of this exersize is to create too many connections without cleaning them up the worst-case scenario. For this deep dive we’ll be using Deno Deploy as an example of a severless / edge arcitecture to quickly get some Typescirpt code up and running.

Here’s some initial code. This outputs a webpage with some json that looks like this:

[{"max_conn":60,"used":13,"res_for_super":3,"res_for_normal":17}]

What does this code do? This creates a connection to a postgres database and queries how many connections the database is currently using.

import { serve } from "https://deno.land/std@0.177.0/http/server.ts";
import { Client } from "https://deno.land/x/postgres@v0.17.0/mod.ts";

// this is needed to aid the query 
BigInt.prototype['toJSON'] = function () {
    return parseInt(this.toString());
};

const query = `
select max_conn,used,res_for_super,max_conn-used-res_for_super res_for_normal 
from 
  (select count(*) used from pg_stat_activity) t1,
  (select setting::int res_for_super from pg_settings where name=$$superuser_reserved_connections$$) t2,
  (select setting::int max_conn from pg_settings where name=$$max_connections$$) t3
`

serve(async (req: Request) => {
  const client = new Client(Deno.env.get("DBURI"));
  const connection = await client.connect();
  const data = await client.queryObject(query)
  return Response.json(data.rows)
});

I then created another endpoint to “bombard” the postgres database with a bunch of connections. This sends several requests to the above url at onceusing Promise.all. The code looks like this:

import { serve } from "https://deno.land/std@0.177.0/http/server.ts";

async function ping (url: string) {
  const response = await fetch(Deno.get('PING_URL'))
  return await response.json()
}

serve(async (req: Request) => {
  const example = await Promise.all([
    ping(), ping(), ping(), ping(), 
    ping(), ping(), ping(), ping(),
    ping(), ping(), ping(), ping(),
  ])
  return Response.json([example])
});

And here’s the output from this endpoint. This is showing the used connections fluctuating fro 14 to 21.

[
  [{"max_conn":60,"used":14,"res_for_super":3,"res_for_normal":43}],
  [{"max_conn":60,"used":18,"res_for_super":3,"res_for_normal":39}],
  [{"max_conn":60,"used":21,"res_for_super":3,"res_for_normal":36}],
  [{"max_conn":60,"used":17,"res_for_super":3,"res_for_normal":40}],
  [{"max_conn":60,"used":16,"res_for_super":3,"res_for_normal":41}],
  [{"max_conn":60,"used":19,"res_for_super":3,"res_for_normal":38}],
  [{"max_conn":60,"used":15,"res_for_super":3,"res_for_normal":42}],
  [{"max_conn":60,"used":21,"res_for_super":3,"res_for_normal":36}],
  [{"max_conn":60,"used":21,"res_for_super":3,"res_for_normal":36}],
  [{"max_conn":60,"used":17,"res_for_super":3,"res_for_normal":40}],
  [{"max_conn":60,"used":21,"res_for_super":3,"res_for_normal":36}],
  [{"max_conn":60,"used":17,"res_for_super":3,"res_for_normal":40}]
]

What happens when the “used” number of connections is exhaused? The app will crash, and requests will drop.

This example may not be 100% fair, it’s happening mainly because I’m doing something really evil here, I’m calling new Client from within the serve hander which causes the connections to be created every time there’s a request. Interestingly enough when I hoist this outside of the serve callback, this fluctuation doesn’t occur (perhaps that’s because it’s because I’m hitting the same datacenter), this may be because Deno Deploy or the Deno Postgres library is keeping the root scope of this edge function alive in some way. This again will vary across providers, runtimes, and clients. But this demonstration paints a visual and tanible picture of how you can shoot yourself in the foot while working with databases within the servereless edge.

The Antidote

What does this mean for using Databases within Edge functions? It’s scary to code for the edge when it may be a situation where “it works for me” but once many requests are going through the system from different data centers everything comes to a screaching halt. It’s also scarry to think about vendor lock-in where things may be working within one edge provider but you can’t switch out of fear that the runtime handling would be different on another provider.

The solution is kind of right in front of our eyes. The problem is connection pooling and connecting to the database directly many times. What if you didn’t need to do that? What if we had a proxy that wasn’t an edge function that we could use as an intermediary between the database and our edge function? This is the classic microservice architecture, utilizing an API to interact with the database on your behalf. With http requests you can ping this proxy layer and the proxy will handle the connection pooling for you, all you app would have to do is call fetch.

ORMs like Prisma provide this baked into the client, prisma provides a service called aptly Data Proxy that does just what we described. This allows your edge functions to not have to worry about any of the aforementioned connection and pooling troubles. A middle layer between your app and the database allows your core business logic to be on the edge and allows you to interact with your database without the hassle.

Here’s a visual:

graph showing proxy layer

Wrap up

The world of serverless computing and edge functions is fresh and exciting! I think it’s a growing architecture that shifts how we think about designing software and provides really quick, snappy, and responsive websites. It’s exciting to watch this technology grow and what people will do with it in the future.

If you’re interested in learning more about Data Proxies, here are some great tutorials showing how to incorporate Data Proxy into different apps from Next.js to Deno’s Oak.