Building an observability platform

Recently I have been working on building an obsevability platform by myself.

This started when I tried to run Sentry locally and found out that it’s a bit complex to run, as it now requires:

Kafka
Clickhouse
Snuba
PostgreSQL
Redis

and other stuff.

This probably makes sense for their scale as they have a multi-tenant platform that handles observability for several customers, but it got me thinking about hard it would be to build the simplest observability tool by myself, and I named it Nabatshy - نبطشي.

Generating data

I already knew that OpenTelemetry is now the industry standard for generating this data, and found out that Sentry also uses it under the hood, so this part was easy as I didn’t have to implement it by myself (there are OpenTelemtry SDKs for almost all programming languages).

Collecting data

All of the tracing data generated by apps is sent to a server that collects it and exports another software that saves it to storage so that it can be queried later.

For this step I decided to write a Golang server that collects the data. Golang is fast enough to handle this task, and is widely used for cloud tools, so it seemed like a good fit.

I searched for the easiest way to parse this data into a Golang struct, and found out there is an official package by OpenTelemtry for this go.opentelemetry.io/proto/otlp/collector/trace/v1

so I simply used it in to parse the incoming requests.

        
      
import (
  coltrace "go.opentelemetry.io/proto/otlp/collector/trace/v1"
)

func ingestTraceHTTPRequest(w http.ResponseWriter, r *http.Request) {
	var req coltrace.ExportTraceServiceRequest
	body, err := io.ReadAll(r.Body)
	// rest of the code, will be discussed later
}

The data mainly consists of traces, spans, and attributes. A trace consists of spans. You can think of it like how an HTTP request will do multiple operations in your code, like fetching a user, updating them, etc…

Spans have a attributes. Attributes can be used to attach more metadata about the span, like is it a database operation or an http request you’re making, and what parameters were used for it.

So now I have the data. The question is what to do with it?

Exporting data

Apps generate a huge volume of tracing data, so we need to save it into a database to query it later. The database needs to be very performant and able to handle large volumes of data, so we have to choose an analytical (OLAP) database instead of an OLTP database.

I chose Clickhouse.

ClickHouse® is a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP).

Being a column-oriented database makes it faster for analytical queries that searches a large amount of data, and also saves storage by compressing the data since it stores columns separately and all values of a column have the same type.

Using the data

Now that the tracing data is stored in the database, we need to be able to search it, aggregate it, and visualize it to create valuable insights of it.

So I build a Golang API that exposes endpoints to search and aggregate the data, and a React.js frontend to visualize it.

The API returns data like P50, p75, P90, P95, P99 for traces.

Making it even simpler

So I have a web server that collects the data, exports it to be saved into Clickhouse, an API to query it, and a frontend for it. Why make this even easier by compiling all of this a single executable file?

Golang as a compiled language, so we’re off to a good start.

I wrote a single web server that collects the data, saves it to Clickhouse, and queries it for the frontend.

Now the frontend, how can we embed it into the same executable? Well, the frontend transpiles to a single HTML, CSS, and JavaScript file. All we need is serve these files by our existing web server, and it turns out Golang supports embedding arbitrary files into a virtual filesystem.

So after transpiling it, I embedded it with this single line

        
//go:embed ui/dist/*
var content embed.FS

and wrote a simple static file server for the files.

So now we have everything we need in a single executable file (except the database of course).

Here’s the link for the repo: https://github.com/adhamsalama/nabatshy

Building an observability platform

Generating data

Collecting data

Exporting data

Using the data

Making it even simpler

Further Reading

Getting Real-time Database updates using PostgreSQL's LISTEN and NOTIFY

I built a Kindle Clippings App

Why are Kafka messages still on the topic after the retention time has expired?