Joining SQL tables on a JSON array
How to join PostgreSQL tables on a JSON array of ids.
Joining tables is a common SQL operation that allows you to combine data from two or more tables into a single result set. Normally, join operations are performed on columns that have matching values in both tables.
Many-to-many relationships are a common pattern in relational databases, where two tables are related to each other through a junction table.
Let’s take this example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_name TEXT
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
price NUMERIC
);
CREATE TABLE order_products (
id SERIAL PRIMARY KEY,
order_id int references orders(id),
product_id int references products(id)
);
INSERT INTO orders (customer_name)
VALUES ('Alice'), ('Bob');
INSERT INTO products (name, price)
VALUES ('Product 1', 10.0), ('Product 2', 20.0), ('Product 3', 30.0), ('Product 4', 40.0);
INSERT INTO order_products (order_id, product_id)
VALUES (1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (2, 4);
Here we have an orders table, a products table, and an order_products table which relates each order to its products.
If we want to get each order and its products, we use the following query.
1
2
3
4
5
6
7
8
9
select
orders.id,
customer_name,
products.name,
products.price
from
orders
join order_products on orders.id = order_products.order_id
join products on product_id = products.id;
We get this result
id | customer_name | name | price |
---|---|---|---|
1 | Alice | Product 1 | 10.0 |
1 | Alice | Product 2 | 20.0 |
1 | Alice | Product 3 | 30.0 |
2 | Bob | Product 2 | 20.0 |
2 | Bob | Product 3 | 30.0 |
2 | Bob | Product 4 | 40.0 |
But what if I told you we don’t need this intermediate table, order_products?
We can store the product ids as a JSON array in the orders table.
Let’s drop the tables
1
DROP order_products, orders, products;
And create the new tables
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_name TEXT,
product_ids JSONB
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT,
price NUMERIC
);
INSERT INTO orders (customer_name, product_ids)
VALUES ('Alice', '[1, 2, 3]'), ('Bob', '[2, 3, 4]');
INSERT INTO products (name, price)
VALUES ('Product 1', 10.0), ('Product 2', 20.0), ('Product 3', 30.0), ('Product 4', 40.0);
Here we defined the products_ids in the orders table as a JSON column and inserted an array of ids into it.
So how can we join these two tables to get each order and its products as we did earlier?
We run the following query
1
2
3
4
5
6
7
8
9
10
11
SELECT
orders.id,
orders.customer_name,
products.name,
products.price
FROM
orders
CROSS JOIN jsonb_array_elements(orders.product_ids) AS product_id
JOIN products ON products.id = product_id::text::int
ORDER BY
orders.id;
We get the following result
id | customer_name | name | price |
---|---|---|---|
1 | Alice | Product 1 | 10.0 |
1 | Alice | Product 2 | 20.0 |
1 | Alice | Product 3 | 30.0 |
2 | Bob | Product 2 | 20.0 |
2 | Bob | Product 3 | 30.0 |
2 | Bob | Product 4 | 40.0 |
The results are the same, but a lot is going on here, so let’s break it down piece by piece.
jsonb_array_elements
is a PostgreSQL function that allows you to extract the elements of a JSON array and return them as a set of rows.
If we run the following query
1
SELECT jsonb_array_elements(orders.product_ids) FROM orders;
We get this result
jsonb_array_elements |
---|
1 |
2 |
3 |
2 |
3 |
4 |
As you can see, this returned all of the ids in the product_ids field as rows.
CROSS JOIN
is a type of join in SQL that returns the Cartesian products of two tables.
In other words, it returns all possible combinations of rows from the two tables, without any condition for joining them.
But it can also work on functions. You can use a function to generate a set of values, and then combine those values with another table using CROSS JOIN
.
So this query
1
2
3
4
5
6
7
SELECT
id,
customer_name,
product_id
FROM
orders
CROSS JOIN jsonb_array_elements (orders.product_ids) AS product_id;
returns this result
id | customer_name | product_id |
---|---|---|
1 | Alice | 1 |
1 | Alice | 2 |
1 | Alice | 3 |
2 | Bob | 2 |
2 | Bob | 3 |
2 | Bob | 4 |
We can join this result on the products table as we did to get all orders and their products.
product_id::text::int
casts the value to an integer.
What is a JSONB type?
You might wonder what is JSONB, JSONB is a binary format for storing JSON data.
Unlike the JSON data type, which stores JSON data as plaintext, JSONB stores the JSON data in a binary format.
The benefits are:
- Efficient storage: `JSONB` stores data in a binary format that is more compact than plain text, leading to smaller table sizes and faster data access.
- - Fast queries: `JSONB` supports indexing and querying of JSON data using various operators and functions, which can improve query performance.
- - Data validation: `JSONB` provides some level of data validation, as it only allows valid JSON data to be stored in a column with this data type.
Considerations
There are some things to consider in this approach.
Firstly, updates and deletes are a bit more complex.
In the first approach where we had an order_products table, adding or deleting products from an order was easy, we just insert a new row or delete an existing one.
But if we want to update the product_ids array we use this
1
2
3
4
5
6
UPDATE
orders
SET
product_ids = jsonb_set(product_ids, '{2}', '4')
WHERE
id = 1;
instead of
1
2
3
4
5
6
UPDATE
order_products
SET
product_id = 4
WHERE
id = 1;
Secondly, is this a good design?
JSON fields in SQL are useful for arbitrary data that is difficult to model, but if the data follows a pattern, then we should probably model the data in a structured way, in this case by using a junction table in the first example of this article.
So, you probably shouldn’t use JSON in this case, but I hope this article is useful in case you do need to join tables on a JSON field that doesn’t follow a pattern.