Common table expressions
Common table expressions are a SQL99 feature that allow you to reuse common subqueries in your queries. There is often no semantic difference between CTEs and reusing Beam queries, but backends sometimes have optimizations that will only fire when using CTEs. Common table expressions are also necessary to write recursive queries.
Let's start with an example
Common table expressions are complicated, so let's start with an example.
In the window function section, we saw how to find the genre representing the most tracks in a particular album.
let albumGenreCnts = aggregate_ (\t -> ( group_ (trackAlbumId t)
, group_ (trackGenreId t)
, as_ @Int32 countAll_ )) $
all_ (track chinookDb)
withMaxCounts = withWindow_ (\(albumId, _, _) -> frame_ (partitionBy_ albumId) noOrder_ noBounds_)
(\(albumId, genreId, trackCnt) albumWindow ->
(albumId, genreId, trackCnt, max_ trackCnt `over_` albumWindow)) $
albumGenreCnts
in filter_' (\(_, _, trackCnt, maxTrackCntPerAlbum) -> just_ trackCnt ==?. maxTrackCntPerAlbum) withMaxCounts
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t0"."res2" AS "res2",
"t0"."res3" AS "res3"
FROM
(SELECT "t0"."AlbumId" AS "res0",
"t0"."GenreId" AS "res1",
COUNT(*) AS "res2",
MAX(COUNT(*)) OVER (PARTITION BY "t0"."AlbumId") AS "res3"
FROM "Track" AS "t0"
GROUP BY "t0"."AlbumId",
"t0"."GenreId") AS "t0"
WHERE ("t0"."res2") = ("t0"."res3")
Now, suppose that, instead of the album and genre ids, we wanted the names, along with the name of the artist who produced the album. We can do this by just joining over the above
let albumGenreCnts = aggregate_ (\t -> ( group_ (trackAlbumId t)
, group_ (trackGenreId t)
, as_ @Int32 countAll_ )) $
all_ (track chinookDb)
withMaxCounts = withWindow_ (\(albumId, _, _) -> frame_ (partitionBy_ albumId) noOrder_ noBounds_)
(\(albumId, genreId, trackCnt) albumWindow ->
(albumId, genreId, trackCnt, max_ trackCnt `over_` albumWindow)) $
albumGenreCnts
albumAndRepresentativeGenres = filter_' (\(_, _, trackCnt, maxTrackCntPerAlbum) -> just_ trackCnt ==?. maxTrackCntPerAlbum) withMaxCounts
in do
(albumId, genreId, _, _) <- albumAndRepresentativeGenres
genre_ <- join_' (genre chinookDb) (\g -> genreId ==?. just_ (primaryKey g))
album_ <- join_' (album chinookDb) (\a -> albumId ==?. just_ (primaryKey a))
artistName <- fmap artistName $
join_ (artist chinookDb) (\a -> albumArtist album_ `references_` a)
pure ( artistName, albumTitle album_, genreName genre_ )
SELECT "t3"."Name" AS "res0",
"t2"."Title" AS "res1",
"t1"."Name" AS "res2"
FROM
(SELECT "t0"."AlbumId" AS "res0",
"t0"."GenreId" AS "res1",
COUNT(*) AS "res2",
MAX(COUNT(*)) OVER (PARTITION BY "t0"."AlbumId") AS "res3"
FROM "Track" AS "t0"
GROUP BY "t0"."AlbumId",
"t0"."GenreId") AS "t0"
INNER JOIN "Genre" AS "t1" ON ("t0"."res1") = ("t1"."GenreId")
INNER JOIN "Album" AS "t2" ON ("t0"."res0") = ("t2"."AlbumId")
INNER JOIN "Artist" AS "t3" ON ("t2"."ArtistId") = ("t3"."ArtistId")
WHERE ("t0"."res2") = ("t0"."res3")
Because we're using Beam, and Beam is just Haskell, we are free to compose
queries arbitrarily, and get the right results. However, the generated SQL is a
bit dense, and some backends may not optimize it well. Moreover, if we plan on
using albumAndRepresentativeGenres
more than once, then we'd like to express
this syntactically, rather than repeating the query wherever we need it. In
other words, we want to be able to signal the concept of re-use to the database
system.
The selectWith
function
We can rewrite the above query using common table expressions. Firstly, we'll
have to tell Beam that we want to write a SELECT
statement with a WITH
expression. We can do this by using selectWith
instead of
select
. selectWith
takes one argument, which is a monadic action returning a
query. The monad expected is the With
monad from
Database.Beam.Query.CTE
. Within this monad, bindings represent queries we
would like bound at the top-level.
So, let's start building our query using selectWith
. We'll want to return a
list, so we'll use runSelectReturningList
.
runSelectReturningList $ selectWith $ do
Binding subqueries with selecting
Now, we're in the With
monad, so we can start binding common table
expressions. We can bind multiple different types of results here. On all
backends that support CTEs, you can bind the results of SELECT
, but in some
backends, you can bind the results of INSERT
, DELETE
, UPDATE
, etc.
In our case, we would like to bind the results of a SELECT
statement, so we
can use the selecting
function. This function takes a query (represented by a
value of type Q
) and returns a ReusableQ
, which is a query value that can be
re-used elsewhere.
A ReusableQ
is parameterized by three paramaters
data ReusableQ be db res
be
-- This is the backend that the query is in. For example,Postgres
forbeam-postgres
orSqlite
forbeam-sqlite
.db
-- This is the type of the database the query is written overres
-- This is the type of each row returned by the query.
Notice that ReusableQ
has no scoping parameter like regular Q
expressions. This is because ReusableQ
values can be rescoped at any
level. We'll get to this in the next section.
We can introduce the albumAndRepresentativeGenres
query into the With
monad
using selecting
.
albumAndRepresentativeGenres <-
selecting $
let albumGenreCnts = aggregate_ (\t -> ( group_ (trackAlbumId t)
, group_ (trackGenreId t)
, as_ @Int32 countAll_ )) $
all_ (track chinookDb)
withMaxCounts = withWindow_ (\(albumId, _, _) -> frame_ (partitionBy_ albumId) noOrder_ noBounds_)
(\(albumId, genreId, trackCnt) albumWindow ->
(albumId, genreId, trackCnt, max_ trackCnt `over_` albumWindow)) $
albumGenreCnts
in filter_' (\(_, _, trackCnt, maxTrackCntPerAlbum) -> just_ trackCnt ==?. maxTrackCntPerAlbum) withMaxCounts
Using the query with reuse
Now that we have taken that query out of scope, we'll need the ability to refer
to its result. Queries are in the Q
monad, but albumAndRepresentativeGenres
has type ReusableQ
. In order to use the value in the Q
monad, we can use the
reuse
function. This utility function ensures that the query is able to be
used at any nesting level.
Also note that since we're in the With
monad, we'll need to inject our query
into that using pure
.
pure $ do
(albumId@(AlbumId albumIdColumn), genreId, _, _) <-
reuse albumGenreCountQ
genre_ <- join_' (genre chinookDb) (\g -> genreId ==?. just_ (primaryKey g))
album_ <- join_' (album chinookDb) (\a -> albumId ==?. just_ (primaryKey a))
artist_ <- join_ (artist chinookDb) (\a -> albumArtist album_ `references_` a)
pure ( artistName artist_, albumTitle album_, genreName genre_ )
Phew! Putting all of that together, and executing, we get...
void $ runSelectReturningList $ selectWith $ do
albumAndRepresentativeGenres <-
selecting $
let albumGenreCnts = aggregate_ (\t -> ( group_ (trackAlbumId t)
, group_ (trackGenreId t)
, as_ @Int32 countAll_ )) $
all_ (track chinookDb)
withMaxCounts = withWindow_ (\(albumId, _, _) -> frame_ (partitionBy_ albumId) noOrder_ noBounds_)
(\(albumId, genreId, trackCnt) albumWindow ->
(albumId, genreId, trackCnt, max_ trackCnt `over_` albumWindow)) $
albumGenreCnts
in filter_' (\(_, _, trackCnt, maxTrackCntPerAlbum) -> just_ trackCnt ==?. maxTrackCntPerAlbum) withMaxCounts
pure $ do
(albumId, genreId, _, _) <-
reuse albumAndRepresentativeGenres
genre_ <- join_' (genre chinookDb) (\g -> genreId ==?. just_ (primaryKey g))
album_ <- join_' (album chinookDb) (\a -> albumId ==?. just_ (primaryKey a))
artist_ <- join_ (artist chinookDb) (\a -> albumArtist album_ `references_` a)
pure ( artistName artist_, albumTitle album_, genreName genre_ )
WITH "cte0"("res0",
"res1",
"res2",
"res3") AS
(SELECT "cte0_0"."res0" AS "res0",
"cte0_0"."res1" AS "res1",
"cte0_0"."res2" AS "res2",
"cte0_0"."res3" AS "res3"
FROM
(SELECT "cte0_0"."AlbumId" AS "res0",
"cte0_0"."GenreId" AS "res1",
COUNT(*) AS "res2",
MAX(COUNT(*)) OVER (PARTITION BY "cte0_0"."AlbumId") AS "res3"
FROM "Track" AS "cte0_0"
GROUP BY "cte0_0"."AlbumId",
"cte0_0"."GenreId") AS "cte0_0"
WHERE ("cte0_0"."res2") = ("cte0_0"."res3"))
SELECT "t3"."Name" AS "res0",
"t2"."Title" AS "res1",
"t1"."Name" AS "res2"
FROM "cte0" AS "t0"
INNER JOIN "Genre" AS "t1" ON ("t0"."res1") = ("t1"."GenreId")
INNER JOIN "Album" AS "t2" ON ("t0"."res0") = ("t2"."AlbumId")
INNER JOIN "Artist" AS "t3" ON ("t2"."ArtistId") = ("t3"."ArtistId");
Refining our query
Notice that the filter_'
condition on the common table expression resulted in
a nasty subselect. We can get rid of that by introducing the filter_'
in the
outer query.
void $ runSelectReturningList $ selectWith $ do
albumAndRepresentativeGenres <-
selecting $
let albumGenreCnts = aggregate_ (\t -> ( group_ (trackAlbumId t)
, group_ (trackGenreId t)
, as_ @Int32 countAll_ )) $
all_ (track chinookDb)
in withWindow_ (\(albumId, _, _) -> frame_ (partitionBy_ albumId) noOrder_ noBounds_)
(\(albumId, genreId, trackCnt) albumWindow ->
(albumId, genreId, trackCnt, max_ trackCnt `over_` albumWindow)) $
albumGenreCnts
pure $ do
(albumId, genreId, _, _) <-
filter_' (\(_, _, trackCnt, maxTrackCntInAlbum) ->
just_ trackCnt ==?. maxTrackCntInAlbum) $
reuse albumAndRepresentativeGenres
genre_ <- join_' (genre chinookDb) (\g -> genreId ==?. just_ (primaryKey g))
album_ <- join_' (album chinookDb) (\a -> albumId ==?. just_ (primaryKey a))
artist_ <- join_ (artist chinookDb) (\a -> albumArtist album_ `references_` a)
pure ( artistName artist_, albumTitle album_, genreName genre_ )
WITH "cte0"("res0",
"res1",
"res2",
"res3") AS
(SELECT "cte0_0"."AlbumId" AS "res0",
"cte0_0"."GenreId" AS "res1",
COUNT(*) AS "res2",
MAX(COUNT(*)) OVER (PARTITION BY "cte0_0"."AlbumId") AS "res3"
FROM "Track" AS "cte0_0"
GROUP BY "cte0_0"."AlbumId",
"cte0_0"."GenreId")
SELECT "t3"."Name" AS "res0",
"t2"."Title" AS "res1",
"t1"."Name" AS "res2"
FROM "cte0" AS "t0"
INNER JOIN "Genre" AS "t1" ON ("t0"."res1") = ("t1"."GenreId")
INNER JOIN "Album" AS "t2" ON ("t0"."res0") = ("t2"."AlbumId")
INNER JOIN "Artist" AS "t3" ON ("t2"."ArtistId") = ("t3"."ArtistId")
WHERE ("t0"."res2") = ("t0"."res3");
To demonstrate that we can still use all beam's features, let's only return results for albums with tracks of more than one genre. We can do this by using a quantified comparison operator on the album id column, and a subquery to find all albums with more than one genre.
void $ runSelectReturningList $ selectWith $ do
albumAndRepresentativeGenres <-
selecting $
let albumGenreCnts = aggregate_ (\t -> ( group_ (trackAlbumId t)
, group_ (trackGenreId t)
, as_ @Int32 countAll_ )) $
all_ (track chinookDb)
in withWindow_ (\(albumId, _, _) -> frame_ (partitionBy_ albumId) noOrder_ noBounds_)
(\(albumId, genreId, trackCnt) albumWindow ->
(albumId, genreId, trackCnt, max_ trackCnt `over_` albumWindow)) $
albumGenreCnts
pure $ do
(albumId@(AlbumId albumIdColumn), genreId, _, _) <-
filter_' (\(_, _, trackCnt, maxTrackCntInAlbum) ->
just_ trackCnt ==?. maxTrackCntInAlbum) $
reuse albumAndRepresentativeGenres
genre_ <- join_' (genre chinookDb) (\g -> genreId ==?. just_ (primaryKey g))
album_ <- join_' (album chinookDb) (\a -> albumId ==?. just_ (primaryKey a))
artist_ <- join_ (artist chinookDb) (\a -> albumArtist album_ `references_` a)
-- Filter out all albums with tracks of only one genre
guard_' (albumIdColumn ==*.
anyOf_ (orderBy_ asc_ $ fmap (\(AlbumId albumIdRaw, _) -> albumIdRaw) $
filter_ (\(_, genreCntByAlbum) -> genreCntByAlbum >. 1) $
aggregate_ (\t -> let GenreId genreId = trackGenreId t
in ( group_ (trackAlbumId t)
, as_ @Int32 (countOver_ distinctInGroup_ genreId))) $
all_ (track chinookDb)))
pure ( artistName artist_, albumTitle album_, genreName genre_ )
WITH "cte0"("res0",
"res1",
"res2",
"res3") AS
(SELECT "cte0_0"."AlbumId" AS "res0",
"cte0_0"."GenreId" AS "res1",
COUNT(*) AS "res2",
MAX(COUNT(*)) OVER (PARTITION BY "cte0_0"."AlbumId") AS "res3"
FROM "Track" AS "cte0_0"
GROUP BY "cte0_0"."AlbumId",
"cte0_0"."GenreId")
SELECT "t3"."Name" AS "res0",
"t2"."Title" AS "res1",
"t1"."Name" AS "res2"
FROM "cte0" AS "t0"
INNER JOIN "Genre" AS "t1" ON ("t0"."res1") = ("t1"."GenreId")
INNER JOIN "Album" AS "t2" ON ("t0"."res0") = ("t2"."AlbumId")
INNER JOIN "Artist" AS "t3" ON ("t2"."ArtistId") = ("t3"."ArtistId")
WHERE (("t0"."res2") = ("t0"."res3"))
AND (("t0"."res0") = ANY
(SELECT "sub_t0"."AlbumId" AS "res0"
FROM "Track" AS "sub_t0"
GROUP BY "sub_t0"."AlbumId"
HAVING (COUNT(DISTINCT "sub_t0"."GenreId")) > (1)
ORDER BY "sub_t0"."AlbumId" ASC));
without CTEs,
do let albumGenreCnts = aggregate_ (\t -> ( group_ (trackAlbumId t)
, group_ (trackGenreId t)
, as_ @Int32 countAll_ )) $
all_ (track chinookDb)
albumAndRepresentativeGenres =
withWindow_ (\(albumId, _, _) -> frame_ (partitionBy_ albumId) noOrder_ noBounds_)
(\(albumId, genreId, trackCnt) albumWindow ->
(albumId, genreId, trackCnt, max_ trackCnt `over_` albumWindow)) $
albumGenreCnts
(albumId@(AlbumId albumIdColumn), genreId, _, _) <-
filter_' (\(_, _, trackCnt, maxTrackCntInAlbum) ->
just_ trackCnt ==?. maxTrackCntInAlbum) $
albumAndRepresentativeGenres
genre_ <- join_' (genre chinookDb) (\g -> genreId ==?. just_ (primaryKey g))
album_ <- join_' (album chinookDb) (\a -> albumId ==?. just_ (primaryKey a))
artist_ <- join_ (artist chinookDb) (\a -> albumArtist album_ `references_` a)
-- Filter out all albums with tracks of only one genre
guard_' (albumIdColumn ==*.
anyOf_ (orderBy_ asc_ $ fmap (\(AlbumId albumIdRaw, _) -> albumIdRaw) $
filter_ (\(_, genreCntByAlbum) -> genreCntByAlbum >. 1) $
aggregate_ (\t -> let GenreId genreId = trackGenreId t
in ( group_ (trackAlbumId t)
, as_ @Int32 (countOver_ distinctInGroup_ genreId))) $
all_ (track chinookDb)))
pure ( artistName artist_, albumTitle album_, genreName genre_ )
SELECT "t3"."Name" AS "res0",
"t2"."Title" AS "res1",
"t1"."Name" AS "res2"
FROM
(SELECT "t0"."AlbumId" AS "res0",
"t0"."GenreId" AS "res1",
COUNT(*) AS "res2",
MAX(COUNT(*)) OVER (PARTITION BY "t0"."AlbumId") AS "res3"
FROM "Track" AS "t0"
GROUP BY "t0"."AlbumId",
"t0"."GenreId") AS "t0"
INNER JOIN "Genre" AS "t1" ON ("t0"."res1") = ("t1"."GenreId")
INNER JOIN "Album" AS "t2" ON ("t0"."res0") = ("t2"."AlbumId")
INNER JOIN "Artist" AS "t3" ON ("t2"."ArtistId") = ("t3"."ArtistId")
WHERE (("t0"."res2") = ("t0"."res3"))
AND (("t0"."res0") = ANY
(SELECT "sub_t0"."AlbumId" AS "res0"
FROM "Track" AS "sub_t0"
GROUP BY "sub_t0"."AlbumId"
HAVING (COUNT(DISTINCT "sub_t0"."GenreId")) > (1)
ORDER BY "sub_t0"."AlbumId" ASC))
Reusing queries multiple times
Suppose we wanted to ask the question, "which albums have tracks under the same genre?". One easy way of doing this is to do multiple self-joins.
We can also use a common table expression.
Let's call albums A
and B
"genre-related" if A
and B
contain at least
one track under the same genre. Then, the queries above answer if A
and B
are "genre-related". A natural extension of the question above then is "Which
albums are genre-related to the same album?". For example, suppose A
has Jazz
and Rock tracks, B
has Rock and Country tracks, and C
has Country and Blues
tracks. Then, (A, B)
and (B, C)
will both be results of the query above, but
(A, C)
will not. We can output tuples like (A, C)
by self-joining on the
results of the same CTE.
Recursive queries
Okay, so the next natural extension is to extend this relation by relating any
two albums A
and D
where there exist B
and C
such that A
is related to
B
, B
is related to C
, and C
is related to D
. We can keep extending
this query forever, but the queries above require that we specify all lengths
we're interested in. Conceptually, we could express in Haskell an infinite query:
However, the query generator will loop when serializing this query, because database systems don't operate on laziness they way Haskell does!
To solve this, some RDBMS systems offer "recursive" queries 1. The canonical example of a recursive query is the Fibonacci sequence:
void $ runSelectReturningList $ selectWith $ do
rec fib <- selecting (pure (as_ @Int32 0, as_ @Int32 1) `union_`
(do (a, b) <- reuse fib
guard_ (b <. 1000)
pure (b, a + b)))
pure (reuse fib)
WITH RECURSIVE "cte0"("res0",
"res1") AS (
(SELECT 0 AS "res0",
1 AS "res1")
UNION
(SELECT "cte0_0"."res1" AS "res0",
("cte0_0"."res0") + ("cte0_0"."res1") AS "res1"
FROM "cte0" AS "cte0_0"
WHERE ("cte0_0"."res1") < (1000)))
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1"
FROM "cte0" AS "t0";
-
"Recursive" here is in quotes, because this is not true, general recursion, but rather iteration. Still the term "recursive" has stuck around, so Beam adopts the convention. ↩