Aggregates
You can use the aggregate_
function to group your result set and compute
aggregates within the group. You can think of aggregate_
as a souped up
version of Haskell's groupBy
.
You use aggregate_
by specifying an underlying query to run and a function
that produces an aggregation projection. An aggregation projection is either a
value of type QAgg syntax s a
, a value of type QGroupExpr syntax s a
, or a
tuple of such values. Any QGenExpr
that uses an aggregate function is
automatically assigned the QAgg syntax s a
type. Any QGenExpr
that contains
the group_
combinator is given the type QGroupExpr
.
During query generation, the expressions of type QGroupExpr
are added to the
GROUP BY
clause, and expressions of type QAgg
are treated as aggregation to
be computed.
The result of the aggregate_
lifts all the QAgg
s and QGroupExpr
s to
'regular' value-level QExpr
s, so the result of aggregate_
can be used in
expressions as usual.
Simple aggregate usage
Suppose we wanted to count the number of genres in our database.
aggregate_ (\_ -> as_ @Int32 countAll_) (all_ (genre chinookDb))
SELECT COUNT(*) AS "res0"
FROM "Genre" AS "t0"
SELECT COUNT(*) AS "res0"
FROM "Genre" AS "t0";
-- With values: []
Adding a GROUP BY clause
Above, SQL used the default grouping, which puts all rows in one group. We can
also specify columns and expressions to group by. For example, if we wanted to
count the number of tracks for each genre, we can use the group_
function to
group by the genre.
aggregate_ (\(genre, track) -> (group_ genre, as_ @Int32 $ count_ (trackId track))) $ do
g <- all_ (genre chinookDb)
t <- genreTracks g
pure (g, t)
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT("t1"."TrackId") AS "res2"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."GenreId")
GROUP BY "t0"."GenreId",
"t0"."Name"
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT("t1"."TrackId") AS "res2"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
WHEN (("t1"."GenreId") IS NULL)
AND (("t0"."GenreId") IS NULL) THEN ?
WHEN (("t1"."GenreId") IS NULL)
OR (("t0"."GenreId") IS NULL) THEN ?
ELSE ("t1"."GenreId")=("t0"."GenreId")
END
GROUP BY "t0"."GenreId",
"t0"."Name";
-- With values: [SQLInteger 1,SQLInteger 0]
Tip
count_
can return any Integral
type. Adding the explicit as_ @Int32
above
prevents an ambiguous type error.
SQL compatibility
Above, we demonstrated the use of count_
and countAll_
which map to the
appropriate SQL aggregates. Beam supports all of the other standard SQL92
aggregates.
In general, SQL aggregates are named similarly in beam and SQL. As usual, the
aggregate function in beam is suffixed by an underscore. For example, sum_
corresponds to the SQL aggregate SUM
.
SQL also allows you to specify set quantifiers for each aggregate. Beam supports
these as well. By convention, versions of aggregates that take in an optional
set quantifier are suffixed by Over
. For example SUM(DISTINCT x)
can be
written sumOver_ distinctInGroup_ x
. The universally quantified version of
each aggregate is obtained by using the allInGroup_
quantifier. Thus, sum_ ==
sumOver_ allInGroup_
. Because ALL
is the default set quantifier, beam does
not typically generate it in queries. If, for some reason, you would like beam
to be explicit about it, you can use the allInGroupExplicitly_
quantifier.
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (fromMaybe_ 0 (trackMilliseconds track))) `div_` 1000)) $ do
g <- all_ (genre chinookDb)
t <- genreTracks g
pure (g, t)
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL COALESCE("t1"."Milliseconds", 0)), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."GenreId")
GROUP BY "t0"."GenreId",
"t0"."Name"
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL COALESCE("t1"."Milliseconds", ?)), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
WHEN (("t1"."GenreId") IS NULL)
AND (("t0"."GenreId") IS NULL) THEN ?
WHEN (("t1"."GenreId") IS NULL)
OR (("t0"."GenreId") IS NULL) THEN ?
ELSE ("t1"."GenreId")=("t0"."GenreId")
END
GROUP BY "t0"."GenreId",
"t0"."Name";
-- With values: [SQLInteger 0,SQLInteger 0,SQLInteger 1000,SQLInteger 1,SQLInteger 0]
Tip
Most Beam aggregates (count_
and countAll_
being an exception) return a
Maybe
value, because aggregating over no rows in SQL returns a NULL
value. Use fromMaybe_
or coalesce_
to supply a default value in this case.
The beam-core
library supports the standard SQL aggregation functions.
Individual backends are likely to support the full range of aggregates available
on that backend (if not, please send a bug report).
SQL Aggregate | Relevant standard | Unquantified beam function | Quantified beam function |
---|---|---|---|
SUM | SQL92 | sum_ |
sumOver_ |
MIN | SQL92 | min_ |
minOver_ |
MAX | SQL92 | max_ |
maxOver_ |
AVG | SQL92 | avg_ |
avgOver_ |
COUNT(x) | SQL92 | count_ |
countOver_ |
COUNT(*) | SQL92 | countAll_ |
N/A |
EVERY(x) | SQL99 | every_ |
everyOver_ |
ANY(x)/SOME(x) | SQL99 | any_ , some_ |
anyOver_ , someOver_ |
The HAVING
clause
SQL allows users to specify a HAVING
condition to filter results based on the
computed result of an aggregate. Beam fully supports HAVING
clauses, but does
not use any special syntax. Simply use filter_
or guard_
as usual, and beam
will add a HAVING
clause if it forms legal SQL. Otherwise, beam will create a
subselect and add a WHERE
clause. Either way, this is transparent to the user.
-- Only return results for genres whose total track length is over 5 minutes
filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000)) >= (300000)
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?))>=(?);
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 0,SQLInteger 1000,SQLInteger 300000]
Beam will also handle the filter_
correctly in the presence of more
complicated queries. For example, we can now join our aggregate on genres back
over tracks.
-- Only return results for genres whose total track length is over 5 minutes
filter_ (\(genre, track, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
do (genre, priceCnt, trackLength) <-
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
track <- genreTracks genre
pure (genre, track, priceCnt, trackLength)
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t1"."TrackId" AS "res2",
"t1"."Name" AS "res3",
"t1"."AlbumId" AS "res4",
"t1"."MediaTypeId" AS "res5",
"t1"."GenreId" AS "res6",
"t1"."Composer" AS "res7",
"t1"."Milliseconds" AS "res8",
"t1"."Bytes" AS "res9",
"t1"."UnitPrice" AS "res10",
"t0"."res2" AS "res11",
"t0"."res3" AS "res12"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name") AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."res0")
WHERE ("t0"."res3") >= (300000)
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t1"."TrackId" AS "res2",
"t1"."Name" AS "res3",
"t1"."AlbumId" AS "res4",
"t1"."MediaTypeId" AS "res5",
"t1"."GenreId" AS "res6",
"t1"."Composer" AS "res7",
"t1"."Milliseconds" AS "res8",
"t1"."Bytes" AS "res9",
"t1"."UnitPrice" AS "res10",
"t0"."res2" AS "res11",
"t0"."res3" AS "res12"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name") AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
WHEN (("t1"."GenreId") IS NULL)
AND (("t0"."res0") IS NULL) THEN ?
WHEN (("t1"."GenreId") IS NULL)
OR (("t0"."res0") IS NULL) THEN ?
ELSE ("t1"."GenreId")=("t0"."res0")
END
WHERE ("t0"."res3")>=(?);
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 1,SQLInteger 0,SQLInteger 300000]
The position of filter_
changes the code generated. Above, the filter_
produced a WHERE
clause on the outermost SELECT
. If instead, we put the
filter_
clause right outside the aggregate_
, beam will produce a HAVING
clause instead.
-- Only return results for genres whose total track length is over 5 minutes
do (genre, priceCnt, trackLength) <-
filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
track <- genreTracks genre
pure (genre, track, priceCnt, trackLength)
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t1"."TrackId" AS "res2",
"t1"."Name" AS "res3",
"t1"."AlbumId" AS "res4",
"t1"."MediaTypeId" AS "res5",
"t1"."GenreId" AS "res6",
"t1"."Composer" AS "res7",
"t1"."Milliseconds" AS "res8",
"t1"."Bytes" AS "res9",
"t1"."UnitPrice" AS "res10",
"t0"."res2" AS "res11",
"t0"."res3" AS "res12"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000)) >= (300000)) AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."res0")
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t1"."TrackId" AS "res2",
"t1"."Name" AS "res3",
"t1"."AlbumId" AS "res4",
"t1"."MediaTypeId" AS "res5",
"t1"."GenreId" AS "res6",
"t1"."Composer" AS "res7",
"t1"."Milliseconds" AS "res8",
"t1"."Bytes" AS "res9",
"t1"."UnitPrice" AS "res10",
"t0"."res2" AS "res11",
"t0"."res3" AS "res12"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?))>=(?)) AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
WHEN (("t1"."GenreId") IS NULL)
AND (("t0"."res0") IS NULL) THEN ?
WHEN (("t1"."GenreId") IS NULL)
OR (("t0"."res0") IS NULL) THEN ?
ELSE ("t1"."GenreId")=("t0"."res0")
END;
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 0,SQLInteger 1000,SQLInteger 300000,SQLInteger 1,SQLInteger 0]
Due to the monadic structure, putting the filtered aggregate as the second clause in the JOIN causes the HAVING to be floated out, because the compiler can't prove that the conditional expression only depends on the results of the aggregate.
-- Only return results for genres whose total track length is over 5 minutes
do track_ <- all_ (track chinookDb)
(genre, priceCnt, trackLength) <-
filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
guard_ (trackGenreId track_ ==. just_ (pk genre))
pure (genre, track_, priceCnt, trackLength)
SELECT "t1"."res0" AS "res0",
"t1"."res1" AS "res1",
"t0"."TrackId" AS "res2",
"t0"."Name" AS "res3",
"t0"."AlbumId" AS "res4",
"t0"."MediaTypeId" AS "res5",
"t0"."GenreId" AS "res6",
"t0"."Composer" AS "res7",
"t0"."Milliseconds" AS "res8",
"t0"."Bytes" AS "res9",
"t0"."UnitPrice" AS "res10",
"t1"."res2" AS "res11",
"t1"."res3" AS "res12"
FROM "Track" AS "t0"
CROSS JOIN
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name") AS "t1"
WHERE (("t1"."res3") >= (300000))
AND (("t0"."GenreId") IS NOT DISTINCT
FROM ("t1"."res0"))
SELECT "t1"."res0" AS "res0",
"t1"."res1" AS "res1",
"t0"."TrackId" AS "res2",
"t0"."Name" AS "res3",
"t0"."AlbumId" AS "res4",
"t0"."MediaTypeId" AS "res5",
"t0"."GenreId" AS "res6",
"t0"."Composer" AS "res7",
"t0"."Milliseconds" AS "res8",
"t0"."Bytes" AS "res9",
"t0"."UnitPrice" AS "res10",
"t1"."res2" AS "res11",
"t1"."res3" AS "res12"
FROM "Track" AS "t0"
INNER JOIN
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name") AS "t1"
WHERE (("t1"."res3")>=(?))
AND (CASE
WHEN (("t0"."GenreId") IS NULL)
AND (("t1"."res0") IS NULL) THEN ?
WHEN (("t0"."GenreId") IS NULL)
OR (("t1"."res0") IS NULL) THEN ?
ELSE ("t0"."GenreId")=("t1"."res0")
END);
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 300000,SQLInteger 1,SQLInteger 0]
You can prove to the compiler that the filter_
should generate a having by
using the subselect_
combinator.
-- Only return results for genres whose total track length is over 5 minutes
do track_ <- all_ (track chinookDb)
(genre, priceCnt, trackLength) <-
subselect_ $
filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
guard_ (trackGenreId track_ ==. just_ (pk genre))
pure (genre, track_, priceCnt, trackLength)
SELECT "t1"."res0" AS "res0",
"t1"."res1" AS "res1",
"t0"."TrackId" AS "res2",
"t0"."Name" AS "res3",
"t0"."AlbumId" AS "res4",
"t0"."MediaTypeId" AS "res5",
"t0"."GenreId" AS "res6",
"t0"."Composer" AS "res7",
"t0"."Milliseconds" AS "res8",
"t0"."Bytes" AS "res9",
"t0"."UnitPrice" AS "res10",
"t1"."res2" AS "res11",
"t1"."res3" AS "res12"
FROM "Track" AS "t0"
CROSS JOIN
(SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t0"."res2" AS "res2",
"t0"."res3" AS "res3"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000)) >= (300000)) AS "t0") AS "t1"
WHERE ("t0"."GenreId") IS NOT DISTINCT
FROM ("t1"."res0")
SELECT "t1"."res0" AS "res0",
"t1"."res1" AS "res1",
"t0"."TrackId" AS "res2",
"t0"."Name" AS "res3",
"t0"."AlbumId" AS "res4",
"t0"."MediaTypeId" AS "res5",
"t0"."GenreId" AS "res6",
"t0"."Composer" AS "res7",
"t0"."Milliseconds" AS "res8",
"t0"."Bytes" AS "res9",
"t0"."UnitPrice" AS "res10",
"t1"."res2" AS "res11",
"t1"."res3" AS "res12"
FROM "Track" AS "t0"
INNER JOIN
(SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t0"."res2" AS "res2",
"t0"."res3" AS "res3"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?))>=(?)) AS "t0") AS "t1"
WHERE CASE
WHEN (("t0"."GenreId") IS NULL)
AND (("t1"."res0") IS NULL) THEN ?
WHEN (("t0"."GenreId") IS NULL)
OR (("t1"."res0") IS NULL) THEN ?
ELSE ("t0"."GenreId")=("t1"."res0")
END;
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 0,SQLInteger 1000,SQLInteger 300000,SQLInteger 1,SQLInteger 0]