Aggregates
You can use the aggregate_ function to group your result set and compute
aggregates within the group. You can think of aggregate_ as a souped up
version of Haskell's groupBy.
You use aggregate_ by specifying an underlying query to run and a function
that produces an aggregation projection. An aggregation projection is either a
value of type QAgg syntax s a, a value of type QGroupExpr syntax s a, or a
tuple of such values. Any QGenExpr that uses an aggregate function is
automatically assigned the QAgg syntax s a type. Any QGenExpr that contains
the group_ combinator is given the type QGroupExpr.
During query generation, the expressions of type QGroupExpr are added to the
GROUP BY clause, and expressions of type QAgg are treated as aggregation to
be computed.
The result of the aggregate_ lifts all the QAggs and QGroupExprs to
'regular' value-level QExprs, so the result of aggregate_ can be used in
expressions as usual.
Simple aggregate usage
Suppose we wanted to count the number of genres in our database.
aggregate_ (\_ -> as_ @Int32 countAll_) (all_ (genre chinookDb))
SELECT COUNT(*) AS "res0"
FROM "Genre" AS "t0"
SELECT COUNT(*) AS "res0"
FROM "Genre" AS "t0";
-- With values: []
Adding a GROUP BY clause
Above, SQL used the default grouping, which puts all rows in one group. We can
also specify columns and expressions to group by. For example, if we wanted to
count the number of tracks for each genre, we can use the group_ function to
group by the genre.
aggregate_ (\(genre, track) -> (group_ genre, as_ @Int32 $ count_ (trackId track))) $ do
g <- all_ (genre chinookDb)
t <- genreTracks g
pure (g, t)
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT("t1"."TrackId") AS "res2"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."GenreId")
GROUP BY "t0"."GenreId",
"t0"."Name"
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT("t1"."TrackId") AS "res2"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
WHEN (("t1"."GenreId") IS NULL)
AND (("t0"."GenreId") IS NULL) THEN ?
WHEN (("t1"."GenreId") IS NULL)
OR (("t0"."GenreId") IS NULL) THEN ?
ELSE ("t1"."GenreId")=("t0"."GenreId")
END
GROUP BY "t0"."GenreId",
"t0"."Name";
-- With values: [SQLInteger 1,SQLInteger 0]
Tip
count_ can return any Integral type. Adding the explicit as_ @Int32 above
prevents an ambiguous type error.
SQL compatibility
Above, we demonstrated the use of count_ and countAll_ which map to the
appropriate SQL aggregates. Beam supports all of the other standard SQL92
aggregates.
In general, SQL aggregates are named similarly in beam and SQL. As usual, the
aggregate function in beam is suffixed by an underscore. For example, sum_
corresponds to the SQL aggregate SUM.
SQL also allows you to specify set quantifiers for each aggregate. Beam supports
these as well. By convention, versions of aggregates that take in an optional
set quantifier are suffixed by Over. For example SUM(DISTINCT x) can be
written sumOver_ distinctInGroup_ x. The universally quantified version of
each aggregate is obtained by using the allInGroup_ quantifier. Thus, sum_ ==
sumOver_ allInGroup_. Because ALL is the default set quantifier, beam does
not typically generate it in queries. If, for some reason, you would like beam
to be explicit about it, you can use the allInGroupExplicitly_ quantifier.
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (fromMaybe_ 0 (trackMilliseconds track))) `div_` 1000)) $ do
g <- all_ (genre chinookDb)
t <- genreTracks g
pure (g, t)
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL COALESCE("t1"."Milliseconds", 0)), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."GenreId")
GROUP BY "t0"."GenreId",
"t0"."Name"
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL COALESCE("t1"."Milliseconds", ?)), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
WHEN (("t1"."GenreId") IS NULL)
AND (("t0"."GenreId") IS NULL) THEN ?
WHEN (("t1"."GenreId") IS NULL)
OR (("t0"."GenreId") IS NULL) THEN ?
ELSE ("t1"."GenreId")=("t0"."GenreId")
END
GROUP BY "t0"."GenreId",
"t0"."Name";
-- With values: [SQLInteger 0,SQLInteger 0,SQLInteger 1000,SQLInteger 1,SQLInteger 0]
Tip
Most Beam aggregates (count_ and countAll_ being an exception) return a
Maybe value, because aggregating over no rows in SQL returns a NULL
value. Use fromMaybe_ or coalesce_ to supply a default value in this case.
The beam-core library supports the standard SQL aggregation functions.
Individual backends are likely to support the full range of aggregates available
on that backend (if not, please send a bug report).
| SQL Aggregate | Relevant standard | Unquantified beam function | Quantified beam function |
|---|---|---|---|
| SUM | SQL92 | sum_ |
sumOver_ |
| MIN | SQL92 | min_ |
minOver_ |
| MAX | SQL92 | max_ |
maxOver_ |
| AVG | SQL92 | avg_ |
avgOver_ |
| COUNT(x) | SQL92 | count_ |
countOver_ |
| COUNT(*) | SQL92 | countAll_ |
N/A |
| EVERY(x) | SQL99 | every_ |
everyOver_ |
| ANY(x)/SOME(x) | SQL99 | any_, some_ |
anyOver_, someOver_ |
The HAVING clause
SQL allows users to specify a HAVING condition to filter results based on the
computed result of an aggregate. Beam fully supports HAVING clauses, but does
not use any special syntax. Simply use filter_ or guard_ as usual, and beam
will add a HAVING clause if it forms legal SQL. Otherwise, beam will create a
subselect and add a WHERE clause. Either way, this is transparent to the user.
-- Only return results for genres whose total track length is over 5 minutes
filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000)) >= (300000)
SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?))>=(?);
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 0,SQLInteger 1000,SQLInteger 300000]
Beam will also handle the filter_ correctly in the presence of more
complicated queries. For example, we can now join our aggregate on genres back
over tracks.
-- Only return results for genres whose total track length is over 5 minutes
filter_ (\(genre, track, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
do (genre, priceCnt, trackLength) <-
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
track <- genreTracks genre
pure (genre, track, priceCnt, trackLength)
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t1"."TrackId" AS "res2",
"t1"."Name" AS "res3",
"t1"."AlbumId" AS "res4",
"t1"."MediaTypeId" AS "res5",
"t1"."GenreId" AS "res6",
"t1"."Composer" AS "res7",
"t1"."Milliseconds" AS "res8",
"t1"."Bytes" AS "res9",
"t1"."UnitPrice" AS "res10",
"t0"."res2" AS "res11",
"t0"."res3" AS "res12"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name") AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."res0")
WHERE ("t0"."res3") >= (300000)
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t1"."TrackId" AS "res2",
"t1"."Name" AS "res3",
"t1"."AlbumId" AS "res4",
"t1"."MediaTypeId" AS "res5",
"t1"."GenreId" AS "res6",
"t1"."Composer" AS "res7",
"t1"."Milliseconds" AS "res8",
"t1"."Bytes" AS "res9",
"t1"."UnitPrice" AS "res10",
"t0"."res2" AS "res11",
"t0"."res3" AS "res12"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name") AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
WHEN (("t1"."GenreId") IS NULL)
AND (("t0"."res0") IS NULL) THEN ?
WHEN (("t1"."GenreId") IS NULL)
OR (("t0"."res0") IS NULL) THEN ?
ELSE ("t1"."GenreId")=("t0"."res0")
END
WHERE ("t0"."res3")>=(?);
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 1,SQLInteger 0,SQLInteger 300000]
The position of filter_ changes the code generated. Above, the filter_
produced a WHERE clause on the outermost SELECT. If instead, we put the
filter_ clause right outside the aggregate_, beam will produce a HAVING clause instead.
-- Only return results for genres whose total track length is over 5 minutes
do (genre, priceCnt, trackLength) <-
filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
track <- genreTracks genre
pure (genre, track, priceCnt, trackLength)
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t1"."TrackId" AS "res2",
"t1"."Name" AS "res3",
"t1"."AlbumId" AS "res4",
"t1"."MediaTypeId" AS "res5",
"t1"."GenreId" AS "res6",
"t1"."Composer" AS "res7",
"t1"."Milliseconds" AS "res8",
"t1"."Bytes" AS "res9",
"t1"."UnitPrice" AS "res10",
"t0"."res2" AS "res11",
"t0"."res3" AS "res12"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000)) >= (300000)) AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."res0")
SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t1"."TrackId" AS "res2",
"t1"."Name" AS "res3",
"t1"."AlbumId" AS "res4",
"t1"."MediaTypeId" AS "res5",
"t1"."GenreId" AS "res6",
"t1"."Composer" AS "res7",
"t1"."Milliseconds" AS "res8",
"t1"."Bytes" AS "res9",
"t1"."UnitPrice" AS "res10",
"t0"."res2" AS "res11",
"t0"."res3" AS "res12"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?))>=(?)) AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
WHEN (("t1"."GenreId") IS NULL)
AND (("t0"."res0") IS NULL) THEN ?
WHEN (("t1"."GenreId") IS NULL)
OR (("t0"."res0") IS NULL) THEN ?
ELSE ("t1"."GenreId")=("t0"."res0")
END;
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 0,SQLInteger 1000,SQLInteger 300000,SQLInteger 1,SQLInteger 0]
Due to the monadic structure, putting the filtered aggregate as the second clause in the JOIN causes the HAVING to be floated out, because the compiler can't prove that the conditional expression only depends on the results of the aggregate.
-- Only return results for genres whose total track length is over 5 minutes
do track_ <- all_ (track chinookDb)
(genre, priceCnt, trackLength) <-
filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
guard_ (trackGenreId track_ ==. just_ (pk genre))
pure (genre, track_, priceCnt, trackLength)
SELECT "t1"."res0" AS "res0",
"t1"."res1" AS "res1",
"t0"."TrackId" AS "res2",
"t0"."Name" AS "res3",
"t0"."AlbumId" AS "res4",
"t0"."MediaTypeId" AS "res5",
"t0"."GenreId" AS "res6",
"t0"."Composer" AS "res7",
"t0"."Milliseconds" AS "res8",
"t0"."Bytes" AS "res9",
"t0"."UnitPrice" AS "res10",
"t1"."res2" AS "res11",
"t1"."res3" AS "res12"
FROM "Track" AS "t0"
CROSS JOIN
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name") AS "t1"
WHERE (("t1"."res3") >= (300000))
AND (("t0"."GenreId") IS NOT DISTINCT
FROM ("t1"."res0"))
SELECT "t1"."res0" AS "res0",
"t1"."res1" AS "res1",
"t0"."TrackId" AS "res2",
"t0"."Name" AS "res3",
"t0"."AlbumId" AS "res4",
"t0"."MediaTypeId" AS "res5",
"t0"."GenreId" AS "res6",
"t0"."Composer" AS "res7",
"t0"."Milliseconds" AS "res8",
"t0"."Bytes" AS "res9",
"t0"."UnitPrice" AS "res10",
"t1"."res2" AS "res11",
"t1"."res3" AS "res12"
FROM "Track" AS "t0"
INNER JOIN
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name") AS "t1"
WHERE (("t1"."res3")>=(?))
AND (CASE
WHEN (("t0"."GenreId") IS NULL)
AND (("t1"."res0") IS NULL) THEN ?
WHEN (("t0"."GenreId") IS NULL)
OR (("t1"."res0") IS NULL) THEN ?
ELSE ("t0"."GenreId")=("t1"."res0")
END);
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 300000,SQLInteger 1,SQLInteger 0]
You can prove to the compiler that the filter_ should generate a having by
using the subselect_ combinator.
-- Only return results for genres whose total track length is over 5 minutes
do track_ <- all_ (track chinookDb)
(genre, priceCnt, trackLength) <-
subselect_ $
filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
aggregate_ (\(genre, track) ->
( group_ genre
, as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
, fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
guard_ (trackGenreId track_ ==. just_ (pk genre))
pure (genre, track_, priceCnt, trackLength)
SELECT "t1"."res0" AS "res0",
"t1"."res1" AS "res1",
"t0"."TrackId" AS "res2",
"t0"."Name" AS "res3",
"t0"."AlbumId" AS "res4",
"t0"."MediaTypeId" AS "res5",
"t0"."GenreId" AS "res6",
"t0"."Composer" AS "res7",
"t0"."Milliseconds" AS "res8",
"t0"."Bytes" AS "res9",
"t0"."UnitPrice" AS "res10",
"t1"."res2" AS "res11",
"t1"."res3" AS "res12"
FROM "Track" AS "t0"
CROSS JOIN
(SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t0"."res2" AS "res2",
"t0"."res3" AS "res3"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000)) >= (300000)) AS "t0") AS "t1"
WHERE ("t0"."GenreId") IS NOT DISTINCT
FROM ("t1"."res0")
SELECT "t1"."res0" AS "res0",
"t1"."res1" AS "res1",
"t0"."TrackId" AS "res2",
"t0"."Name" AS "res3",
"t0"."AlbumId" AS "res4",
"t0"."MediaTypeId" AS "res5",
"t0"."GenreId" AS "res6",
"t0"."Composer" AS "res7",
"t0"."Milliseconds" AS "res8",
"t0"."Bytes" AS "res9",
"t0"."UnitPrice" AS "res10",
"t1"."res2" AS "res11",
"t1"."res3" AS "res12"
FROM "Track" AS "t0"
INNER JOIN
(SELECT "t0"."res0" AS "res0",
"t0"."res1" AS "res1",
"t0"."res2" AS "res2",
"t0"."res3" AS "res3"
FROM
(SELECT "t0"."GenreId" AS "res0",
"t0"."Name" AS "res1",
COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
(COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
"t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?))>=(?)) AS "t0") AS "t1"
WHERE CASE
WHEN (("t0"."GenreId") IS NULL)
AND (("t1"."res0") IS NULL) THEN ?
WHEN (("t0"."GenreId") IS NULL)
OR (("t1"."res0") IS NULL) THEN ?
ELSE ("t0"."GenreId")=("t1"."res0")
END;
-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 0,SQLInteger 1000,SQLInteger 300000,SQLInteger 1,SQLInteger 0]