Aggregates

You can use the aggregate_ function to group your result set and compute aggregates within the group. You can think of aggregate_ as a souped up version of Haskell's groupBy.

You use aggregate_ by specifying an underlying query to run and a function that produces an aggregation projection. An aggregation projection is either a value of type QAgg syntax s a, a value of type QGroupExpr syntax s a, or a tuple of such values. Any QGenExpr that uses an aggregate function is automatically assigned the QAgg syntax s a type. Any QGenExpr that contains the group_ combinator is given the type QGroupExpr.

During query generation, the expressions of type QGroupExpr are added to the GROUP BY clause, and expressions of type QAgg are treated as aggregation to be computed.

The result of the aggregate_ lifts all the QAggs and QGroupExprs to 'regular' value-level QExprs, so the result of aggregate_ can be used in expressions as usual.

Simple aggregate usage

Suppose we wanted to count the number of genres in our database.

aggregate_ (\_ -> as_ @Int32 countAll_) (all_ (genre chinookDb))

SELECT COUNT(*) AS "res0"
FROM "Genre" AS "t0"

SELECT COUNT(*) AS "res0"
FROM "Genre" AS "t0";

-- With values: []

Adding a GROUP BY clause

Above, SQL used the default grouping, which puts all rows in one group. We can also specify columns and expressions to group by. For example, if we wanted to count the number of tracks for each genre, we can use the group_ function to group by the genre.

aggregate_ (\(genre, track) -> (group_ genre, as_ @Int32 $ count_ (trackId track))) $ do
  g <- all_ (genre chinookDb)
  t <- genreTracks g
  pure (g, t)

SELECT "t0"."GenreId" AS "res0",
       "t0"."Name" AS "res1",
       COUNT("t1"."TrackId") AS "res2"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."GenreId")
GROUP BY "t0"."GenreId",
         "t0"."Name"

SELECT "t0"."GenreId" AS "res0",
       "t0"."Name" AS "res1",
       COUNT("t1"."TrackId") AS "res2"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
                                 WHEN (("t1"."GenreId") IS NULL)
                                      AND (("t0"."GenreId") IS NULL) THEN ?
                                 WHEN (("t1"."GenreId") IS NULL)
                                      OR (("t0"."GenreId") IS NULL) THEN ?
                                 ELSE ("t1"."GenreId")=("t0"."GenreId")
                             END
GROUP BY "t0"."GenreId",
         "t0"."Name";

-- With values: [SQLInteger 1,SQLInteger 0]

Tip

count_ can return any Integral type. Adding the explicit as_ @Int32 above prevents an ambiguous type error.

SQL compatibility

Above, we demonstrated the use of count_ and countAll_ which map to the appropriate SQL aggregates. Beam supports all of the other standard SQL92 aggregates.

In general, SQL aggregates are named similarly in beam and SQL. As usual, the aggregate function in beam is suffixed by an underscore. For example, sum_ corresponds to the SQL aggregate SUM.

SQL also allows you to specify set quantifiers for each aggregate. Beam supports these as well. By convention, versions of aggregates that take in an optional set quantifier are suffixed by Over. For example SUM(DISTINCT x) can be written sumOver_ distinctInGroup_ x. The universally quantified version of each aggregate is obtained by using the allInGroup_ quantifier. Thus, sum_ == sumOver_ allInGroup_. Because ALL is the default set quantifier, beam does not typically generate it in queries. If, for some reason, you would like beam to be explicit about it, you can use the allInGroupExplicitly_ quantifier.

aggregate_ (\(genre, track) ->
              ( group_ genre
              , as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
              , fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (fromMaybe_ 0 (trackMilliseconds track))) `div_` 1000)) $ do
  g <- all_ (genre chinookDb)
  t <- genreTracks g
  pure (g, t)

SELECT "t0"."GenreId" AS "res0",
       "t0"."Name" AS "res1",
       COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
       (COALESCE(SUM(ALL COALESCE("t1"."Milliseconds", 0)), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."GenreId")
GROUP BY "t0"."GenreId",
         "t0"."Name"

SELECT "t0"."GenreId" AS "res0",
       "t0"."Name" AS "res1",
       COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
       (COALESCE(SUM(ALL COALESCE("t1"."Milliseconds", ?)), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
                                 WHEN (("t1"."GenreId") IS NULL)
                                      AND (("t0"."GenreId") IS NULL) THEN ?
                                 WHEN (("t1"."GenreId") IS NULL)
                                      OR (("t0"."GenreId") IS NULL) THEN ?
                                 ELSE ("t1"."GenreId")=("t0"."GenreId")
                             END
GROUP BY "t0"."GenreId",
         "t0"."Name";

-- With values: [SQLInteger 0,SQLInteger 0,SQLInteger 1000,SQLInteger 1,SQLInteger 0]

Tip

Most Beam aggregates (count_ and countAll_ being an exception) return a Maybe value, because aggregating over no rows in SQL returns a NULL value. Use fromMaybe_ or coalesce_ to supply a default value in this case.

The beam-core library supports the standard SQL aggregation functions. Individual backends are likely to support the full range of aggregates available on that backend (if not, please send a bug report).

SQL Aggregate	Relevant standard	Unquantified beam function	Quantified beam function
SUM	SQL92	`sum_`	`sumOver_`
MIN	SQL92	`min_`	`minOver_`
MAX	SQL92	`max_`	`maxOver_`
AVG	SQL92	`avg_`	`avgOver_`
COUNT(x)	SQL92	`count_`	`countOver_`
COUNT(*)	SQL92	`countAll_`	N/A
EVERY(x)	SQL99	`every_`	`everyOver_`
ANY(x)/SOME(x)	SQL99	`any_`, `some_`	`anyOver_`, `someOver_`

The `HAVING` clause

SQL allows users to specify a HAVING condition to filter results based on the computed result of an aggregate. Beam fully supports HAVING clauses, but does not use any special syntax. Simply use filter_ or guard_ as usual, and beam will add a HAVING clause if it forms legal SQL. Otherwise, beam will create a subselect and add a WHERE clause. Either way, this is transparent to the user.

-- Only return results for genres whose total track length is over 5 minutes
filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
aggregate_ (\(genre, track) ->
              ( group_ genre
              , as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
              , fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
           ((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))

SELECT "t0"."GenreId" AS "res0",
       "t0"."Name" AS "res1",
       COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
       (COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
FROM "Genre" AS "t0"
CROSS JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
         "t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000)) >= (300000)

SELECT "t0"."GenreId" AS "res0",
       "t0"."Name" AS "res1",
       COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
       (COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
FROM "Genre" AS "t0"
INNER JOIN "Track" AS "t1"
GROUP BY "t0"."GenreId",
         "t0"."Name"
HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?))>=(?);

-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 0,SQLInteger 1000,SQLInteger 300000]

Beam will also handle the filter_ correctly in the presence of more complicated queries. For example, we can now join our aggregate on genres back over tracks.

-- Only return results for genres whose total track length is over 5 minutes
filter_ (\(genre, track, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
do (genre, priceCnt, trackLength) <-
            aggregate_ (\(genre, track) ->
                          ( group_ genre
                          , as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
                          , fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
            ((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
   track <- genreTracks genre
   pure (genre, track, priceCnt, trackLength)

SELECT "t0"."res0" AS "res0",
       "t0"."res1" AS "res1",
       "t1"."TrackId" AS "res2",
       "t1"."Name" AS "res3",
       "t1"."AlbumId" AS "res4",
       "t1"."MediaTypeId" AS "res5",
       "t1"."GenreId" AS "res6",
       "t1"."Composer" AS "res7",
       "t1"."Milliseconds" AS "res8",
       "t1"."Bytes" AS "res9",
       "t1"."UnitPrice" AS "res10",
       "t0"."res2" AS "res11",
       "t0"."res3" AS "res12"
FROM
  (SELECT "t0"."GenreId" AS "res0",
          "t0"."Name" AS "res1",
          COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
          (COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
   FROM "Genre" AS "t0"
   CROSS JOIN "Track" AS "t1"
   GROUP BY "t0"."GenreId",
            "t0"."Name") AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."res0")
WHERE ("t0"."res3") >= (300000)

SELECT "t0"."res0" AS "res0",
       "t0"."res1" AS "res1",
       "t1"."TrackId" AS "res2",
       "t1"."Name" AS "res3",
       "t1"."AlbumId" AS "res4",
       "t1"."MediaTypeId" AS "res5",
       "t1"."GenreId" AS "res6",
       "t1"."Composer" AS "res7",
       "t1"."Milliseconds" AS "res8",
       "t1"."Bytes" AS "res9",
       "t1"."UnitPrice" AS "res10",
       "t0"."res2" AS "res11",
       "t0"."res3" AS "res12"
FROM
  (SELECT "t0"."GenreId" AS "res0",
          "t0"."Name" AS "res1",
          COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
          (COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
   FROM "Genre" AS "t0"
   INNER JOIN "Track" AS "t1"
   GROUP BY "t0"."GenreId",
            "t0"."Name") AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
                                 WHEN (("t1"."GenreId") IS NULL)
                                      AND (("t0"."res0") IS NULL) THEN ?
                                 WHEN (("t1"."GenreId") IS NULL)
                                      OR (("t0"."res0") IS NULL) THEN ?
                                 ELSE ("t1"."GenreId")=("t0"."res0")
                             END
WHERE ("t0"."res3")>=(?);

-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 1,SQLInteger 0,SQLInteger 300000]

The position of filter_ changes the code generated. Above, the filter_ produced a WHERE clause on the outermost SELECT. If instead, we put the filter_ clause right outside the aggregate_, beam will produce a HAVING clause instead.

-- Only return results for genres whose total track length is over 5 minutes
do (genre, priceCnt, trackLength) <-
            filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
            aggregate_ (\(genre, track) ->
                          ( group_ genre
                          , as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
                          , fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
            ((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
   track <- genreTracks genre
   pure (genre, track, priceCnt, trackLength)

SELECT "t0"."res0" AS "res0",
       "t0"."res1" AS "res1",
       "t1"."TrackId" AS "res2",
       "t1"."Name" AS "res3",
       "t1"."AlbumId" AS "res4",
       "t1"."MediaTypeId" AS "res5",
       "t1"."GenreId" AS "res6",
       "t1"."Composer" AS "res7",
       "t1"."Milliseconds" AS "res8",
       "t1"."Bytes" AS "res9",
       "t1"."UnitPrice" AS "res10",
       "t0"."res2" AS "res11",
       "t0"."res3" AS "res12"
FROM
  (SELECT "t0"."GenreId" AS "res0",
          "t0"."Name" AS "res1",
          COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
          (COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
   FROM "Genre" AS "t0"
   CROSS JOIN "Track" AS "t1"
   GROUP BY "t0"."GenreId",
            "t0"."Name"
   HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000)) >= (300000)) AS "t0"
LEFT JOIN "Track" AS "t1" ON ("t1"."GenreId") IS NOT DISTINCT
FROM ("t0"."res0")

SELECT "t0"."res0" AS "res0",
       "t0"."res1" AS "res1",
       "t1"."TrackId" AS "res2",
       "t1"."Name" AS "res3",
       "t1"."AlbumId" AS "res4",
       "t1"."MediaTypeId" AS "res5",
       "t1"."GenreId" AS "res6",
       "t1"."Composer" AS "res7",
       "t1"."Milliseconds" AS "res8",
       "t1"."Bytes" AS "res9",
       "t1"."UnitPrice" AS "res10",
       "t0"."res2" AS "res11",
       "t0"."res3" AS "res12"
FROM
  (SELECT "t0"."GenreId" AS "res0",
          "t0"."Name" AS "res1",
          COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
          (COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
   FROM "Genre" AS "t0"
   INNER JOIN "Track" AS "t1"
   GROUP BY "t0"."GenreId",
            "t0"."Name"
   HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?))>=(?)) AS "t0"
LEFT JOIN "Track" AS "t1" ON CASE
                                 WHEN (("t1"."GenreId") IS NULL)
                                      AND (("t0"."res0") IS NULL) THEN ?
                                 WHEN (("t1"."GenreId") IS NULL)
                                      OR (("t0"."res0") IS NULL) THEN ?
                                 ELSE ("t1"."GenreId")=("t0"."res0")
                             END;

-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 0,SQLInteger 1000,SQLInteger 300000,SQLInteger 1,SQLInteger 0]

Due to the monadic structure, putting the filtered aggregate as the second clause in the JOIN causes the HAVING to be floated out, because the compiler can't prove that the conditional expression only depends on the results of the aggregate.

-- Only return results for genres whose total track length is over 5 minutes
do track_ <- all_ (track chinookDb)
   (genre, priceCnt, trackLength) <-
            filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
            aggregate_ (\(genre, track) ->
                          ( group_ genre
                          , as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
                          , fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
            ((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
   guard_ (trackGenreId track_ ==. just_ (pk genre))
   pure (genre, track_, priceCnt, trackLength)

SELECT "t1"."res0" AS "res0",
       "t1"."res1" AS "res1",
       "t0"."TrackId" AS "res2",
       "t0"."Name" AS "res3",
       "t0"."AlbumId" AS "res4",
       "t0"."MediaTypeId" AS "res5",
       "t0"."GenreId" AS "res6",
       "t0"."Composer" AS "res7",
       "t0"."Milliseconds" AS "res8",
       "t0"."Bytes" AS "res9",
       "t0"."UnitPrice" AS "res10",
       "t1"."res2" AS "res11",
       "t1"."res3" AS "res12"
FROM "Track" AS "t0"
CROSS JOIN
  (SELECT "t0"."GenreId" AS "res0",
          "t0"."Name" AS "res1",
          COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
          (COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
   FROM "Genre" AS "t0"
   CROSS JOIN "Track" AS "t1"
   GROUP BY "t0"."GenreId",
            "t0"."Name") AS "t1"
WHERE (("t1"."res3") >= (300000))
  AND (("t0"."GenreId") IS NOT DISTINCT
       FROM ("t1"."res0"))

SELECT "t1"."res0" AS "res0",
       "t1"."res1" AS "res1",
       "t0"."TrackId" AS "res2",
       "t0"."Name" AS "res3",
       "t0"."AlbumId" AS "res4",
       "t0"."MediaTypeId" AS "res5",
       "t0"."GenreId" AS "res6",
       "t0"."Composer" AS "res7",
       "t0"."Milliseconds" AS "res8",
       "t0"."Bytes" AS "res9",
       "t0"."UnitPrice" AS "res10",
       "t1"."res2" AS "res11",
       "t1"."res3" AS "res12"
FROM "Track" AS "t0"
INNER JOIN
  (SELECT "t0"."GenreId" AS "res0",
          "t0"."Name" AS "res1",
          COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
          (COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
   FROM "Genre" AS "t0"
   INNER JOIN "Track" AS "t1"
   GROUP BY "t0"."GenreId",
            "t0"."Name") AS "t1"
WHERE (("t1"."res3")>=(?))
  AND (CASE
           WHEN (("t0"."GenreId") IS NULL)
                AND (("t1"."res0") IS NULL) THEN ?
           WHEN (("t0"."GenreId") IS NULL)
                OR (("t1"."res0") IS NULL) THEN ?
           ELSE ("t0"."GenreId")=("t1"."res0")
       END);

-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 300000,SQLInteger 1,SQLInteger 0]

You can prove to the compiler that the filter_ should generate a having by using the subselect_ combinator.

-- Only return results for genres whose total track length is over 5 minutes
do track_ <- all_ (track chinookDb)
   (genre, priceCnt, trackLength) <-
            subselect_ $
            filter_ (\(genre, distinctPriceCount, totalTrackLength) -> totalTrackLength >=. 300000) $
            aggregate_ (\(genre, track) ->
                          ( group_ genre
                          , as_ @Int32 $ countOver_ distinctInGroup_ (trackUnitPrice track)
                          , fromMaybe_ 0 (sumOver_ allInGroupExplicitly_ (trackMilliseconds track)) `div_` 1000 )) $
            ((,) <$> all_ (genre chinookDb) <*> all_ (track chinookDb))
   guard_ (trackGenreId track_ ==. just_ (pk genre))
   pure (genre, track_, priceCnt, trackLength)

SELECT "t1"."res0" AS "res0",
       "t1"."res1" AS "res1",
       "t0"."TrackId" AS "res2",
       "t0"."Name" AS "res3",
       "t0"."AlbumId" AS "res4",
       "t0"."MediaTypeId" AS "res5",
       "t0"."GenreId" AS "res6",
       "t0"."Composer" AS "res7",
       "t0"."Milliseconds" AS "res8",
       "t0"."Bytes" AS "res9",
       "t0"."UnitPrice" AS "res10",
       "t1"."res2" AS "res11",
       "t1"."res3" AS "res12"
FROM "Track" AS "t0"
CROSS JOIN
  (SELECT "t0"."res0" AS "res0",
          "t0"."res1" AS "res1",
          "t0"."res2" AS "res2",
          "t0"."res3" AS "res3"
   FROM
     (SELECT "t0"."GenreId" AS "res0",
             "t0"."Name" AS "res1",
             COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
             (COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000) AS "res3"
      FROM "Genre" AS "t0"
      CROSS JOIN "Track" AS "t1"
      GROUP BY "t0"."GenreId",
               "t0"."Name"
      HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), 0)) / (1000)) >= (300000)) AS "t0") AS "t1"
WHERE ("t0"."GenreId") IS NOT DISTINCT
  FROM ("t1"."res0")

SELECT "t1"."res0" AS "res0",
       "t1"."res1" AS "res1",
       "t0"."TrackId" AS "res2",
       "t0"."Name" AS "res3",
       "t0"."AlbumId" AS "res4",
       "t0"."MediaTypeId" AS "res5",
       "t0"."GenreId" AS "res6",
       "t0"."Composer" AS "res7",
       "t0"."Milliseconds" AS "res8",
       "t0"."Bytes" AS "res9",
       "t0"."UnitPrice" AS "res10",
       "t1"."res2" AS "res11",
       "t1"."res3" AS "res12"
FROM "Track" AS "t0"
INNER JOIN
  (SELECT "t0"."res0" AS "res0",
          "t0"."res1" AS "res1",
          "t0"."res2" AS "res2",
          "t0"."res3" AS "res3"
   FROM
     (SELECT "t0"."GenreId" AS "res0",
             "t0"."Name" AS "res1",
             COUNT(DISTINCT "t1"."UnitPrice") AS "res2",
             (COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?) AS "res3"
      FROM "Genre" AS "t0"
      INNER JOIN "Track" AS "t1"
      GROUP BY "t0"."GenreId",
               "t0"."Name"
      HAVING ((COALESCE(SUM(ALL "t1"."Milliseconds"), ?)) / (?))>=(?)) AS "t0") AS "t1"
WHERE CASE
          WHEN (("t0"."GenreId") IS NULL)
               AND (("t1"."res0") IS NULL) THEN ?
          WHEN (("t0"."GenreId") IS NULL)
               OR (("t1"."res0") IS NULL) THEN ?
          ELSE ("t0"."GenreId")=("t1"."res0")
      END;

-- With values: [SQLInteger 0,SQLInteger 1000,SQLInteger 0,SQLInteger 1000,SQLInteger 300000,SQLInteger 1,SQLInteger 0]