Skip to main content

Table.aggregate

aggregategroup_bycolumnserror_on_missing_columnson_problems

Group: Calculations
Aliases: group by, summarize

Documentation

Aggregates the rows in a table using group_by columns. The columns argument specifies which additional aggregations to perform and to return.

Arguments

  • group_by: A list of columns to group by. These will be included at the start of the resulting table. If no columns are specified a single row will be returned with the aggregate columns.
  • columns: The aggregate operations being performed specifying the aggregated table. Expressions can be used within the aggregate column to perform more complicated calculations.
  • error_on_missing_columns: Specifies if a missing columns in aggregates should result in an error regardless of the on_problems settings. Defaults to False, meaning that problematic aggregations will not be included in the result and a problem reported.
  • on_problems: Specifies how to handle problems if they occur, reporting them as warnings by default.

Examples

Count all the rows

      table = Table.from_rows ["Name","Location"] [["John", "Massachusetts"],["Paul","London"]]
grouped = table.aggregate columns=[Aggregate_Column.Count]

Returns a Table

Count
2

Group by the Key column, count the rows

      table = Table.from_rows ["Name","Location"] [["John", "Massachusetts"],["Paul","London"]]
grouped = table.aggregate ["Name"] [Aggregate_Column.Count]

Returns a Table

NameCount
John1
Paul1

Errors

  • If there are no columns in the output table, a No_Output_Columns is raised as an error regardless of the problem behavior, because it is not possible to create a table without any columns.
  • If a column index is out of range, a Missing_Input_Columns is reported according to the on_problems setting, unless error_on_missing_columns is set to True, in which case it is raised as an error. Problems resolving group_by columns are reported as dataflow errors regardless of these settings, as a missing grouping will completely change semantics of the query.
  • If a column selector is given as a Text and it does not match any columns in the input table nor is it a valid expression, an Invalid_Aggregate_Column problem is raised according to the on_problems settings (unless error_on_missing_columns is set to True in which case it will always be an error). Problems resolving group_by columns are reported as dataflow errors regardless of these settings, as a missing grouping will completely change semantics of the query.
  • If an aggregation fails, an Invalid_Aggregation dataflow error is raised.
  • Additionally, the following problems may be reported according to the on_problems setting:
    • If there are invalid column names in the output table, a Invalid_Column_Names.
    • If there are duplicate column names in the output table, a Duplicate_Output_Column_Names.
    • If grouping on or computing the Mode on a floating point number, a Floating_Point_Equality.
    • If when concatenating values there is an quoted delimited, an Unquoted_Delimiter
    • If there are more than 10 issues with a single column, an Additional_Warnings.

Returns

  • A new table with the group_by columns as well as any aggregate columns.