Skip to main content

Table.distinct

distinctcolumns case_sensitivityon_problems

Group: Selections
Aliases: deduplicate, unique

Documentation

Returns the distinct set of rows within the specified columns from the input table. When multiple rows have the same values within the specified columns, the first row of each such set is returned if possible, but in database backends any row from each set may be returned (for example if the row ordering is unspecified). For the in-memory table, the unique rows will be in the order they occurred in the input (this is not guaranteed for database operations).

Returns - A new table with the distinct rows.

Arguments

  • columns: The columns of the table to use for distinguishing the rows. Defaults to all columns.
  • case_sensitivity: Specifies if the text values should be compared case sensitively.
  • error_on_missing_columns: Specifies if a missing input column should result in an error regardless of the on_problems settings. Defaults to True.
  • on_problems: Specifies how to handle if a problem occurs, raising as a warning by default.

Examples

Select distinct by name

      table = Table.from_rows ["Name","Location"] [["John", "Massachusetts"],["Paul","London"]]
distinct = table.distinct ["Name"]

Returns a Table

Name
John
Paul

Select distinct by name and location

      table = Table.from_rows ["Name","Location"] [["John", "Massachusetts"],["Paul","London"]]
distinct = table.distinct ["Name", "Location"]

Returns a Table

NameLocation
JohnMassachusetts
PaulLondon

Errors

  • If there are no columns in the output table, a No_Output_Columns is raised as an error regardless of the problem behavior, because it is not possible to create a table without any columns.
  • If a column in columns is not in the input table, a Missing_Input_Columns is raised as an error.
  • If no valid columns are selected, a No_Input_Columns_Selected, is reported as a dataflow error regardless of setting.
  • If floating points values are present in the distinct columns, a Floating_Point_Equality is reported according to the on_problems setting.