Skip to main content

Table.parse

parsecolumnstypeformaterror_on_missing_columnson_problems

Group: Conversions

Documentation

Parses columns within a Table into a more specific value type. By default, it looks at all Text columns and attempts to deduce the type (columns with other types are not affected).

Arguments

  • columns: The columns to parse. If not specified, all text columns will be parsed.
  • type: The type to parse the columns to. Defaults to Auto meaning that the type will be inferred from the data.
  • format: The formatting settings to use when parsing the columns. For Date, Time_Of_Day and Date_Time, a Java date time style can be used. For Boolean, it should be two values that represent true and false, separated by a |. Alternatively, a Data_Formatter can be passed to provide complete customisation of the formatting. If Nothing is provided, the default formatting settings of the backend will be used. Nothing is currently the only setting accepted by the Database backends.
  • error_on_missing_columns: Specifies if a missing input column should result in an error regardless of the on_problems settings. Defaults to True.
  • on_problems: Specifies how to handle if a problem occurs, raising as a warning by default.

Returns

  • A new table with the parsed columns replacing the original text ones.

Examples

Parse the last columns containing Yes/No values as booleans.

      table = Table.from_rows ["foo","bar","buzz","bizz"] [["25", "John","abc","Yes"]]
output = table.parse columns=[-1] type=Value_Type.Boolean format="Yes|No"

Returns a Table

foobarbuzzbizz
25Johnabctrue

Parse dates in a column in the format yyyy-MM-dd (the default format).

      table = Table.from_rows ["Name","Location","Active","Date"] [["John", "Massachusetts", True, "2021-01-01"]]
output = table.parse ["Date"] ..Date

Returns a Table

NameLocationActiveDate
JohnMassachusettstrue2021-01-01

Parse dates in a column in the format dd/MM/yyyy.

      table = Table.from_rows ["Name","Location","Active","Date"] [["John", "Massachusetts", True, "21/12/2021"]]
output = table.parse ["Date"] ..Date format="dd/MM/yyyy"

Returns a Table

NameLocationActiveDate
JohnMassachusettstrue2021-12-21

Parse all columns inferring their types, using , as the decimal point for numbers.

      table = Table.from_rows ["Name","Location","Active","Value"] [["John", "Massachusetts", "True", "1.234,56"]]
output = table.parse ["Value"] ..Float

Returns a Table

NameLocationActiveValue
JohnMassachusettsTrue1234.56

Errors

  • If a column in columns is not in the input table, a Missing_Input_Columns is raised as an error, unless error_on_missing_columns is set to False, in which case the problem is reported according to the on_problems setting.
  • If a column selected for parsing is not a text column, an Invalid_Value_Type error is raised.
  • If no columns have been selected for parsing, a No_Input_Columns_Selected error is raised.
  • If some values in a column did not match the expected datatype format, an Invalid_Format problem is reported. The problematic cells are replaced with Nothing.

Remarks

Auto type detection

In the in-memory backend, if the type is `Auto, the process will attempt to work out the type of the column.

The default parser only parse values where the process is reversible (e.g., 0123 would not be converted to an integer as there is a leading 0). However, this can be overriden by providing a Data_Formatter.

In Database Support

In the Database backends, the default formatting settings of the particular database are used. Custom formats are not currently supported.

The target type must be specified explicitly, the Auto option is not currently support there.

Default Number Parsing

If the type is set to a number, the default parser will attempt to find the most appropriate format for the column. This is done by finding the format that parses the longest set without an issue from the first record.

It will try the following separators in British, German, French and Swiss order. Thousand separators must be followed by groups of 3 numbers.

The following formats are supported:

  • Sign (+/-) followed by Number (e.g. +1,234.56)
  • Using brackets to indicate a negative number (e.g. (1,234.56))
  • Currency symbols (if not in Auto mode) can be placed before or after the sign and number.
  • If using brackets, the currency symbol must be placed after the opening bracket.

Scientific notation is only allowed on decimals and must be on a value between -10 and 10. The notation is an E followed by an integer and must be enabled on a Data_Formatter.