Skip to main content

Column.parse

parsetypeformaton_problems

Group: Conversions

Documentation

Parses a text column into values. In the Database backends, the default formatting settings of the particular database are used. In the in-memory backend, the default parser options only parse values where the process is reversible (e.g., 0123 would not be converted to an integer as there is a leading 0). However, settings in the Data_Formatter can control this.

Arguments

  • type: The type to parse the column to. Defaults to Auto meaning that the type will be inferred from the data. In the Database backends, Auto is not supported, so a specific type must be selected.
  • format: The formatting settings to use when parsing the column. For Date, Time_Of_Day and Date_Time, a Java date time style can be used. For Boolean, it should be two values that represent true and false, separated by a |. Alternatively, a Data_Formatter can be passed to provide complete customisation of the formatting. If "" is provided, the default formatting settings of the backend will be used. "" is currently the only setting accepted by the Database backends.
  • on_problems: Specifies how to handle if a problem occurs, raising as a warning by default.

Examples

Parse dates in a column in the format yyyy-MM-dd (the default format).

      import Standard.Examples

example_contains = Examples.text_column_1.parse Date

Parse dates in a column in the format dd/MM/yyyy

      import Standard.Examples

example_contains = Examples.text_column_1.parse Date 'dd/MM/yyyy'

Parse a Yes/No column into a boolean column.

      import Standard.Examples

example_contains = Examples.text_column_1.parse Boolean 'Yes|No'

Errors

  • If the column is not a text column, an Invalid_Value_Type error is raised.
  • If some values in the column did not match the expected datatype format, an Invalid_Format problem is reported. The problematic cells are replaced with Nothing.

Remarks

Number Formats

If parsing a column to a number, by default, the parser will attempt to find the most appropriate format for the column. This is done by finding the format that parses the longest set without an issue from the first record.

It will try the following separators in British, German, French and Swiss order.

  • Thousand separators must be followed by groups of 3 numbers.
  • Scientific notation is only allowed on decimals and must be on a value between -10 and 10. The notation is an E followed by an integer and must be enabled on the Data_Formatter,

The following formats are supported:

  • Sign (+/-) followed by Number (e.g. +1,234.56)
  • Using brackets to indicate a negative number (e.g. (1,234.56))
  • Currency symbols (if not in Auto mode) can be placed before or after the sign and number.
  • If using brackets, the currency symbol must be placed after the opening bracket.