Table.parse
Group: Conversions
Documentation
Parses columns within a Table into a more specific value type.
By default, it looks at all Text columns and attempts to deduce the type
(columns with other types are not affected).
Arguments
columns: The columns to parse. If not specified, all text columns will be parsed.type: The type to parse the columns to. Defaults toAutomeaning that the type will be inferred from the data.format: The formatting settings to use when parsing the columns. ForDate,Time_Of_DayandDate_Time, a Java date time style can be used. ForBoolean, it should be two values that represent true and false, separated by a|. Alternatively, aData_Formattercan be passed to provide complete customisation of the formatting. IfNothingis provided, the default formatting settings of the backend will be used.Nothingis currently the only setting accepted by the Database backends.error_on_missing_columns: IfTrue, an error is raised if a specified column is not found. IfFalse, the missing column is ignored.on_problems: Specifies how to handle if a problem occurs, raising as a warning by default.
Returns
- A new table with the parsed columns replacing the original text ones.
Examples
Parse the last columns containing Yes/No values as booleans.
table = Table.from_rows ["foo","bar","buzz","bizz"] [["25", "John","abc","Yes"]]
output = table.parse columns=[-1] type=Value_Type.Boolean format="Yes|No"
Returns a Table
| foo | bar | buzz | bizz |
|---|---|---|---|
| 25 | John | abc | true |
Parse dates in a column in the format yyyy-MM-dd (the default format).
table = Table.from_rows ["Name","Location","Active","Date"] [["John", "Massachusetts", True, "2021-01-01"]]
output = table.parse ["Date"] ..Date
Returns a Table
| Name | Location | Active | Date |
|---|---|---|---|
| John | Massachusetts | true | 2021-01-01 |
Parse dates in a column in the format dd/MM/yyyy.
table = Table.from_rows ["Name","Location","Active","Date"] [["John", "Massachusetts", True, "21/12/2021"]]
output = table.parse ["Date"] ..Date format="dd/MM/yyyy"
Returns a Table
| Name | Location | Active | Date |
|---|---|---|---|
| John | Massachusetts | true | 2021-12-21 |
Parse all columns inferring their types, using , as the decimal point for numbers.
table = Table.from_rows ["Name","Location","Active","Value"] [["John", "Massachusetts", "True", "1.234,56"]]
output = table.parse ["Value"] ..Float
Returns a Table
| Name | Location | Active | Value |
|---|---|---|---|
| John | Massachusetts | True | 1234.56 |
Errors
- If a column in
columnsis not in the input table, aMissing_Input_Columnsis raised as an error, unlesserror_on_missing_columnsis set toFalse, in which case the problem is reported according to theon_problemssetting. - If a column selected for parsing is not a text column, an
Invalid_Value_Typeerror is raised. - If no columns have been selected for parsing, a
No_Input_Columns_Selectederror is raised. - If some values in a column did not match the expected datatype format,
an
Invalid_Formatproblem is reported. The problematic cells are replaced withNothing.
Remarks
Auto type detection
In the in-memory backend, if the type is Auto, the process will attempt to work out the type of the column. The default parser only parse values where the process is reversible (e.g., 0123 would not be converted to an integer as there is a leading 0). However, this can be overridden by providing a Data_Formatter`.
In Database Support
In the Database backends, the default formatting settings of the
particular database are used. Custom formats are not currently supported.
The target type must be specified explicitly, the Auto option is not
currently support there.
Default Number Parsing
If the type is set to a number, the default parser will attempt to find the most appropriate format for the column. This is done by finding the format that parses the longest set without an issue from the first record. It will try the following separators in British, German, French and Swiss order. Thousand separators must be followed by groups of 3 numbers. The following formats are supported:
- Sign (+/-) followed by Number (e.g. +1,234.56) - Using brackets to indicate a negative number (e.g. (1,234.56)) - Currency symbols (if not in Auto mode) can be placed before or after the sign and number.
- If using brackets, the currency symbol must be placed after the opening
bracket.
Scientific notation is only allowed on decimals and must be on a value
between -10 and 10. The notation is an
Efollowed by an integer and must be enabled on aData_Formatter.