Table.parse
Group: Conversions
Documentation
Parses columns within a Table
into a more specific value type. By default, it looks at all Text
columns and attempts to deduce the type (columns with other types are not affected).
Arguments
columns
: The columns to parse. If not specified, all text columns will be parsed.type
: The type to parse the columns to. Defaults toAuto
meaning that the type will be inferred from the data.format
: The formatting settings to use when parsing the columns. ForDate
,Time_Of_Day
andDate_Time
, a Java date time style can be used. ForBoolean
, it should be two values that represent true and false, separated by a|
. Alternatively, aData_Formatter
can be passed to provide complete customisation of the formatting. IfNothing
is provided, the default formatting settings of the backend will be used.Nothing
is currently the only setting accepted by the Database backends.error_on_missing_columns
: Specifies if a missing input column should result in an error regardless of theon_problems
settings. Defaults toTrue
.on_problems
: Specifies how to handle if a problem occurs, raising as a warning by default.
Returns
- A new table with the parsed columns replacing the original text ones.
Examples
Parse the last columns containing Yes/No values as booleans.
table = Table.from_rows ["foo","bar","buzz","bizz"] [["25", "John","abc","Yes"]]
output = table.parse columns=[-1] type=Value_Type.Boolean format="Yes|No"
Returns a Table
foo | bar | buzz | bizz |
---|---|---|---|
25 | John | abc | true |
Parse dates in a column in the format yyyy-MM-dd
(the default format).
table = Table.from_rows ["Name","Location","Active","Date"] [["John", "Massachusetts", True, "2021-01-01"]]
output = table.parse ["Date"] ..Date
Returns a Table
Name | Location | Active | Date |
---|---|---|---|
John | Massachusetts | true | 2021-01-01 |
Parse dates in a column in the format dd/MM/yyyy
.
table = Table.from_rows ["Name","Location","Active","Date"] [["John", "Massachusetts", True, "21/12/2021"]]
output = table.parse ["Date"] ..Date format="dd/MM/yyyy"
Returns a Table
Name | Location | Active | Date |
---|---|---|---|
John | Massachusetts | true | 2021-12-21 |
Parse all columns inferring their types, using ,
as the decimal point for numbers.
table = Table.from_rows ["Name","Location","Active","Value"] [["John", "Massachusetts", "True", "1.234,56"]]
output = table.parse ["Value"] ..Float
Returns a Table
Name | Location | Active | Value |
---|---|---|---|
John | Massachusetts | True | 1234.56 |
Errors
- If a column in
columns
is not in the input table, aMissing_Input_Columns
is raised as an error, unlesserror_on_missing_columns
is set toFalse
, in which case the problem is reported according to theon_problems
setting. - If a column selected for parsing is not a text column, an
Invalid_Value_Type
error is raised. - If no columns have been selected for parsing,
a
No_Input_Columns_Selected
error is raised. - If some values in a column did not match the expected datatype
format, an
Invalid_Format
problem is reported. The problematic cells are replaced withNothing
.
Remarks
Auto type detection
In the in-memory backend, if the type
is `Auto, the process will
attempt to work out the type of the column.
The default parser only parse values where the process is reversible
(e.g., 0123 would not be converted to an integer as there is a leading
0). However, this can be overriden by providing a Data_Formatter
.
In Database Support
In the Database backends, the default formatting settings of the particular database are used. Custom formats are not currently supported.
The target type must be specified explicitly, the Auto
option is not
currently support there.
Default Number Parsing
If the type is set to a number, the default parser will attempt to find the most appropriate format for the column. This is done by finding the format that parses the longest set without an issue from the first record.
It will try the following separators in British, German, French and Swiss order. Thousand separators must be followed by groups of 3 numbers.
The following formats are supported:
- Sign (+/-) followed by Number (e.g. +1,234.56)
- Using brackets to indicate a negative number (e.g. (1,234.56))
- Currency symbols (if not in Auto mode) can be placed before or after the sign and number.
- If using brackets, the currency symbol must be placed after the opening bracket.
Scientific notation is only allowed on decimals and must be on a value
between -10 and 10. The notation is an E
followed by an integer and
must be enabled on a Data_Formatter
.