Reading Excel files
Introduction
This post shows how to use Enso to read data from an Excel file. Enso is designed and built to make it easier to catalog, process, blend and analyse both structured and unstructured data.
The dataset used in this walkthrough is available on the Enso Community in the Enso 101 Learning Path here
The goal of this post is to cover reading Excel data in a variety of different scenarios.
Supported Excel Formats
Enso currently supports the following Excel file formats:
- Excel Workbook 2007+ (.xlsx)
- Excel Macro Enabled Workbooks (.xlsm)
- Excel 97-2004 Workbooks (.xls)
Excel Binary files (.xlsb) and Excel 5/95 Workbooks (.xls) are not supported.
Opening an Excel File
The simplest way to bring data into Enso is to drag a file onto the graph editor canvas. The component referencing the link to the file will be added to the graph:
Let's have a look at the code the node created:
Data.read '/Users/ensouser/Downloads/School-Fine-Report (FY10-19).xlsx'
.
The first part, Data.read
, is the method that allows reading different data sources.
It will attempt to deduce the format automatically based on extension. It is
also possible to choose the extension from the drop down on the right.
'/Users/ensouser/Downloads/School-Fine-Report (FY10-19).xlsx'
represents the path for the
file we dragged into the project. We can change it by choosing another file from
the file browser or by editing a text field on a component.
Now, take a quick look at the component itself:
To the left of the component is a menu with three icons. The first icon controls whether output
nodes are "live" or disabled. This post will ignore this functionality. The
second icon allows for entering the edit
mode so you can edit the whole component
as a text. The third icon opens the visualisation of the data. In Enso, components
are evaluated as you add or edit them, and you can use the visualisation to see the results as
you work.
To the component’s left, the variable name is shown (node1
in this case),
where the result of the component is stored; you can use this name to refer to this
component anywhere later in the graph.
Finally, within the component, you can see the function used to create it, the
required argument, and placeholders for the two optional parameters of the read
method (format
and on_problems
). If you hover your mouse over these (or the "School-Fine-Report (FY10-19).xlsx"), you will see a dropdown arrow at the bottom appear. If you
click here, it will allow you to choose a value.
Reading Data from Excel Files
Unlike other file formats such as CSV, Excel Workbooks are containers that have one or more Sheets, and zero or more Named Ranges.
The default format that Enso chooses when opening an Excel file is Excel Workbook. It is possible to change this to Excel Sheet
or Excel Range
if you are reading a single worksheet or named range, but it is recommended that you leave the default and use the methods below to read the data as it is more efficient.
If we open the visualization, we will see the list of Sheets contained in the Excel Wookbook.
Read One or More Sheets
To open one or more sheets, simply double click on their names in the visualization.
Read Named Range
Power users of Excel often create Named Ranges in Excel. A named range represents a collection of cells in Excel. For example, our Excel Workbook has a range called 'SummaryData' that is defined as =Summary!$A$5:$E$15
. This means on the Summary Sheet, the cells that start at A5 at the upper left and go to E15 at the lower right.
Enso can read Named Ranges, in addition to Sheets.
Below, we add a read
component to the Data.read
, and when we click on the query
parameter, our Named Range shows at the bottom of the list.
If we had multiple named ranges, we can add multiple read
components to access all of the data.
Read Multiple Sheets with the Same Format and Union Them Together
Sometimes, users will find data with the same schema in multiple worksheets, broken out by month, quarter or year. Enso simplifies this process using a component called read_many
which reads multiple sheets and unions the data into a single, logical table.
Wrapping Up
In this post, we worked through reading an Excel file, and reading data from Worksheets and Named Ranges. To learn more, please see our Getting Started and Enso 101 Learning Paths on the Enso Community.