Skip to main content

Reading Excel files

Introduction

This post shows how to use Enso to read data from an Excel file. Enso is designed and built to make it easier to catalog, process, blend and analyse both structured and unstructured data.

The dataset used in this walkthrough is available on the Enso Community in the Enso 101 Learning Path here


The goal of this post is to cover reading Excel data in a variety of different scenarios.

Supported Excel Formats

Enso currently supports the following Excel file formats:

  • Excel Workbook 2007+ (.xlsx)
  • Excel Macro Enabled Workbooks (.xlsm)
  • Excel 97-2004 Workbooks (.xls)

Excel Binary files (.xlsb) and Excel 5/95 Workbooks (.xls) are not supported.

Opening an Excel File

The simplest way to bring data into Enso is to drag a file onto the graph editor canvas. The component referencing the link to the file will be added to the graph:

Import a file

Let's have a look at the code the node created: Data.read '/Users/ensouser/Downloads/School-Fine-Report (FY10-19).xlsx'.

The first part, Data.read, is the method that allows reading different data sources. It will attempt to deduce the format automatically based on extension. It is also possible to choose the extension from the drop down on the right.

'/Users/ensouser/Downloads/School-Fine-Report (FY10-19).xlsx' represents the path for the file we dragged into the project. We can change it by choosing another file from the file browser or by editing a text field on a component.

Now, take a quick look at the component itself:


Read Node


To the left of the component is a menu with three icons. The first icon controls whether output nodes are "live" or disabled. This post will ignore this functionality. The second icon allows for entering the edit mode so you can edit the whole component as a text. The third icon opens the visualisation of the data. In Enso, components are evaluated as you add or edit them, and you can use the visualisation to see the results as you work.


To the component’s left, the variable name is shown (node1 in this case), where the result of the component is stored; you can use this name to refer to this component anywhere later in the graph.


Read Node2

Finally, within the component, you can see the function used to create it, the required argument, and placeholders for the two optional parameters of the read method (format and on_problems). If you hover your mouse over these (or the "School-Fine-Report (FY10-19).xlsx"), you will see a dropdown arrow at the bottom appear. If you click here, it will allow you to choose a value.


Format Dropdown

Reading Data from Excel Files

Unlike other file formats such as CSV, Excel Workbooks are containers that have one or more Sheets, and zero or more Named Ranges.


The default format that Enso chooses when opening an Excel file is Excel Workbook. It is possible to change this to Excel Sheet or Excel Range if you are reading a single worksheet or named range, but it is recommended that you leave the default and use the methods below to read the data as it is more efficient.


If we open the visualization, we will see the list of Sheets contained in the Excel Wookbook.


Workbook Viz



Read One or More Sheets

To open one or more sheets, simply double click on their names in the visualization.


read sheets



Read Named Range

Power users of Excel often create Named Ranges in Excel. A named range represents a collection of cells in Excel. For example, our Excel Workbook has a range called 'SummaryData' that is defined as =Summary!$A$5:$E$15. This means on the Summary Sheet, the cells that start at A5 at the upper left and go to E15 at the lower right.


Enso can read Named Ranges, in addition to Sheets.


Below, we add a read component to the Data.read, and when we click on the query parameter, our Named Range shows at the bottom of the list.


If we had multiple named ranges, we can add multiple read components to access all of the data.


named range



Read Multiple Sheets with the Same Format and Union Them Together

Sometimes, users will find data with the same schema in multiple worksheets, broken out by month, quarter or year. Enso simplifies this process using a component called read_many which reads multiple sheets and unions the data into a single, logical table.


read many



Wrapping Up

In this post, we worked through reading an Excel file, and reading data from Worksheets and Named Ranges. To learn more, please see our Getting Started and Enso 101 Learning Paths on the Enso Community.