Spreadsheet Design Philosophy

Overview

Most spreadsheets are a list which is added up. This requires little further thought. Some use large volumes of data and require some planning. Depending on your business environment, you may have several workbooks covering different aspects of your business or you may be attempting to pull numerous data blocks together from disparate elements of the business.

In all of these scenarios, you need to consider where your data comes from, how clean it is, how much you can rely on its integrity, how will you maintain it, to what purpose is the outcome to be put, to whom is the final report going, what conclusions you expect to come from your work, is the report an end in itself or part of a larger picture, any risks and consequences of errors and what mitigation is in place.

Some sheets evolve over time and others are used by several people. All these factors potentially affect their efficacy and the integrity of the spreadsheet.

Determine the Role of The Spreadsheet

One of the first things you need to do is consider the scope of your work and Identify the results you are attempting to get; if you are to find the total cost of a project, you should ensure that you understand what all the costs are and where the data for that comes from.

Is a spreadsheet the optimum solution or should you consider other reporting options? If the report your building is to be part of a larger report or the results from your report will form part of a larger report, you should ensure that you are reporting on what is required, that the data is in a format and place that is accessible, as required.

The Audience

Identify your audience. What do they need to see and what are they expecting to see. Your manager, a colleague and a third party may all expect to see different things even if the answer is the same in each instance.

A third party for example may be privy to a gross value but not to the details behind that value. If your data includes tenant names for example, you may not show your bank the raw data.

Equally, your manager or a colleague may want to zoom in on a calculation and the data source and you should build a tool that enables them to jump between data sets and report information.

All these factors determine how your results are presented.

Notice that some people respond better to and comprehend more thoroughly data that is presented in particular ways. Some people prefer to see groups of numbers; some like their groups of numbers to have several colours to differentiate elements; others prefer a graph or other diagram to depict the results; still others prefer free text as a description of the results with less or no reference to numerical results; some prefer to see percentages rather than actuals; and some may prefer to see both with one type being in the form of a graph. As many reports are sent to numerous diverse people, it is usual to have a mixture of text, data arrays, diagrams and graphs.

Preambles & Instructions

Preamble

For more involved reports such as dashboards, it can be important to add an explanation or an overview of what the viewer can expect. It’s sometimes useful to offer advice on how to move around a spreadsheet.

This can especially be true when the spreadsheet has been designed to not look like a spreadsheet such as when the tabs are hidden, the gridlines have been removed along with the headings (row & column numbers and letters) and the formula bar and access to the sheets (tabs) is through hyperlinks in a menu format.

Where a report is succinct because maybe space is tight, it can be useful to add notes on what is being represented. A dashboard frequently uses graphs which appear as images and what they represent may not be clear. In these sort of occasions, it may be useful to add notes by way of a preamble to clarify the report.

Instructions

Complex designs or workbooks with more than one designer or editor will benefit from adding instructions. These can be anything from a list of how to complete a task to where the primary data is and what a complex formula is intended to do.

These notes can be added as a separate tab (worksheet) or for simpler instructions or notes you can use the Comments element that come with every cell. To use Comments, right click on a cell and select New Comment. The cell will show a small purple triangle at the top left. The Comment becomes visible when the mouse rolls over the cell or you can edit the Comment to appear all the time. The Comments section allows a conversation between collaborators.

The other option is Notes. This can be found in the Review Ribbon. Generally, a Note starts with the user’s name. For brevity, it’s best to overwrite this. The Note becomes visible when the mouse rolls over the cell.

You should always document any scripts that you write. This relates to any VBA that you add but you can create add-ins and these should also be documented in detail.

For more complex equations and especially where they encompass multiple sheets, it can be useful to make notes to facilitate reverse engineering or even simple editing at a future date.

Data Integrity

Mining Primary Data

One of the first things to do on building a spreadsheet is to identify what data you need, where it is stored and how to access it in a useful format.

Most databases offer a download or reporting function that includes a format that is compatible with spreadsheets. These include CSV, XLS and XLSX. Incompatible versions include XLSB and CSV types where the data is delimited by length or other criteria instead of a comma.

Note that in Sage, there are two icons in reporting that purport to be Excel compatible. The one on the left produces a report that will have extra blank columns and can include merged arrays. The merged sections can be cleared by selecting the grey triangle at the intersection between the rows and columns and clicking on the Merge & Centre icon on the Home ribbon. Do this before deleting the extra columns.

It should be noted that you will need your data to arrive on your sheet in exactly the same way and place for each iteration of upload (or copy) and that having to unmerge and delete columns or headers should not be part of your methodology. For these reasons, always choose the right hand Excel button if you are going to manipulate the downloaded data.

Testing Quality

Once the data has been transferred to the workbook, you should ensure that it matches the data you believe you requested. Confirm that the totals for any column or row come back to the same totals as the primary data. In accounting, this is called a reconciliation. Basically, when you’re sense checking, you’re looking for outliers and investigating their veracity.

Add filters to headers and sense check that all dates or other parameters appear to have been met.

Check that each row adds up to what you might expect and preferably that it matches back to the primary data. Do the same thing for columns of data. Check that there are the correct number of columns and also rows.

You can check parameters aren’t breached. So you think that you imported all the data for 2021. Run an IF formula. If greater than 31/12/21 and if less than 1/1/23. To do this in one algorithm, use If And. Notice that other values may have parameters such that any text should not start with a space and you can ask Excel if the first character is a space or simply use the Trim function.

Never change data in your imported data set. Report any suspected error to the appropriate party. If the issue will have a significant impact on your dataset, re-import the dataset after the correction has been made.

If a report is regularly used or updated, it’s useful to set a version control system in place. One method is to ask that when a cell is populated that Excel adds a time stamp to the sheet: +If(A1=””,””,Now())

Notice that the file’s metadata has a date and time stamp. This should be checked when the file isn’t open or it will tell you today and now! Nevertheless, this is not a great test as the file could have been opened and saved recently but the import could still be from last time.

You can count rows: Counta(A:A) and check that that is more, less or the same as a previous version.

You should make a backup before you import and after you’re ready to publish your report. It may be worthwhile to update your timestamp at this point. To do so, go to the cell where the formula is =Now() and press F9, then save.

Cleaning Primary Data

This is sometimes known as formatting your data. Your data should meet certain criteria:

  1. it should be readily accessible
  2. it should be clean (i.e. in the correct format with no extraneous characters)
  3. it should be correct
  4. it should be up to date or at least correct for the time period being reported
  5. it should be complete

If you have issues with getting a total, the data in your sheet may be dirty. There are various tools in Excel that can help you to clean data. The Trim function will remove any unneeded spaces, so ‘Joe  Smith’ becomes ‘Joe Smith’. If you need to convert an email address to a name, the Left, Right, Len and Find functions can help (see the Managing Text section).

Sometimes the problem with an array of numbers is that they have been formatted as Text in the primary data. Selecting all the cells and clicking on the yellow box to the left offers the option to convert text to numbers.

Sometimes dates are in a different format than you want; notice that this isn’t usually important when it only affects the primary data but it’s important to report in a format that will be recognisable and not misinterpreted by the viewer (see the section on Date Management).

Protection

To protect data from being overwritten, you can ask Excel to protect some cells.

You can protect just cell A1 in the import area. This means that you cannot paste into any of that sheet if the paste area includes A1, which it generally should. You can temporarily remove the protection, paste your data and then protect the cell again.

All of the formula in your calculation area should not be changed except when you are adding new calculations. It is best to protect all of the calculation area and only remove protection if you are editing.

The report is similar to the calculation sheet and should be fully protected except when something must be changed.

To protect a cell or an array such as your calculation area, go to the Alignment section of the Home Ribbon and select the small square with an arrow in it at the bottom right (red circle). That brings up this box. Select the Protection tab on the right.

By default, everything is protected if you activate protection. You can select cells that don’t need protecting and untick the Locked box.

To activate protection, go to the Review menu and select Protect Sheet icon. To remove protection and edit cells, go to the Review menu and select the Unprotect Sheet icon.

Notice that you can add a password for further security. The password should be known to more than one person and be readily available with the correct authorisation.

To unprotect the sheet, go to the Review menu and select the Unprotect Sheet icon from the Ribbon. Notice that if you have password protected the sheet that you must enter the password to complete the protection removal. When you set protection again, the password is still active unless you delete it from the password box.

Standardisation

Identify a standard. A common standard within an organisation facilitates collaboration, aids comprehension and saves development time. The standard should include, among other things, consistent conventions on use of cell formatting. For example, you could use Cell Styles to easily apply an organisational standard formatting scheme to your workbook.

For larger exercises, it’s useful to separate out the three aspects into different tabs (worksheets):

  1. Primary or raw data
  2. The calculations area
  3. The report

Formatting

The layout should be similar for every report. This is not to say that every report must look the same but that Headers, Footers, Logos, results or end points should be distinct and similar for any report that might utilise these elements.

Logos

Check periodically that any logo on a report is the most up to date. Notice that logos should only appear on reports that will be sent to third parties.

Graphs

The data points should all be similar colours. So for example credit items might be red, debits might be green and other items in blue. Whatever colour scheme is chosen, it should be ubiquitous across all graphs. The viewer should not have to study the legend to understand the representation.

Formulas

It’s considered good practice to keep equations simple. This can sometimes mean using helper columns (an extra column with a simple formula who’s results are used in a subsequent calculation. The Month formula converts a date to a month number and this can make picking up data from a particular period simpler.

Nevertheless, some equations are going to be convoluted. Make notes and add them to the Instructions repository. With more complex equations where data is being pulled together from different sources, it is important to ensure that static data is not used. So, if this month is March, you shouldn’t have ‘March’ as a criterion. Instead chose Month(Today()) to get the value for this period.

Users & Writers

If there is more than one designer of a workbook or the workbook is evolving and new people work on the design, it’s important that these basics are adhered to both initially with the original build and later when the workbook is being edited. The original writer should understand these guidelines and have applied them. This means that the next editor understands the fundamentals of the build. Equally, the current editor may not be the next editor and they too should apply these principles. This is also important when the build is a collaboration. See the Notation section for further clarification.

Future Proofing

Many sheets are designed to complete the same task for years. This can be periodic monthly reporting with identical raw data every period.

It’s important to ensure that extra columns aren’t added or that columns aren’t dropped in the Data sheet. Equally, some formulas are setup to extract data from limited arrays. If the data array is a different size each period, the formula array selection should be bigger than the largest expected. Notice though, that in an expanding company, the data array may be significantly bigger five years after the original build. Sometimes, it’s better to select a full column or array of columns such as B:B or D:H. Bear in mind though, that this utilises resources and for larger spreadsheets, this can make processing slower.

When the primary data repository changes (e.g. you update your primary software) and the shape of the data has changed, it’s important to have good instructions on the build if you don’t want to spend considerable time reverse engineering an aspect of the sheet or having to rebuild from scratch.

Some people will notice that you can change a value in the primary data or the calculation area and treat this as reviewing different scenarios. Often they have overwritten a formula with data or removed primary clean data and the sheet for that period (or forever) is damaged. There are things that can be done to protect data and calculations.

Always keep your primary data in separate tabs.

Always keep calculations in a separate tab.

Right click on a tab and select Hide to remove key areas from those who would edit without permission.

Calculation areas can be formatted to be protected. A password can be added to further protect the cells.

You can add a password to any worksheet or to the workbook.

You can protect data areas but this can impede certain types of calculation.

Saving a workbook as a pdf and issuing that as the report secures raw data, calculations and reporting integrity.