England's PHE Undercounts COVID Cases Due to Import to Excel Bug

Usage of Excel as a database, old XLS format and lack of error handling leads to temporary undercounting of COVID cases.

Healthcare

Industry

United Kingdom

Location

Year

2019

Mistake Type

Data Entry

Resolution

1 Week

Background

Public Health England (PHE) was the executive agency of the Department of Health and Social Care in the United Kingdom. Its responsibilities were improving and protecting the UK's health and wellbeing, including advice on effective public health policies and interventions. PHE was also to respond to health emergencies, monitor the spread of diseases and issue guidance to ministers and local authorities.

PHE was called upon during the COVID crisis to provide data about the spread of the virus and the number of cases found in tests. The testing itself was conducted by commercial firms paid to analyze swab tests of the public, to discover who has the virus. Then the data was sent to PHE in CVF file format and combined together into one single spreadsheet.

These spreadsheets containing daily data were then uploaded to another system where they were accessed by the NHS Test and Trace team, government dashboards, etc. The resulting information was then used to devise policy and provide information to the government and the public.

Incident

The atmosphere of panic and haste surrounding the initial spread of COVID was unparalleled to any other event since the beginning of the century. It is thus well understood that the developers of the system to collect and analyze the test results were under a lot of pressure to build it quickly, within a matter of days, and to keep changing it to reflect the changing circumstances. In this scenario the use of human-readable and easy to debug CSV files is a sound choice.

The use of Excel for data handling was probably a more dubious decision. Given more time probably a more robust vessel could have been chosen, but again, the speed of development and the easily observable data probably weighted the decision towards using spreadsheets. There was also a legacy system available that could be modified to fit purpose, and that fact probably tipped the scales towards using Excel.

Where the real problem occurred is that the file format used was XLS dating back to the last century. Current Excel supports the far superior XLSX format - since 2007. Unfortunately for PHE, the XLS format only supports up to around 65,000 rows of data. Combined with the fact that each test was represented with multiple rows of data, the file format would max out at holding data for about 1,400 cases. That, combined with lack of error handling in the code importing data, and the lack of other procedures for validating the data, lead to the total calculated number of infections topping out at around 11,000. The sources do not specify exactly what the relation between these two numbers is, but for that to be correct, multiple Excel files had to be used to represent the cases for one day.

Discovery

Once the daily cases reached 11,000 records for the additional cases were being lost, and the number of cases reported 'stalled'. It took about a week for that to be realized. After the mistake was discovered, an additional 16,000 cases were added to the daily totals. While awaiting a replacement system which would not rely on XLS, PHE started to break down the test result data into smaller batches. That, of course, results in a larger number of Excel templates, but with less records per template. That would ensure that the limit of 1,400 cases per template is not reached. Judging by the fact that we did not see repeated publications on the topic, the solution must have worked until a more robust system was developed, or the health emergency passed altogether.

Timeline of events

1

2019

Legacy systems for monitoring infectious disease spread exists that relies on XLS files.

2

Jan 31, 2020

The first COVID case is registered in the UK.

3

early 2020

The existing system for monitoring infectious diseases is modified to track COVID cases. Individual Excel spreadsheets with case data have less than 65,000 rows / 140,000 cases

4

Sep 25, 2020

First day when the daily case volume exceeds 11,000. Any cases above that number are no longer counted.

5

Oct 2, 2020

The error is realized and alternative partitioning of the input data starts being the norm, thus avoiding the limitation of the XLS file format.

Impact

Since COVID was the most important political issue during the beginning of the epidemic, and huge debates about lockdowns and related policies were waged, having incorrect data being used when making decisions was not welcomed. The incident reached national attention where Bris Johnson, the prime minister at the time, had to make announcements. There were accusations that pivotal decisions on lockdowns were taken based on incorrect information.

This came during a time when PHE was already unpopular and there were calls made for it to be abolished. Since the incident PHE has been replaced by the UK Health Security Agency and the Office for Health Improvement and Disparities. Making such a big mistake on the most important health issue of the century certainly did not help PHE's case.

The total political, societal and financial impact of this error is hard to estimate.

References

Spreadsheet mishaps in the same country

London Olympics Oversold Tickets Due to a Spreadsheet Mishap

Lack of spreadsheet input validation and incorrect input lead to over-selling of ...

Mouchel Blames £4.3m of £8.6m Writedown on a Spreadsheet Bug

A spreadsheet error is blamed for incorrect calculation of pension fund deficit, ...

Similar spreadsheet disasters due to data entry

Babylonian Educator Makes a Data Entry Mistake in an Answer Key Table

In what constitutes the known oldest spreadsheet bluder, a Babylonian Educator e ...

Crypto.com Refunds the Amount of $10.5 Million Instead of Just $100

Excel data entry error and lack of validation rules leads to couple splurging on ...

London Olympics Oversold Tickets Due to a Spreadsheet Mishap

Lack of spreadsheet input validation and incorrect input lead to over-selling of ...