by Oliver Steele, April 25, 2022 Revised April 28, 12:10 AM Shanghai Time
<iframe width="776" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vT35m7QXRmF4XFc3rbCh1WP9PMXJd0yKeXkEpKXpVy_h_ufIq78pF5Fccp_9WHyHlX4nYeS5kFo5SKz/pubchart?oid=384688515&format=interactive"></iframe>
Introduction
As of April 25, I’ve been in the Shanghai lockdown for more than a month.
On March 22, I started tracking Shanghai’s COVID-19 statistics, as reported in the English-language papers. I did this as a way of keeping touch with what’s going on in the city outside this apartment. It was also because I wanted to see a different view of the data than what was being summarized in the papers.
The latest numbers and charts are in this spreadsheet. This accompanying document describes the values and graphs in the spreadsheet, the sources and methods used to collect the data, and some choices about how to present it.
I am particularly interested in what’s happening outside of the quarantine centers. I also wanted to see how many total transmissions were occurring, regardless of whether the newly-infected individual ended up with a symptomatic or “asymptomatic” case.
The major difference between the data that is summarized in the spreadsheet, and articles in the press, is that the spreadsheet summaries combine symptomatic and asymptomatic cases, while distinguishing cases detected in quarantine centers from cases detected during residential screening. My rationale for grouping the data this way is further discussed in the Presentation Choices section below.
I was interested in this way of slicing the data: because both asymptomatic and symptomatic individuals are contagious; because both symptomatic and asymptomatic cases detected in the community will trigger or extend a period of residential lockdown; and because Long Covid appears to affect many people who had relatively mild symptoms, and would be reported as asymptomatic in China.
Note: As of April 26, Shanghai Times started including a graph of “New infections found in screening of high-risk people”. (By “high-risk”, they mean “outside of quarantine”.) This goes halfway towards what I wanted. The missing piece is combining all infections, regardless of whether they are symptomatic.
Note that I do not have training as a virologist, epidemiologist, statistician, or infection disease expert; nor have any of these reviewed this document.
Sections
- Introduction
- Definitions
- Sources
- Presentation Choices
- Spreadsheet Tabs
- Summary Tab
- Data Tab
- Districts Tab
- Summary Graphs
- New Cases
- New Cases Detected Outside Quarantine
- New Cases (log)
- Trend Lines
- Notes
- Rationales for partitioning the data
- Data Quality
- How to Read the Sources
- “Locally Transmitted” versus “Screening of High-Risk People”
- District-Level Data
- Acknowledgements
Definitions
These definitions used to describe the data. Note that these definitions are specific to Shanghai in Spring 2022. For example, in other countries (and, I believe, during other times in China), a person could be considered a case if they tested positive on an antigen test. During the current outbreak, antigen results are not considered conclusive.
Some or all of these definitions may be familiar to anyone living in Shanghai.
Symptomatic cases, in the Shanghai reports, are cases that are confirmed positive on a nucleic acid test, and that also have symptoms. I believe that during the reporting period, a case is considered symptomatic only if if there are respiratory symptoms confirmed by a CT scan.
Asymptomatic cases are cases that are confirmed positive on a nucleic acid test, but do not have symptoms, using the criterion above. (The term does not indicate that there are no symptoms at all. For example, a patient could have a fever, intestinal discomfort, or other non-respiratory symptoms; or, be coughing or sneezing, but not show a lung abnormality on an X-Ray.) In some reporting, this category is referred to as asymptomatic infections, not “cases”.
A close contact is an individual who was in close and extended proximity to a symptomatic or asymptomatic case. The exact meaning of “close” in this definition has changed over time; these data simply pass through the numbers that are reported under this category. The definition has always, by my understanding, included a roommate or family member with the same address; it has never, to my knowledge, included other residents of the same building, even if they share a ventilation system.
The media reports distinguish between close contacts of symptomatic cases (who are taken into quarantine), and close contacts of asymptomatic cases (who are “placed under observation”). I am unclear as to whether people in this latter category are removed from their homes.
Central quarantine is a facility where close contacts who initially tested negative are moved from their residences, in order that, if they have been infected but did not yet test positive, they won’t infect members of the community. There is a policy for how many negative tests over what timeline a close contact needs in order to be released from central quarantine back to their residence. (There are separate quarantine facilities for asymptomatic cases, that don’t come into play in this data.)
Update: During some reporting phases, “close contacts” were not moved to any kind of quarantine facility, although the official reports still infections in this population as occurring in “central quarantine”. The text of this document and the column names of related data sets do not reflect this.
[There are separate central quarantine facilities for asymptomatic positive cases, and for close contacts who have not tested positive. The data in this spreadsheet only pertains to close contacts.]
The community, in this context, is the population that is neither in a COVID-19 hospital nor in any kind of quarantine.
Local cases are cases acquired through transmission within China. This distinguishes local transmission from imported cases: travelers who arrived from other countries, and tested positive at the airport or during their isolation period in a quarantine hotel. Press reports further divide local cases into local symptomatic cases, and local asymptomatic cases. My data only includes local cases.
Sources
Data is sourced from Shanghai Times (SHINE) and That’s Shanghai. Each of these publishes an article every day, that summarizes the test and some other data from the preceding day.
Shanghai Times is an English-language publication of the Shanghai United Media Group, a state media company of the People’s Republic of China.
That’s Shanghai is owned by JY International Cultural Communications Co., Ltd., a Chinese company.
Each day that I update the chart, I look for the latest update from the Shanghai Times and That’s Shanghai. Up through April 25, I updated the data in the Data tab manually, checking the two sources against each other. As of April 26, I enter the URLs into a Python script that updates the spreadsheet automatically, and I double-check the results. I have used a script to update the District tab since I started tracking it around April 20, and occasionally spot-check the results.
I occasionally spot-check these two sources against Chinese-language reports such as the Tencent News dashboard. I have occasionally found small discrepancies among sources, that I suspect are from typos or possibly from using different sources with different time-of-day cutoffs for the reporting periods. I have never found discrepancies in the that make a substantive difference.
Presentation Choices
In online articles, the headlines and summaries make a distinction between symptomatic cases and asymptomatic infections, but combine cases detected in quarantine with cases detected outside quarantine. Symptomatic cases are clustered together, regardless of whether they appear in residential screening or general quarantine. Similarly for asymptomatic infections.
In my graphs and tables, I’ve chosen to combine symptomatic and asymptomatic cases, but instead to distinguish between cases detected inside general quarantine, and cases detected during residential screening.
The media stories report all four categories, once you read past the summary. It’s just a matter of what is given prominence in titles, headlines, maps, and graphs. The Data spreadsheet tab gives all four.
Symptomatic versus asymptomatic. The top-level summaries in online articles make a distinction between symptomatic cases and asymptomatic infections.
This is a useful distinction in assessing the current effect on the health system. It might be a useful distinction in order to reassure readers. Epidemiologically, it is more questionable, as described in the section Rationales for Partitioning the Data. I have chosen to combine symptomatic and asymptomatic positives, in the summary data and graphs.
Quarantine versus non-quarantine. The top-level summaries in online articles combine cases detected in quarantine, and cases detected in general screening, into a single number. I’ve chosen to summarize and graph them as separate categories, as discussed in the section Rationales for Partitioning the Data.
Spreadsheet Tabs
Summary Tab
This sheet summarizes data from the Data sheet. It also includes a couple of graphs, which are described in a following section.
All values on this sheet are summed symptomatic cases and asymptomatic infections, for reasons described in the Presentation Choices section.
The summary begins on March 14. The Data sheet goes back to the first reports on local cases, on February 22. I consider the first few weeks of data to be too low quality to be worth including in the summary view.
In addition to cumulative stats for all of Shanghai, this sheet also includes case counts for Pudong, and for the sum of the other fifteen districts. See the note here.
Data for each of the sixteen districts is on the Districts tab of the Google sheet.
Data Tab
The Data tab has numbers taken directly from Shanghai Times and That’s Shanghai. (See the Sources section, above.) The data before March 14 is not of the same quality as subsequent days, for reasons discussed in a following section.
Districts Tab
The Districts tab shows positives per district. See the section District-Level Data, below.
Summary Graphs
The Summary tab of the spreadsheet also contains a few graphs:
New Cases
<iframe width="776" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vT35m7QXRmF4XFc3rbCh1WP9PMXJd0yKeXkEpKXpVy_h_ufIq78pF5Fccp_9WHyHlX4nYeS5kFo5SKz/pubchart?oid=1891190477&format=image"></iframe>
The New Cases graph shows new cases detected in quarantine, and new cases detected outside of quarantine.
Most of the new cases are detected among close contacts, who have been moved to central quarantine. This makes it difficult to read the new cases that are detected in the community off of this graph. The following two graphs address this.
New Cases Detected Outside Quarantine
<iframe width="776" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vT35m7QXRmF4XFc3rbCh1WP9PMXJd0yKeXkEpKXpVy_h_ufIq78pF5Fccp_9WHyHlX4nYeS5kFo5SKz/pubchart?oid=384688515&format=interactive"></iframe>
This graph contains the following series:
- New cases in Pudong.
- New cases in the other 15 districts. (This isn’t just “Puxi”, because it also includes Baoshan, Minhang, Jiading, Jinshan, Songjiang, Qingpu, and Fengxian.)
- New cases in all of Shanghai. (This third series is the sum of the other two. It is also the same as the “Detected Outside Quarantine” line in the previous chart.)
Why Pudong? Originally, because I live here. When I started tracking this information, I was particularly interested in what was happening on this side of the river. It turns out, though, that Pudong is the only district whose population is so large that the per-day data is salient to the eye. (See Trend Lines, below.)
Data for each of the sixteen districts is on the Districts tab of the Google sheet.
New Cases (log)
<iframe width="776" height="371" seamless frameborder="0" scrolling="no" src="https://docs.google.com/spreadsheets/d/e/2PACX-1vT35m7QXRmF4XFc3rbCh1WP9PMXJd0yKeXkEpKXpVy_h_ufIq78pF5Fccp_9WHyHlX4nYeS5kFo5SKz/pubchart?oid=1289218452&format=interactive"></iframe>
This log-scale graph shows the same data as the first graph, but with a log y axis. (Each tick mark on the y axis represents ten times as many cases as the tick mark below it.) This choice of axis makes it possible to read both the cases detected in quarantine, and the cases detected in the community, from a single graph.
This graph is better than the other two graphs (that use linear axes) for eyeballing the trend lines of data that is generated by processes with exponential growth or decrease, such as infectious disease spread. Exponential data appears as straight lines on a log axis. (In the other graphs, which use a linear y axis, exponential growth or decrease has a curvy shape that makes it difficult to tell by eye whether the exponent is changing.)
Short-term discontinuities in the shape of this graph could be an artifact of changes to either the testing methodology or reporting standards. I believe that sustained changes to the trend lines are almost certainly from changes to the effective reproduction number of the virus.
Trend Lines
All the graphs include trend lines. The trend lines are moving averages. The use of moving averages attempts to compensate for the fact that different kinds of testing are done in different regions and on different days. (These differences don’t appear to cancel out within any given day, but they do over time.)
The first graph uses a four-day moving average; the second graph uses a five-day moving average. In theory, the second chart ought to require more averaging, since two of its series are over smaller populations each with individually less diverse testing policies on any given day, and therefore with less opportunity for daily differences to average out on any given day.
The United States uses seven-day averages, but this is because there is a strong difference between weekend and weekday reporting. This difference does not exist here because tests and reporting are done equally on weekdays and weekend days.
Why not polynomials? You may have seen graphs, of this or other data, that fit the shape to a polynomial. It is a fact of math that you can fit any discrete graph to a polynomial of sufficiently high order; a corollary of this fact is that no specific polynomial has good explanatory or predictive power (unless the chart-maker had a prior commitment to using only polynomials of a certain order). A polynomial is only appropriate if there is an a priori reason to believe that the data measures a phenomenon generated by combination of linear, quadratic, cubic, etc. generating processes (or by a single process that computes a polynomial). Either of these is unlikely in nature; there is in particular no reason to expect infectious disease transmission, which is known to be an exponential process, to be polynomial.
Notes
Rationales for partitioning the data
Why separate cases detected in central quarantine, from cases detected during residential screening?
Update: I am no longer sure that it makes sense to apply this distinction, and I am no longer confident of the analysis in this section. I will update this section tomorrow.
The top-level summaries in online articles make a distinction between symptomatic cases and asymptomatic infections. As mentioned previously, this is a useful distinction in assessing the current effect on the health system. It might be a useful distinction in order to reassure readers. Epidemiologically, It is more questionable.
Symptomatic cases and asymptomatic infections are caused by the same virus, and participate in the same transmission chains. Whether you develop with a symptomatic or asymptomatic case has more to do with your immune system, exposure length and proximity (and other factors that affect viral load), etc., than with whether your exposure was to a symptomatic versus asymptomatic carrier.
It might be the case that symptomatic cases shed a greater viral load than asymptomatic ones. I don’t think this is clear or generally accepted; and the most infectious period appears to be prior to the emergence of symptoms in any case.
It might also be the case that symptomatic cases remain contagious for longer. This would matter more if they were allowed to remain in communities. If the screening and quarantine policies are followed (there is some question about this), neither symptomatic nor asymptomatic cases should be remaining in the community for long enough that this would make a difference.
Retrospective versus prospective transmission chains. The top-level summaries in online articles combine cases detected in general quarantine, and cases detected in general screening, into a single number. As mentioned previously, I’ve chosen to summarize and chart them as separate categories. This is because I’m interested in the epidemiology and transmission chains, and in the effect on neighborhood lockdowns. Still, it’s a judgement call.
There are quarantine facilities for people who tested negative, but were close contacts of people who tested positive. A case detected in these facilities can have one of two histories: (1) the individual was at an early, undetectable, stage of infection when they were moved to quarantine; (2) the individual was not infected when they were moved into quarantine, but became infected from other contagious individuals sharing the quarantine space.
[There is lots to say about the ethics of a system that can cause (2) to happen. There is also lots to say about the trolley problems of quarantine in general, even if quarantines are implemented such that they don’t create additional COVID-19 cases. This isn’t the place for me to talk about these.]
Whether, for epidemiological purposes, a close contact who tested negative in the community, and then positive in general quarantine, should be considered to belong to the general population (where they might have contracted the virus), or to a distinct population (where they also might have contracted it, and which is only population to whom they can transmit it), depends on what question you’re trying to use the data to answer.
How likely is someone to be infected? The number of cases detected in quarantine has something to do with what percentage of the general population is being infected each day. How it bears on this depends on how many cases detected in quarantine are from individuals who were infected prior to quarantine, versus how many were infected in quarantine. For example, for the 24-hour period reported on April 24, 18392 cases were detected in quarantine and 217 cases were detected outside quarantine. If half the quarantine cases were infected (but not detected) prior to being moved to quarantine, then 38 residents of Shanghai out of 100,000 were infected each day around that time (assuming a locally steady infection rate for the preceding few days). If only 10% of the quarantine cases were infected by their close contacts, then the number is 8 per 100,000; if everyone who entered quarantine was already infected; then the rate is 75 per 100,000.
How many people are transmitting infections outside quarantine? The number of cases detected in quarantine has no bearing on future community transmissions. The objective of removing close contacts to central quarantine is to remove the transmission chain form the community. Individuals who test negative on PCR are extremely unlikely to be infectious at the time of the test, so these individuals were unlikely to be the source of further community transmissions.
How many cases are causing neighborhoods to re-enter, or reset the clock on, lockdown status? The number of cases detected in quarantine has no bearing on future community transmissions. An individual who tests positive in central quarantine does not, by my understanding, cause a change to their residence’s lockdown status.
Data Quality
Prior to about March 14, there were many inconsistencies about how the data was reported. Different language was used on different days, possibly to describe the same data set, but possibly where reporting criteria and collection methods were still changing. I made an effort to extract and normalize the data, but I don’t consider this data to be at the same quality, and haven’t included it in the Summary sheet.
The number of contacts of symptomatic cases that were taken into quarantine, and the number close contacts of asymptomatic infections that were placed under observation, were reported up through April 15. No data was reported for these after that date.
On April 21, the number of severe cases increased by more than 150%. On April 24, the daily death count increased by more than 200%. I don’t have any information about what could have caused these discontinuities.
How to Read the Sources
“Locally Transmitted” versus “Screening of High-Risk People”
A typical article in Shanghai Times begins:
The city reported 2,472 locally transmitted COVID-19 cases, 16,983 local asymptomatic infections and one new asymptomatic imported case on Sunday, said the Shanghai Health Commission on Monday morning.
“Locally transmitted” and “local asymptomatic infections” in this context does not mean cases and infections that were detected outside of quarantine. It includes some of those that were detected in quarantine.
In order to find the number of cases detected via residential testing, it is necessary to read further down in the article and, in some cases, do some arithmetic (subtract the number of cases and infections “detected during central quarantine” from the totals).
Also, “high-risk people” just means “everyone outside of quarantine”. Reports started using that term early in the pandemic, when only a few people in special categories were screened each day. The English-language articles, at least, never updated this terminology.
District-Level Data
District-level case counts are from That's Shanghai. These case counts are for tests outside of quarantine.
Shanghai Times reports district data too, but its district-level numbers include cases that were detected in central quarantine. (This is evident because the district numbers sum to numbers greater than the total number of residential cases.) I believe that districts are credited with the home address of a close contact who tests positive in quarantine.
Acknowledgements
Thanks to Margaret Minsky and Charlotte Minsky for invaluable advice on early drafts of this page. Thanks also to the COVID Tracking Project, for giving Charlotte experience and training which is indirectly reflected in this advice.
– Oliver Steele, Shanghai, China