Chapter 5 Data Requirements
Important: Data loaded into CaRDO are stored locally on your computer, and all analyses are performed locally. Your data will not leave your computer while using CaRDO.
Options to publish the dashboard you develop using CaRDO (e.g., share it online) are described later. If you do publish it, it is your responsibility to ensure that all data shown in the CaRDO dashboard is appropriate for sharing. See data privacy for more details.
5.1 The basic dashboard
The type of CaRDO dashboard you’re able to create will depend on the breadth of data you have available.
The simplest dashboard available is a display of cancer incidence count data, and the data necessary to generate this minimum dashboard are listed below.
- Cancer type
- Counts (number of diagnoses)
- Diagnosis year
- Age at diagnosis (5-year age groups)
- Sex
Note: The specific names of these variables in your dataset do not matter, only that these data are available.
More expansive dashboards are possible where additional variables are available, and we cover those options in Expand Your Dashboard. But first, there are three key requirements for all cancer datasets loaded into CaRDO.
5.1.1 Data structure
- Your data must be in the following structure (variable names do not matter)
Cancer | Count | Year | Age.group | Sex |
---|---|---|---|---|
Liver | 25 | 2018 | 12 | 1 |
Breast (Female) | 52 | 2003 | 15 | 2 |
Colorectal | 28 | 2007 | 9 | 2 |
Prostate | 51 | 2019 | 6 | 1 |
All cancers | 275 | 2019 | 13 | 2 |
You must have a single column for each variable and outcome you wish to report, and each row in your dataset should correspond to a unique combination of each variable. Refer back to the Example Datasets for further guidance.
Note: For registries with minimal data, CaRDO provides the option to aggregate and report your data by fixed 5-year intervals (e.g., 2018–2022) once your data has been loaded.
5.1.2 Cancer types
- Cancer-type values must be coded as you wish them to be displayed (see example above).
For example, liver cancer should be coded as ‘Liver’ or ‘Liver cancer’ and not as a numerical value. Numerical coding will cause cancer types to display as numbers in the dashboard and be uninterpretable for users who don’t know the coding scheme (and even for those who do).
5.1.3 5-year age groups
- Cancer counts must be aggregated by 5-year age groups, with age groups coded numerically from 1–18. You do not need to supply data for every age group listed in the table below. However, any data you do provide must use the correct age group code, as defined in the table. For example, if you don’t have data for the 0–4 years age group (code 1), but you do have data for the 5–9 years group, it must still be coded as 2 — not 1.
Age.group | Age.group.numeric |
---|---|
0-4 | 1 |
5-9 | 2 |
10-14 | 3 |
15-19 | 4 |
20-24 | 5 |
25-29 | 6 |
30-34 | 7 |
35-39 | 8 |
40-44 | 9 |
45-49 | 10 |
50-54 | 11 |
55-59 | 12 |
60-64 | 13 |
65-69 | 14 |
70-74 | 15 |
75-79 | 16 |
80-84 | 17 |
85+ | 18 |
5-year age groups are necessary for two reasons. Firstly, a standard format is required for CaRDO to aggregate data and display cancer counts by broad age groups. Secondly, where age-standardized rates are being calculated, 5-year age groups are required.
Note, counts by 5-year age groups will NOT be displayed visually in the dashboard and will not be published online. CaRDO will only display and publish counts by the following broad age-groups.
Broad.age.group |
---|
0-34 |
35-49 |
50-64 |
65+ |
5.2 Other data details to note
- You will be asked whether your dataset includes an ‘All cancers’ category — a row that summarizes the total number of diagnoses or deaths across all cancer types. We strongly recommend including this in your dataset. If you do not supply an ‘All cancers’ category, one will be created automatically by summing across the cancer types you provide.
If your dataset is not comprehensive (i.e., it only includes a subset of cancer types), then the generated ‘All cancers’ total may be misleading, since it won’t reflect the true overall cancer burden.
If your dataset includes nested cancer categories (e.g., Gynecological cancers and Ovarian cancer), CaRDO requires an explicit ‘All cancers’ row. Without it, the dashboard will double count nested cancers during aggregation.
- It does not matter how sex is coded – you’ll be asked to define sex when loading your datasets.
- CaRDO will calculate counts for persons based on counts for males and females – you do not need to supply data for persons (males + females).
- By default, counts below 5 will be suppressed. These data points will appear as ‘insufficient data’ in the dashboard. You will have the option to set a custom suppression threshold.
If you have additional data to include in your dashboard, such as mortality data, stick with us for just a moment longer — we’ll walk you through how to Expand Your Dashboard in the next section. Otherwise, if your dataset looks like the example above and you have no further data to add, then you’re ready to launch CaRDO and begin building your own dashboard using the following command.
- For guidance navigating the CaRDO user interface, return to launch CaRDO and follow the steps.
If you’ve finished building your CaRDO dashboard, learn how to publish it online here.