Chapter 10 Methods
10.1 Age-standardized rates
Age-standardised rates attempt to adjust for variation in age structures in different populations (either different geographical areas or the same population across time). There are two methods of age-standardisation – direct and indirect.
All cancer diagnoses and cancer deaths trends were calculated using directly standardised rates. The method involves applying age-specific rates from the population of interest (i.e. your catchment) to a standard population, which in CaRDO is the World Standard Population by default. Five-year age groups up to 85-years-and-above were used for all age-standardized rate calculations.
Note: if not all 18 age groups are supplied, the standard population is re-weighted according to your supplied age groups.
10.2 Lifetime risk
Cumulative risk is a measure used to estimate the risk of developing or dying of cancer up to a specific age. It takes into account the removal of persons from the population of interest who have already been diagnosed with or died from cancer. Commonly expressed as a ‘1 in n’ proportion, the cumulative risk is calculated as:
\[n = \frac{1}{1-\exp(-5\sum a_j\times\frac{100}{100,000})}\]
where \(a_j\) are the age-specific rates (5-year age groups) per 100,000 for ages 0 – to your specific age group, for example 85. CaRDO provides the cumulative risk up to the age 85 as an approximation of lifetime risk. An x in 100 variation is also supplied, calculated as the inverse of the cumulative risk multiplied by 100. These calculations assume that the person experiences the current age-specific risk rates up to the age specified (e.g. 85), so do not account for any specific risk factors (such as smoking).
10.3 Incidence & mortality trends
Incidence and mortality trends were calculated by fitting piece-wise functions, composed of splines and linear models, to the data.
Break points were identified using the strucchange
package (Zeileis et al. 2002), specifically the function breakpoints()
(Zeileis et al. 2003). The data was then segmented by these breakpoints. A maximum number of 3 breakpoints were set, with a minimum number of 5 observations within each segment. Then, a spline and linear model was fit to each segment using the mgcv
package (Wood 2003). The model with the lowest AIC was chosen for each given segment.
10.4 Sex specific cancers
CaRDO does not inherently know which cancer types are sex-specific based on the data you load. Since accurate rate calculations require matching the denominator (population count) to the appropriate sex, CaRDO determines sex specificity by scanning the dataset for sex codes by cancer type across all years. If a cancer type has data for only one sex, it is classified as sex-specific. For example, because prostate cancer occurs only in males and is absent from female data, CaRDO knows to calculate rates using the male population as the denominator rather than the total population. It is important to note that this method does not distinguish between truly sex-specific cancers and cases where data for one sex is simply absent due to low numbers. Where data is absent due to low counts, CaRDO will misclassify the cancer type as sex-specific and calculate the rate using the incorrect denominator (population count). If this unlikely scenario arises, in the short term, we recommend removing cancer types from your dataset that are not sex-specific but where data is only available for one sex. For an alternative solution, please reach out to us at statistics@cancerqld.org.au
10.5 Data Privacy
Data loaded into CaRDO are stored locally on your computer, and all analyses are performed locally. Your data will not leave your computer while using CaRDO – CaRDO has been designed with data privacy as a top priority. The process of loading data into CaRDO is no different to loading data into RStudio on your local computer. In fact, that is all that is happening - CaRDO loads data into the RStudio environment and carries out data processing locally on your machine using R. It is perfectly safe to load sensitive data into CaRDO as you would into any other secure data managment software. Once CaRDO has finished data processing, it will create a series of summary data files (stored locally) used for the creation of the dashboard. We encourage you to inspect these data files if you have any concerns. Suppression of sensitive data (low counts) is discussed in the Data Requirements section.
Important: If you choose to publish your dashboard (e.g., share it online), data will be uploaded to the internet (made public) at the resolution that it appears in the CaRDO dashboard. It is your responsibility to ensure that all displayed data is appropriate for sharing before publishing publicly.
10.6 Packages
CaRDO was built using the following R packages:
shiny
software package for R (Chang et al. (2025))tidyverse
(Wickham et al. (2019))bslib
(Sievert, Cheng, and Aden-Buie (2025))shinywidgets
(Perrier, Meyer, and Granjon (2025))plotly
(Sievert (2020))markdown
(Xie, Allaire, and Horner (2024))strucchange
(Zeileis et al. (2002))segmented
(Muggeo (2008))
10.7 Example Datasets
Below are the three example datasets used in the CaRDO example. These are fictional datasets intended to illustrate how cancer incidence, mortality, and population data should be structured for CaRDO.
Cancer Incidence Dataset: An example dataset containing cancer diagnoses by cancer type, year, sex, and 5-year age group.
Cancer Mortality Dataset: An example dataset containing cancer deaths by cancer type, year, sex, and 5-year age group.
Population Dataset: An example dataset containing population estimates by year, sex, and 5-year age group, which can be used for calculating cancer age-standardised rates.