Overview

The Human Mortality Database (HMD) contains original calculations of all-cause death rates and life tables for national populations (countries or areas), as well as the input data used in constructing those tables. The input data consist of death counts from vital statistics, plus census counts, birth counts, and population estimates from various sources.

The HMD now also contains cause-specific mortality indicators for many of the database countries. The Human Cause-of-Death Data (HCD@HMD) series is further described below. In addition, the HMD provides weekly death counts and death rates for most of the countries as part of the Short-Term Mortality Fluctuations (STMF@HMD) data series. The text below summarizes information about annual data series. For an overview of the STMF, please see our dedicated page.

Scope and basic principles

We continue to add new data series to this collection. However, the database is limited by design to populations where death registration and census data are virtually complete, since this type of information is required for the uniform method used to reconstruct historical data series. As a result, the countries and areas included here are relatively wealthy and for the most part highly industrialized.

In a companion project, we are also developing the Human Lifetable Database (HLD) , which includes life tables constructed by other individuals or institutions using a variety of techniques. Thus, the HLD contains mortality estimates for some countries that could not be included in the HMD.

The main goal of the Human Mortality Database is to document the longevity revolution of the modern era and to facilitate research into its causes and consequences. As much as possible, we have followed four guiding principles in creating this database: comparability, flexibility, accessibility, reproducibility.

We have tried to provide complete documentation of the data available through this site. Users may start by reading a brief summary of how individual data sets are constructed. A complete description of our methodology is provided in the Methods Protocol (available in PDF format). Documentation that is specific to an individual population (including data sources) is provided through links within each country section.

You are welcome to download and analyze any data provided here free of charge. However, before gaining full access to the database, you must become a registered user, which requires accepting our user agreement and answering just a few questions. After receiving this information, we will immediately send you a password and more information about how to use the database.

We are still actively developing this database. Although we have been very careful in assembling and manipulating the data presented here, it is possible that some errors remain, and we would appreciate your help in identifying any inaccuracies. If you have comments or questions, or trouble accessing the database, please write to hmd@mortality.org.

Computing all-cause death rates and life tables

Our process for computing mortality rates and life tables can be described in terms of six steps, corresponding to six data types that are available from the HMD. Here is an overview of the process:

Births. Annual counts of live births by sex are collected for each population over the longest possible time period. These counts are used mainly for making population estimates at younger ages.
Deaths. Death counts are collected at the finest level of detail available. If raw data are aggregated, uniform methods are used to estimate death counts by completed age (i.e., age-last-birthday at time of death), calendar year of death, and calendar year of birth.
Population size. Annual estimates of population size on January 1st are either obtained from another source or are derived from census data plus birth and death counts.
Exposure-to-risk. Estimates of the population exposed to the risk of death during some age-time interval are based on annual (January 1st) population estimates, with a small correction that reflects the timing of deaths within the interval.
Death rates. Death rates are always a ratio of the death count for a given age-time interval divided by an estimate of the exposure-to-risk in the same interval.
Life tables. To build a life table, probabilities of death are computed from death rates. These probabilities are used to construct life tables, which include life expectancies and other useful indicators of mortality and longevity.

Constructing homogeneous cause-specific death rates

Cause-of-death time series are severely disrupted by periodic changes in disease classifications. Existing international databases do not fix this problem.

The information provided by the WHO database provides death counts classified according to the International Classification of Diseases (ICD) at the time when they were collected. Periodic changes in the ICD thus create discontinuities in cause-specific data series.

The Eurostat data series are more consistent over time because the data are only provided for a short list of broad cause-of-death categories, but these data are only available for relatively short periods of time (from 1994 at best) and they do not allow for the analysis of detailed causes of death.

To reconstruct cause-specific mortality series which are consistent over time, it is necessary to establish transition coefficients between items of two successive ICDs, in order to redistribute deaths classified according to the old classification into items of the new classification.

When bridge coding (double classification of a sample of deaths simultaneously into the old and new classification) has been performed at a detailed level by the national office in charge of processing cause-of-death data, transition coefficients can be inferred directly from the results, but there are only two countries in the database where this has been done and where the resulting bridge-coding data are available (and only for the last transition, i.e. from ICD-9 to ICD-10), namely England and Wales and the U.S.A.

For the other transitions in the two countries and for all transitions in other countries, coherent time series are reconstructed in our database by producing ex-post double coding. The method developed at INED in the 1980s is used as a guideline, but the work was tailored to each country independently.

For each classification change, the method comprises three steps (Vallin and Meslé, 1988, 1998; Meslé and Vallin, 1996):

Setting up one correspondence table which lists, for each item of one classification, all the items of the other classification that are a priori equivalent in terms of medical content.
Building fundamental associations of items that identify the smallest possible number of items containing the same medical contents in both classifications and testing the consistency of the associations over time using a statistical test (Barbieri, Chung, and Boe, 2008; Camarda, Pecholdová, and Meslé, 2015).
Setting up ex-post double-coding according to the structure of fundamental associations, to finally obtain transition coefficients.

The results derived from the medical logic of the classification rules have to be checked statistically, to detect and solve any remaining breaks in the series. Such checks are carried out visually by age group and sex.

In addition, national statistical offices implement occasional changes independent of the official revisions of the classification by the World Health Organization, in charge of maintaining the schemes. To address this problem, the statistical continuity of the series over time is systematically verified and any artificial disruption dealt with appropriately.

Finally, country- and time-specific methods are used to deal with ill-defined causes (Ledermann, 1955; Vallin and Meslé, 1988).

In contrast to other existing databases on causes of death, the real innovation of these data is to provide time series with a constant classification of causes. The goal is to include the reconstructed data series based on the most recent classification (currently the 10th revision of ICD). Not all countries use ICD-10 at the most detailed (4 digit) level.

For comparability purposes, we provide mortality data classified according to long, intermediate and short lists of causes of death which are the same for all countries. In addition, data classified according to a more or less "full list", are also provided, according to their availability in each country.

A description of the available data can be found in the Explanatory notes. For a description of data formats see Formats. The cause- and age-specific death rates, crude death rates, and age standardized death rates are calculated using the population exposures from the all-cause HMD series. For the countries that are not included in the HMD (Moldova and Romania at the time of this writing), the standard HMD methods are implemented to produce comparable population exposures.

For several of the countries with data available in the HMD, reconstruction is either under way or not yet initiated. For these countries, the data series published in the database are only available for the period covered by the most recent ICD Revision (ICD-10) until reconstruction has been completed.

Note that population estimates in the HCD@HMD might be updated with some delay. Thus, a temporary discrepancy between population estimates in core HMD and the HCD is possible. For several countries, the HCD uses adjusted historical birth counts (see country-specific documentations for details) to calculate cause-specific infant mortality rates. Thus, the HCD birth counts may differ from official statistics and core HMD estimates.

Corrections to the data

The data presented here have been corrected for gross errors (e.g., a processing error whereby 3,800 becomes 38,000 in a published statistical table would be obvious in most cases, and it would be corrected). However, we have not attempted to correct the data for systematic age misstatement (misreporting of age) or coverage errors (over- or under-enumeration of people or events).

Some studies have assessed the completeness of census coverage or death registration in the various countries, but most of their results are anecdoctic and more comprehensive work is needed in this area. However, in developing the database thus far, we did not consider it feasible or desirable to attempt corrections of this sort, especially since it would be impossible to correct the data by a uniform method across all countries.

In the cause-specific data series, we have nonetheless redistributed deaths of ill-defined or unknown causes as further explained in the additional documentation. Adjustments have also been made in the HCD to the infant mortality rates where needed (additional details provided in the country specific documentation files).

Age misreporting

Populations are included here if there is a well-founded belief that the coverage of their census and vital registration systems is high, and thus, that fruitful analyses by both specialists and non-specialists should be possible with these data. Nevertheless, there is evidence of both age heaping (overreporting ages ending in "0" or "5") and age exaggeration in these data, especially for historical periods.

In general, the degree of age heaping in these data varies by the time period and population considered, but it is usually no burden to scientific analysis. In most cases, it is sufficient to analyze data in five-year age groups in order to avoid the false impressions created by this particular form of age misstatement.

Age exaggeration, on the other hand, is a more insidious problem. Our approach is guided by the conventional wisdom that age reporting in death registration systems is typically more reliable than in census counts or official population estimates. For this reason, we derive population estimates at older ages from the death counts themselves, by implementing the extinct cohort method. The method eliminates some, but certainly not all, of the biases in old-age mortality estimates due to age exaggeration, hence the need to remain cautious when using HMD data for investigations into old-age mortality.

Uniform set of procedures

A key goal of this project is to follow a uniform set of procedures for each population and time period. This approach does not guarantee the cross-national comparability of the data. Rather, it ensures only that we have not introduced biases by our own manipulations. Our desire for uniformity had to face the challenge that raw data come in a variety of formats (for example, 1-year versus 5-year age groups).

Our general approach to this problem is that the available raw data are used first to estimate two quantities: 1) the number of deaths by age-at-last-birthday, year of birth, and year of death; and 2) annual population estimates by single year of age. For each population, these calculations are performed separately by sex. From these two pieces of information, we compute death rates and life tables in a variety of age-time configurations. These are used in turn to construct cause-specific death rates.

It is reasonable to ask whether a single procedure is the best method for treating the data from a variety of populations. Here, two points must be considered. First, our uniform methodology is based on procedures that were developed separately, though following similar principles, for various countries and by different researchers.

Earlier methods were synthesized by choosing what we considered the best among alternative procedures after careful analysis and by eliminating superficial inconsistencies. The second point is that a uniform procedure is possible only because we have not attempted to correct the data for reporting and coverage errors. Although some general principles could be followed, such problems would have to be addressed individually for each population.

Although we adhere strictly to a uniform procedure, the data for each population also receive significant individualized attention. For both the all-cause data series, each country or area is assigned to an HMD staff member, who takes responsibility for assembling and checking the data for errors as well as for communicating with in-country experts (statisticians at the National Statistics Office or academics) when needed. One of the responsibilities of these country specialists is to check our data against other available sources.

For the cause-specific data series, the country specialists are not part of the HMD project team per se but in-country statisticians or academics with expertise about the country cause-of-death data. The work of the country specialists is verified through a number of standard procedures, by the HMD cause-of-death project coordinator for the cause-specific series, and, ultimately, by the two HMD Co-Directors. These procedures guarantee a high level of data quality and consistency, but assistance from database users in identifying problems is always appreciated!

References:

Barbieri, M., Chung, R., & Boe, C. (2008). Automating the redistribution of deaths by cause over ICD changes. Second Human Mortality Database Symposium, Max Planck Institute for Demographic Research, Rostock, Germany, 13-14 June 2008.
Camarda, C.G., Pechholdová, M. & Meslé, F. (2015). Cause-specific senescence: classifying causes of death according to the rate of aging. 80th Annual Meeting of the Population Association of America. San Diego (USA), May 2015. http://paa2015.princeton.edu/uploads/153074
Ledermann, S. (1955). La répartition des décès de cause indéterminée. Revue de l’Institut international de statistique, 23 (1–3), 47–55.
Meslé, F., & Vallin, J. (1996). Reconstructing long-term series of causes of death. Historical Methods, 29 (2), 72–87.
Vallin, J., & Meslé, F. (1988). Les causes de décès en France de 1925 à 1978 (Travaux et Documents, No.115, 608 p.). Paris: INED/PUF.
Vallin, J., & Meslé, F. (1998). Comment suivre l’évolution de la mortalité par cause malgré les discontinuités de la statistique. Le cas de la France de 1925 à 1993. In G. Pavillon (Eds.), Enjeux des classifications internationales en santé (Questions en santé publique, pp. 113–156, 220 p.). Paris: Éditions INSERM.