Project Overview
This project performs a descriptive statistical analysis of global demographic data. The goal is to compare
life expectancy at birth and under-age-5 mortality rates across countries, sexes, regions, and subregions,
and to study how these indicators changed between 2002 and 2022.
The project was completed for the Introductory Case Studies course at TU Dortmund. It focuses on exploratory
and descriptive methods rather than predictive modelling, making it a strong foundation project for statistical
data analysis and visualization.
Dataset
The dataset is an extract from the International Data Base of the U.S. Census Bureau. It includes demographic
indicators for 227 countries for the years 2002 and 2022. The countries are grouped into five regions and
21 subregions.
The two main demographic indicators are life expectancy at birth and under-age-5 mortality rate. Both indicators
are available for both sexes together and separately for males and females.
| Dataset Component |
Description |
Use in Analysis |
| Country.Area.Name |
Country or area name |
Country-level comparison |
| Region |
Five global regions |
Regional averages and comparisons |
| Subregion |
21 geographical subregions |
Homogeneity and heterogeneity analysis |
| Year |
2002 and 2022 |
20-year change analysis |
| Life Expectancy |
At birth, by both sexes, males, and females |
Health and longevity indicator |
| Under-5 Mortality |
Mortality rate by both sexes, males, and females |
Child health and development indicator |
Research Tasks
The project was designed around four main descriptive-analysis tasks. For the first three tasks, only the year
2022 was analyzed. The final task compared 2002 and 2022 to study changes over a 20-year period.
- Describe frequency distributions of life expectancy and under-5 mortality variables.
- Compare differences between sexes and regions.
- Analyze homogeneity within subregions and heterogeneity between subregions.
- Investigate bivariate correlations between demographic indicators.
- Compare changes in life expectancy and under-5 mortality from 2002 to 2022.
Statistical Methods
The project used standard descriptive statistics and graphical methods. These methods were selected because the
course objective was to practice exploratory and descriptive analysis with interpretable statistical measures.
| Method |
Purpose |
Interpretation |
| Mean and Median |
Measure central tendency |
Summarize typical life expectancy or mortality values |
| Variance and Standard Deviation |
Measure spread |
Show how much countries differ within groups |
| Quantiles |
Summarize distribution shape |
Compare lower, middle, and upper parts of the distribution |
| Histograms |
Display frequency distributions |
Show skewness and concentration of values |
| Boxplots |
Compare groups visually |
Reveal median, spread, and outliers |
| Correlation Matrix |
Analyze bivariate relationships |
Measure linear associations between demographic variables |
Frequency Distribution Analysis
The 2022 distribution showed clear global variation in life expectancy. The average life expectancy for both sexes
was around 74.8 years. Monaco had the highest life expectancy, while Afghanistan had the lowest value in the dataset.
74.8 years
Average life expectancy for both sexes across regions in 2022.
89.52 years
Highest life expectancy, observed for Monaco.
53.65 years
Lowest life expectancy, observed for Afghanistan.
Sex-Based Differences
Female life expectancy was generally higher than male life expectancy. Monaco had the highest female life expectancy,
while Afghanistan had the lowest values for both males and females. The project used bar plots to compare male and
female life expectancy across regions.
Regional Comparison
The regional comparison showed strong differences between continents. Europe had the highest average life expectancy,
while Africa had the lowest. The Americas were generally second-highest after Europe, while Asia and Oceania showed
similar average levels.
| Region |
Life Expectancy Pattern |
Under-5 Mortality Pattern |
| Europe |
Highest average life expectancy |
Lowest under-5 mortality |
| Americas |
High average life expectancy |
Low to moderate under-5 mortality |
| Asia |
Middle range, close to Oceania |
Second-highest under-5 mortality after Africa |
| Oceania |
Middle range, similar to Asia |
Moderate under-5 mortality |
| Africa |
Lowest average life expectancy |
Highest under-5 mortality |
Under-5 Mortality Analysis
Under-age-5 mortality was much more unequal across regions than life expectancy. Africa showed the highest mortality
rate, followed by Asia. Europe and the Americas had the lowest mortality rates. The analysis also showed that male
under-5 mortality was higher than female under-5 mortality on average.
26.67
Average under-5 mortality rate for both sexes in 2022.
29.23
Average under-5 mortality rate for males in 2022.
24.01
Average under-5 mortality rate for females in 2022.
Boxplots were used to compare mortality rates between males, females, and both sexes. The distributions showed
many high-value outliers, especially for countries with much higher child mortality rates.
Subregion Homogeneity and Heterogeneity
One part of the project focused on Africa and compared its subregions. Boxplots were used to assess whether values
were homogeneous within subregions and heterogeneous between different subregions.
The analysis found visible heterogeneity among African subregions. Northern Africa showed relatively high variation
in life expectancy, while Southern Africa appeared less heterogeneous. For under-5 mortality, the variation was also
large, with several outliers and different median levels across African subregions.
- Box size and median position were used to assess homogeneity.
- Subregions with similar box sizes and median lines were interpreted as more homogeneous.
- Large spreads and outliers indicated heterogeneity.
- Africa showed strong subregional differences in both life expectancy and under-5 mortality.
Bivariate Correlation Analysis
The project used correlation analysis to study linear relationships between life expectancy variables and under-5
mortality variables. A correlation matrix and heatmap were created for the 2022 data.
As expected, male, female, and both-sex life expectancy variables were strongly positively correlated with each other.
Under-5 mortality variables were also strongly positively correlated with each other. Life expectancy and under-5
mortality showed a negative relationship, meaning that countries with higher life expectancy tended to have lower
under-5 mortality.
Change from 2002 to 2022
The final part compared demographic indicators from 2002 and 2022. The project found a general improvement in global
demographic outcomes over the 20-year period: life expectancy increased in many countries and under-5 mortality decreased.
- Most countries and subregions showed an upward trend in life expectancy.
- Under-5 mortality generally declined from 2002 to 2022.
- Regional inequality remained visible, especially between Africa and Europe.
- The results support the broader pattern of global health improvement with persistent regional disparities.
Python Implementation
The analysis was implemented in Python using pandas for data handling and matplotlib/seaborn for visualization.
The notebook/code included data loading, filtering by year, grouping by region and subregion, summary statistics,
boxplots, histograms, and correlation heatmaps.
Implemented Steps
- Loaded the CSV dataset with pandas.
- Filtered the dataset for 2022 and compared it with 2002.
- Computed means, medians, standard deviations, quantiles, and descriptive summaries.
- Created histograms for life expectancy distributions.
- Created bar charts for regional comparisons.
- Created boxplots for mortality and subregion heterogeneity.
- Computed and visualized correlation matrices.
Limitations
This project is an introductory descriptive analysis. It does not use inferential models or causal methods, so the
results should be interpreted as descriptive patterns rather than causal explanations. Some parts of the original code
contained exploratory trial-and-error cells and minor variable-name mistakes, but the final analysis still demonstrates
the core descriptive workflow.
- The analysis is descriptive and not causal.
- Only two years, 2002 and 2022, were compared.
- Country-level demographic indicators may hide within-country inequality.
- Some results depend on regional grouping definitions.
- The project is useful as a foundation-level exploratory data analysis project.
Outcome
This project strengthened my foundation in exploratory data analysis, descriptive statistics, data visualization,
regional comparison, correlation analysis, and Python-based data workflows. It also helped connect statistical methods
with real-world demographic and public-health indicators.
It is a useful portfolio project for showing solid basics in pandas, matplotlib, statistical summaries, and visual
interpretation of global demographic data.
Descriptive Statistics
Demography
Life Expectancy
Under-5 Mortality
Python
pandas
matplotlib
Correlation Analysis
Data Visualization