ColOpenData can be used to access open demographic data from Colombia. This demographic data is retrieved from the National Administrative Department of Statistics (DANE). The demographic module allows you to consult demographic data from the National Population and Dwelling Census (CNPV) of 2018 and Population Projections.
The available CNPV information is divided in four categories: households, persons demographic, persons social and dwellings. The population projections information presents data from 1950 to 2070 for a national level, from 1985 to 2050 for a departmental level and from 1985 to 2035 for a municipal level. All data documentation can be accessed as explained at Documentation and Dictionaries.
In this vignette you will learn:
As the goal of this vignette is to show some examples on how to use the data, we will load some specific libraries but that does not mean they are required to use the data in all cases.
In order to access its documentation we need to use the function
list_datasets()
and indicate as a parameter the module we
are interested in. It is important to take a good look at this to have a
clearer understanding of what we count with, before just throwing
ourselves to work with the data. Now, we should start by loading all
necessary libraries.
Disclaimer: all data is loaded to the environment in the user’s R session, but is not downloaded to user’s computer.
First, we have to access the demographic documentation, to check available datasets.
After checking the documentation, we can load the data we want to
work with. To do this, we will use the
download_demographic()
function that takes by parameter the
dataset name, presented in the documentation. For this first example we
will focus on a CNPV dataset.
As it can be seen above, public_services_d presents information regarding availability of public services in the country at the department level. Now, with this data we could, for example, find the proportion of dwellings that have access to a water supply system (WSS) by department and plot it.
First we will subset the data so it presents the information regarding the WSS by department.
wss <- public_services_d %>%
filter(
area == "total_departamental",
servicio_publico == "acueducto"
) %>%
select(departamento, disponible, total)
With the subset, we can calculate the total counts by department.
Then, we can calculate the proportions of “yes” (“si”) by department.
proportions_wss <- wss %>%
filter(disponible == "si") %>%
left_join(total_counts, by = "departamento") %>%
mutate(proportion_si = total / total_all)
For plotting purposes, we will change the name of “San Andrés”, since the complete name is too long.
Finally, we can plot the results
ggplot(proportions_wss, aes(
x = reorder(departamento, -proportion_si),
y = proportion_si
)) +
geom_bar(stat = "identity", fill = "#10bed2", color = "black", width = 0.6) +
labs(
title = "Proportion of dwellings with access to WSS by department",
x = "Department",
y = "Proportion"
) +
theme_minimal() +
theme(
plot.background = element_rect(fill = "white", colour = "white"),
panel.background = element_rect(fill = "white", colour = "white"),
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5)
)