Introduction

The UN Development Program every year publishes it’s Human Development Index. The goal of this measure is to measure human development by country with respect to economic output, education level and health outcome. The goal of this project will be to reimagine such an index to account for political stability, economic stability, and economic sustainability. The index will be calculated and analyzed for nations comprising Latin America.

ggmap(map)

Materials and methods

This section will describe data sources and coding methods. Before proceeding to the code and data itself however, it is important to discuss how this index will be defined.

Being that the Index is a model, it will be important to understand the key assumptions it relies on, namely how it’s inputs represent what they claim to represent.

We can consider these casewise:

Political Stability - This will be represented by the level of public (or government) debt in a single country. The thinking here is that as public debt levels increase, a nation’s government will become increasingly unstable.
Economic Stability - One of the most common macroeconomic indicators of economic stability is the inflation rate. The Consumer Price Index of each country, a measure tracking the prices of all goods in a country intended for end consumers, will be included. An unchecked inflation rate is always a danger to any economy.
Environmental Sustainability - The are a plethora of measure that may be used to track a nation’s environmental sustainability. Although some of these measures may change over time, levels of Carbon Dioxide Emissions are always a concern. Therefore CO2 emissions by country will be included.

Thus the general workflow proceeds as follows:

Gathering Requisite Data
Constructing the Index
Examine Trends

Here’s a quick look at the packages used for this project

library(fredr)
library(wbstats)
library(dplyr)
library(ggplot2)
library(foreach)
library(spData)
library(sf)
library(plm)
library(forecast)
library(rworldmap)
library(ggmap)
library(knitr)

Download and clean data for Political Stability

Arg = fredr_series_search_text("Argentina debt GDP")
head(Arg$title)

## [1] "General government gross debt for Argentina"                                     
## [2] "General Government Gross Debt for Argentina"                                     
## [3] "Total Credit to Private Non-Financial Sector, Adjusted for Breaks, for Argentina"
## [4] "Total Credit to Private Non-Financial Sector, Adjusted for Breaks, for Argentina"
## [5] "Outstanding Total International Debt Securities to GDP for Argentina"            
## [6] "Total Credit to Households and NPISHs, Adjusted for Breaks, for Argentina"

Arg$id[5]

## [1] "DDDM07ARA156NWDB"

Arg.debt = fredr("DDDM07ARA156NWDB")
dates = c(1999:2014)
dates.1 = c(1980:2017)
debtnames = c("date", "debt")
Arg.debt.1 = cbind(dates.1, Arg.debt) %>%
  filter(dates.1 >= 2000) %>%
  filter(dates.1 <= 2014) %>%
  dplyr::select(dates.1, value) %>%
  arrange((desc(dates.1)))
colnames(Arg.debt.1) <- c("date", "Arg.debt")

Download and clean data for Economic Stability

wbsearch(pattern = " CO2 emission")

##         indicatorID
## 5521 EN.CO2.ETOT.ZS
##                                                                                   indicator
## 5521 CO2 emissions from electricity and heat production, total (% of total fuel combustion)

CO2 = wb(indicator = "EN.ATM.CO2E.KT")

Arg.CO2 = CO2 %>%
  filter(country == "Argentina") %>%
  filter(date >= 2000) %>%
  dplyr::select(date, value) %>%
  arrange((desc(date)))
colnames(Arg.CO2) <- c("date1", "Arg.CO2")

Download and clean data for Environmental Sustainability

Arg.CPI = fredr_series_search_text("Argentina CPI")
head(Arg.CPI$title)

## [1] "Inflation, consumer prices for Argentina"                   
## [2] "Consumer Price Index for Argentina"                         
## [3] "Consumer Price Index for Argentina"                         
## [4] "Consumer Price Index: All items: Total: Total for Argentina"
## [5] "Consumer Price Index: All items: Total: Total for Argentina"
## [6] "Consumer Price Index: All items: Total: Total for Argentina"

Arg.CPI$id[2]

## [1] "DDOE01ARA086NWDB"

Arg.CPI = fredr("DDOE01ARA086NWDB")
CPIdates.1 = c(1960:2014)
Arg.CPI.1 = cbind(CPIdates.1, Arg.CPI) %>%
  filter(CPIdates.1 >= 2000) %>%
  filter(CPIdates.1 <= 2014) %>%
  dplyr::select(CPIdates.1, value) %>%
  arrange((desc(CPIdates.1)))
colnames(Arg.CPI.1) <- c("date2", "Arg.CPI")

With all of the necessary data collected, it is now possible to construct the Development Index for each country

Arg = cbind(Arg.debt.1, Arg.CO2) 
Arg = cbind(Arg, Arg.CPI.1) %>%
  dplyr::select(date, Arg.debt, Arg.CO2, Arg.CPI)
Arg.mean= ((Arg$Arg.debt*Arg$Arg.CO2*Arg$Arg.CPI)^(1/3))
Arg = cbind(Arg, Arg.mean)

Note the object Arg.mean. This variable is the cubed root of a product of three inputs. To calculate this mean debt level, CO2 emisssions, and the CPI are all multiplied together. Because there are three inputs, the cubed root is taken. This is called a geometric mean, as opposed to the familiar arthimetic mean. The reason for using the geometric mean is that it is the same measure the UNDP uses to calculate the Human Development Index

Binding Countries by row yields the following table

str(AU)

## 'data.frame':    210 obs. of  6 variables:
##  $ country: Factor w/ 14 levels "Argentina","Bolivia",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ date   : int  2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 ...
##  $ debt   : num  9.47 9.46 9.3 9.65 12.05 ...
##  $ CO2    : num  204025 189852 192356 191634 187919 ...
##  $ CPI    : num  113 135 121 110 100 ...
##  $ mean   : num  603 623 601 587 609 ...

head(AU)

##     country date     debt      CO2      CPI     mean
## 1 Argentina 2014  9.46895 204024.5 113.3800 602.8004
## 2 Argentina 2013  9.46153 189851.6 134.6680 623.0841
## 3 Argentina 2012  9.29994 192356.2 121.3980 601.0804
## 4 Argentina 2011  9.64515 191633.8 109.5330 587.1855
## 5 Argentina 2010 12.04860 187919.1 100.0000 609.4936
## 6 Argentina 2009 15.17380 179961.7  90.1606 626.7539

Results

First let’s examine what the individual country level trends look like:

ggplot(AU, aes(date, mean, col = country))+
  geom_point()+
  geom_line()+
  facet_wrap(~country)

Note that as the indicator goes up, utility goes down.

Next is it possible to identify a trend?

There isn’t enough data yet to build a time-series model of any one single country. However, if we use a panel data model, it is possible to extend our analysis across borders.

Fixed Effects

AU.fe <- plm(mean~date, data = AU, model = "within")
summary(AU.fe)

## Oneway (individual) effect Within Model
## 
## Call:
## plm(formula = mean ~ date, data = AU, model = "within")
## 
## Balanced Panel: n = 14, T = 15, N = 210
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -166.1575  -27.5008   -4.4719   26.9320  210.0539 
## 
## Coefficients:
##          Estimate Std. Error t-value  Pr(>|t|)    
## date2001   13.430     19.787  0.6787 0.4981908    
## date2002   49.486     19.787  2.5009 0.0132704 *  
## date2003   61.328     19.787  3.0994 0.0022475 ** 
## date2004   63.496     19.787  3.2089 0.0015747 ** 
## date2005   50.068     19.787  2.5303 0.0122447 *  
## date2006   54.455     19.787  2.7520 0.0065221 ** 
## date2007   56.297     19.787  2.8451 0.0049482 ** 
## date2008   45.442     19.787  2.2965 0.0227865 *  
## date2009   65.880     19.787  3.3294 0.0010535 ** 
## date2010   66.697     19.787  3.3707 0.0009155 ***
## date2011   76.852     19.787  3.8839 0.0001437 ***
## date2012  106.069     19.787  5.3605 2.492e-07 ***
## date2013  130.377     19.787  6.5889 4.622e-10 ***
## date2014  153.367     19.787  7.7508 6.226e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    804370
## Residual Sum of Squares: 498820
## R-Squared:      0.37986
## Adj. R-Squared: 0.28786
## F-statistic: 7.96308 on 14 and 182 DF, p-value: 4.3339e-13

Using a Fixed Effects model explains just under 30% of the variation we see in our model

Random Effects

AU.ra <- plm(mean~date, data = AU, model = "random")
summary(AU.ra)

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## 
## Call:
## plm(formula = mean ~ date, data = AU, model = "random")
## 
## Balanced Panel: n = 14, T = 15, N = 210
## 
## Effects:
##                    var  std.dev share
## idiosyncratic  2740.76    52.35 0.054
## individual    48342.93   219.87 0.946
## theta: 0.9386
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -169.3128  -31.5876   -5.9096   21.6465  229.1669 
## 
## Coefficients:
##             Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept)  280.594     60.406  4.6452 3.398e-06 ***
## date2001      13.430     19.787  0.6787 0.4973287    
## date2002      49.486     19.787  2.5009 0.0123871 *  
## date2003      61.328     19.787  3.0994 0.0019393 ** 
## date2004      63.496     19.787  3.2089 0.0013324 ** 
## date2005      50.068     19.787  2.5303 0.0113970 *  
## date2006      54.455     19.787  2.7520 0.0059226 ** 
## date2007      56.297     19.787  2.8451 0.0044393 ** 
## date2008      45.442     19.787  2.2965 0.0216474 *  
## date2009      65.880     19.787  3.3294 0.0008703 ***
## date2010      66.697     19.787  3.3707 0.0007497 ***
## date2011      76.852     19.787  3.8839 0.0001028 ***
## date2012     106.069     19.787  5.3605 8.300e-08 ***
## date2013     130.377     19.787  6.5889 4.430e-11 ***
## date2014     153.367     19.787  7.7508 9.133e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    840000
## Residual Sum of Squares: 534450
## R-Squared:      0.36375
## Adj. R-Squared: 0.31807
## Chisq: 111.483 on 14 DF, p-value: < 2.22e-16

Using a Random Effects model explains just over 30% of the variation we see in our model

Conclusions

Indices are an interesting may of measuring trends over time and space. There are many that exist that provide us with new information every year. Creating a new index may not be too difficult conceptually, but there are challenges to building one that is effective. There should be a strong case that a combination of quantitative variables represent a changing process in the world. An issue in this case is the scale of the indicator. For instance, CO2 Emissions as a number will be far larger than the Consumer Price Index. In the future the combination may provide a more meaningful representation if all input observations were first standardized, then combined using the geometric mean. Next, ensuring a sufficient amount of data exists is necessary for building the strongest time-series or panel data models. Finally, a Principle Component Analysis may be worth consideration.

References:

Federal Reserve Bank of St. Louis and US. Office of Management and Budget, retrieved from FRED, Federal Reserbe Bank of St. Louis

World Bank. World Development Indicators. Bank Group

A New Development Index

A Case Study in Data Visualization and Analysis in R

Michael Monzillo