I. Introduction

The State University of New York (SUNY) is one of the largest public higher education systems in the country, with 64 total institutions spread across 10 state regions. These regions are highly varied, and include a megacity, the rural periphery, and several old industrial cities experiencing population loss. The institutions are varied too, and include research universities, comprehensive colleges, community colleges, and technical institutes.

SUNY’s heterogeneous geographic and institutional structure is designed to facilitate access and cultivate human capital at several degree levels. This is explicit in SUNY’s stated mission: to recognize the “fundamental role of its responsibilities in undergraduate education and [to provide] a full range of graduate and professional education that reflects the opportunity for individual choice and the needs of society” (suny.edu).

How well does the SUNY system fulfill this mission at the regional and state scales? The following study offers a preliminary assessment of SUNY’s role in attracting, forming, and retaining human capital in different parts of the state. The regional scale is used to capture both the variety of needs across New York, but also because it demonstrates the interrelationships of different SUNY institutions in a given area. The purpose of this exercise is 1) to identify strengths and weaknesses in the system at the regional scale, and 2) to offer some preliminary conclusions and identify areas for further research.

II. Methods and Materials

This project uses data compiled from the Census Tigris package, US Census American Community survey, the Rockefeller Institute, and the SUNY gradwages dashboard. I use a series of line, plot, and bar graphs, some with reference lines, to visually convey the different roles the SUNY system plays within each region. A few linear regressions are also run to test for the strength of the relationship between graduate retention and wages earned 10 years after graduating.

#load necessary packages

library(readr)
library(ggplot2)
library(dplyr)
library(tidyverse)
library(RColorBrewer)
library(tidycensus)
library(readr)
library(knitr)
library(lattice)
library(spData)
library(sf)
library(tidyverse)
library(tigris)
library(rgdal)
library(sp)
library(rgeos)
library(ggmap)

census_api_key("7912510ff91a51de560cf1e9160860875f062880")

III. Who SUNY serves

SUNY has campuses in every part of the state. Institutions tend to congregate near one another in the state’s largest metropolitan areas: New York City and its environs, the capital region of Albany, Syracuse in Central New York, and Buffalo in Western New York. In each of these areas, comprehensive college and community or technical colleges complement one or more doctoral universities.

#load new york counties from tigris
ny <- counties("NY")
ny <- counties("NY", cb = TRUE, resolution = "5m")

#load regions and school locations
regions <- read_csv("Census_geography.csv")

#join region attribute with spatial data
ny_regions <- merge(ny, regions, by.x = "NAME", by.y = "County")
class(ny_regions)

#convert sp to sf
ny_regions <- st_as_sf(ny_regions)

#dissolve county borders
ny_regions_dissolved <-
    ny_regions %>%
    group_by(Region) %>%
  summarise(status = 'dissolved')

#load school locations
schools <- read_csv("SUNY_locations_and_student_counts.csv")
#join school locations to ny regions
schools2 <- merge(ny_regions_dissolved, schools, by = "Region")

#plot regions on map
ggplot() +
  geom_sf(ny_regions_dissolved, mapping = aes(fill = Region))+
  scale_fill_brewer(palette = "Paired")+
  geom_point(data = schools, mapping = aes(x= lon, y = lat, shape = Level))+
  ggtitle("SUNY Institutional Distribution by Region")+
  theme_bw()+
  theme(axis.title.x = element_blank(),
           axis.title.y = element_blank()
  )

The stacked bar plot below demonstrates that a majority of SUNY students attend college within their own region. This is likely due to cost and information symmetries, as well as individual preference. The two exceptions are New York City, which draws the most out-of-state students, and Southern Tier, which is home to Cornell University and attracts the largest number of international students. Both of these outliers are intuitive; New York is a global city, and Cornell is also an ivy league college with excellent name recognition. Mohawk Valley happens to draw a large number of non-regional in-state students, but that’s likely because its comprehensive college, SUNY Oneonta, straddles the border with the Southern Tier region (and is also just upstate from New York City).

#load student origin data
origins <- read_csv("suny_student_origins.csv")

#build stacked bar plot
ggplot(origins, aes(x = Region, y = Value, 
               fill = Key))+
  geom_col(position = position_stack(reverse = FALSE))+
  scale_fill_brewer(palette = "Paired")+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust = 1),
        axis.text = element_text(size = 7),
        legend.text = element_text(size = 7),
        legend.title = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_text(size = 7),
        axis.ticks = element_blank(),
        plot.caption = element_text(size = 6)
          )+
  labs(title = "SUNY Student Origins by Region",
       y = "Percent of Total Students",
       caption = "Data: Rockefeller Institute"
       )

IV. How important are SUNY graduates to each region’s workforce?

The relative importance of SUNY alumni - and regional schools - to sub-state labor markets varies widely. For example, SUNY graduates make up a disproportionate number of total graduates in rural parts of the state like Mohawk Valley and North Country. Graduates of regional SUNY schools make up over 50% of the North Country’s total pool of residents with a degree, but total SUNY alums comprise more than 70% of area graduates.

Despite making up a smaller share of total graduates in the Capital District, SUNY graduates are about 35% of the total workforce there (which is highly educated on the whole).

SUNY graduates in New York City were mostly educated outside of the region, and the city succeeds in “pulling” college graduates into town. The opposite is true of Western New York; 60% of area graduates are SUNY alums, and about 55% were educated at regional schools. This may be because SUNY has monopolized higher education in the region (though that seems unlikely as there are several other private colleges in and around Buffalo), or because a high proportion of graduates decide to remain in the area after graduation, or because the region struggles to attract college graduates from elsewhere.

#load grad retention data
grad_retention <- read_csv("grad_retention_percent_ttl_grads.csv")

#plot grad retention total graduates
ggplot(grad_retention, aes(x = Region, y = Value,
                           fill = Key))+
  geom_col(position = position_stack(reverse = FALSE))+
  scale_fill_brewer(palette = "Paired")+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust = 1),
        axis.text = element_text(size = 7),
        legend.text = element_text(size = 7),
        legend.title = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_text(size = 7),
        axis.ticks = element_blank(),
        plot.caption = element_text(size = 6),
        legend.key.size = unit(0.4, "cm")
  )+
  labs(title = "SUNY Alums as a Percent of Graduates in Region",
       y = "Percent of Total Graduates",
       caption = "Data: Rockefeller Institute")

#load workforce grad retention data
grad_retention <- read_csv("grad_retention_percent_ttl_workforce.csv")

#build stacked bar plot
ggplot(grad_retention, aes(x = Region, y = Value,
                           fill = Key))+
  geom_col(position = position_stack(reverse = FALSE))+
  scale_fill_brewer(palette = "Paired")+
  theme_bw()+
  theme(axis.text.x = element_text(angle = 90, 
                                   hjust = 1),
        axis.text = element_text(size = 7),
        legend.text = element_text(size = 7),
        legend.title = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_text(size = 7),
        axis.ticks = element_blank(),
        plot.caption = element_text(size = 6),
        legend.key.size = unit(0.4, "cm")
  )+
  labs(title = "SUNY Alums as a Percent of Regional Workforce",
       y = "Percent of Total Workforce",
       caption = "Data: Rockefeller Institute")

V. Matching SUNY Graduates’ Skills to Local Labor Markets

An important dimension of any public university’s impact is whether the skills of their graduates match demand. Here, I compare degrees held by alums living in the region as a percent of total regional jobs requiring a related degree to SUNY alums as a percent of total regional graduates (the dashed reference line). This data is from the Rockefeller Institute, which compiled graduate major information from the International Postsecondary Education Data System (IPEDS) and job demand estimates from the NY Department of Labor.
The series of bar plots below depict the skills (mis)match.

#load Skills Match Data from CSV file
skills <- read_csv("suny_skills_match_by_region.csv")

#build faceted bar graphs w ref lines
ggplot(skills) + 
  geom_col(aes(x = Field , y = Degrees, fill = Field), 
           position = "dodge")+
  scale_fill_brewer(palette = "Paired")+
  labs(x = "Field", 
       y = "SUNY Degrees as a % of Jobs Requiring One",
       caption = "--- SUNY graduates as a proportion of total regional graduates")+
  theme_bw()+
  theme(axis.text.x = element_blank(),
        axis.ticks.x = element_blank()
        )+
  geom_hline(aes(yintercept = Percent), 
             col = "Black", linetype = "dashed")+
  facet_wrap(~Region)+
  theme(legend.position = "bottom", 
        legend.box = "horizontal", 
        legend.text = element_text(size = 8),
        legend.key.size = unit(0.4, "cm"),
        legend.title = element_blank(),
        plot.caption = element_text(size = 7),
        axis.title.x = element_text(size = 10),
        axis.title.y = element_text(size = 10)
        )+
  ggtitle("Matching SUNY Graduates' Skills to Area Labor Markets")

Overall, there is impressive consistency between SUNY graduates’ skills and regional labor market demand. But there are some notable mismatches. In all cases, there is an oversupply of agriculture and art degrees. The largest and most consistent undersupply is in hospitality. Mohawk Valley, Long Island, and Western New York appear to have the least perfect relationship between skills supplied and skills demanded. North Country, Central New York, Southern Tier, New York City, and the Capital District all fare better, though in different ways.

VI. Investigating the Skills Gap

Are these skills gaps primarily caused by broader market trends, such as higher demand for certain kinds of skills outside of the region? Wage rates are a good proxy for the level of market power and degree of specialization alums hold, and help to illuminate whether students may have better job options elsewhere. Mapping them according to degree level and alongside graduate retention rates helps to reveal a few trends.

#load wage data
wages <- read_csv("SUNY_locations_enrollment_wages.csv")

ggplot(wages, aes(x = Wages, y = Residents, col = Level))+
  geom_point()+
  scale_fill_brewer(palette = "Paired")+
  theme_bw()+
  theme(legend.key.size = unit(0.4, "cm"))+
  ggtitle("Graduate Wages and New York Residency after 10 Years")+
  labs(x = "Wages 10 Years After Graduating", y = "% Graduates Living in New York After 10 Years",
       caption = "Data: SUNY Office of Institutional Research and Data Analytics")+
  theme(plot.caption = element_text(size = 7))

From the above scatterplot, it appears as though graduates of Community, Comprehensive, and Technical schools are all more likely to remain in NY state after graduating. They also tend to make less money.

Linear regressions reveal a significant and negative relationship between wages and percent of graduates who remain in New York State after graduating (the higher the grad’s wages, the less likely they are to stay and work in NY), but, perhaps due to outliers, the overall explanatory power of these models is extremely weak (R2 is less than 0.03). “Wage Premiums” are the percent above the regional median wage graduates of a given school/program make.

programs <- read_csv("schools_programs_wages_residency.csv")

#filter schools that don't have programs or have fewer than 10 graduates
programs1 <- programs %>%
  filter(Wages != 0)
#linear regression wages residency status
residentwages <- lm(Residents~Wages, data = programs1)
#print summary
summary(residentwages)
## 
## Call:
## lm(formula = Residents ~ Wages, data = programs1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.884  -5.922   0.477   7.246  26.249 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.281e+01  1.863e+00  39.089  < 2e-16 ***
## Wages       -1.006e-04  3.042e-05  -3.307  0.00105 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.3 on 324 degrees of freedom
## Multiple R-squared:  0.03266,    Adjusted R-squared:  0.02967 
## F-statistic: 10.94 on 1 and 324 DF,  p-value: 0.001048
programs <- read_csv("schools_programs_wages_residency.csv")

#calculate wage premiums by school and field of study
programs_new <- programs %>%
  filter(Wages != 0) %>%
  mutate(wage_premium1 = Wages - Regional_Wages)%>%
  mutate(wage_premium2 = wage_premium1/Regional_Wages)

#linear regression wage premiums and NY residency
residentwagepremium <- lm(Residents~wage_premium2, data = programs_new)

#print summary
summary(residentwagepremium)
## 
## Call:
## lm(formula = Residents ~ wage_premium2, data = programs_new)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -35.877  -5.875   0.560   7.307  23.067 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     69.375      1.032  67.240  < 2e-16 ***
## wage_premium2   -2.974      1.049  -2.834  0.00488 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.34 on 324 degrees of freedom
## Multiple R-squared:  0.02419,    Adjusted R-squared:  0.02118 
## F-statistic: 8.032 on 1 and 324 DF,  p-value: 0.004883

Though the skills gap is at least partially a retention issue, it may also be a supply side issue. After all, not all SUNY schools offer degree programs in all fields, their program size may be small, or they may be missing one or more degree levels in a given region. To dig into both retention and degree programs offered, I look at field of study, NY state residency, and wage rates in a few regions of interest (which I selected based on the skills gap visualization above): Western New York, Mohawk Valley, and Long Island. Color is mapped onto type of institution, and size is mapped onto cohort numbers 10 years after graduation. Cohort size is intended to stand in for program size.

In Western New York, SUNY is undersupplying graduates in business and finance, computers and math, and hospitality. One issue appears to be small cohort size and weak retention rates for graduates of the University at Buffalo (the only doctoral university in the area) in computers and math. And no SUNY school in the area offers a degree in hospitality.

programs <- read_csv("schools_programs_wages_residency.csv")
## Warning: Missing column names filled in: 'X9' [9], 'X10' [10]
#calculate average wages by program of study in each region
programs2 <- programs_new %>%
  group_by(Region) %>%
filter(Region == "Western New York")
  
#organize graphics
ggplot(programs2, aes(x = wage_premium2, y = Residents, col = Level, size = Sample))+
  geom_point()+
  scale_fill_brewer(palette = "Paired")+
  theme_bw()+
  theme(legend.key.size = unit(0.4, "cm"))+
  ggtitle("Western New York: Graduate Wages and New York Residency after 10 Years")+
  labs(x = "Wage Premium* 10 Years After Graduating", y = "% Graduates Living in New York After 10 Years")+
   theme_bw()+
  facet_wrap(~Program)+
  theme(legend.position = "bottom", 
        legend.box = "horizontal", 
        legend.text = element_text(size = 8),
        legend.key.size = unit(0.4, "cm"),
        legend.title = element_blank(),
        plot.caption = element_text(size = 7),
        axis.title.x = element_text(size = 10),
        axis.title.y = element_text(size = 10)
        )

Mohawk Valley, a rural area in the center of the state, presents skills gaps in business and finance, education, health, hospitality, law and public service, and social services.

There are no doctoral universities in the region, so the gap in law and public service may be partially due to the absence of a law school (the only SUNY law school is at UB in Western New York). A similar challenge is likely happening in the local healthcare industry. Despite several degree programs at community, comprehensive, and technical schools, the absence of a medical school means demand for certain high-level skills is outstripping supply.

Retention rates are quite low for education and social service degree recipients, which may be driving the gap there.

#calculate average wages by program of study in Mohawk Valley region
programs3 <- programs_new %>%
  group_by(Region) %>%
filter(Region == "Mohawk Valley")
  
#organize graphics
ggplot(programs3, aes(x = wage_premium2, y = Residents, col = Level, size = Sample))+
  geom_point()+
  scale_fill_brewer(palette = "Paired")+
  theme_bw()+
  theme(legend.key.size = unit(0.4, "cm"))+
  ggtitle("Mohawk Valley: Graduate Wages and New York Residency after 10 Years")+
  labs(x = "Wage Premium 10 Years After Graduating", y = "% Graduates Living in New York After 10 Years",
       caption = "Data: SUNY Office of Institutional Research and Data Analytics")+
   theme_bw()+
  facet_wrap(~Program)+
  theme(legend.position = "bottom", 
        legend.box = "horizontal", 
        legend.text = element_text(size = 8),
        legend.key.size = unit(0.4, "cm"),
        legend.title = element_blank(),
        plot.caption = element_text(size = 7),
        axis.title.x = element_text(size = 10),
        axis.title.y = element_text(size = 10)
        )

Long Island also presents skills gaps in business and finance, computers and math, education, health, hospitality, and technical trades. Its doctoral institution, Stony Brook, is well known for its math and computer science programs, and its many graduates may make more money outside of the state. Education graduates tend to stay in the region and make about 1.5x the region’s median wage. No SUNY schools in the area offer degrees in hospitality. While retention rates and wages are quite high for technical trades, cohort sizes are small.

programs <- read_csv("schools_programs_wages_residency.csv")
## Warning: Missing column names filled in: 'X9' [9], 'X10' [10]
#calculate average wages by program of study in each region
programs4 <- programs_new %>%
  group_by(Region) %>%
filter(Region == "Long Island")
  
#organize graphics
ggplot(programs4, aes(x = wage_premium2, y = Residents, col = Level, size = Sample))+
  geom_point()+
  scale_fill_brewer(palette = "Paired")+
  theme_bw()+
  theme(legend.key.size = unit(0.4, "cm"))+
  ggtitle("Long Island: Graduate Wages and New York Residency after 10 Years")+
  labs(x = "Wage Premium 10 Years After Graduating", y = "% Graduates Living in New York After 10 Years",
       caption = "Data: SUNY Office of Institutional Research and Data Analytics")+
   theme_bw()+
  facet_wrap(~Program)+
  theme(legend.position = "bottom", 
        legend.box = "horizontal", 
        legend.text = element_text(size = 8),
        legend.key.size = unit(0.4, "cm"),
        legend.title = element_blank(),
        plot.caption = element_text(size = 7),
        axis.title.x = element_text(size = 10),
        axis.title.y = element_text(size = 10)
        )

VII. Conclusions and Additional Research Questions

In many ways, the SUNY state system is a model for human capital formation. About 1.5 million SUNY alums live in New York, and the state’s median educational attainment and median income have consistently exceeded national rates.

regional_ed <- read_csv("suny_regional_education_tidy.csv")

#isolate NY and USA

NY <- regional_ed %>%
  filter(Region == "NY State")

USA <- regional_ed %>%
  filter(Region == "USA")

regional_ed1 <- regional_ed %>%
  filter(Region != "NY State") %>%
  filter(Region != "USA")

#plot education rates in each region

plot <- ggplot(data = regional_ed1, 
       aes(x = Year, 
           y = bach_or_higher, 
           col = Region))+ geom_line()+
  scale_color_brewer(palette = "Paired")

plot + geom_line(data = NY, 
                 aes(x = Year, y = bach_or_higher), 
                 col = "Black", size = 1)+
  geom_line(data = USA, 
            aes(x = Year, y = bach_or_higher), 
            col = "Black", linetype = "dashed", size = 1)+
  theme_bw()+
  theme(panel.grid.major = element_blank())+
  labs(x = "Year", 
       y = "% Population with Bachelor's or Higher", 
       title = "Educational Attainment in New York State",
       caption = "___ New York; -- US"
         )

#load income data from .csv file
regional_income <- read_csv("suny_regional_income_data_tidy.csv")

#isolate NY and USA

NY <- regional_income %>%
  filter(Region == "NY State")

USA <- regional_income%>%
  filter(Region == "USA")

regional_income1 <- regional_income %>%
  filter(Region != "NY State") %>%
  filter(Region != "USA")

#build line graphs by year, with NY and USA as reference lines

base_plot <- ggplot(data = regional_income1, 
                    aes(x = Year, y = Income, col = Region))+
  geom_line()

#add NY State and USA as reference lines

base_plot +
  geom_line(data = NY, 
            aes(x = Year, y = Income), 
            col= "Black", size = 1)+
  scale_color_brewer(palette = "Paired")+
  geom_line(data = USA, 
            aes(x = Year, y = Income), 
            col = "Black", linetype="dashed", size = 1)+
  theme_bw()+
  theme(panel.grid.major = element_blank())+
  labs(x = "Year", 
       y = "Median Income", 
       title = "Median Incomes in New York State",
       caption = "__ US; --- NY"
  )

Yet there is significant variation between regions, and SUNY’s role in helping to attract, form, and redistribute human capital also varies widely across the state. While the system has a relatively small effect on New York City, it plays a more central role in rural (Mohawk Valley, North Country) and post-industrial areas (Western and Central New York). These regions have low median incomes and educational attainment rates relative to both the state and the country. It also appears that these regions struggle to attract non-SUNY college graduates. Without access to SUNY schools, it is likely that these areas would be even further behind.

However, because of the outsize role SUNY plays in places like Mohawk Valley and Western New York, the interface between degrees conferred and local labor markets is even more critically important than in economically advantaged areas. This study takes a first step towards understanding some of the drivers behind the relationship between skills supplied and regional labor demand. Earnings after graduation by degree type help to explain some of the mismatch. Available programs and cohort size also matter.

VIII. References

How SUNY Matters: Economic Impacts of the State of New York. (2014). Rockefeller Institute Report.