--- title: "Higher and Graduate Education: Census, ENADE, IDD, CPC, IGC, and CAPES" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Higher and Graduate Education: Census, ENADE, IDD, CPC, IGC, and CAPES} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE, message = FALSE, warning = FALSE ) suppressPackageStartupMessages(library(systemfonts)) suppressPackageStartupMessages(library(textshaping)) ``` This vignette covers higher education and graduate education datasets available in educabR. These datasets allow you to analyze institutions, courses, student performance, and quality indicators across Brazilian higher education. ```{r setup} library(educabR) library(dplyr) library(ggplot2) ``` ## Higher Education Census The Higher Education Census is an annual survey covering all Brazilian higher education institutions (IES), including data on institutions, courses, student enrollment, and faculty. ### Available data types | Type | Description | |------|-------------| | `"ies"` | Institutions (location, administrative type, accreditation) | | `"cursos"` | Undergraduate courses (area, modality, enrollment) | | `"alunos"` | Student enrollment (demographics, enrollment status) | | `"docentes"` | Faculty (qualifications, employment type) | ### Downloading data ```{r censo-superior-download} # Institution data ies_2023 <- get_censo_superior(year = 2023, type = "ies") # Course data filtered by state cursos_sp <- get_censo_superior(year = 2023, type = "cursos", uf = "SP") # Faculty data with limited rows docentes_sample <- get_censo_superior( year = 2023, type = "docentes", n_max = 10000 ) ``` ### Available years Data is available from 2009 to 2024. ### Exploring ZIP contents ```{r censo-superior-files} # See what files are inside the downloaded ZIP list_censo_superior_files(2023) ``` ### Example analysis: Institutions by administrative type ```{r censo-superior-analysis} ies <- get_censo_superior(2023, type = "ies") ies_summary <- ies |> mutate( admin_type = case_when( tp_categoria_administrativa == 1 ~ "Public Federal", tp_categoria_administrativa == 2 ~ "Public State", tp_categoria_administrativa == 3 ~ "Public Municipal", tp_categoria_administrativa == 4 ~ "Private For-Profit", tp_categoria_administrativa == 5 ~ "Private Non-Profit", TRUE ~ "Other" ) ) |> count(admin_type, sort = TRUE) ggplot(ies_summary, aes(x = reorder(admin_type, n), y = n)) + geom_col(fill = "steelblue") + coord_flip() + labs( title = "Higher Education Institutions by Type (2023)", x = NULL, y = "Number of Institutions" ) + theme_minimal() + scale_y_continuous(label = scales::number_format(big.mark = ".", decimal.mark = ",")) ``` ![](../man/figures/vignette-higher-ies-type.png) --- ## ENADE - National Student Performance Exam ENADE (Exame Nacional de Desempenho dos Estudantes) is an annual exam assessing undergraduate student performance. It follows a rotating cycle where different course areas are evaluated each year. ### Downloading ENADE data ```{r enade-download} # Download ENADE microdata enade_2023 <- get_enade(year = 2023) # Sample for exploration enade_sample <- get_enade(year = 2023, n_max = 5000) ``` ### Available years Data is available from 2004 to 2024. ### Data structure ```{r enade-structure} enade <- get_enade(2023, n_max = 5000) glimpse(enade) ``` --- ## IDD - Value-Added Indicator IDD (Indicador de Diferenca entre os Desempenhos Observado e Esperado) measures the value added by an undergraduate course. It compares ENADE scores with the expected performance based on students' ENEM admission scores. ### Downloading IDD data ```{r idd-download} # Download IDD data idd_2023 <- get_idd(year = 2023) # Sample for exploration idd_sample <- get_idd(year = 2023, n_max = 5000) ``` ### Available years Data is available for 2014-2019 and 2021-2023 (no 2020 edition due to COVID). ### Example analysis: Value-added by institution type ```{r idd-analysis} idd <- get_idd(2023) glimpse(idd) ``` --- ## CPC - Preliminary Course Concept CPC (Conceito Preliminar de Curso) is a quality indicator for undergraduate courses. It combines ENADE scores, IDD, faculty qualifications, pedagogical resources, and student perceptions. CPC scores range from 1 to 5, where courses scoring 1 or 2 are considered unsatisfactory. ### Downloading CPC data ```{r cpc-download} # Download CPC data (Excel format, requires readxl) cpc_2023 <- get_cpc(year = 2023) # Sample for exploration cpc_sample <- get_cpc(year = 2023, n_max = 5000) ``` ### Available years Data is available for 2007-2019 and 2021-2023 (no 2020 edition). ### Example analysis: Course quality distribution ```{r cpc-analysis} cpc <- get_cpc(2023) # Distribution of CPC scores cpc |> filter(!is.na(cpc_faixa)) |> count(cpc_faixa) |> ggplot(aes(x = factor(cpc_faixa), y = n)) + geom_col(fill = "coral") + labs( title = "CPC 2023 - Course Quality Distribution", x = "CPC Score (1-5)", y = "Number of Courses" ) + theme_minimal() + scale_y_continuous(label = scales::number_format(big.mark = ".", decimal.mark = ",")) ``` ![](../man/figures/vignette-higher-cpc-dist.png) --- ## IGC - General Courses Index IGC (Indice Geral de Cursos) is a quality indicator for higher education institutions. It is calculated as a weighted average of CPC scores for undergraduate courses plus CAPES scores for graduate programs. IGC scores range from 1 to 5, providing an overall quality measure for each institution. ### Downloading IGC data ```{r igc-download} # Download IGC data (Excel format, requires readxl) igc_2023 <- get_igc(year = 2023) # Sample for exploration igc_sample <- get_igc(year = 2023, n_max = 1000) ``` ### Available years Data is available for 2007-2019 and 2021-2023 (no 2020 edition). Note: IGC 2007 comes as a 7z archive containing an Excel file. ### Example analysis: Institution quality ```{r igc-analysis} igc <- get_igc(2023) # Top institutions by continuous IGC igc |> filter(!is.na(igc_continuo)) |> filter(!is.na(sigla_da_ies)) |> arrange(desc(igc_continuo)) |> head(20) |> ggplot(aes(x = reorder(sigla_da_ies, igc_continuo), y = igc_continuo)) + geom_col(fill = "darkblue") + coord_flip() + labs( title = "Top 20 Institutions by IGC (2023)", x = NULL, y = "IGC (Continuous)" ) + theme_minimal() ``` ![](../man/figures/vignette-higher-igc-top20.png) --- ## CAPES - Graduate Education Data CAPES (Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior) provides open data on Brazilian graduate programs (stricto sensu: masters and doctoral programs). ### Available data types | Type | Description | |------|-------------| | `"programas"` | Graduate programs (area, institution, CAPES score) | | `"discentes"` | Students (enrollment, demographics, funding) | | `"docentes"` | Faculty (qualifications, research output) | | `"cursos"` | Graduate courses within programs | | `"catalogo"` | Theses and dissertations catalog | ### Downloading CAPES data ```{r capes-download} # Graduate programs programas_2023 <- get_capes(year = 2023, type = "programas") # Students (large dataset!) discentes_sample <- get_capes(year = 2023, type = "discentes", n_max = 10000) # Theses and dissertations catalog catalogo_2023 <- get_capes(year = 2023, type = "catalogo") ``` ### Available years Data is available from 2013 to 2024. Data is retrieved from the CAPES Open Data Portal via CKAN API. ### Example analysis: Graduate programs by knowledge area ```{r capes-analysis} programas <- get_capes(2023, type = "programas") # Count programs by broad knowledge area programas |> count(nm_grande_area_conhecimento, sort = TRUE) |> head(10) |> ggplot(aes( x = reorder(nm_grande_area_conhecimento, n), y = n )) + geom_col(fill = "purple4") + coord_flip() + labs( title = "Graduate Programs by Knowledge Area (2023)", x = NULL, y = "Number of Programs" ) + theme_minimal() ``` ![](../man/figures/vignette-higher-capes-area.png) --- ## Combining quality indicators CPC, IGC, IDD, and ENADE are closely related. Here is an example of how to combine them for a comprehensive view. ```{r combined-analysis} # Load CPC and IGC for the same year cpc <- get_cpc(2023) igc <- get_igc(2023) # Compare institution-level quality # IGC gives the overall institution score # CPC gives individual course scores within each institution igc_summary <- igc |> filter(!is.na(igc_faixa)) |> select(codigo_da_ies, sigla_da_ies, igc_continuo, igc_faixa) cpc_summary <- cpc |> filter(!is.na(cpc_continuo)) |> group_by(codigo_da_ies) |> summarise( n_courses = n(), mean_cpc = mean(cpc_continuo, na.rm = TRUE), .groups = "drop" ) combined <- inner_join(igc_summary, cpc_summary, by = "codigo_da_ies") ggplot(combined, aes(x = mean_cpc, y = igc_continuo, size = n_courses)) + geom_point(alpha = 0.4, color = "steelblue") + labs( title = "IGC vs Average CPC by Institution (2023)", x = "Average CPC (Continuous)", y = "IGC (Continuous)", size = "Courses Evaluated" ) + theme_minimal() ``` ![](../man/figures/vignette-higher-igc-vs-cpc.png)