Skip to contents

Database content

The tables in the database have the following table names, column names and row and column counts:

library(institutions)
library(dplyr, warn.conflicts = FALSE)
library(purrr)
library(knitr)

institutions_download()

# fcn to describe a table

desc_table <- function(x)
  tibble(
    tablename = x,
    rowcount = institutions_table(x) %>% nrow(),
    colcount = institutions_table(x) %>% ncol(),
    colnames = institutions_table(x) %>% names() %>% paste0(collapse = ", ")
  )

# iterate over existing tables and display results

tables <- institutions_tables()

desc <-
 tables %>% map_df(desc_table) %>%
 arrange(desc(rowcount))

knitr::kable(desc)
tablename rowcount colcount colnames
external_ids 229321 3 grid.id, external_id_type, external_id
addresses 102392 14 grid_id, line_1, line_2, line_3, lat, lng, postcode, primary, city, state, state_code, country, country_code, geonames_city_id
grid 102392 5 ID, Name, City, State, Country
institutes 102392 5 grid_id, name, wikipedia_url, email_address, established
types 102362 2 grid_id, type
links 100857 2 grid_id, link
acronyms 42840 2 grid_id, acronym
relationships 40868 3 grid_id, relationship_type, related_grid_id
labels 27092 3 grid_id, iso639, label
aliases 24225 2 grid_id, alias
geonames 15007 14 geonames_city_id, city, nuts_level1_code, nuts_level1_name, nuts_level2_code, nuts_level2_name, nuts_level3_code, nuts_level3_name, geonames_admin1_code, geonames_admin1_name, geonames_admin1_ascii_name, geonames_admin2_code, geonames_admin2_name, geonames_admin2_ascii_name

The tables in the database are based on the CSV format. This format is described in more detail here: https://grid.ac/format.

Size on disk

The size on disk for some of the generated files (the SQLite db is larger than the zipped CSVs downloaded due to the Full Text Search tables that are generated and added to the database):


cfg <- institutions_cfg()

print_fs <- function(x) 
  sprintf("File %s has size %s", basename(x), 
          utils:::format.object_size(file.info(x)$size, "auto"))

files <- c(cfg$zip, cfg$db)

sizes <- map_chr(files, print_fs)

File grid.zip has size 37.3 Mb

File grid.db has size 58.7 Mb