2023-04-28

Mini-hackathon

The first time we meet in this setting (first deep dive into DiVA/Inveno):

  • Gaël Dubus, Anders Wändahl, Sam Al-Arbid, Nazar Dino, Markus Skyttner

Discussion points:

  • When doing analyses of research outputs from KTH, what would be an ergonomic data flow?
  • What system is most suitable as a primary/main data source for research outputs / publication metadata?
  • What can data about research outputs be used for at KTH?

What do we need for supporting an ergonomic data flow for analysing?

Depending on the analyst or user, different functionality is required:

  • Researcher: Good Web UI for data entry for researchers, to maintain and update
  • Analyst: APIs for system integrations and automatable workflows
  • Analyst: Toolbox / utilities for using and combining data in analysis
  • ITA: Reliable components that can be adapted to workflows at KTH

Main data source for research outputs – currently DiVA

Current DiVA - future Invenio … probably?

The current DiVA repository have some strength and present some challenges:

  • Used nationally in many institutions
  • API lacking, slow progress towards CORA (DiVA API v3)
  • Difficult to extend for KTH specific purposes

Future: Invenio

Invenio have some strengths in areas where DiVA does not:

  • Invenio is similar to Zenodo, but more flexible, so known “globally”
  • Good API, documentation
  • Global community, actively maintained
  • Extensible with custom fields
  • Support for links / minted DOIs
  • Presentation / GUI can be customized (can we make it look like DiVA, do we even want this?)
  • Control over the data and the system, more empowered to quickly make adjustments
  • Adapted by other institutions
  • Comply to EU-standards for open data, a regional/global community

Utblick - dataflöden & tjänster

  • Vad är poängen med dataflöden?
    • transparens & tillgänglighet
    • reproducerbarhet
    • analytics
    • standardisering

Exempel på dataflöde

  • Infrastruktur
    • Datatillgänglighet
    • Mellanlager för data (S3)
    • Knyta samman datakällor

BIBMET schematics

## Andra tillgängliga datakällor och verktyg

Workshop #2 - Migrate DiVA to Invenio

Tasks

  • Migrate a few different records from DiVA to Invenio@KTH

    • Upload: Thesis, article, conference proceeding
    • Evaluate what is missing (for a specific usage / use case)?
    • Field mappings - Nazar has made a mapping for a complete view (including optional fields)
    • We could here first focus on required fields in DiVA
    • How to deal with missing values - a value is missing in DiVA, how will it be filled?

Pre-populating Invenio with “vocabularies”

  • Pre-populating or “seeding” Invenio with “vocabularies” requires YAML for
    • Organizations (ROR and internal KTH orgs)
    • Names/People/Authors (creators)
    • Resource types
    • Funding
    • Subjects (FOS, CESSDA - controlled vocabulary consortium och european social sciences, MESH) - namespaced with a specific - SwePub requires UKÄ-classification (at least UKÄ, Scopus, WoS etc)
    • Users

Discussion points

  • Q: You can add, but you can not delete…. So a difference….
  • Q: Are all “funders” pre-seeded/loaded - right now Swecris funders - ROR provides list of funders… and we can add Swecris and … What are the DiVA funders….
  • Q: How do we match DiVA keywords to controlled vocabularies?
  • Q: For Swedish keywords UKÄ and possibly an European one
  • Q: Is name variations possible to load? Investigate.
  • Q: How would it be possible to use different UI views for different resource types for a) data entry b) curation.

Migrating data from DiVA to Invenio

Example - migrate PID 1750711

Migration Plan

  • Existing tools
    • Time to research and evaluate migration tools, for example…
    • invenio-rdm-migrator (from Invenio Software)
    • more?
  • “Preseeding” Invenio from YAML files
    • Tool - CSV-to-YAML (possibly)
    • Load YAML files for affiliations/orgs, people/authors/creators, subjects, users and other “entities”
  • Tool - CSV to JSON-file
    • Cleaning
    • Untangle mixed vocabularies…
    • Rate-limiting-call - can be managed individually
    • Quality control - checks for things like character encoding etc… and content

UI - Data Entry and Curation in Invenio