Agenda

  • Status update for the DAUF project
  • New ABM 2025 with overall result and updates
  • New beta version of subject based KTH Research Information app
  • OpenAlex Database on Sunet
  • News related to data curation - new version of DiVA coming
  • Future directions and your questions and feedback

About the DAUF project

  • Creating services and tools for presentation of research information data, improved data flows and connecting data sources within KTH
  • Agile model with 2 week sprints
  • Collaboration between KTH Library, RSO and ITA
  • Part of IT portfolio for Research (Delportfölj forskning), within in the object “Publicering och analys”

Status and progress update

Progress overview - since last demo

  • This years version of ABM was released a couple of weeks ago

  • Recently released beta version of topics based KTH Research Information

  • Assisting on graphs and summary for KTH Indicator report based on consolidated indicators collected from across KTH.

  • Tests and prep for GDP 2.0 (Gemensamma dataprojektet) - new standard for Swedish project data

  • Work to use OpenAlex to update DiVA, and to construct bibliometric database

Annual Bibliometric Monitoring 2025

Changes in ABM 2025

  • More interactive graphs (plotly)
  • Changed OA graph
  • Enabled selection of number of rows for co-publication tables
  • Some cosmetic changes

Brief ABM results for KTH

  • Number of publications seems to have stabilized
    • proceedings have reterned to pre-covid levels
  • Tendency for citations indicators to decrease
    • will be evaluated further
    • seems to be spread across several schools and subjects
  • Journal indicators stable but slightly increasing over last 5 years
  • Small changes in co-publication patterns
  • Share of Open Access publications sharply decreasing last year
    • reasons unclear at the moment - might in part be lagging green OA

KTH Research Information - Topics (beta)

Swedish bibliometrics & OpenAlex

Background

  • Bibliometric analysis has historically been based on commercial data sources (Web of Science, SciVal, Incites, Dimensions)
  • Sweden is lacking a national effort or common system
  • Latest research bill is pointing towards more openness in research evaluation

OpenAlex

  • Open resource, stemming from Microsoft Academic (closed 2021)
  • Web interface and API (json)
  • Harvesting from Crossref, Pubmed, ArXiv, Zenodo and local repositories

Current status

  • Server running on Sunet for snapshot download and data processing
  • Flattening of json to .parquet
  • Database using duckdb
  • Access by database connections and sql requests through basic API
  • Model for long-term funding and development currently missing
  • Current development with KTHB bibliometrics and the DAUF project, together with KI

Data contents

About 200 mln articles, 24 mln book chapters, 10 mln proceedings

Type Example
Basic bibliographics Title, publication channel, year, issue…
Authors Name, Person-ID (Orcid), affiliation
Organisations Address, Org-ID (ROR), Organisation type
Funders Funder type, roles, grant nr. (sometimes)
Subject area Keywords, hierarchical topic structure, MESH-terms
Open Access Status, URL:s
Sustainability class SDG
Reference lists & citations Citation counts, some basic indicators

DiVA works in Open Alex

Preliminary/WIP: matching DiVA pid against Open Alex ids using DOIs/PMIDs; coverage:

  n_yes   n_no                   type_diva                           coverage           
                                                                                        
  65712    8544   Article in journal                         ██████████████████░░ 88 %  
  19163   18698   Conference paper                           ██████████░░░░░░░░░░ 51 %  
   2460    2881   Chapter in book                            █████████░░░░░░░░░░░ 46 %  
   1383      86   Article, review/survey                     ███████████████████░ 94 %  
    172     596   Book                                       ████░░░░░░░░░░░░░░░░ 22 %  
    157    4822   Manuscript (preprint)                      █░░░░░░░░░░░░░░░░░░░  3 %  
    152     313   Article, book review                       ███████░░░░░░░░░░░░░ 33 %  
     89     241   Collection (editor)                        █████░░░░░░░░░░░░░░░ 27 %  
     54    4109   Report                                     ░░░░░░░░░░░░░░░░░░░░  1 %  
     52     216   Conference proceedings (editor)            ████░░░░░░░░░░░░░░░░ 19 %  
     20     699   Other                                      █░░░░░░░░░░░░░░░░░░░  3 %  
      8      33   Data set                                   ████░░░░░░░░░░░░░░░░ 20 %  
      5    1281   Doctoral thesis, monograph                 ░░░░░░░░░░░░░░░░░░░░  0 %  
      4    5547   Doctoral thesis, comprehensive summary     ░░░░░░░░░░░░░░░░░░░░  0 %  
      1      54   Artistic output                            ░░░░░░░░░░░░░░░░░░░░  2 %  
      1    2369   Licentiate thesis, comprehensive summary   ░░░░░░░░░░░░░░░░░░░░  0 %  
      1   42086   Student thesis                             ░░░░░░░░░░░░░░░░░░░░  0 %  
      0     960   Licentiate thesis, monograph               ░░░░░░░░░░░░░░░░░░░░  0 %  
      0     290   Manuscript                                 ░░░░░░░░░░░░░░░░░░░░  0 %  
      0     621   Patent                                     ░░░░░░░░░░░░░░░░░░░░  0 %  

Future

  • Aim for common system and data flows, enabling more transparent and comparable analyses
  • Working on project idea with Vinnova, KI and VR
  • Evaluation of data
    • strengths and weaknesses
    • where is additional curation needed?
  • More filters for selection of source and work types
  • Bibliometric indicators
  • Multiple ways for access, depending on different needs and use cases

Data Curation

Data Curation - overview

  • Preparations under way for migration to a future new version of DiVA
    • Launch of new DiVA with API is postponed, and has new timeline
    • All records need review to meet new (not yet finalized) requirements
    • Updates are required for KTH curation tools and processes
  • Broader discussions relating to a future data lake for “KTH Works”
    • Publication data mirrored in a separate system under control from KTH
    • Ability to cross reference other research outputs and auxiliary data from external sources
    • Revamp curation process - increase automation and data enrichment from external sources, sync to DiVA repository

Data Curation and data flows

Object storage (S3)

General Dataflow

+--------------------------------+
|                                |
|          Data Sources          |
|                                |
+--------------------------------+
                 |                
  Clean / Crosscheck / Transform  
                 v                
+--------------------------------+
|                                |
|          Curated Data          |
|                                |
+--------------------------------+
                 |                
           Write / POST           
                 v                
+--------------------------------+
|                                |
|    [S3] Bronze/Silver/Gold     |
|                                |
+--------------------------------+
                 |                
            Read / GET            
                 v                
+--------------------------------+
|                                |
|     Data Consumer / Client     |
|                                |
+--------------------------------+

DiVA curation

The DAUF project now harvests DiVA publication data from KTH using the OAI-PMH protocol which regularly updates duckdb databases, openly available from object storage:

The database is regularly updated. This is WIP and jocularly codenamed “KaTHarsis”

  • Harvest of KTH works in DiVA now available as relational database
  • Ambition to decouple importing and curation from DiVA in preparation for new DiVA
  • Can curate and annotate works using this database, aka “stoplists”
  • Preparations to use APIs to sync data between DiVA repository and this database

DiVA curation stats

Journal articles in DiVA 2015 - 2025

                                                                                              
    y     art_n_pi         pi          art_n_r          r                shr           pct    
                                                                                              
   2015        932   ░░░░                  879   ░░░░              ████████░░░░░░░    51 %    
   2016       1142   ░░░░░                1260   ░░░░░░            ███████░░░░░░░░    48 %    
   2017       1922   ░░░░░░░░░             855   ░░░░              ██████████░░░░░    69 %    
   2018       2570   ░░░░░░░░░░░░          683   ░░░               ████████████░░░    79 %    
   2019       3329   ░░░░░░░░░░░░░░░      1218   ░░░░░░            ███████████░░░░    73 %    
   2020       3240   ░░░░░░░░░░░░░░░       880   ░░░░              ████████████░░░    79 %    
   2021       2715   ░░░░░░░░░░░░░        1167   ░░░░░             ███████████░░░░    70 %    
   2022       2767   ░░░░░░░░░░░░░         898   ░░░░              ███████████░░░░    75 %    
   2023       3141   ░░░░░░░░░░░░░░░       734   ░░░               ████████████░░░    81 %    
   2024       2530   ░░░░░░░░░░░░          668   ░░░               ████████████░░░    79 %    
   2025       2670   ░░░░░░░░░░░░░         516   ░░                █████████████░░    84 %    
                                                                                              

DiVA curation stats …

Conference papers in DiVA 2015 - 2025

                                                                                              
    y     con_n_pi         pi          con_n_r          r                shr           pct    
                                                                                              
   2015        454   ░░░░░                 923   ░░░░░░░░░         █████░░░░░░░░░░    33 %    
   2016        675   ░░░░░░░               743   ░░░░░░░           ███████░░░░░░░░    48 %    
   2017        757   ░░░░░░░░              828   ░░░░░░░░          ███████░░░░░░░░    48 %    
   2018        844   ░░░░░░░░              596   ░░░░░░            █████████░░░░░░    59 %    
   2019        993   ░░░░░░░░░░            813   ░░░░░░░░          ████████░░░░░░░    55 %    
   2020        804   ░░░░░░░░              659   ░░░░░░░           ████████░░░░░░░    55 %    
   2021        816   ░░░░░░░░              670   ░░░░░░░           ████████░░░░░░░    55 %    
   2022        919   ░░░░░░░░░             486   ░░░░░             ██████████░░░░░    65 %    
   2023       1231   ░░░░░░░░░░░░          548   ░░░░░             ██████████░░░░░    69 %    
   2024       1182   ░░░░░░░░░░░░          299   ░░░               ████████████░░░    80 %    
   2025        799   ░░░░░░░░              415   ░░░░              ██████████░░░░░    66 %    
                                                                                              

DiVA curation stats …

Journal articles in 2025, by month

                                                                                                
     t      art_n_pi         pi          art_n_r          r                shr           pct    
                                                                                                
  2025-01        223   ░░░░                   59   ░                 ████████████░░░    79 %    
  2025-02        197   ░░░░                   39   ░                 ████████████░░░    83 %    
  2025-03        192   ░░░░                   30   ░                 █████████████░░    86 %    
  2025-04        241   ░░░░░                  28   ░                 ██████████████░    90 %    
  2025-05        151   ░░░                    57   ░                 ███████████░░░░    73 %    
  2025-06        187   ░░░░                  114   ░░                █████████░░░░░░    62 %    
  2025-07        756   ░░░░░░░░░░░░░░         44   ░                 ██████████████░    95 %    
  2025-08        203   ░░░░                   54   ░                 ████████████░░░    79 %    
  2025-09        236   ░░░░                   46   ░                 █████████████░░    84 %    
  2025-10        163   ░░░                    38   ░                 ████████████░░░    81 %    
  2025-11        121   ░░                      7                     ██████████████░    95 %    
                                                                                                

DiVA curation stats …

Conference papers in 2025, by month

                                                                                                
     t      con_n_pi         pi          con_n_r          r                shr           pct    
                                                                                                
  2025-01        148   ░░░░░░░░░░░            38   ░░░               ████████████░░░    80 %    
  2025-02         81   ░░░░░░                 14   ░                 █████████████░░    85 %    
  2025-03         92   ░░░░░░░                35   ░░░               ███████████░░░░    72 %    
  2025-04         90   ░░░░░░░                30   ░░                ███████████░░░░    75 %    
  2025-05         48   ░░░░                   21   ░░                ███████████░░░░    70 %    
  2025-06         22   ░░                     52   ░░░░              █████░░░░░░░░░░    30 %    
  2025-07        112   ░░░░░░░░               85   ░░░░░░            █████████░░░░░░    57 %    
  2025-08         38   ░░░                    59   ░░░░              ██████░░░░░░░░░    39 %    
  2025-09         71   ░░░░░                  48   ░░░░              █████████░░░░░░    60 %    
  2025-10         68   ░░░░░                  17   ░                 ████████████░░░    80 %    
  2025-11         29   ░░                     16   ░                 ██████████░░░░░    64 %    
                                                                                                

Curation tools for KTH Works

SQL workbench “Pond Pilot” (wasm) for combining DiVA, OpenAlex and other data sources

GDP

GDP

GDP (Gemensamma data för projekt) is an effort of a number of Swedish research funders to create a common data model for project data. The five funding agencies Energimyndigheten, Formas, Forte, Vetenskapsrådet and Vinnova is developing a standard which enables sharing of open data about fundings and related information.

The standard is developed in cooperation with a reference group including universities and other organisations within the university sector, KTH is a participant in the reference group.

GDP data mobilization

Future work and discussion

Future work and directions

  • Evaluation of subject-based RI
  • Continued collaboration on OpenAlex

Related activities

  • KTH CRIS/RIMS
  • KTH Insights / datastyrning (MS Fabric/Power BI)

Questions and Answers

Please provide your input in chat or verbally.

  • Questions, suggestions or comments?

If you prefer to give your feedback later or come up with questions after this demo, you are always welcome to email us at biblioteket@kth.se.

Thank you for attending!