Welcome To ReDrug-KG!
Messy data to Explainable Drug repurposing via Knowledge Graphs
Tutorial at ACM womENcourage™ 2025

Brașov, Romania, September 17-19, 2025

More

about

From Messy Data to Medical Insights: This tutorial demonstrates data wrangling techniques specifically designed for biomedical data challenges - transforming fragmented pharmaceutical data into Knowledge Graphs that reveal hidden relationships between drugs, diseases, and biological mechanisms.

Traditional tabular data structures fail to capture the complex, interconnected relationships that are crucial for understanding drug mechanisms and disease pathways. Pharmaceutical information scattered across multiple databases, formats, and terminologies makes it difficult to identify potential therapeutic connections using conventional approaches.

This tutorial introduces our 'MedJsonify' methodology - a Python-based framework that simplifies the process of wrangling and integrating data from diverse pharmaceutical sources. Participants will learn practical techniques for entity resolution across multiple data sources, handling medical terminology inconsistencies, and creating knowledge representations that support novel drug discovery insights.

Using prepared datasets derived from public resources like DailyMed, Purple Book, Orange Book, and ontologies from DrugBank, DO, ChEBI, and Orphanet, participants will practice key data transformation steps and visualization techniques within the Neo4j graph database environment, focusing on the most critical aspects of knowledge graph creation.


The output of this tutorial will be hands-on experience in biomedical data wrangling and a functional knowledge graph that reveals drug-disease relationships invisible in traditional tabular formats, complete with Neo4j visualization capabilities and reusable code templates.

schedule

90-minute tutorial

  • 00:00 - 00:15 - Introduction to Knowledge Graphs for Drug Repurposing

    Challenges in biomedical data integration, advantages of graph-based representations, overview of public pharmaceutical data sources

  • 00:15 - 00:45 - Key Techniques for Biomedical Data Wrangling

    NER for drugs and diseases, relationship extraction from unstructured text, demonstration of the MedJsonify methodology

  • 00:45 - 01:15 - Hands-on: Creating and Visualizing Drug-Disease Graphs

    Loading prepared datasets into Neo4j, writing basic Cypher queries, visualization techniques, identifying potential drug repurposing candidates

  • 01:15 - 01:30 - Q&A and Resources for Continued Learning

    Access to code templates and datasets, recommended tools and libraries, community resources and further reading

Slides

Download all resources on the GitHub repository

Tutorial slides will be made available later.

...

Matilde Pato

Biomedical Engineering

...

Carolina Pereira

Computer Science and Engineering

...

Nuno Datia

Computer Science

acknowledgements

This work is supported by UID/04516/NOVA Laboratory for Computer Science and Informatics (NOVA LINCS), IBEB Research Unit, UID/BIO/00645/2025, LASIGE Research Unit, ref. UID/00408/2025 with the financial support of Fundação para a Ciência e a Tecnologia, I.P.

...
...
...
...
...
...
...
Copyright © ReDrug-KG