ReDrug-KG

about

From Messy Data to Medical Insights: This tutorial demonstrates data wrangling techniques specifically designed for biomedical data challenges - transforming fragmented pharmaceutical data into Knowledge Graphs that reveal hidden relationships between drugs, diseases, and biological mechanisms.

Traditional tabular data structures fail to capture the complex, interconnected relationships that are crucial for understanding drug mechanisms and disease pathways. Pharmaceutical information scattered across multiple databases, formats, and terminologies makes it difficult to identify potential therapeutic connections using conventional approaches.

This tutorial introduces our 'MedJsonify' methodology - a Python-based framework that simplifies the process of wrangling and integrating data from diverse pharmaceutical sources. Participants will learn practical techniques for entity resolution across multiple data sources, handling medical terminology inconsistencies, and creating knowledge representations that support novel drug discovery insights.

Using prepared datasets derived from public resources like DailyMed, Purple Book, Orange Book, and ontologies from DrugBank, DO, ChEBI, and Orphanet, participants will practice key data transformation steps and visualization techniques within the Neo4j graph database environment, focusing on the most critical aspects of knowledge graph creation.

The output of this tutorial will be hands-on experience in biomedical data wrangling and a functional knowledge graph that reveals drug-disease relationships invisible in traditional tabular formats, complete with Neo4j visualization capabilities and reusable code templates.

schedule

90-minute tutorial

00:00 - 00:15 - Introduction to Knowledge Graphs for Drug Repurposing

Challenges in biomedical data integration, advantages of graph-based representations, overview of public pharmaceutical data sources
00:15 - 00:45 - Key Techniques for Biomedical Data Wrangling

NER for drugs and diseases, relationship extraction from unstructured text, demonstration of the MedJsonify methodology
00:45 - 01:15 - Hands-on: Creating and Visualizing Drug-Disease Graphs

Loading prepared datasets into Neo4j, writing basic Cypher queries, visualization techniques, identifying potential drug repurposing candidates
01:15 - 01:30 - Q&A and Resources for Continued Learning

Access to code templates and datasets, recommended tools and libraries, community resources and further reading

Slides

Download all resources on the GitHub repository

Tutorial slides will be made available later.

Our Amazing Team

Lisbon School of Engineering, Portugal

Matilde Pato

Biomedical Engineering

Carolina Pereira

Computer Science and Engineering

Nuno Datia

Computer Science

Brașov, Romania, September 17-19, 2025