Traditional tabular data structures fail to capture the complex, interconnected relationships that are crucial for understanding drug mechanisms and disease pathways. Pharmaceutical information scattered across multiple databases, formats, and terminologies makes it difficult to identify potential therapeutic connections using conventional approaches.
This tutorial introduces our 'MedJsonify' methodology - a Python-based framework that simplifies the process of wrangling and integrating data from diverse pharmaceutical sources. Participants will learn practical techniques for entity resolution across multiple data sources, handling medical terminology inconsistencies, and creating knowledge representations that support novel drug discovery insights.
Using prepared datasets derived from public resources like DailyMed, Purple Book, Orange Book, and ontologies from DrugBank, DO, ChEBI, and Orphanet, participants will practice key data transformation steps and visualization techniques within the Neo4j graph database environment, focusing on the most critical aspects of knowledge graph creation.
00:00 - 00:15 - Introduction to Knowledge Graphs for Drug Repurposing
Challenges in biomedical data integration, advantages of graph-based representations, overview of public pharmaceutical data sources
00:15 - 00:45 - Key Techniques for Biomedical Data Wrangling
NER for drugs and diseases, relationship extraction from unstructured text, demonstration of the MedJsonify methodology
00:45 - 01:15 - Hands-on: Creating and Visualizing Drug-Disease Graphs
Loading prepared datasets into Neo4j, writing basic Cypher queries, visualization techniques, identifying potential drug repurposing candidates
01:15 - 01:30 - Q&A and Resources for Continued Learning
Access to code templates and datasets, recommended tools and libraries, community resources and further reading
Tutorial slides will be made available later.