I'm constructing a biomedical knowledge graph, I collected the data from different open sources, All values in each node are unique, and there are no duplicate rows in relationships. (verified thoroughly)
These are my nodes: assays, cells, clinicals, compounds, disorders, drugs, foods, genes, metabolites, organisms, pathways, peptides, proteins, targets, therapeutics.
These are my relationships: cell_FROM_species, clinical_IS_ASSOCIATED_disorder, clinical_IS_ASSOCIATED_drug, compound_IS_ASSOCIATED_protein, drug_CAUSES_disorder, drug_INTERACTS_target, food_IS_ASSOCIATED_compound, metabolite_IS_ASSOCIATED_pathway, peptide_TESTED_IN_assay, peptide_BINDS_TO_protein, peptide_IS_ASSOCIATED_therapeutics, protein_IS_ASSOCIATED_disorder, protein_IS_ASSOCIATED_gene, protein_COMES_FROM_organism, protein_IS_EXPRESSED_IN_pathway.
I used neo4j admin to import data using below command, (since it's a long one, I only mentioned a sample)
C:/Users/mypc/.Neo4jDesktop/relate-data/dbmss/bin/neo4j-admin import --database=db1 --nodes=import/assays.csv --nodes=import/cells.csv --nodes=import/clinicals.csv --………………………………………. --relationships=import/ cell_FROM_species.csv --relationships=import/ clinical_IS_ASSOCIATED_disorder.csv …………………………………………………………………………--multiline-fields=true
I ended up with this schema, I could see there are some new relationships been created between nodes, example:
- peptide IS_ASSOCIATED with compound which I didn't mention.
- protein IS_ASSOCIATED with compound, but I gave the opposite which is compound IS_ASSOCIATED with protein
- Also why compound IS_ASSOCIATED with compound (same node)
Can someone correct me where I'm going wrong? Thanks in advance.
#neo4j-admin #relationships