cancel
Showing results for 
Search instead for 
Did you mean: 

Join the community at Nodes 2022, our free virtual event on November 16 - 17.

Fastest way to read in a DirectoryTree?

Thypari
Node

I am reading in a directory tree from the disk. While I traverse the directories I am directly streaming them via IEnumerable to my asynchronous database methods. Since they are asynchronous I can't rely on the correct order (parent before child directory). So I just create all nodes and then connect them with relations in a later step. This makes use of all neo4j threads and seems a lot faster than writing synchronous.

But it's still really slow.

Would it be faster to just write the nodes to disk first e.g. into a csv file. And then bulk insert them into neo4j?

Any suggestions would be appreciated.

1 ACCEPTED SOLUTION

charlotte_skard
Graph Buddy

Hi Thypari,

If I were you I would collect the directories into lists of 1-2000 then make one query call using unwind to handle all in one go.

For example, if creating a Dir was:

CREATE (:Directory $param)

it would now be:

UNWIND $param AS dir
CREATE (d:Directory) SET d = dir

That's typically a lot faster.

All the best

Chris

View solution in original post

3 REPLIES 3

charlotte_skard
Graph Buddy

Hi Thypari,

If I were you I would collect the directories into lists of 1-2000 then make one query call using unwind to handle all in one go.

For example, if creating a Dir was:

CREATE (:Directory $param)

it would now be:

UNWIND $param AS dir
CREATE (d:Directory) SET d = dir

That's typically a lot faster.

All the best

Chris

Do you still have to convert all user-defined types to Dictionaries?

Directory{ size long, string name, ShareInformation shareInformation }
ShareInformation {string someProperty1, int someProperty2}

So I can't just pass a List<Directory> into the cypher because it contains a ShareInformation property? The same goes for none-defined types like GUIDs:

Is the recommended approach still to convert objects into nested Dictionaries?

charlotte_skard
Graph Buddy

Yep, unfortunately so - you'd need to parse the output, something like this:

async Task Main()
{
	var directory = new DirectoryInfo("d:\\Projects\\");
		var directories = directory
		.GetDirectories()
		.Select(d => new Directory { Name = d.Name, Size = d.GetFiles().Length + d.GetDirectories().Length, ShareInformation = new ShareInformation { PropInt = 1, PropString = d.FullName } })
		.ToList();

	var query = new Query(
	@"UNWIND $directories AS dir 
	  CREATE (d:Directory) SET d = dir", 
	  new Dictionary<string, object> { 
	  	{ "directories", ConvertToDriverFormatFromCollection(directories)}
	  });
	
	var driver = GraphDatabase.Driver("neo4j://localhost:7687", AuthTokens.Basic("neo4j", "neo"), config => config.WithEncryptionLevel(EncryptionLevel.None));
	var session = driver.AsyncSession();
	var x =await session.RunAsync(query);
	await x.ConsumeAsync();
}

public IEnumerable<IDictionary<string, object>> ConvertToDriverFormatFromCollection<T>(IEnumerable<T> items)
{
	return items.Select(i => ConvertToDriverFormat(i));
}

public IDictionary<string, object> ConvertToDriverFormat<T>(T item)
{
	return item.GetType().GetProperties().Where(i => i.CanRead && i.PropertyType.IsValueType || i.PropertyType == typeof(string)).ToDictionary(x => x.Name, x => x.GetValue(item));
}

public class Directory
{
	public long Size { get; set; }
	public string Name { get; set; }
	public ShareInformation ShareInformation { get; set; }
}

public class ShareInformation
{
	public string PropString { get; set; }
	public int PropInt { get; set; }
}
Nodes 2022
Nodes
NODES 2022, Neo4j Online Education Summit - November 16 - 17, 2022.


Free NODES Training Series


October 19th -

Intro to Neo4j


October 20th -

Healthcare Analytics Using Neo4j


October 25th -

Handling Neo4j data with Apache Hop


October 26th -

Blazing Fast Graphs: Hands-on with Apache Arrow and Neo4j


November 2nd -

Graph EDA Using the Neo4j GDS Client