Barebones HTTP requests much faster than python neo4j-driver and py2neo?

Hey all,

I was testing the speed of neo4j-driver (1.6.2 and 1.7.1) and py2neo (4.1.3) and I found that the simple HTTP requests that I was doing are 2-5 times faster for medium sized queries and up. I'll take you through what I did, so we can hopefully figure out what's going on here and when it makes sense to use the libraries.

Now that I hopefully have your attention, lets take a step back and give you some background info. When I started working with neo4j, I learned about by sending JSON HTTP requests to the API at the hostname:7474/db/data/transaction/commit endpoint. I like knowing the guts of what I'm dealing with, so processing the raw JSON responses works well for me and over time I've added my own thin wrapper around the python requests library for some quality of life improvements.

I saw that some colleagues are using py2neo, so I wondered if using a driver library would make sense for me. For one they provide interactions through bolt, which sounds to me like it would be more efficient than sending raw JSON. Also, the results are a bit nicer to work with (I got a bit bored of writing for datum in response['results'][0]['data'] but the downside is that it seems a bit awkward to execute many queries in a single transaction (e.g. one type of query but with varying parameters).

Setup

So I set up a small test bench. In python (3.7) I created a script that iterates through the different drivers and test scenarios. It executes the queries 10 times and averages the result (while also showing the time for each individual execution of a test scenario). The graph for these tests is a copy of our 'production' database with order of magnitude millions of nodes and version 3.0.6 of neo4j (outdated yes, but it's realistic for my usecase). As my driver libraries I tested py2neo 4.1.3, neo4j-driver 1.6.2 (which comes with py2neo) and neo4j-driver 1.7.1.

I created 3 of scenarios:

    • retrieve the total number of nodes in the graph. This is a simple 'ping' to see how fast a single query with a single (pre-calculated) answer returns result
    • retrieve nodes with a specific label, run with a limit of either 10, 100, 1k or 10k nodes in the response.
    • retrieve nodes by an indexed property, the property is passed as a parameter (so the query can be cached), executed as transactions containing either 10, 100 or 1k queries

Note: I figured that the difference between reading and writing would be mostly down to neo4j, and not down to the library that I used to execute the commands, so I didn't go through the trouble of generating data to push.

Note 2: I chose the upper limits for scenarios 2 and 3 based on how long I had to wait for the 10 repetitions. I could've gone orders of magnitude larger on each scenario, but I didn't feel like waiting minutes.

Results

Let me start by saying that I know that the order in which drivers are tested probably matters because of caching. I indeed see the response times dropping after the first query. This only seems to matter a few ms though, which isn't so important when we look at the queries that return more than 10 results.

Scenario 1

Average request time in milliseconds after 10 repetitions.
HTTP requests wrapper: 2.2ms
py2neo (bolt): 0.5ms
neo4j-driver (1.6.2): 1.0ms
neo4j-driver (1.7.1): 0.7ms

As you can see, it's close together, but there's no question that py2neo and neo4j-driver are always faster than going through the HTTP requests wrapper. If we need many different, single queries then it would make a big difference, but the difference isn't human noticeable for a few queries.

Scenario 2

Average request time in milliseconds, after 10 repetitions. Response times given in order for response limits of 10, 100, 1k and 10k nodes.
HTTP requests wrapper: 3.0, 4.8, 14.3, 95.1
py2neo (bolt): 1.6, 5.6, 42.9, 435.5
neo4j-driver (1.6.2): 2.3, 5.2, 29.0, 283.2
neo4j-driver (1.7.1): 2.5, 4.5, 30.8, 296.3

This came as a huge surprise to me, it seems that the simple HTTP requests are a lot faster than both other libraries, but also faster than py2neo running over HTTP (results not shown, as it was a bit slower than py2neo over bolt). The difference between the simple requests and neo4j-driver is a factor of 2.5-3x and py2neo is 4.5x slower. The driver libraries seem to scale linearly above 1k nodes, whereas the requests scale better than linear.

Scenario 3

Average request time in milliseconds, after 10 repetitions. Response times given in order for 10, 100 and 1k queries in 1 transaction.
HTTP requests wrapper: 2.8, 7.5, 44.0
py2neo (bolt): 2.8, 26.6, 243.0
neo4j-driver (1.6.2): 2.9, 14.0, 147.3
neo4j-driver (1.7.1): 3.3, 17.6, 170.8

Again, quite a big difference between the simple HTTP requests and the libraries. The neo4j-drivers are around 3.5x slower than simple requests, and py2neo is a whopping 5.5x slower. Again the driver libraries seem to scale linearly with the size of the test.

Discussion

My first instinct was that the driver libraries present the results in a more usable way (list of dicts) than the raw JSON response through the HTTP API. So I built some additional logic to see how costly this transformation is. The result is that it takes 5-10% longer to return the nice list of dictionaries.

So, what is going on here? I imagine that the version of neo4j might play a role. Also the driver libraries might do some additional fancy processing, which can be nice but also costly if you just want to retrieve a lot of nodes and their properties.

Thanks a lot for the feedback.
I think mostly the difference is between the network & stream processing code. For requests this will go down to c-code while the bolt-drivers are doing it in python.

It would be good if you could re-run it in 3.5.x to see if the server makes a difference.

Also as far as I can remember the python driver has an option to use the "compile python to C" thingy, not sure if that's activated by default.

In the future we might switch to the c-connector (seabolt) as an optional underlying connector for other languages too (like python) which should alleviate this issue.

Hey hey,

I copied the data that I was testing with to the current docker version of neo4j:3.5 and the results are very similar. This is only tested with neo4j-driver 1.6.2, because this is the version that py2neo 4.1.3 requires, so I can test both drivers with the same virtual env.

For completeness I now also include the results of each repetition, to get a better feel of the variation between them (and see the effect of the 1st query).

For all scenario's I added the result for the variant of HTTP requests where the result is transformed from the raw JSON to a list of dicts for single queries per transaction and a list of a list of dicts for multiple queries per transaction. You'll see that it's faster for simple queries because it can use caching from the raw-response test that was run just before it, but for bigger queries it takes a small hit.

For scenario 2 I added results for a limit of 100k nodes in the response

For scenario 3 I added results for 'naively' performing the queries each in their own transaction with the py2neo library and the HTTP requests. py2neo actually doesn't perform all that badly, unlike HTTP requests, which is a total disaster if you take the naive approach. I guess that's the price of setting up a new HTTP connection over and over.

Scenario 1, retrieve total node count
	HTTP requests       average 3.69ms (14.26, 2.69, 3.12, 2.04, 2.02, 2.07, 2.23, 2.29, 3.20, 2.97)
	HTTP requests dict  average 2.44ms (2.77, 1.91, 1.97, 2.19, 2.45, 2.80, 2.56, 2.46, 2.58, 2.73)
	py2neo-bolt         average 0.61ms (1.57, 0.56, 0.50, 0.51, 0.50, 0.50, 0.52, 0.51, 0.46, 0.46)
	py2neo-http         average 1.52ms (2.50, 1.86, 1.76, 1.84, 1.71, 1.13, 1.05, 1.17, 1.14, 1.08)
	neo4j-driver        average 0.85ms (1.27, 0.60, 0.61, 0.62, 0.58, 0.54, 2.24, 0.57, 0.97, 0.50)

Scenario 2, retrieve all nodes with a specific label, limit 10
	HTTP requests       average 2.94ms (3.82, 3.61, 3.54, 2.87, 2.09, 2.13, 2.43, 2.59, 3.18, 3.15)
	HTTP requests dict  average 2.83ms (3.71, 2.31, 2.37, 2.19, 1.89, 2.05, 3.11, 4.12, 3.51, 3.03)
	py2neo-bolt         average 0.95ms (1.96, 0.80, 0.93, 0.79, 0.82, 0.97, 0.97, 0.71, 0.83, 0.67)
	py2neo-http         average 1.90ms (2.04, 2.51, 2.10, 2.10, 2.18, 1.69, 1.47, 1.22, 1.57, 2.07)
	neo4j-driver        average 1.19ms (2.39, 1.12, 1.16, 0.98, 1.01, 0.97, 1.07, 1.03, 1.16, 0.96)

Scenario 2, retrieve all nodes with a specific label, limit 100
	HTTP requests       average 3.18ms (4.45, 3.62, 3.67, 3.43, 3.11, 2.80, 2.70, 2.59, 2.56, 2.88)
	HTTP requests dict  average 3.61ms (4.35, 3.88, 3.95, 3.45, 2.82, 2.95, 2.81, 3.55, 4.19, 4.18)
	py2neo-bolt         average 5.29ms (11.01, 6.69, 4.00, 4.38, 5.53, 4.66, 4.09, 4.04, 3.66, 4.86)
	py2neo-http         average 5.78ms (9.05, 5.64, 5.12, 5.15, 5.17, 5.26, 4.84, 4.86, 5.43, 7.26)
	neo4j-driver        average 3.40ms (4.41, 3.77, 3.74, 3.76, 3.42, 3.26, 2.84, 2.88, 2.97, 2.94)

Scenario 2, retrieve all nodes with a specific label, limit 1k
	HTTP requests       average 11.68ms (21.70, 8.75, 8.52, 8.86, 9.07, 9.47, 9.73, 10.22, 10.06, 20.44)
	HTTP requests dict  average 10.77ms (13.03, 9.85, 9.51, 9.35, 9.15, 9.21, 8.86, 9.25, 20.41, 9.07)
	py2neo-bolt         average 36.86ms (52.81, 34.70, 35.79, 30.40, 30.35, 30.24, 30.22, 58.32, 36.02, 29.78)
	py2neo-http         average 43.15ms (44.46, 38.60, 38.10, 38.32, 76.22, 42.08, 37.55, 38.31, 40.00, 37.83)
	neo4j-driver        average 27.01ms (41.80, 25.98, 26.46, 25.23, 25.65, 25.36, 26.14, 25.27, 24.18, 24.00)

Scenario 2, retrieve all nodes with a specific label, limit 10k
	HTTP requests       average 97.91ms (118.27, 101.47, 99.13, 93.75, 94.61, 92.37, 105.23, 92.71, 91.13, 90.40)
	HTTP requests dict  average 101.82ms (98.47, 93.92, 114.48, 100.92, 98.12, 96.43, 94.82, 114.12, 101.78, 105.17)
	py2neo-bolt         average 405.88ms (537.93, 316.30, 320.56, 276.29, 537.42, 468.17, 279.08, 546.40, 495.03, 281.60)
	py2neo-http         average 711.81ms (755.73, 710.34, 726.62, 732.32, 706.83, 763.08, 640.70, 670.36, 681.37, 730.70)
	neo4j-driver        average 265.41ms (292.06, 274.46, 259.34, 261.72, 256.50, 258.42, 257.39, 272.20, 261.35, 260.72)

Scenario 2, retrieve all nodes with a specific label, limit 100k
	HTTP requests       average 1261.74ms (1349.77, 1276.86, 1302.08, 1256.24, 1188.57, 1224.66, 1235.40, 1270.25, 1242.93, 1270.70)
	HTTP requests dict  average 1405.74ms (1407.45, 1426.14, 1354.19, 1435.64, 1372.40, 1462.19, 1383.32, 1416.45, 1398.17, 1401.40)
	py2neo-bolt         average 5400.10ms (6125.14, 4507.51, 5423.34, 5682.48, 6076.66, 5330.21, 4629.22, 5446.57, 4215.68, 6564.16)
	py2neo-http         average 8652.15ms (8579.51, 8528.91, 8868.20, 8682.91, 8513.26, 8761.23, 8653.73, 8673.72, 8606.97, 8653.10)
	neo4j-driver        average 3084.31ms (3216.46, 3189.50, 3053.85, 3034.93, 3069.99, 3010.55, 3003.43, 3060.32, 3119.85, 3084.24)

Scenario 3, 10 queries in 1 transaction
	HTTP requests       average 2.47ms (3.48, 3.25, 3.36, 2.29, 1.98, 2.21, 2.05, 2.09, 1.98, 2.01)
	HTTP requests dict  average 3.43ms (4.34, 3.54, 3.16, 2.63, 2.96, 3.63, 3.42, 3.46, 3.61, 3.55)
	HTTP requests naive average 19.6ms (27.48, 21.54, 18.54, 16.77, 16.18, 20.57, 17.61, 18.90, 18.00, 19.94)
	py2neo-bolt         average 4.16ms (6.10, 5.25, 3.51, 4.14, 3.47, 4.00, 3.68, 4.16, 3.48, 3.85)
	py2neo-bolt-naive   average 3.78ms (4.34, 3.87, 3.90, 3.31, 3.54, 3.45, 4.43, 3.86, 3.61, 3.50)
	py2neo-http         average 4.31ms (6.59, 4.39, 4.37, 5.29, 3.89, 3.60, 3.48, 3.92, 3.79, 3.81)
	neo4j-driver        average 3.09ms (3.98, 3.93, 3.89, 3.93, 3.45, 2.53, 2.29, 2.43, 2.34, 2.10)

Scenario 3, 100 queries in 1 transaction
	HTTP requests       average 7.74ms (9.36, 7.85, 7.77, 6.69, 6.87, 6.70, 7.16, 7.98, 9.86, 7.18)
	HTTP requests dict  average 8.46ms (10.77, 10.32, 8.02, 6.97, 6.89, 6.45, 7.37, 10.21, 10.53, 7.08)
	HTTP requests naive average 196.5ms (208.14, 201.61, 213.52, 197.20, 183.17, 188.99, 178.00, 193.86, 200.97, 199.24)
	py2neo-bolt         average 23.41ms (35.94, 24.34, 20.70, 23.06, 21.63, 22.00, 20.55, 19.49, 23.58, 22.81)
	py2neo-bolt-naive   average 38.09ms (41.16, 39.47, 40.22, 38.37, 33.41, 40.30, 40.21, 36.62, 36.01, 35.10)
	py2neo-http         average 19.41ms (25.11, 17.31, 17.69, 19.93, 19.69, 17.91, 20.26, 17.98, 20.10, 18.12)
	neo4j-driver        average 12.96ms (15.12, 12.38, 12.29, 12.74, 13.25, 12.83, 12.33, 12.76, 12.35, 13.55)

Scenario 3, 1k queries in 1 transaction
	HTTP requests       average 65.89ms (266.10, 52.71, 39.58, 38.81, 53.00, 39.07, 39.89, 53.34, 38.16, 38.20)
	HTTP requests dict  average 47.20ms (42.49, 50.50, 40.36, 39.03, 53.77, 62.25, 47.82, 43.14, 52.22, 40.40)
	HTTP requests naive average 1910.4ms (1934.91, 1887.20, 1867.37, 1874.45, 1945.37, 1945.95, 1912.87, 1899.61, 1894.48, 1941.94)
	py2neo-bolt         average 257.31ms (301.57, 218.61, 288.41, 250.30, 243.90, 277.40, 259.34, 239.23, 274.16, 220.14)
	py2neo-bolt-naive   average 389.61ms (427.79, 365.23, 365.64, 432.78, 350.89, 404.32, 371.76, 376.86, 431.84, 368.98)
	py2neo-http         average 204.95ms (222.85, 219.66, 198.05, 154.01, 222.29, 219.18, 221.86, 208.73, 170.42, 212.47)
	neo4j-driver        average 140.06ms (144.31, 121.12, 149.71, 149.34, 122.40, 161.14, 121.69, 157.97, 122.00, 150.92)

Also, I just noticed that if you're married to py2neo, then it's better to use it with the http scheme if you're firing multiple queries in a single transaction, whereas the bolt scheme is more preferable for retrieving many nodes in 1 query.

I have now also tested the neo4jrestclient library (v 2.1.1), and it's pretty similar to the HTTP requests, except for larger queries where it's almost exactly a factor 2 slower.

I've also tested what happens when I add the 'X-Stream: true' header in the HTTP request (as suggested in the developer documentation), but for the scenarios above this doesn't make a difference.

X-Stream: true was only needed for the old (now removed REST API).

The transactional endpoint streams automatically (in and out) (but you should stream-process the results to benefit from that).

neo4jrestclient is no longer maintained though.

And you can use the tx endpoint with multiple transactions.

You can start a tx by posting against db/transaction
then you get back a tx id (url) that you continue to post against db/transaction/<id>
until at the end you finish with /db/transaction/<id>/commit or /db/transaction/<id>/rollback

Hi @jjbankert

Thank you massively for the time and effort you've put into this. There's a lot of interesting data to sift through there and it's extremely valuable for us, and no doubt for other people too.

Firstly, it's worth pointing out that the original Bolt project was never about trying to outperform HTTP. The main focus, at least during the 3.x series, has been to embed a clean, Cypher-like type system throughout the stack (which JSON isn't ideal for) as well as promote usage of Neo4j in a number of significant non-Java ecosystems by providing officially supported drivers. Indeed HTTP is an extremely mature technology with a lot of high-performing implementations; we would have had to go a long way to beat it!

That said, we shouldn't settle for performance differences of the scale that you identify (and that we have seen internally as well). We definitely need to make sure we nail performance properly during the 4.x series of Neo4j.

It's worth noting a couple of things. There are a lot of variables at play that can affect performance; to that end, it's not always clear where to optimise. There are huge differences by client language, by size and shape of result and by the data types used (integers and nodes are significantly different on the wire, for example). There are also differences in how transactions are used and when network sync occurs. And then of course there's routing, which is available for clusters with Enterprise Edition. This brings extra processing complexity, and doesn't have a direct equivalent in any of the available HTTP drivers.

You may have seen our new Seabolt project (https://github.com/neo4j-drivers/seabolt). This is a low-level C Connector library that we've introduced as a high-performance component on which drivers can be built. This already powers our Go driver and we plan to underpin the Python driver with it as well in an upcoming release.

We're also working on plans for future versions of the Bolt protocol itself. We can definitely take on board the extensive information you've provided here as part of those designs.

All that said, it's still probably not unreasonable to assert that there are a multitude of reasons to choose one driver over another. Raw performance can certainly be one of those reasons, but isn't always the bottleneck. Type safety, feature set, usability, maturity, availability of support, documentation, existing skill sets, network policies and licensing are just some of the possible reasons to pick one over the other. Hopefully there's an option for everyone, and over time we can improve the overall experience across the board!

Thanks again

Nigel

1 Like

I'll check-out seabolt!

I agree that there are a multitude of reasons to choose one driver over another. My use-case is to perform big updates on many nodes at once. Batch queries that return 10k-100k nodes and batch updates of 10k-100k statements are pretty standard for me. I was hoping that there would be a faster/better maintained option than my solution with HTTP requests, but my current conclusion is that raw post requests in a single transaction outperform all the drivers that I tested. I'm thinking about open sourcing my driver, but it's only 50 lines of code, so shouldn't be too hard to build for yourself either.

As suggested, I also tested performing scenario 3 (executing 10^x queries) with multiple statements each in their own request, but all together in a single transaction. This is different from the 'HTTP requests naive' result in the sense that I now maintain a single transaction instead of performing each query in a separate transaction. This performs even worse though, because for some reason there are times when it slows down a lot (even when running it 10 times more).

Data:

	HTTP requests              took 42.5ms (50.12, 35.68, 31.74, 30.64, 32.42, 33.02, 33.19, 68.16, 45.67, 64.58)
	HTTP requests naive        took 2239.2ms (2305.16, 2330.41, 2262.66, 2224.94, 2233.65, 2204.41, 2224.47, 2223.69, 2176.05, 2206.72)
	HTTP requests transaction  took 4734.4ms (2293.53, 3181.96, 9448.37, 9627.36, 9914.09, 3024.93, 2967.48, 2230.68, 2457.48, 2197.90)

With regard to the X-Stream header, that explains why I didn't see any difference haha. Maybe the documentation on the HTTP API should be updated then (since that's where I found the suggestion)?

Is there somewhere I can see your test code?

Summary

I performed some similar testing for API Endpoints using a Py2neo Bolt + Lambda/Chalice app in AWS vs using the Barebones HTTP request and found the following results. The request being tested is to pull entire subgraphs of upto 4 hops away from a specific Node Id in the database. The Ids are indexed and most subgraphs are below 1000 nodes. I believe the last graph is most telling in the performance difference for these two API's.

Results

image
image
image
image

Results Summary and Outcomes

Lambda/Chalice Native Http Best Method
Average time across all observations 0.62666 0.30027 Native Http
Average Time of the sample means (s) 0.62666 0.30027 Native Http
Number of users not in graph or with runtime error is 6871 6945
The average cliquesize of the observations was 60.7148 144.287 Native Http
The average cliquesize of the sample means was 60.7148 144.287 Native Http
The fastest sample mean time was 0.43533 0.21119 Native Http
The fastest time was 0.40276 0.19285 Native Http
The largest cliquesize was 40912 141456
The number of users with cliquesize 1 is 38768 38949
The percent of users with cliquesize <= 5 is 0.92956 0.92728
The percent of users with cliquesize <= 25 is 0.97772 0.97612
The percent of users with cliquesize <= 50 is 0.98422 0.9831
The percent of users with cliquesize <= 75 is 0.9872 0.98562
The percent of users with cliquesize <= 150 is 0.98956 0.98786
The percent of users with cliquesize <= 300 is 0.99072 0.9894
The percent of users with cliquesize > 3000 is 0.00374 0.00518
The slowest sample mean time was 1.74987 31.6964 Lambda/Chalice
The slowest time was 35.637 76.9479 Lambda/Chalice
The smallest cliquesize was 0 0
The std.dev cliquesize of the observations was 1066.65 2619.64 Native Http
The std.dev cliquesize of the sample means was 150.148 354.536 Native Http
The std.dev time across all observations was 1.36392 1.59279 Lambda/Chalice
The std.dev time of the sample means was 0.19972 0.99563 Lambda/Chalice

image
image

Conclusion

Native Http Rest API is significantly faster than Lambda Chalice for accessing the Identity Graph. Despite a more verbose Http call compared to a cleaner designed API such as the lambda Chalice app, if latency is the main concern than using the Native Http Rest Api will be faster. The experiments also indicate that the size of clique's returned have a direct effect on response time, hence reducing hyper nodes further/ limiting graph expansion is important to lower latency.

Interesting stuff @benjamin.squire!

I'm in the final stages of open-sourcing my driver (got the go-ahead from management), so expect that soon. First time making 'production' python code, so lots to figure out, and of course there's the problem of coming up with a good name.

@technige somehow I missed your question. I've put the most important code into a single file on pastebin.
The code is written for python 3.6+ (due to f-strings). Also it doesn't contain comments or documentation, but hopefully it has meaningful enough names. This is not 'production' code, but I think it could help. Please also give me feedback on code quality if you notice anything (through pm if it's not useful to a broad audience), as I'm interested in improving!
Test 'framework' (under MIT licence): https://pastebin.com/wP5u4Yk0

As promised, our project is now open-source and available: Announcing neo4j-connector 1.0.0 (python 3.5+)

Cool, make sure to also share it in #projects-collaboration