We use cookies to improve your site experience and analyze website traffic. Click Accept to agree to our website's cookie use as described in our Cookie Policy . Click Preferences to customize your cookie settings.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results forΒ

- Neo4j
- Technical Discussions
- Neo4j Graph Platform
- Math calculations are incorrect- Math is wrong

Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Math calculations are incorrect- Math is wrong

Options

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

β07-27-2020 01:50 PM

Hi,

Playing with NLP and doing some simple metric calculations for TF-IDF (term frequency - inverse document frequency).

The query and calculations are very straight forward.

The problem is that the answers from Neo4J are wrong.

```
MATCH (:patent{num:'7547899'})<-[r:Is_in]-(a:Word)-[c:Is_in]->(b:patent)
RETURN a.term, r.count as Num, count(b), sum(c.count), r.count*(log(358/count(b))) as TFidf ORDER BY Num DESC
```

The first item is pretty obvious but I doublers checked the others. I exported the results are recalculated them within JMP (my standard statistical package). Using:

The results are close except for the first entry, but I was expecting much closer given it is a simple calculation.

Andy

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

β07-28-2020 03:25 AM

I wonder if there's a rounding error somewhere in the Neo4j calculation. Sometimes I find that I need to multiply by 1.0 to make it do double based calculations properly.

So if we take the numbers from the first row:

Whereas with the `*1.0`

it's actually computing the log of 0.

Update your query to read like this:

```
MATCH (:patent{num:'7547899'})<-[r:Is_in]-(a:Word)-[c:Is_in]->(b:patent)
RETURN a.term, r.count as Num, count(b), sum(c.count), r.count*(log(358*1.0/count(b))) as TFidf ORDER BY Num DESC
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

β07-28-2020 06:11 AM

Hi Mark,

Thanks that resolved it. I was more concerned by the first returned value since the difference seemed much greater than just rounding error. The fix seemed to have resolved both concerns.

Thanks

Andy

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

β07-28-2020 06:44 AM

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

β07-28-2020 08:05 AM

If your hypothesis is correct, I am guessing this calculation does integer math to get 358/225 = 1 and then does log(1) which equals 0. It looks like all other count(b) values are less than 1/2 the 358 so they returned a non-zero number.

Andy

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

β08-17-2020 02:20 AM

Nodes 2022

OnΒ November 16 and 17 for 24 hours across all timezones, youβll learn about best practices for beginners and experts alike.

Related Content

- settings a parameter from a Cypher not working in cypher-shell. in Drivers & Stacks
- Has the neo4j-admin import tool changed in recent versions? in Neo4j Graph Platform
- Spring Boot. Neo.ClientError.Security.AuthenticationRateLimit: The client has provided incorrect authentication details too many times in a row in Neo4j Graph Platform
- Query not adding up in Neo4j Graph Platform