Is Neo4j best for a Dictionary or Foreign Language App

I am building a foreign language app. I want to build a database which will store connections and relationships between foreign language words, audio, pictures, icons, and word meanings.

How do I know if Neo4j is the best option for this?
Is there a website where this question could be asked to get unbiased opinions on this?

Phillippe Talbot
Founder & CEO
Fonetic About Language
Language at the Speed of a Synaptic Spark

Yes, you can do as long as you have data with translated a 
foreign language to any local language.

MERGE (a:Italian {word: "xyz")
MERGE (b:LocalLanguage {meaning: "pst")
MERGE (a)-[:IN_LOCAL_LANGUAGE]-> (b)

Hi @phillip.talbot

Languages are distributed differently in different cultures.
Right now, what I am most interested in is translation.
The same facts in words are not simply translatable on a one-to-one basis.
Using English and Japanese, plus Klingon as an example, I am trying to translate using Neo4j.
I believe that a graph-based implementation would produce better translations.

There are not many examples of Neo4j in this field, but I think GraphAware's NLP and Hume can be helpful.

Interesting idea, however you might be right, but i do think, you need to look at AI based NLP also.
The "graph" has got "gates". Neo4j also has got fast reads, however elastic search might be better for the job. Neo4j is lagging behind on writes, and i think it's pretty serious. It's simply not good enough for common use. It's good enough for almost static data, where everything has been inserted and you hava day or two for it. So if you have a never changing dictionary go for it, but if you would like your user inserts it, it might be a big no.

Hi @fssrepository

I think Elasticsearch is a good product.
I've used it in combination with Neo4j.

Since Neo4j is equipped with ACID, it is at a disadvantage in terms of write speed when compared to other NoSQL that do not have ACID.
However, I think ACID is necessary for safe writing.
I have tried Neo4j for IoT and network device log collection.
There are tens of millions of data, and a data comes in milliseconds.
Still, it was able to handle it without any problems.

I think it is possible to make it run faster if you do Cypher tuning, indexing and memory tuning well.

I think you need to do a whiteboard exercise where you draw out the connections between "entities" and their relationships and how they will be used.

Neo4J is probably is a good choice, because there is a many-to-many relationship between words in different languages. (Graph Databases are excellent at dealing with many-to-many relationships.)

For example, the english word "fall" has a multitude of meanings. Is it the season? Or the act of going from high to low? Or any other set of meanings (e.g. hair piece)? And of course, there are words that have different meanings from Verb vs Noun forms (etc.)

I think the bigger issue is how you actually want to connect your underlying data and how you anticipate it to be used.

Do you have multiple nodes with the identical name "fall" in them? Or do you have one node "fall" with relationships to all the different means as separate nodes and then have those nodes point to the various foreign language translations?

For example, if you query the english word "fall" into the system, do you want a list of all the different French words that it could translate to?

Another complication that I can imagine, is that there are idioms, where a string of words are connected to make a phrase with an unexpected meaning.

One advantage of Neo4J is the environment is very flexible, so you can quickly rejigger your schema when you figure out something isn't working for you.

I do suggest trying something out on a small scale to see what sorts of challenges you face. It doesn't make sense to import the OED into a schema and then discover that you've overlooked something.

1 Like

Excuse me if you (the OP) know this already, but...

I'm starting to get more into NLP. One useful NLP library I've come across is SpaCy. They have a component called "Sense2Vec", which addresses some of what I was talking about: word ambiguity. (The library also supports multiple languages to varying degrees.)

See: Sense2vec with spaCy and Gensim · Explosion

So, one thing that could be useful is to combine SpaCy with Neo4J in some clever way.

Further thought... if you do use SpaCy and you need to store some of its binary data, you might not want to store the binary data within Neo4J itself. See storing binary objects in neo4j

Another thought:

You can make a node Label the part of speech, which might make searching for word faster:

E.g.

MATCH (v:Verb {name: 'fall'})

Since Neo4J nodes can have multiple Labels (a great but little known feature), you could have created:

CREATE (v:Verb:English {name: 'fall'})
...
MATCH (v:English {name: 'fall'}). // Match for either Verb or Noun
...
MATCH (n: {name: 'demand'}). // Match for either Verb or Noun for any language, FR or EN