How to write neo4j in python with neo4j-spark-connector

Hi,

I am using a spark cluster for batch processing now. I just want to ask if anyone knows how to write Neo4j with neo4j-spark-connector in python. I read the repo README GitHub - neo4j-contrib/neo4j-spark-connector: Neo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs, I can't find anything on how to do the writing in python, neither in scala.

Appreciate your help, thanks.

Shuai

Hi Shui,

I believe we can't use the neo4j spark connector in python.
for python, we can do this with the help of pyspark and we can integrate neo4j.

Spark-neo4j connector basically compatible with java and scala so if you want to do coding in java/scala then you must use this connector otherwise pyspark is a better option.

Please let me know if some confusions are there

Thanks for your reply Kunal.

Could you tell me how should I use pyspark for writing data into neo4j. Now I have 200,000,000 relationships to write into the database, but can't find a way to make the writing faster.

Right now I am using neo4j bolt driver for python, my data only has 2 columns, I am using this Cypher query to write data:

query='''WITH $names AS nested
UNWIND nested AS x
MERGE (w:Patent {name: x[0]})
MERGE (n:Patent {name: x[1]})
MERGE (w)-[r:NIHAO]-(n)
'''

Hi Shui,

Please find below code that can help you to write data in neo4j

sc = SparkContext()
batch = []
max = None
processed = 0

def writeBatchData(b):
    
    session = driver.session()
    session.run('UNWIND {batch} AS row CREATE (n:Node {v: row})', {'batch': b})
    session.close()

def write2neo(v):
    batch.append(v)
    global processed
    processed += 1
    if len(batch) >= 1000 or processed >= max:
        writeBatchData(batch)
        batch[:] = []

dt = sc.parallelize(range(1, 100000))
max = dt.count()
dt.foreach(write2neo)

Thank you very much Kunal! I'll have a try~

Hi Kunal,

I tried the code your provided above but got the error below

PicklingError: Could not serialize object: TypeError: Cannot serialize socket object
``
Do you have any idea how should I fix this?

Thanks,
Shuai

Please post full error and code

def writeBatchData(b):
       session = gradb.session()
       session.run('UNWIND {batch} AS row CREATE (n:Node {v: row})', {'batch': b})
       session.close()
   
   uri = "****"
   gradb=GraphDatabase.driver(uri, auth=(os.environ['NEO4JUSER'], os.environ['NEO4JPASS']), encrypted=False)
   
   sc = SparkContext()
   
   dt = sc.parallelize(range(1, 1000))
   
   dt.foreach(writeBatchData)
Traceback (most recent call last):
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 710, in dumps
    return cloudpickle.dumps(obj, pickle_protocol)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 1097, in dumps
    cp.dump(obj)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 357, in dump
    return Pickler.dump(self, obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
    save(x)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
    save(x)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
    save(x)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
    save(x)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 846, in _batch_appends
    save(tmp[0])
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 846, in _batch_appends
    save(tmp[0])
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 846, in _batch_appends
    save(tmp[0])
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 496, in save_function
    self.save_function_tuple(obj)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 890, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 890, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 656, in save_reduce
    self._batch_appends(listitems)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 846, in _batch_appends
    save(tmp[0])
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
  File "/home/ubuntu/anaconda3/lib/python3.7/socket.py", line 192, in __getstate__
    raise TypeError("Cannot serialize socket object")
TypeError: Cannot serialize socket object
Traceback (most recent call last):
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 710, in dumps
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 1097, in dumps
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 357, in dump
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
    save(x)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
    save(x)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
    save(x)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 843, in _batch_appends
    save(x)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 846, in _batch_appends
    save(tmp[0])
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 846, in _batch_appends
    save(tmp[0])
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 501, in save_function
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 819, in save_list
    self._batch_appends(obj)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 846, in _batch_appends
    save(tmp[0])
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 496, in save_function
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 730, in save_function_tuple
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 890, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 890, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 656, in save_reduce
    self._batch_appends(listitems)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 846, in _batch_appends
    save(tmp[0])
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ubuntu/anaconda3/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
  File "/home/ubuntu/anaconda3/lib/python3.7/socket.py", line 192, in __getstate__
    raise TypeError("Cannot serialize socket object")
TypeError: Cannot serialize socket object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ubuntu/dev/birdview-patent-landscape/database-scripts/database.py", line 43, in <module>
    dt.foreach(writeBatchData)
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 862, in foreach
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1128, in count
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 1119, in sum
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 990, in fold
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 889, in collect
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2616, in _jrdd
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2504, in _wrap_function
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 2490, in _prepare_for_python_RDD
  File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 720, in dumps
_pickle.PicklingError: Could not serialize object: TypeError: Cannot serialize socket object```

Sorry, above is the complete code and error message, thank you.
1 Like

Hi shao-shuai

Do you have some tutorial to writting code neo4j in pyspark?

I don't know how start

Thanks a lot

We are working on an updated spark connector that uses the datasource APIs. A pre-release is going to be available September 30th, and it will support pyspark. If this is something you're interested in trying out, let me know.

Hi david

Great news!

Yes, I would like to try it, tell me where I find documentation and if it exists some example to try.

Thanks

1 Like

@monroy.nelson have a look at this link - you can download the package, and the documentation is linked in the description of the release. We'd love to hear what people think.

The new connector is now available -- all the details here: