Discover Aura Free — Week 27 — GraphConnect Talks

Discover Aura Free: Week 27 — GraphConnect Talks

This week we’re looking at at a sneak preview of our conference sessions at GraphConnect 2022, which takes place next week (June 5–7) in Austin, Texas.

GraphConnect 2022

If you haven’t decided to come, now is your chance, if you drop us an email to michael@neo4j.com, we are give away 10 free conference tickets for our viewers / readers.

I sometimes reminisce about the past GraphConnects all the way since 2012 with our Neo4j Flickr photo albums.

Neo4j Graph Database

If you rather want to watch our livestream, it is right here, otherwise follow the blog post for all the details.

https://medium.com/media/9c89e017360b3015f9a7b68124e2a9d5/href

You can find our data and write-ups on our GitHub repository and all videos on the Neo4j Website

But let’s get started, by setting up our AuraDB Free database.

Create a Neo4j AuraDB Free Instance

Go to https://dev.neo4j.com/neo4j-aura to register or log into the service (you might need to verify your email address).

After clicking Create Database you can create a new Neo4j AuraDB Free instance. Select a Region close to you and give it a name, e.g. graphconnect.

Choose the “blank database” option as we want to import our data ourselves.

On the credentials pop-up, make sure to save the password somewhere safe. The default username is always neo4j.

Then wait 3–5 minutes for your instance to be created.

Afterwards you can connect via the Query Button with Neo4j Browser (you’ll need the password), or use Import to load your data with the Data Importer and Explore for Neo4j Bloom.

Once your AuraDB instance is running the connection URL: neo4j+s://xxx.databases.neo4j.io is available and you can copy it to your credentials as you might need it later.

If you want to see examples for programmatically connecting to the database go to the “Connect” tab of your instance and pick the language of your choice

The Data

The conference agenda isn’t available at the website, but we just snug it from the conference planning doc :)

So here are the two CSVs one with the talk details, and one with the schedule.

We put them in google sheets to clean them up and could have loaded them from there (by publishing the sheets to the web: Sessions, Schedule) using LOAD CSV, but we wanted to use our built-in data importer for modeling and loading.

The basic structure of the sessions is:

  • ID
  • Company
  • Industry
  • Topic
  • Audience
  • Type
  • UseCase
  • Title
  • Abstract
  • Experience
  • FirstName
  • LastName
  • FullName
  • JobTitle
  • Biography

In the schedule we have:

  • Date
  • Title
  • StartTime
  • EndTime
  • Type
  • Start
  • End

The Data Model

We want to extract the Session and the Presenter, as well as attributes like: UseCase, Audience, Experience, Topic, Company into their own nodes.

The data importer allows us to load the CSV’s and map them to our graph model, one by one.

Mapping each attribute requires each time to create the extra node with a singular attribute and the link it to the session via that attribute and the session’s title.

To save you some work we’ve provided a zip file with model and data that you can load into the data importer directly.

We can use a trick for merging the additional CSV by declaring a separate node with the same label Session, and mapping the CSV file to it, while using the matching value for Title as the ID.

Data Import

We now just have to click Run Import and supply our password to get the data imported in a few seconds.

If we head over to Query, i.e. Neo4j Browser, we can look at all our data there:

The final bit of data import is some post-processing:

As the UseCase data is stored as a comma separated string property, we need to split that into a list and then turn the list into rows with UNWIND. Then we can create UseCase node for each entry uniquely with MERGE and connect them to our Session.

MATCH (s:Session)
UNWIND split(s.UseCase,', ') AS uc
WITH * WHERE trim(uc) <> ''
MERGE (u:UseCase {name:uc})
MERGE (s)-[:FOR_USECASE]->(u)

We also want to turn the Date and the Start, End, StartTime, EndTime properties into proper datetime values instead of strings.

MATCH (s:Session)
SET s.Date = date(s.Date)
SET s.Start = localdatetime(s.Start)
SET s.End = localdatetime(s.End)
SET s.StartTime = localtime(s.StartTime)
SET s.EndTime = localtime(s.EndTime)

Data Exploration

We can see a single session with it related context.

MATCH (s:Session)
WITH s LIMIT 1
MATCH path = (s)--()
RETURN path

Let’s look at some breakdowns across the attributes, e.g. which Use-Case has the most sessions:

MATCH (uc:UseCase)<-[:FOR_USECASE]-()
RETURN uc.name, count(*) as c
ORDER BY c DESC

Data Science is Leading with App Development as a close second. Note that a single session, can have multiple use-cases.

╒═════════════════════╤═══╕
│"uc.name" │"c"│
╞═════════════════════╪═══╡
│"Data Science" │44 │
├─────────────────────┼───┤
│"Apps & APIs" │39 │
├─────────────────────┼───┤
│"Cypher" │28 │
├─────────────────────┼───┤
│"Best Practices" │28 │
├─────────────────────┼───┤
│"Language-specific" │26 │
├─────────────────────┼───┤
│"Graph Visualization"│23 │
├─────────────────────┼───┤
│"Cloud" │22 │
├─────────────────────┼───┤
│"Aura" │19 │
├─────────────────────┼───┤
│"Knowledge Graphs" │17 │
├─────────────────────┼───┤
│"Operations" │16 │
├─────────────────────┼───┤
│"Security" │15 │
├─────────────────────┼───┤
│"Full Stack" │12 │
├─────────────────────┼───┤
│"GraphQL" │5 │
├─────────────────────┼───┤
│"Use Case" │4 │
├─────────────────────┼───┤
│"Telecom Networks" │1 │
├─────────────────────┼───┤
│"Digital Twin" │1 │
├─────────────────────┼───┤
│"Fraud detection" │1 │
├─────────────────────┼───┤
│"Knowledge Graph" │1 │
└─────────────────────┴───┘

Or what is the overlap between use-cases and industries (we need to exclude General as industry):

MATCH (uc:UseCase)<-[:FOR_USECASE]-(s)-[:FOR_INDUSTRY]->(i:Industry)
WHERE NOT i.Industry in ['General']
RETURN uc.name, i.Industry, count(*) as c
ORDER BY c DESC
LIMIT 5

Great to see that there are a lot of data-science and healthcare related talks.

╒══════════════════╤═══════════════════════╤═══╕
│"uc.name" │"i.Industry" │"c"│
╞══════════════════╪═══════════════════════╪═══╡
│"Data Science" │"Healthcare" │7 │
├──────────────────┼───────────────────────┼───┤
│"Data Science" │"Financial" │4 │
├──────────────────┼───────────────────────┼───┤
│"Data Science" │"Biotechnology" │3 │
├──────────────────┼───────────────────────┼───┤
│"Knowledge Graphs"│"Healthcare" │3 │
├──────────────────┼───────────────────────┼───┤
│"Data Science" │"Aerospace and Defense"│3 │
└──────────────────┴───────────────────────┴───┘

We can also compute percentages of all talks for a given use-case, per industry:

MATCH ()-[:FOR_INDUSTRY]->(i:Industry)
WITH i, count(*) as all
MATCH (n:UseCase)<-[:FOR_USECASE]-(session)-[:FOR_INDUSTRY]->(i)
WHERE NOT i.Industry IN ['General']
RETURN n.name as useCase, i.Industry, count(*) as count, count(*)*100/all as percent, all
ORDER BY count DESC LIMIT 10

Which gives us a better understanding of focus areas than the absolute numbers.

╒═════════════════════╤═══════════════════════╤═══════╤═════════╤═══│"useCase"            │"i.Industry"           |count  │percent  │all
╞═════════════════════╪═══════════════════════╪═══════╪═════════╪═══│"Data Science" │"Healthcare" │7 │77 │9 │
├─────────────────────┼───────────────────────┼───────┼─────────┼───│"Data Science" │"Financial" │4 │100 │4 │
├─────────────────────┼───────────────────────┼───────┼─────────┼───│"Data Science" │"Biotechnology" │3 │100 │3 │
├─────────────────────┼───────────────────────┼───────┼─────────┼───│"Knowledge Graphs" │"Healthcare" │3 │33 │9 │
├─────────────────────┼───────────────────────┼───────┼─────────┼───│"Data Science" │"Aerospace and Defense"│3 │100 │3 │

Or between Topic and Industry:

MATCH (t:Topic)<-[:ON_TOPIC]-(session)-[:FOR_INDUSTRY]->(i:Industry)
WHERE NOT i.Industry IN ['General']
RETURN t.Topic, i.Industry, count(*) as count
ORDER BY count DESC LIMIT 5

Which allows to see us the healthcare applications.

╒═════════════════╤═══════════════════════╤═══════╕
│"t.Topic" │"i.Industry" │"count"│
╞═════════════════╪═══════════════════════╪═══════╡
│"Knowledge Graph"│"Healthcare" │3 │
├─────────────────┼───────────────────────┼───────┤
│"Knowledge Graph"│"Aerospace and Defense"│2 │
├─────────────────┼───────────────────────┼───────┤
│"GDS-UC" │"Telecom" │2 │
├─────────────────┼───────────────────────┼───────┤
│"GDS-UC" │"Healthcare" │2 │
├─────────────────┼───────────────────────┼───────┤
│"GDS-UC" │"Financial" │2 │
└─────────────────┴───────────────────────┴───────┘

Schedule

We can also look at the schedule for one day, e.g. by sorting by start time and aggregating the titles into a list.

MATCH (s:Session)
WHERE date(s.Start) = date('2022-06-06')
RETURN localtime(s.Start) as start, collect(s.Title) as titles
ORDER BY start ASC
Optionally we could create Nodes for Times and Rooms, if we wanted to represent them in our graph model.

We can also just see if we can fill our day with talks from specific use-cases.

MATCH (s:Session)-[:FOR_USECASE]->(uc)
WHERE uc.name IN ['Best Practices','Aura']
WITH distinct s
WHERE date(s.Start) = date('2022-06-06')
RETURN localtime(s.Start) as start, collect(s.Title) as titles
ORDER BY start ASC

Ah, nice, that is our day then:

╒══════════╤═══════════════════════════════════════════════╕
│"start" │"titles" │
╞══════════╪═══════════════════════════════════════════════╡
│"10:00:00"│["Accelerating ML Ops with Graphs and Ontology-│
│ │Driven Design"] │
├──────────┼───────────────────────────────────────────────┤
│"10:50:00"│["Introduction to Neo4j AuraDB: Your Fastest Pa│
│ │th to Graph"] │
├──────────┼───────────────────────────────────────────────┤
│"11:10:00"│["Node Art","Taming Large Databases"] │
├──────────┼───────────────────────────────────────────────┤
│"11:30:00"│["Tracking Data Sources of Fused Entities in La│
│ │w Enforcement Graphs"] │
├──────────┼───────────────────────────────────────────────┤
│"13:00:00"│["Discovery and Insights with Graph Visualizati│
│ │on Using Neo4j Bloom","New! Monitoring and Admi│
│ │nistration with Neo4j Ops Manager"] │
├──────────┼───────────────────────────────────────────────┤
│"13:45:00"│["Introducing Workspaces, a New Experience for │
│ │Neo4j Developer Tools","Trucks on a Graph: How │
│ │JB Hunt Uses Neo4j"] │
├──────────┼───────────────────────────────────────────────┤
│"14:30:00"│["The Inside Scoop on Neo4j: Meet the Builders"│
│ │,"Knowledge Graphs for Pharma: Powered by Data │
│ │& AI Using KG4P","Getting the Most From Today's│
│ │ Java Tooling With Neo4j"] │
├──────────┼───────────────────────────────────────────────┤
│"15:15:00"│["Neo4j Drivers Best Practices"] │
└──────────┴───────────────────────────────────────────────┘

Recommendation

Now we could look at recommending talks by shared attributes, so the more attributes a talk shares with a talk I like, the more similar it is (of course we can also do the opposite, sorting ASC to get the least overlap and more diversity).

Or we fix any attribute we like to stay the same, by adding or excluding relationship-types.

We also don’t want the recommended talk to overlap in terms of start time, so we add this as an exclusion.

After finding our top 5 recommendations, we can fetch the speaker and their company as additional information to return.

MATCH (s:Session {Title:'Using Graph Analytics to Solve Cloud Security Problems'})
// overlap across attributes
MATCH (s)--(attr)--(s2:Session)
WHERE date(s.Start) <> date(s2.Start)
OR localtime(s.Start) <> localtime(s2.Start)
// compute occurrence frequency
WITH s2, count(*) as freq, collect(attr) as overlap
ORDER BY freq DESC LIMIT 5
// fetch additional data for the session
MATCH (s2)<-[:PRESENTS]-(sp:Speaker)-[:WORKS_AT]->(c:Company)
RETURN s2.Title, s2.Start, sp.FullName, c.Company, freq, overlap

Visualization

For visualization we can look at Explore aka Neo4j Bloom.

We open it from our Aura Console and generate a perspective to be used.

The we can e.g. query all the Data Science sessions, their speakers and companies.

We can also take our recommendation query from above and turn it into a saved search phrase in Bloom, so that we can visualize each recommendation. We can pick the parameter from suggestions from the actual title data and return paths instead of scalar data.

I hope you’re coming to GraphConnect, don’t forget our 10 free ticket offer (email to michael@neo4j.com) or use code “Community50” for 50% off on your tickets! Register here.

You can use similar graph structures for other conferences, or curriculum schedules.

Happy Graphing


Discover Aura Free — Week 27 — GraphConnect Talks was originally published in Neo4j Developer Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.