International Salmon Data Laboratory

ScottAkenhead · October 12, 2018, 8:24pm

A search and rescue team for salmon data

I wrote to Matt Jones, a database guru at the National Centre for Ecological Analysis and Synthesis (NCEAS), introducing ISDL and asking for advice. Specifically about compliance with Darwin Core protocols and ontology as we build knowledge graphs and analyses to share across the environmental sector. ISDL has much to learn from the NCEAS project Southern Alaska Salmon and People or SASAP.
NCEAS

js1 · October 13, 2018, 3:37am

Part of the challenge will be defining the standardization of the data. Darwin Core is a good starting point, but as I have been working through the transformation of the attached data, even the simple data sets feel like I'm fitting a square peg into a circle. We still want to preserve the "natural" way of thinking about the data that Neo4j and a graph model encourages to exercise it's true value. What that will probably mean is that we will have to define/extend a core model to fit more naturally into the domain of analytics. Working through the NUSED data with it's metadata really pushes the Darwin Core mapping to its limits.

I will share the data graphs on graph commons to show what the graph containing the data looks like in the next day or two. Working through the NUSED data may take longer.

We should start thinking about working inside Neo4j itself soon and load data into a fully queryable instance. We can then query through the Neo4j desktop.

js1 · October 13, 2018, 4:14pm

NPAFC catch data. You can search for "NPAFC" to get a list all of the available public graphs.

js1 · October 14, 2018, 2:49am

The beauty of graphs. This is the hatchery release data for US and Canada, NPAFC Hatchery Statistics Species Data - Canada, United States, graph 2b1841ac-73d1-4603-9492-2f281f375171.

ScottAkenhead · October 14, 2018, 7:44pm

Be careful what you ask for
Bewilderingly fast development by unseen people with strange tools in unfamiliar territory. Seriously, who would ask for that? Me. Now I am pressing frantically on this button marked “Genius on Demand” but nothing is coming out of the machine (translation: struggling to keep up).

At this point, we need define the components of a salmon graph: a set of ~15 "resources," with subtypes.Things like:
(:Organization{label:"NMFS", name:"National Marine Fisheries Service", type:"federal government"}
If, and how, these nodes and fields within nodes are compliant with Darwin Core is to be determined. A specific ISDL example would probably not involve all resources.

Knowledge: knowing how things are connected. Thus, a constrained set of links between these nodes. We do not need n(n+1)/2 types of edge (node-link-node) for n nodes. Nor links based on types within nodes. Some links are simple and widely used,
(:Place)-[In]-(:Place), (:Organization-[:In]-(:Organization).
Some are not,
(:Person)-[HasOrganization {label:"employed by", job_description:"research biologist"]-(:Organization)
Likely to see a large number of optional fields, with defaults, in all nodes and links. In addition to ID and date-stamps created by neo4j.

ScottAkenhead · October 14, 2018, 7:55pm

Neo4j Sandbox - Needed Immediately
ISDL is happening. We have salmon datasets recast as graphs. A database schema is emerging. We are moving toward pipelines/workflows and analyses. ISDL needs a server and stack for neo4j.

And now a word from our sponsors!

js1 · October 16, 2018, 4:21am

There is some affordance for this in Darwin Core with the RecordLevel class, but this class holds foreign keys or IDs for contributors etc. This can be expanded with relationships that point to more detailed information regarding the contributor or the organization. I'll add those relationships and nodes to the schema.

js1 · October 17, 2018, 10:25am

In terms of the Neo4j sandbox. I believe the default sandbox expires after 3 days. Will we have access to a sandbox with more longevity for the data?

js1 · November 5, 2018, 5:49am

Inspired by Mike Bostock's population pyramid, global salmon hatchery release data plotted as an interactive pyramid. https://isdl-220916.appspot.com/release

ScottAkenhead · November 8, 2018, 10:12pm

2015 paper: Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data ... _"The future, as data will continue to grow in size and complexity, lies in online analysis and storage for collaborative work, with methods that allow high interactivity with data for analysis and visualization." They see graph databases as the way out of the box. The real message is how quickly the technology moved.

ScottAkenhead · November 8, 2018, 10:26pm

Video from my ISDL talk at GraphConnect.
Abstract and slides.
Announcement: first IYS Salmon Data Workshop, Vancouver BC, 2019-01-23/24
Ecologists to contribute datasets and say what they need, technologists then figure out how.

ScottAkenhead · November 21, 2018, 11:39pm

Fretting about workflows that bounce between R and Neo4j.
Excited byMax Demarzi's talk at Graph Connect: decision trees and workflows WITHIN neo4j. Big step forward, thanks!
"The code is the data" -- oh yeah, now we're getting somewhere.
Find the rest of the story at decision tree | Max De Marzi

ScottAkenhead · November 28, 2018, 6:05pm

So on.
Confirmed: ISDL Workshop, 2019-01-25, 0800:1600, Pacific Salmon Commission Boardroom, 600-1155 Robson Street, Vancouver, BC, V6E 1B5.
That was the last obstacle. Oh yeah, it is on!
Attendance by Neo4j confirmed. Attendance by neo4j experts confirmed. Attendance by DFO database managers confirmed.

Definitely an interesting location. The Pacific Salmon Commission (USA + Canada) runs perhaps the world's most data intensive -- and just plain intense -- fisheries management process on the planet. Consider: DNA samples from salmon catches are processed and returned as population composition in 36 to 48 hours, informing weekly decisions about which fisheries will open in which fishing areas.

This first ISDL workshop will consider graph-based solutions to problems re data integration, processing, and analysis expressed by salmon ecologists in a preceding two-day workshop, also sponsored by the International Year of the Salmon. "Designated survivors" of preceding ecologists' workshop will attend, and may or may not survive this one.

ScottAkenhead · December 5, 2018, 9:08pm

Salmon data integration is critical:
“We collated smolt survival and smolt-to-adult (marine) survival data for all regions of the Pacific coast of North America excluding California to examine the forces shaping salmon returns. A total of 3,055 years of annual survival estimates were available for Chinook (Oncorhynchus tshawytscha) and steelhead (O. mykiss). This dataset provides a fundamentally different perspective on west coast salmon conservation problems from the previously accepted view.”
https://www.researchgate.net/publication/329214582_The_coast-wide_collapse_in_marine_survival_of_west_coast_Chinook_and_steelhead_slow_moving_catastrophe_or_deeper_failure

ScottAkenhead · December 5, 2018, 10:55pm

Building toward a graph db scheme for salmon data integration. Levering and extending the Darwin Core ontology, an international standard. Thank you, John Song.

How will salmon ecologists react to the need for elaborate standards for the salmon knowledge graph and attendent glossary: barriers worth removing, or barriers to participation?
But, ...
Are you sure that my "1SW coho" is your "grilse"? They're both age 1.1 right? I mean age 2_1.

ScottAkenhead · December 9, 2018, 8:43pm

You might imagine my excitement upon discovering that the Instituut Natuur-en Bosonderzoek in Brussels -- Institute for Nature and Forest Research -- records fish counts back to 1753! Alas, only 3 observations of Atlantic salmon: 2 in 2012, 1 in 2017. Which seems rather odd.

.

ScottAkenhead · December 9, 2018, 8:53pm

WorkFlow, finally!
Extensive search revealed several promising workflow packages, but either (A) hopeless as tools for persons other than professional computer programmers/sysadmins, or (B) not being maintained and, frankly, old-fashioned. Australia to the rescue, thank-you CSIRO, for WorkSpace.
An excellent GUI for building workflows, and, to my delight, nested workflows. "Nesting does make working with complex workflow arrangements significantly easier."
I have pounced on this like a Velociraptor on a tourist.

neo4j_devrel · December 18, 2018, 5:41pm

Thanks for keeping everyone in the community updated on this projects' updates. This is awesome!!!! It's really cool to be able to see the process as it moves along.

yelkamikolji · August 23, 2020, 9:26pm

Thanks for this. I am very interested in what happened with this project, if it's still ongoing, and how I can help. @js1 @ScottAkenhead . I was reading through @js1 google doc where he mentions the Cal Academy of sciences data begging to be a graph, and that is exactly what I'd like to do. I am new to graphs and Neo4j but very willing to do the work and learn. Any pointing in the right direction is appreciated.

ScottAkenhead · August 26, 2021, 12:58am

This first update since 2018-12 is to (a) tell you this project has been steadily developing, (b) to point to some presentations and papers about those developments, and (c) to apologize for rudely ignoring the 3,600 viewers of this topic.

– The January 2018 workshop International Salmon Data Laboratory, was part of the [North Pacific Anadromous Fish Commission] (NPAFC.org) program International Year of the Salmon. See NPAFC Technical Report #14, HERE. with presenation slides but not videos.

A review of progress 2017-2021 is in this 19 video "The Salmon Of Knowledge." I expected to deliver this in Hakodate Japan in May 2020, but actually delivered in my kitchen May 2021 because of expletive deleted COVID-19. See also the 1,500 word extended abstract

– Lastly, today we presented this project at the webinar [Neo4j Connections - A Virtual Event: Accelerating Innovation with Graphs Wed, Aug 25, 2021. If our video is posted (youtube?) I will let you know.

Topic		Replies	Views
International Salmon Data Laboratory - Scott Akenhead Introduce-Yourself	5	1068	March 25, 2019
Yelka Mikolji - You cannot protect what you don't even know exists. - Tech for nature Introduce-Yourself	1	378	September 1, 2020
Anuja - Neo4j Rookie! Introduce-Yourself	6	1171	September 29, 2018
Hands-On With The Neo4j Graph Data Science Sandbox Neo4j Developer Blog Archive	0	1456	March 7, 2020
Hello from Fred in San Diego Introduce-Yourself	0	256	March 11, 2020

International Salmon Data Laboratory

Related topics