Thanks! We are pretty excited about how the work has been progressing, but even more about the community that has been emerging around graph.
Reading through your federated wiki: so many parallels and synergy, I am not even sure where to start!
Ubiquitous Language - This is fascinating and is a tremendous challenge inside organizations. Language is the foundation of culture. What I have observed over many years of working professional services engineering firms is that this is actually the crux of what prevents effective data value streams from being possible to create. However, it is also an essential part of how these (and any) organization needs to work, and the reality is it is nearly impossible to create a fully overarching language. Our goal therefore is to accept the language as it is, and make the system adaptable, fluid and flexible because language is emergent and will constantly change.
The ‘Menome’ part of the equation stems from: all organizations are in the business of managing the information and knowledge associated with what makes their business successful. Organizations therefore denote ‘information environments’.
People who work in these ‘information environments’ are the force of natural selection that drive the evolution of information and knowledge inside these information environments by selecting information that helps their business survive. These environments are dynamic and change in response to external pressures and internal pressures due to growth etc.
When a company is small, a ubiquitous language that makes up the information elements that describe the business forms organically. Conventions people use to describe their work (projects, sites, observations, samples, legal documents etc.) emerge: typically using some form of code convention. Over time as the organization grows, more people means more teams, more teams means more process, more process means more systems. The language used to describe the business starts to fragment into silos: driven by team specificity emergence coupled with feedback and constraints of systems brought in to manage the needs of that team. The language becomes fractured so that a person in one system no longer maps to the representation of a person in another system that supports a different team. (lots more observations/thoughts here!!)
The thing is though that both aspects are valid: The specific language of the team is essential for the smooth operation of that team, but the unified representation of the overall business is also needed. Part of the design of theLink and the reason for the graph database is that ultimately, we want to be able to support both aspects: the ‘bounded context‘ source language and the overall unified language, and be able to translate smoothly between the two when needed. We are not there yet on that front, but the atomization/faceting pattern is a step in that direction. This is an area where we are deriving insight from Stafford Beers work on the Viable Systems Model which recognizes the fractal nature of organizations, and how language/messaging patterns are what binds the parts of the organization together. In this model, the internal communications and language of a team can be different than messages sent between teams.
As such – we have gone down a different path than that of ETL. We have landed on continuously integrating data from many sources using the multi-agent system (parallels viable system), eventually consistent approach. Thus far we are gaining considerable benefits from the approach, as it addresses many of the challenges I have run into with the more rules/pipeline based ETL pattern.
Our current approach is that the harvesters translate the source model to a central model through the atomization process. Each harvester is completely separate and independent from others. Data messages produced are rendered in a JSON format, and sent out over AMQP (debated about Kafka – for now have been using AMQP via RabbitMQ as its simpler given current use case).
A data refinery for Neo4j currently renders the messages into a graph – the graph is emergent based on the gradual merge of the messages from the various sources over time.
The merge currently is done using a defined set of conformed dimensions or patterns. As you observe, it is not always possible to have a fully reliable conformed dimension to perform merges from different contexts. We therefore have adopted a pattern in which we use the graph edges to denote ‘strength’ of the relationship: 1 is a hard match, probability gives confidence level based on a range of string/pattern matching methods. This way, we can still merge partially conformed data, but provide signals as to this to users.
Right now this is a bit limited, but will improve over time as we continue to refine the bot’s ability to use various matching algorithms. We are not currently preserving the original domain structure beyond the core properties. We think that the way to do this would be to have the central agnostic overriding domain language act as 'anchors' that then are linked to the original domain context boundary language.
Organizational design 'as a graph' is a larger aspect of something we have been investigating through discovering and surfacing the 'hidden' organization that exists beyond the typical hierarchy. We have been working with a local improv community who have grown organically using a 'circle' model akin to holocracy in which each circle is self-organizing, and exists for a specific purpose. Once that purpose is accomplished, the circle can dissolve. Members of the circle can move to other circles. People can be members of more than one circle. The 'leadership' circle is also transient in that members can move in and out of that circle as needed. (Some of this is based on the Viable Systems Model as envisioned by Stafford Beer).
I'll have to spend some time exploring what you have on the federated wiki on this front..
'Hyper Knowledge' (for lack of a better term): the effective capture, curation and organization of personal knowledge, and sharing of that personal knowledge with a larger community is a huge personal focus of several of our team members, and is the thing that provides the drive for the work we are doing as we strive to build funding through bootstrapping off the integration work.
Our goal thus far has been to provide the knowledge graph context by harvesting explicit (structured) data, implicit (unstructured/files) data in order to provide a foundation for making it easy for people to submit tacit knowledge.
We have been studying literature on the subject of hypertext starting with As We May Think and reading/tracing any and all sources we can find on the subject (JCR Licklider, Douglas Engelbart, Alan Kay, Ted Nelson, Tim Berners-Lee, your work on the Wiki...etc..)
What I have found is outside of a few notable exceptions, we seem to have fallen into the Marshall McLuhan trap of ‘we shape our tools and then our tools shape us’ in that a lot of the current innovations associated with knowledge have been lost or sidelined due to the way tools have manifested themselves through the drive for commercialization that has become the overriding factor in much of the development of consumer technology since it ‘escaped’ from the Parc Labs.
Our knowledge tools seem to have devolved over time as the large players that have emerged are driven by the overriding goal of harvesting people’s data and attention to their own benefit.
This is one reason why the graph database has been such an exciting development. It has enabled us to start to ‘go back in time’ and seek to harness some of the original thinking that existed prior to file folders, relational databases, cloud platforms as the latest 'lock in' pattern.
We have been experimenting with the idea of ‘graph documents’ – documents that are composed of the ‘smallest bit of information that can stand by itself’ linked into chains. The As We May Think metaphor of Index Cards and Associative Trails combined with the Robert Pirsig Lila description of Phaedrus ‘Zettlekasten’ http://members.optusnet.com.au/charles57/Creative/Idea_Recording/lila.htm
Blending this with the Wiki concepts of allowing for rapid and dynamic linking of the cards together in a way that allows people to quickly and easily capture, organize and more importantly, re-organize their thoughts dynamically and constantly is the goal.
Key knowledge activities:
- knowledge consolidation. A large amount of knowledge is in front of me and I want to classify and organize it so as to give it more meaningful structure in my mind
- the knowledge is not in front of me or doesn't exist. I have to discern it through conversation, experimentation, reading, etc.
- Sharing or collaborating on knowledge with others
Page rank is great for case 1 and crunchy in case 2. Evolution is the opposite. There's a similarity here in breadth first vs depth first discovering. Breadth first is likely optimal in case 1 - all the information is there, it's faster to skim send identity high value stuff then read each article in depth seriously to get a high level view.
Depth first is likely optional in case 2. Every opportunity to learn should be maximized instead of trying to rush off from one conversation to the next trying to piece it all together. The evolutionary approach affords you the time to meaningfully engage in each opportunity and develop your understanding over time, whereas the context is not conducive to high frequency switching of the knowledge source you are engaging with. The evolution of knowledge is a graph that ‘spreads out’ – each version of a ‘card’ pushed back in a time-chain. The Richard Dawkins’ biomorphs concept is a way I visualize this: people evolving the individual knowledge elements as they naturally select the elements that meet their needs at that time. A dimensional space in which people are rapidly able to navigate and select sub-sets of knowledge using the key characteristics of that knowledge.
Current knowledge platforms seem to be either designed for open sharing and collaboration (typically open source), or are closed systems that enable you to take notes, but are not well suited to sharing (typically commercial).
We want to handle both cases. Imagine n organization really being a whole bunch of ‘myLink’ systems – one for each person that are connected into a larger knowledge graph.
A person can draw from the collective knowledge into their own graph, classify, organize, blend and extend it as much as they desire using their own context, and then share/publish the knowledge back into their cirlces, or the overall organizational circle…
hm..drat - this is getting kind of long...