The Practitioner’s Guide to Graph Data
Think about the last time you searched for someone on a social media platform. What did you look at on the results page? Most likely, you started scanning down the names in the list of profile results. And you probably spent most of your time inspecting the “shared friends” section to understand how you knew someone.
Content Toc
Preface
Applying Graph Thinking and Graph Technologies to Solve Complex Problems
From Book
Think about the last time you searched for someone on a social media platform. What did you look at on the results page? Most likely, you started scanning down the names in the list of profile results. And you probably spent most of your time inspecting the “shared friends” section to understand how you knew someone. Our innate human behavior of reasoning about our shared friends on social media is what inspired us to write this book. Though, our shared inspiration generated two very different reasons behind why we wrote this book. First, have you ever stopped to think about how an app creates the “shared friends” section?
The engineering required to deliver your “shared friends” in search results creates an intricate orchestration of tools and data to solve an extremely complex, distributed problem. We have either built those sections or created the tools that deliver them. Our passion for understanding and teaching others from our collective experiences is the first reason we chose to write this book together. The second reason is that anyone who uses social media intuitively derives personal context directly from the “shared friends” section. This process of reasoning and thinking about relationships within data is called graph thinking, and that is what we name the human approach to understanding life through connected data. How did we all learn to do this?
There wasn’t a specific point in time when we all were taught this skill. Processing relationships among people, places, or things is just how we think. It is the ease with which people infer context from relationships, be it in real life or from data, that has ignited the wave of graph thinking. xiAnd when it comes to understanding graph thinking, most people fall into one of two camps: those who think graphs are about bar charts, or those who think graphs are way too complicated. Either way, these thought processes apply legacy approaches to thinking about data and technology. The problem is that the art of the possible has changed, our tools have improved, and there are new lessons to learn. We believe that graphs are powerful and deployable. Graph technology can make you more productive; we have worked with teams that told us so. This book brings these two mindsets together. Graph thinking closes the gap between how we humans operate/see/live and how we use data to inform a decision. Imagine seeing your whole world as a spreadsheet with rows and columns of data and trying to make sense of it all. For the majority of us, the exercise is unnatural and counterproductive. This is because relationships are how people navigate and reason about life. It is com‐ puters that need databases and operate in the world of rows and columns of data. Graph thinking is a way to solve complex problems by taking a relationship-centric approach. Graph technology bridges the gap between “relationships” and the linear memory constraints of modern computer infrastructure. As more people learn how to build with graph technology by applying graph think‐ ing, imagine what the next wave of innovation will bring.
Who Should Read This Book
This book aims to teach you two things. First, we will teach you about graph thinking and the graph mindset through asking questions and reasoning about data. Second, we will walk you through writing code that solves the most common, complex graph problems.
These new concepts are intertwined within the tasks commonly performed across a few different engineering functions.
Data engineers and architects sit at the heart of transitioning an idea from develop‐ ment into production. We organized this book to show you how to resolve common assumptions that can occur when moving from development into production with graph data and graph tools. Another benefit to the data engineer or data architect will be learning the world of possibilities that come from understanding graph thinking. Synthesizing the breadth of problems that can be solved with graph data will also help you invent new patterns for their use in production applications. Data scientists and data analysts may most benefit from reasoning about how to use graph data to answer interesting questions. All the examples throughout this text were constructed to apply a query-first approach to graph data. A secondary benefit
Prefacefor a data scientist or analyst will be to understand the complexity of using dis‐ tributed graph data within a production application. We teach and build upon the common development pitfalls and their production resolution processes throughout the book so that you can formulate new types of problems to solve.
Computer scientists will learn how to use techniques in functional programming and distributed systems to query and reason about graph data. We will outline fundamen‐ tal approaches to procedurally traversing graph data and step through their applica‐ tion with graph tools. Along the way we will learn about distributed technologies, too. We will be working within the intersection of graph data and distributed, complex problems; a fascinating combination of engineering topics with something to learn for any technologist.
Goals of This Book
The first goal of this book is to create a new foundation that exists at a very diverse intersection. We will be working with concepts from graph theory, database schema, distributed systems, data analysis, and many other fields. This unique intersection forms what we refer to in this book as graph thinking. A new application domain requires new terms, examples, and techniques. This book serves as your foundation for understanding this emerging field.
From the past decade of graph technology emerged a common set of patterns for using graph data in production applications. The second goal of this book is to teach you those patterns. We define, illustrate, build, and implement the most popular ways teams use graph technology to solve complex problems. After studying this book, you will have a set of templates for building with graph technology to solve this common set of problems.
The third goal of this book is to transform how you think. Understanding and apply‐ ing graph data to your problem introduces a paradigm shift into your thought pro‐ cesses. Through many upcoming examples, we aim to teach you the common ways that others think and reason about graph data within an application. This book teaches you what you need to know to apply graph thinking to a technology decision.
Navigating This Book
This book is organized roughly as follows:
Chapter 1
Chapter 1 discusses graph thinking and provides detailed processes for its appli‐ cation to complex problems.
Chapter 2,3
Chapters 2 and 3 introduce fundamental graph concepts that will be used throughout the rest of the book.
Chapter 4,5
Chapters 4 and 5 apply graph thinking and distributed graph technology to building a Customer 360 banking application, the most popular use case for graph data today.
Chapter 6,7
Chapters 6 and 7 into the world of hierarchical data and nested graph data through a telecommunications use case. Chapter 6 sets the stage for a common error that is resolved in Chapter 7.
Chapter 8,9
Chapters 8 and 9 discuss pathfinding across graph data in detail, using an exam‐ ple of quantifying trust in social transaction networks via paths.
Chapter 10,11,12
Chapters 10 and 12 teach you how to use collaborative filtering on graph data to design a Netflix-inspired recommendation system.
Chapter 11 can be thought of as a bonus chapter that illustrates how to apply entity resolution to the merging of multiple datasets into one large graph for col‐ lective analysis.
Each chapter pair (4 and 5, 6 and 7, 8 and 9, 10 and 12) follows the same structure. The first chapter in each pair introduces new concepts and a new example use case in a development environment. The second chapter delves into the details of production issues, such as