Mastering linked data in PHP: Graphs and algorithms - Presentation
Video transcript
We utilized ChatGPT to enhance the grammar and syntax of the transcript.
Introduction: Exploring graphs beyond the surface
Today, we're going to talk about graphs. Just to be precise, not the colorful kind of graphs but the kind that considers points and how they are connected. My name is Christian, and I currently work at Shopware, where I'm the resident madman doing all kinds of crazy experiments with whatever technology might be interesting for the future or that we haven't explored yet, like graphs.
Understanding tangles: The complexity of connected structures
Today, it's all about tangles—how things are when you put them together and how you can get them back out. There are different kinds, like webs, tangled structures, cities from above, and maps. All of them share something: they are visually busy and very connected. Untangling structures like these isn't a new issue; it's been around for hundreds of years.
The seven bridges of Königsberg: A historical perspective
One of the first problems where it occurred is the Seven Bridges problem of Königsberg, which is now called Kaliningrad. This was in 1736 when the question came up: How can I move across the city and cross each bridge only once? This seemingly simple question turned out to be mathematically profound. Mathematician Leonard Euler figured out that you can't solve this problem with normal tools like geometry or algebra. Instead, he thought about the parts of the problem: the places you can be, and how you move between these parts. He showed that if a node has an uneven number of connections, you cannot move across all of them without using one of those connecting edges twice, proving it is mathematically impossible to find such a path.
Defining graphs: Vertices and edges
Mathematicians today describe graphs simply as a function of vertices and edges. Vertices are just points, and edges are how those points are connected. Graphs are all about how data points are connected; they don't have coordinates and can be arbitrarily placed in space.
Graphs in web development: Connecting data and structure
Now, how does this relate to our work? As web developers, we deal with data all the time, and understanding the structure of that data is crucial. Tools like JQ, a command-line tool for reading and modifying JSON data, allow us to navigate through a piece of JSON, treating it as a graph. This kind of structure is what unites maps, cities, and even navigation through data.
Practical applications: Handling dependencies with directed graphs
For instance, at Shopware, we faced a challenge when writing an API to insert data in a developer-friendly format into our database. The issue was that you cannot create a product with certain categories without those categories already existing. This problem reminded me of a directed graph, where the direction of connections (or dependencies) matters.
Directed graphs help solve such dependencies by allowing us to determine the correct order in which to insert data. By applying algorithms like Kahn's algorithm, we can ensure that data is inserted without violating any foreign key constraints. This approach also helps in identifying and handling cycles in data, which can break the insertion process.
Analyzing code structure: Graphs in your codebase
Another practical application of graphs is in understanding the structure of your codebase. For example, Git is actually a tree, which is a type of graph. By analyzing the commit history and how files are connected, you can optimize the architecture of your project. You can even transform this data into a social network of files, showing how changes in one file affect others.
Optimizing with graph algorithms: Betweenness centrality and more
Using graph algorithms, you can calculate metrics like betweenness centrality, which helps identify which files are critical in connecting different parts of your system. This kind of analysis can highlight potential bottlenecks or areas where your codebase might be more fragile.
Conclusion: The ubiquity and power of graphs
In conclusion, graphs are everywhere, and understanding them can provide powerful insights into both your data and your code structure. I hope this talk has inspired you to look at your work through the lens of graphs. Thank you, and enjoy the rest of the day.
Q&A
Question: What are some good reasons to use graph structures instead of other data structures? Can you provide more examples?
Answer: One example is using graph structures to analyze how classes are allowed to depend on each other. By displaying these dependencies as a graph and applying layout algorithms, you can see which parts of your system are strongly coupled. This can help in identifying how closely related different parts of your system are, such as how closely your checkout process is related to your database or payment systems.
Question: Is there any toolset to analyze repositories and generate graphs with the depth of relationships between classes? What did you use to generate the graphs?
Answer: The standard answer would be to use Python and NetworkX, but this can be slow for large graphs. For a whole repository, I rewrote the Python algorithm in Rust and ran it in parallel, which significantly reduced the processing time. The tool I used is available on my GitHub, though it's an unfinished tool that outputs CSV files, which you can then visualize using tools like Excel or a graphing library.
Question: Do you have the code where you calculated the centrality for certain parts of your system?
Answer: Yes, it's available on my GitHub under the name "rawal." It's a bit of an unusual name, but I can share it in the Symfony Slack if you're interested.