Introduction to Persistent Homology
By Matthew Wright
Summary
## Key takeaways - **Persistent homology finds topological features**: Persistent homology is an algebraic method used to identify topological features in data, such as connected components and holes, by analyzing the relationships between points. [00:11] - **Connecting points reveals data structure**: By connecting nearby points with a chosen distance parameter 'd' and forming a simplicial complex, we can begin to discern the topological features of a point cloud. [00:51] - **Barcodes visualize feature persistence**: Persistent homology represents topological features as intervals (bars) indicating the distance range over which they exist, with longer bars signifying more significant features. [04:15] - **Short bars mean noise, long bars mean features**: In persistent homology, short bars in a barcode typically represent noise or sampling irregularities, while long bars correspond to significant, persistent topological features of the data. [05:04] - **Stability is key for real-world data**: A crucial property of persistent homology is its stability: small changes in the input data lead to only small changes in the resulting barcode, making it robust to measurement errors. [05:29] - **Applications in image and signal analysis**: Persistent homology has been applied to analyze image data, revealing that collections of image patches can approximate a Klein bottle, and to detect periodicity in time-series data, even for damped signals. [06:07]
Topics Covered
- Why single-scale analysis fails to reveal true data topology.
- How barcode length reveals significant data features from noise.
- Persistent homology detects hidden patterns in images and time series.
- Are barcodes just visualizations, or deep algebraic structures?
Full Transcript
Hello, I am Matthew Wright, postdoctoral fellow at the Institute for Mathematics and its Applications.
In this video, I give an introduction to persistent homology, a popular tool in topological
data analysis and a subject of my research.
Persistent homology is an algebraic method of discerning the topological features of data.
By “topological features,” I mean things like components (or clusters) and holes.
Data can be, for example, a set of points with a metric that gives distances between pairs of points.
For example, consider the following point cloud, consisting of nineteen points in a plane.
What topological features does the data exhibit? It appears that these points
were roughly sampled from an annulus, but how can we detect the annulus from the points alone?
The problem is that discrete points have trivial topology.
One idea is to connect nearby points. We first choose a distance, or scale parameter, d.
We draw a ball of diameter d around each point. Two balls intersect exactly when two points
are no further apart than distance d, in which case we connect the two points with an edge.
This creates a graph whose vertices are the original points. The graph shows us that the
points form a single cluster at scale parameter d, but it doesn't tell us about higher-order
features, such as holes. For example, this graph has many cycles, but the graph doesn't
help us identify the central hole in the data.
To introduce a more sophisticated idea, we need a bit of background information.
First, a simplicial complex is an object built from points, edges, triangular faces, and so on.
A point is a zero-dimensional simplex, an edge between two points is a one-dimensional
simplex, a triangular face is a two-dimensional simplex, a solid tetrahedron is a three-dimensional
simplex, and so on for higher-dimensional simplices. If we glue many simplices together
in such a way that the intersection between any two simplices is also a simplex, we obtain
a simplical complex. Here is an example of a simplicial complex.
In topology, we define something called homology for simplicial complexes. While the formal
definition of homology is a bit too technical for this video, we can think of homology as
counting the connected components, holes, and voids of a simplicial complex.
Moreover, homology is computable via linear algebra. To learn more about homology, consult your
nearest algebraic topology textbook.
OK, back to our point cloud. Let’s connect nearby points and then build a simplicial
complex. We start with distance d, as before, and connect pairs of points that are no further
apart than d. This time, however, we fill in complete simplices. That is, if we see
three points connected by edges that form a triangle, we fill in the triangle with a
2-dimensional face. Any four points that are all pairwise connected get filled in with
a 3-simplex, and so on. The resulting simplicial complex is called the Rips complex, or the
Vietoris-Rips complex. We then apply homology to this complex, which reveals the presence
of the central hole.
There is still a problem, however. How do we choose the distance d? If d is too small,
we might see multiple connected components and small holes that are artifacts of the
sampling; in short, we detect noise. On the other hand, if d is too large, then any two
points get connected and we get a giant simplex, which has trivial homology. We don't know
what distance to choose without some special insight. This choice of d reveals a single
hole, but how do we know it is significant and not noise? Rather than choosing a single
distance d, we will consider all distances d.
Observe that each hole appears at some particular value of d and disappears at another value of d.
For example, consider these four points. There is some smallest distance, this d1,
which is just large enough for these four edges to appear, creating a hole in the middle.
For the same configuration of points, there is another distance, this d2, which is just
large enough for an edge to appear between opposite points. This completes two triangles
that get filled in, and the hole disappears. So the hole appears at distance d1 and disappears
at distance d2.
We can represent the persistence of this hole as a pair (d1, d2). We can also visualize
this pair as an interval, or bar, from d1 to d2. This bar is a visual representation
of the persistence of the hole. A collection of such bars is called a barcode, and barcodes
are a central object of study in persistent homology.
Let's return to our point cloud example. We will create growing balls around each point
and record the barcode. As the balls start to grow, edges begin to appear, and the complex
gradually becomes connected. A few small holes appear, but quickly become filled in.
The large, central hole appears, and remains as more and more 2-simplices appear around the
outside. Eventually, as the balls get large, edges appear across the center of the hole,
and it eventually gets filled in as well.
Now rewind the movie and play it again. The small holes that appear first are due to sampling
irregularities, or noise, and are represented by short bars in the barcode. The large hole,
which we regard a significant feature of the data, is represented by a long bar in the barcode.
This is the general interpretation of barcodes: short bars represent noise, long bars represent features.
A key property of barcodes is that they are stable under perturbations of the data. In
other words, if you adjust the points a little bit, the barcode only changes a little bit.
This stability is important in applications, in which measurements always have some margin of error.
Persistent homology is also computable via linear algebra. Standard algorithms exist to compute barcodes.
The worst-case runtime of these algorithms is cubic in the number of simplices, although the computation is rarely worst-case.
In addition, clever data structures and topological simplification can speed up the computation significantly.
Persistent homology has been applied to various problems. For example, Carlsson and others
studied the collection of high-contrast patches, 3-by-3 pixels in size, from a large set of
grayscale photographs. Treating the patches as 9-dimensional vectors and applying persistent
homology, they found that the resulting point cloud was close to the surface of a Klein
bottle embedded in the higher-dimensional space. This observation is useful for image-recognition applications.
For another example, Perea and others used persistent homology to detect periodicity
in time-series data. Using sliding-window embeddings, the researchers converted measurements
of a periodic signal into a point cloud in a high-dimensional space. Persistent homology,
applied to the point cloud, detects cycles, which correspond to periodicity in the original signal.
Applying this methodology to gene expression time-series data, the researchers
detected periodicity that was missed by other methods—especially periodicity in the presence of damping.
Persistent homology has been studied extensively from an algebraic perspective.
Specifically, a barcode is really a visualization of an algebraic structure. In our barcode animation,
we were really dealing with a sequence of simplicial complexes, each a subcomplex of the next.
That is, the simplicial complex constructed from the data for some small distance
is a subset of the simplicial complex constructed for a larger distance.
Equivalently, there is an inclusion map from each simplicial complex to the next.
This sequence of simplicial complexes, with inclusion maps, is called a filtration.
When we apply homology to a filtration, we obtain an algebraic structure called a persistence module.
Suppose we want to compute ith homology with coefficients from a field k. The homology
of any complex Cj is a vector space, and the inclusion maps between complexes induce linear
maps between homology vector spaces. The direct sum of the homology vector spaces is an algebraic
module – in fact, a graded module over the polynomial ring k[x]. The variable x acts
as a shift map, taking each homology generator to its image in the next vector space.
Furthermore, a structure theorem tells us that a persistence module decomposes nicely into a direct sum
of simple modules, each corresponding to a bar in the barcode.
In other words, a barcode really is an algebraic structure.
Persistent homology is a fascinating mathematical tool that continues to be studied, developed, and applied.
To learn more, consult the books and papers by Carlsson, Edelsbrunner, Ghrist, and others.
Or, see my website to learn about my work in multidimensional persistent homology.
Thank you for watching this video! I am Matthew Wright.
Loading video analysis...