TDA and Statistics using Gudhi Python Library

Part 3 - Bootstrap and Subsampling Procedures

In [1]:
from IPython.display import Image

In this third part of the tutorial we introduce bootstrap procedures for peristence homology. We start with the case of confidence regions for persistence homology of filtrations of simplicial complexes directly defined on point clouds.

In [ ]:
import numpy as np
import pandas as pd
import pickle as pickle
import gudhi as gd
from pylab import *
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
from IPython.display import Image
from sklearn.model_selection import ShuffleSplit
from sklearn.neighbors import KDTree
from sklearn.neighbors.kde import KernelDensity
import ipyparallel as ipp
%matplotlib inline

We will need additional functionalities for ploting confidence regions for persistence homology (coming in the next releases of Gudhi).

Download the python file and save it in your working repository (or in your python path).

In [ ]:
from persistence_graphical_tools_Bertrand import *

We illustrate the bootstrap procedure for the crater dataset with a filtration of alpha Complexes.

In [ ]:
f = open("crater_tuto","rb")
crater = pickle.load(f)
In [ ]:
In [ ]:
sns.kdeplot(crater, shade = True, cmap = "PuBu",bw=.3)

We define a filtration of alpha Complexes (it takes a few seconds)

In [ ]:
Alpha_complex_crater = gd.AlphaComplex(points = crater)
Alpha_simplex_tree_crater = Alpha_complex_crater.create_simplex_tree(max_alpha_square=2) 
diag_crater = Alpha_simplex_tree_crater.persistence()

Removing points close to the diagonal

For many applications of persistent homology, we observe many topological features closed to the diagonal.

Since they correspond to topological structures that die very soon after they appear in the filtration, these points are generally considered as noise. We will see that confidence regions for persistence diagram provide a rigorous framwork to this idea.

Exercice. Give the number of persistence intervals per dimension in the filtration.

In [ ]:

Representing in the diagram all the topological features is not relevant since most of them have very short persistence. Moreover, ploting all the points takes too much time. We want to select only the more persistent features of the filtration.

Exercice. For some given value k, compute the truncated persistence version of the Alpha Complex filtration by keeping only the k highest persistence intervals per dimension.

In [ ]:

Confidence regions for persistent homology

Confidence regions for persistence diagram provide a rigorous framework for selecting significant topological features in a persistence diagram.

We use the bottleneck distance $d_b$ to define confidence regions.

In [2]: