TDA and Statistics using Gudhi Python Library

Part 2 - The bottleneck distance

In this second part of the tutorial we compute bottleneck distances between persistence diagrams.

from IPython.display import Image
Image("SlidesGudhi/GeneralPipeLine_bottleneck.png")

import numpy as np
import pandas as pd
import pickle as pickle
import gudhi as gd
from pylab import *
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D
from IPython.display import Image
from sklearn import manifold
%matplotlib inline

We consider the fourteen MBP sructures, we compute the matrix of distances associated to each configuration:

path_file = "./Peter corr_ProteinBinding/"
files_list = [
'1anf.corr_1.txt',
'1ez9.corr_1.txt',
'1fqa.corr_2.txt',
'1fqb.corr_3.txt',
'1fqc.corr_2.txt',
'1fqd.corr_3.txt',
'1jw4.corr_4.txt',
'1jw5.corr_5.txt',
'1lls.corr_6.txt',
'1mpd.corr_4.txt',
'1omp.corr_7.txt',
'3hpi.corr_5.txt',
'3mbp.corr_6.txt',
'4mbp.corr_7.txt']
corr_list = [pd.read_csv(path_file+u , header=None,delim_whitespace=True) for u in files_list]
dist_list = [1- np.abs(c) for c in corr_list]

Exercice. Compute and store in a list the fourteen Rips complex filtrations.

Bottleneck distance¶

Image("SlidesGudhi/Bottleneck0.png")

Image("SlidesGudhi/Bottleneck.png")

Documentation

We can compute the distance between two diagrams using the function bottleneck_distance():

gd.bottleneck_distance(persistence_list0[0], persistence_list0[1])

Exercice. Compute the matrix of bottleneck distances for dimensions 0 and 1 (it will take a few seconds).

Visualization via Multidimensional Scaling¶

We can apply a dimension reduction method to visualize a configuration in $\mathbb R^2$ which almost matches with the matrix of bottleneck distances. We apply a Multidimensional Scaling method implemented in the scikit-learn library.

mds = manifold.MDS(n_components=2, max_iter=3000, eps=1e-9,
                   dissimilarity="precomputed", n_jobs=1)
pos = mds.fit(B0).embedding_

plt.scatter(pos[0:7,0], pos[0:7, 1], color='red', label="closed")
plt.scatter(pos[7:l,0], pos[7:l, 1], color='blue', label="red")
plt.legend( loc=2, borderaxespad=1)

mds = manifold.MDS(n_components=2, max_iter=3000, eps=1e-9,
                   dissimilarity="precomputed", n_jobs=1)
pos = mds.fit(B1).embedding_

plt.scatter(pos[0:7,0], pos[0:7, 1], color='red', label="closed")
plt.scatter(pos[7:l,0], pos[7:l, 1], color='blue', label="red")
plt.legend( loc=2, borderaxespad=1)