4.13.1. Plot loop length distribution

The loop length distribution plot is similar to the contact distance plot, but has different information. For example, we can explore the difference in the “loop extrusion” pattern [Fudenberg et al., Cell Reports, 2016] by looking at the loop length difference.

[1]:
%matplotlib inline
import pandas as pd
import numpy as np
import scipy.stats as sp
import seaborn as sns
import matplotlib.pyplot as plt
import os
[2]:
# Specify samples to plot
samplelist = ['Control', 'siCTCF', 'siRad21']

The obtained loops are stored in the file loops/SCALE/merged_loops.simple.bedpe. Read this file and show the loop length as a histogram.

[3]:
for sample in samplelist:
    path = "CustardPyResults_Hi-C/Juicer_hg38/" + sample + "/loops/SCALE/merged_loops.simple.bedpe"
    d = pd.read_csv(path, sep="\t", header=None)
    a = d[4] - d[2]
    hist, bins = np.histogram(a, bins=np.logspace(start=np.log10(200000), stop=np.log10(20000000), num=20))
    plt.plot(bins[1:], hist/hist.sum(),label=sample)

plt.xlabel("Loop length")
plt.ylabel("Cumulative proportion")
plt.xscale('log')
plt.legend()
[3]:
<matplotlib.legend.Legend at 0x7f73130b1ff0>
../_images/content_LoopLengthPlot_4_1.png

Alternatively, you can show the distribution with the heatmap, same with the contact distance plot.

[4]:
dfarray = []
for sample in samplelist:
    path = "CustardPyResults_Hi-C/Juicer_hg38/" + sample + "/loops/SCALE/merged_loops.simple.bedpe"
    d = pd.read_csv(path, sep="\t", header=None)
    a = d[4] - d[2]
    hist, bins = np.histogram(a, bins=np.logspace(start=np.log10(200000), stop=np.log10(20000000), num=20))
    df = pd.DataFrame(data=hist/hist.sum(), index=bins[1:], columns=[sample])
    dfarray.append(df)

dfarray = pd.concat(dfarray, axis=1).T
dfarray
[4]:
2.548550e+05 3.247553e+05 4.138276e+05 5.273302e+05 6.719637e+05 8.562665e+05 1.091119e+06 1.390386e+06 1.771734e+06 2.257676e+06 2.876900e+06 3.665961e+06 4.671443e+06 5.952703e+06 7.585380e+06 9.665860e+06 1.231696e+07 1.569520e+07 2.000000e+07
Control 0.175041 0.170808 0.196576 0.152770 0.107859 0.085956 0.053562 0.030738 0.015277 0.005706 0.002761 0.001657 0.000736 0.000184 0.000184 0.000000 0.000000 0.000000 0.000184
siCTCF 0.128997 0.130435 0.174991 0.144808 0.125045 0.112469 0.081567 0.052461 0.025153 0.014732 0.005390 0.000719 0.001797 0.000359 0.000359 0.000000 0.000000 0.000000 0.000719
siRad21 0.060250 0.054372 0.115356 0.109478 0.095518 0.120500 0.108009 0.095518 0.061719 0.065393 0.034533 0.030125 0.016165 0.007348 0.011021 0.005878 0.005143 0.000735 0.002939
[5]:
g = sns.clustermap(dfarray, cmap="hot", vmax=0.2, col_cluster=False, row_cluster=True, figsize=(5, 3))
plt.setp(g.ax_heatmap.get_yticklabels(), rotation=0)
[5]:
[None, None, None, None, None, None]
../_images/content_LoopLengthPlot_7_1.png