4.13.1. Plot loop length distribution¶
The loop length distribution plot is similar to the contact distance plot, but has different information. For example, we can explore the difference in the “loop extrusion” pattern [Fudenberg et al., Cell Reports, 2016] by looking at the loop length difference.
[1]:
%matplotlib inline
import pandas as pd
import numpy as np
import scipy.stats as sp
import seaborn as sns
import matplotlib.pyplot as plt
import os
[2]:
# Specify samples to plot
samplelist = ['Control', 'siCTCF', 'siRad21']
The obtained loops are stored in the file loops/SCALE/merged_loops.simple.bedpe
. Read this file and show the loop length as a histogram.
[3]:
for sample in samplelist:
path = "CustardPyResults_Hi-C/Juicer_hg38/" + sample + "/loops/SCALE/merged_loops.simple.bedpe"
d = pd.read_csv(path, sep="\t", header=None)
a = d[4] - d[2]
hist, bins = np.histogram(a, bins=np.logspace(start=np.log10(200000), stop=np.log10(20000000), num=20))
plt.plot(bins[1:], hist/hist.sum(),label=sample)
plt.xlabel("Loop length")
plt.ylabel("Cumulative proportion")
plt.xscale('log')
plt.legend()
[3]:
<matplotlib.legend.Legend at 0x7f73130b1ff0>
Alternatively, you can show the distribution with the heatmap, same with the contact distance plot.
[4]:
dfarray = []
for sample in samplelist:
path = "CustardPyResults_Hi-C/Juicer_hg38/" + sample + "/loops/SCALE/merged_loops.simple.bedpe"
d = pd.read_csv(path, sep="\t", header=None)
a = d[4] - d[2]
hist, bins = np.histogram(a, bins=np.logspace(start=np.log10(200000), stop=np.log10(20000000), num=20))
df = pd.DataFrame(data=hist/hist.sum(), index=bins[1:], columns=[sample])
dfarray.append(df)
dfarray = pd.concat(dfarray, axis=1).T
dfarray
[4]:
2.548550e+05 | 3.247553e+05 | 4.138276e+05 | 5.273302e+05 | 6.719637e+05 | 8.562665e+05 | 1.091119e+06 | 1.390386e+06 | 1.771734e+06 | 2.257676e+06 | 2.876900e+06 | 3.665961e+06 | 4.671443e+06 | 5.952703e+06 | 7.585380e+06 | 9.665860e+06 | 1.231696e+07 | 1.569520e+07 | 2.000000e+07 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Control | 0.175041 | 0.170808 | 0.196576 | 0.152770 | 0.107859 | 0.085956 | 0.053562 | 0.030738 | 0.015277 | 0.005706 | 0.002761 | 0.001657 | 0.000736 | 0.000184 | 0.000184 | 0.000000 | 0.000000 | 0.000000 | 0.000184 |
siCTCF | 0.128997 | 0.130435 | 0.174991 | 0.144808 | 0.125045 | 0.112469 | 0.081567 | 0.052461 | 0.025153 | 0.014732 | 0.005390 | 0.000719 | 0.001797 | 0.000359 | 0.000359 | 0.000000 | 0.000000 | 0.000000 | 0.000719 |
siRad21 | 0.060250 | 0.054372 | 0.115356 | 0.109478 | 0.095518 | 0.120500 | 0.108009 | 0.095518 | 0.061719 | 0.065393 | 0.034533 | 0.030125 | 0.016165 | 0.007348 | 0.011021 | 0.005878 | 0.005143 | 0.000735 | 0.002939 |
[5]:
g = sns.clustermap(dfarray, cmap="hot", vmax=0.2, col_cluster=False, row_cluster=True, figsize=(5, 3))
plt.setp(g.ax_heatmap.get_yticklabels(), rotation=0)
[5]:
[None, None, None, None, None, None]