10. CustardPy API¶
Here we provide some small Python pieces that use the functions of CustardPy. This notebook assumes that the output of the CustardPy tutorial is stored in the CustardPyResults_Hi-C/Juicer_hg38/
directory.
[30]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2
from custardpy.loadData import *
from custardpy.HiCmodule import *
from custardpy.PlotModule import *
from custardpy.plotHiCfeature_module import *
from custardpy.DirectionalRelativeFreq import *
cm = generate_cmap(['#1310cc', '#FFFFFF', '#d10a3f'])
[31]:
# Specify samples to plot
samplelist = ['Control', 'siCTCF', 'siRad21']
Define the function to read the contact matrix and PC1 value from the CustardPyResults_Hi-C
directory.
The JuicerMatrix
function loads the contact matrix and the eigenvector (PC1). Here we define the get_samples
function that returns the list of the JuicerMatrix object.
[32]:
dirname = "CustardPyResults_Hi-C/Juicer_hg38"
def get_samples(samplelist, chr, norm, resolution):
samples = []
for sample in samplelist:
observed = f"{dirname}/{sample}/Matrix/intrachromosomal/{resolution}/observed.{norm}.{chr}.matrix.gz"
eigenfile = f"{dirname}/{sample}/Eigen/{resolution}/eigen.{norm}.{chr}.txt.gz"
samples.append(JuicerMatrix("RPM", observed, resolution, eigenfile=eigenfile))
return samples
10.1. Chromosome 14, 20 Mbp to 107 Mbp¶
As an example, we plot the contact matrix of the entire chromosome 14 (from 20 Mbp to 107 Mbp) with 100 kbp resolution. This setting is mainly for viewing the compartment structure.
[33]:
# Load contact maps for chromosome 14
chr = "chr14"
resolution = 100000
norm = "SCALE"
samples = get_samples(samplelist, chr, norm, resolution)
nsample = len(samples)
CustardPyResults_Hi-C/Juicer_hg38/Control/Matrix/intrachromosomal/100000/observed.SCALE.chr14.matrix.gz
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/Matrix/intrachromosomal/100000/observed.SCALE.chr14.matrix.gz
CustardPyResults_Hi-C/Juicer_hg38/siRad21/Matrix/intrachromosomal/100000/observed.SCALE.chr14.matrix.gz
[34]:
s = 20000000 # start position
e = 107000000 # end position
vmin = 0 # minimum value of the color scale
vmax = 50 # maximum value of the color scale
plt.figure(figsize=(10, 5))
for i, sample in enumerate(samples):
label = samplelist[i]
ax = plt.subplot(1, 3, i+1)
mat = sample.getmatrix().loc[s:e,s:e]
im = ax.imshow(mat, clim=(vmin, vmax), cmap=generate_cmap(['#FFFFFF', '#d10a3f']))
ax.set_title(label)
pltxticks_subplot2grid(0, mat.shape[1], s, e, 5, ax=ax)
ax.set_yticks([])
plt.tight_layout()
cbar = plt.colorbar(im, ax=plt.gcf().axes, orientation='vertical', fraction=0.015, pad=0.04)
cbar.set_label('Concact frequency')
Alternatively, you can use the log scale contact frequency with the getlog
function to emphasize the TAD and compartment structure.
[35]:
s = 20000000 # start position
e = 107000000 # end position
vmin = 0 # minimum value of the color scale
vmax = 6 # maximum value of the color scale
plt.figure(figsize=(10, 5))
for i, sample in enumerate(samples):
label = samplelist[i]
ax = plt.subplot(1, 3, i+1)
mat = sample.getlog().loc[s:e,s:e]
im = ax.imshow(mat, clim=(vmin, vmax), cmap=generate_cmap(['#FFFFFF', '#d10a3f']))
ax.set_title(label)
pltxticks_subplot2grid(0, mat.shape[1], s, e, 5, ax=ax)
ax.set_yticks([])
plt.tight_layout()
cbar = plt.colorbar(im, ax=plt.gcf().axes, orientation='vertical', fraction=0.015, pad=0.04)
cbar.set_label('Concact frequency')
10.2. chromosome 21, 25 Mbp to 31 Mbp¶
Next, we plot the region from 25 Mbp to 31 Mbp of chromosome 21 at 25 kbp resolution. This region corresponds to the Figure 1A of [Haarhuis et al. Cell, 2017].
[36]:
resolution = 25000
norm = "SCALE"
chr="chr21"
samples = get_samples(samplelist, chr, norm, resolution)
nsample = len(samples)
CustardPyResults_Hi-C/Juicer_hg38/Control/Matrix/intrachromosomal/25000/observed.SCALE.chr21.matrix.gz
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/Matrix/intrachromosomal/25000/observed.SCALE.chr21.matrix.gz
CustardPyResults_Hi-C/Juicer_hg38/siRad21/Matrix/intrachromosomal/25000/observed.SCALE.chr21.matrix.gz
[37]:
s = 25000000
e = 31000000
vmax = 100
vmin = 0
plt.figure(figsize=(12, 10))
for i, sample in enumerate(samples):
label = samplelist[i]
ax = plt.subplot(1, 3, i+1)
mat = sample.getmatrix().loc[s:e,s:e]
im = ax.imshow(mat, clim=(vmin, vmax), cmap=generate_cmap(['#FFFFFF', '#d10a3f']))
ax.set_title(label)
pltxticks_subplot2grid(0, mat.shape[1], s, e, 5, ax=ax)
ax.set_yticks([])
plt.tight_layout()
cbar = plt.colorbar(im, ax=plt.gcf().axes, orientation='vertical', fraction=0.015, pad=0.04)
cbar.set_label('Concact frequency')
The heatmaps clearly show the difference in TAD and loop structures between samples.
10.3. Plotting a sample pair in a single heatmap¶
Use the plot_SamplePair_triu
function to plot a sample pair in a single heatmap. Here we visualize Control
and siRad21
.
[38]:
resolution=25000
s = 25000000
e = 31000000
vmax = 100
vmin = 0
sample1 = samples[0]
sample2 = samples[2]
label1 = samplelist[0]
label2 = samplelist[2]
plot_SamplePair_triu(sample1, sample2, label1, label2,
resolution, s, e, vmax, vmin)
10.4. Triangle heatmap with TADs and loops¶
Let’s visualize triangle heatmaps. CustardPy has a plot_HiC_Map
function to visualize the Hi-C triangle heatmap.
Here we also visualize TADs and loops by specifying the path of the TADs and the loops called in the CustardPy workflow (see the tutorial for details). The loaded file paths are displayed in the logs, so you can verify that the correct files are being visualized.
[39]:
chr="chr21"
s = 25000000
e = 31000000
resolution = 25000
norm = "SCALE"
# Parameters for drawing
nrow_now = 0
nrow_heatmap = 4
nrow = nrow_heatmap * 3
colspan_plot = 12
colspan_colorbar = 2
colspan_full = (colspan_plot + colspan_colorbar)
vmax = 100 # maximum value of the color scale
vmin = 0 # minimum value of the color scale
distancemax = 0
plt.figure(figsize=(8, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
The black dashed lines and blue circles indicate TADs and loops, respectively.
If you don’t want to plot TADs and loops, simply skip loading them.
[40]:
chr="chr21"
s = 25000000
e = 31000000
resolution = 25000
norm = "SCALE"
# Parameters for drawing
nrow_now = 0
nrow_heatmap = 4
nrow = nrow_heatmap * 3
colspan_plot = 12
colspan_colorbar = 2
colspan_full = (colspan_plot + colspan_colorbar)
vmax = 100 # maximum value of the color scale
vmin = 0 # minimum value of the color scale
distancemax = 0
plt.figure(figsize=(8, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full)
nrow_now += nrow_heatmap
plt.tight_layout()
When visualizing a broader region, the triangle becomes too large to see each TAD and loop. In this case, set the distance_max
parameter in the drawHeatmapTriangle_subplot2grid
function to limit the maximum distance of the triangles.
[41]:
chr="chr21"
s = 25000000
e = 40000000
resolution = 25000
norm = "SCALE"
distancemax = 3000000 # Maximum distance for visualization
# Parameters for drawing
nrow_now = 0
nrow_heatmap = 4
nrow = nrow_heatmap * 3
colspan_plot = 12
colspan_colorbar = 2
colspan_full = (colspan_plot + colspan_colorbar)
vmax = 100 # maximum value of the color scale
vmin = 0 # minimum value of the color scale
plt.figure(figsize=(10, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
10.4.1. Adding 3D Features¶
Here is the way to visualize some 3D features.
10.5. Compartment PC1¶
To distinguish the genomic positions from bin positions, let’s define figstart
/figend
and sbin
/ebin
. The resolution is 25 kbp.
[42]:
figstart = 25000000
figend = 40000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
The plot_PC1
function plots the compartment PC1. The positive and negative values indicate the compartments A and B, respectively.
[43]:
norm = "SCALE"
nrow_heatmap = 4
nrow_eigen = 2
nrow = (nrow_heatmap +1 + nrow_eigen +1) *3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 2
colspan_full = (colspan_plot + colspan_colorbar)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 10))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap + 1
plot_PC1(nrow, nrow_now, nrow_eigen, sample, label, sbin, ebin, colspan_plot, colspan_full)
nrow_now += nrow_eigen + 1
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
10.6. Single insulation score¶
[44]:
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 2
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 12))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap + 1
plot_single_insulation_score(sample, label, nrow, nrow_now, nrow_feature,
figstart, figend, sbin, ebin,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full)
nrow_now += nrow_feature + 1
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
You can view the insulation scores of all samples in this way:
[45]:
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 2
nrow = (nrow_heatmap +1) * 3 + nrow_feature
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap +1
plot_single_insulation_score(samples, samplelist, nrow, nrow_now, nrow_feature,
figstart, figend, sbin, ebin,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full)
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
If you want to display the insulation score of the samples as a heatmap, use the options heatmap=True, lineplot=False
. In the heatmap, red regions indicate lower values (i.e. insulated regions).
[46]:
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 3
nrow = (nrow_heatmap +1) * 3 + nrow_feature
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap +1
plot_single_insulation_score(samples, samplelist, nrow, nrow_now, nrow_feature,
figstart, figend, sbin, ebin,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full,
heatmap=True, lineplot=False)
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
10.7. Multi-scale insulation scores¶
The plot_multi_insulation_score
function plots multi-scale insulation scores. The plot is similar to the one above (single insulation score for multiple samples), but note that this heatmap shows the insulation score at the different scale for each sample.
[47]:
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 2
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(9, 10))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap + 1
plot_multi_insulation_score(sample, label, nrow, nrow_now, nrow_feature,
figstart, figend, sbin, ebin,
colspan_plot, colspan_colorbar, colspan_full)
nrow_now += nrow_feature + 1
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
10.8. Directionality index¶
Use the plot_directionality_index
function to plot the directionality index.
[48]:
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distance = 500000 # distance for DI
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 2
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
DI_vmin = -2000 # Mimnimum value of Directionality index
DI_vmax = 2000 # Maximum value of Directionality index
plt.figure(figsize=(10, 12))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, figstart, figend, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap + 1
plot_directionality_index(samples, samplelist, nrow, nrow_now, nrow_feature,
sbin, ebin, figstart, figend, distance,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full,
vmin=DI_vmin, vmax=DI_vmax)
nrow_now += nrow_feature + 1
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
[49]:
figstart = 25000000
figend = 32000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distance = 500000 # distance for DI
distancemax = 3000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 3
nrow = (nrow_heatmap +1) * 3 + nrow_feature
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 1
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 100
vmin = 0
plt.figure(figsize=(10, 8))
for i, sample in enumerate(samples):
label = samplelist[i]
tadfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/TAD/" + norm + "/" + str(resolution) + "_blocks.bedpe"
loopfile = "CustardPyResults_Hi-C/Juicer_hg38/" + label + "/loops/" + norm + "/merged_loops.bedpe"
plot_HiC_Map(nrow, nrow_now, nrow_heatmap, sample, label, chr,
type, resolution, vmax, vmin, s, e, distancemax,
colspan_plot, colspan_colorbar, colspan_full,
tadfile=tadfile, loopfile=loopfile)
nrow_now += nrow_heatmap +1
plot_directionality_index(samples, samplelist, nrow, nrow_now, nrow_feature,
sbin, ebin, figstart, figend, distance,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full,
vmin=DI_vmin, vmax=DI_vmax, heatmap=True, lineplot=False)
plt.tight_layout()
CustardPyResults_Hi-C/Juicer_hg38/Control/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/Control/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siCTCF/loops/SCALE/merged_loops.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/TAD/SCALE/25000_blocks.bedpe
CustardPyResults_Hi-C/Juicer_hg38/siRad21/loops/SCALE/merged_loops.bedpe
10.8.1. Log ratio heatmap¶
To directly examine the depletion effect in the contact matrix, we visualize the log ratio heatmap here. First, create log-ratio matrix using the make3dmatrixRatio
function. The smooth_median_filter
parameter is used to smooth the data to mitigate the low coverage of the sparse matrix.
[50]:
smooth_median_filter = 3
Combined = make3dmatrixRatio(samples, smooth_median_filter)
labels = ["siCTCF", "siRad21"]
The Combined
object stores the log-ratio matrix of siCTCF/Control
and siRad21/Control
. You can use it to view the log-ratio heatmap as shown below.
[51]:
figstart = 27500000
figend = 32500000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
vmax = 1
vmin = -1
plt.figure(figsize=(6, 3))
for i in range(len(labels)):
ax = plt.subplot(1, 2, i+1)
im = ax.imshow(Combined[i, sbin:ebin,sbin:ebin], clim=(vmin, vmax), cmap=cm)
ax.set_title(labels[i])
pltxticks_subplot2grid(0, ebin-sbin, figstart, figend, 5, ax=ax)
ax.set_yticks([])
plt.tight_layout()
cbar = plt.colorbar(im, ax=plt.gcf().axes, orientation='vertical', fraction=0.015, pad=0.04)
cbar.set_label('log enrichment')
Red and blue regions indicate increased and decreased contact frequency, respectively, after depletion.
To draw the triangle heatmap, use the drawHeatmapTriangle_subplot2grid
function.
[52]:
nrow = 8
nrow_now = 0
colspan_full = 30
figstart = 27500000
figend = 32500000
plt.figure(figsize=(6, 8))
for i, sample in enumerate(Combined):
ax = plt.subplot2grid((nrow, colspan_full), (nrow_now, 0), rowspan=2, colspan=24)
drawHeatmapTriangle_subplot2grid(sample, resolution,
figstart=figstart, figend=figend,
vmax=1, vmin=-1, cmap=cm,
label=labels[i], xticks=True,
logratio=True, control_label='Control')
nrow_now += 2
plt.tight_layout()
10.9. Directional Relative Frequency¶
Recently, we have defined a new score Directional Relative Frequency (DRF) that captures the asymmetric changes in inter-TAD interactions after depletion [Nakato et al, Nat Commun., 2023].
The plot_directional_relative_frequency
function visualizes the line plot and heatmap of the DRF.
[53]:
figstart = 25000000
figend = 40000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 5000000 # Maximum distance for visualization
norm = "SCALE"
nrow_heatmap = 4
nrow_feature = 3
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 2
colspan_legend = 2
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmax = 1
vmin = -1
plt.figure(figsize=(10, 12))
for i, sample in enumerate(Combined):
label = samplelist[i]
heatmap_ax = plt.subplot2grid((nrow, colspan_full), (nrow_now, 0),
rowspan=nrow_heatmap, colspan=colspan_plot)
colorbar_ax = plt.subplot2grid((nrow, colspan_full), (nrow_now, colspan_plot +1),
rowspan=nrow_heatmap, colspan=colspan_colorbar)
drawHeatmapTriangle_subplot2grid(sample, resolution, figstart=figstart, figend=figend,
vmax=vmax, vmin=vmin, cmap=cm, label=label, xticks=True,
logratio=True, control_label='Control',
heatmap_ax=heatmap_ax, colorbar_ax=colorbar_ax)
nrow_now += nrow_heatmap + 1
plot_directional_relative_frequency(samples, samplelist, nrow, nrow_now, nrow_feature,
sbin, ebin, figstart, figend, resolution,
True, True,
colspan_plot, colspan_colorbar, colspan_legend, colspan_full)
plt.tight_layout()
The DRF does not distinguish whether the right side is enriched or the left side is depleted when the score is high. By using the plot_triangle_ratio_multi
function, it is possible to visualize the scores on both the left and right sides.
[54]:
figstart = 25000000
figend = 40000000
resolution = 25000
sbin = int(figstart / resolution)
ebin = int(figend / resolution)
distancemax = 5000000 # Maximum distance for visualization
nrow_heatmap = 4
nrow_feature = 3
nrow = (nrow_heatmap +1 + nrow_feature + 1) * 3
nrow_now = 0
colspan_plot = 12
colspan_colorbar = 1
colspan_legend = 3
colspan_full = (colspan_plot + colspan_colorbar + colspan_legend)
vmin = -1
vmax = 1
plt.figure(figsize=(10, 12))
plot_triangle_ratio_multi(samples, samplelist, nrow, nrow_now, nrow_heatmap, nrow_feature,
sbin, ebin, figstart, figend, distancemax, resolution,
vmin, vmax, colspan_plot, colspan_colorbar, colspan_legend, colspan_full)
plt.tight_layout()
To obtain the distribution of the DRF, you can use the get_drf_matrix
function.
[55]:
drf_right = True
drf_left = True
DRFMatrix = get_drf_matrix(samples, resolution, drf_right, drf_left)
plt.figure(figsize=(10, 3))
for i, sample in enumerate(DRFMatrix):
plt.plot(sample, label=labels[i])
plt.legend()
[55]:
<matplotlib.legend.Legend at 0x7f6514989e10>
[56]:
import session_info
session_info.show()
[56]:
Click to view session information
----- custardpy 0.7.1 matplotlib 3.8.3 numpy 1.25.2 pandas 1.5.3 scipy 1.10.1 session_info 1.0.0 sklearn 1.2.2 -----
Click to view modules imported as dependencies
PIL 10.2.0 anyio NA arrow 1.3.0 asttokens NA attr 23.1.0 attrs 23.1.0 babel 2.11.0 bottleneck 1.3.7 brotli 1.0.9 certifi 2024.02.02 cffi 1.16.0 charset_normalizer 2.0.4 colorama 0.4.6 comm 0.2.1 cycler 0.12.1 cython_runtime NA dateutil 2.8.2 debugpy 1.6.7 decorator 5.1.1 defusedxml 0.7.1 exceptiongroup 1.2.0 executing 0.8.3 fastjsonschema NA fqdn NA idna 3.4 ipykernel 6.28.0 isoduration NA jedi 0.18.1 jinja2 3.1.3 joblib 1.3.2 json5 NA jsonpointer 2.1 jsonschema 4.19.2 jsonschema_specifications NA jupyter_events 0.8.0 jupyter_server 2.10.0 jupyterlab_server 2.25.1 kiwisolver 1.4.4 markupsafe 2.0.1 matplotlib_inline 0.1.6 mpl_toolkits NA nbformat 5.9.2 numexpr 2.8.7 overrides NA packaging 23.1 parso 0.8.3 pexpect 4.8.0 pkg_resources NA platformdirs 3.10.0 prometheus_client NA prompt_toolkit 3.0.43 psutil 5.9.8 ptyprocess 0.7.0 pure_eval 0.2.2 pydev_ipython NA pydevconsole NA pydevd 2.9.5 pydevd_file_utils NA pydevd_plugins NA pydevd_tracing NA pygments 2.15.1 pyparsing 3.0.9 pythonjsonlogger NA pytz 2023.3.post1 referencing NA requests 2.31.0 rfc3339_validator 0.1.4 rfc3986_validator 0.1.1 rpds NA ruamel NA send2trash NA simplejson 3.17.6 six 1.16.0 sniffio 1.3.0 sphinxcontrib NA stack_data 0.2.0 threadpoolctl 2.2.0 tornado 6.3.3 traitlets 5.7.1 typing_extensions NA uri_template NA urllib3 2.1.0 wcwidth 0.2.5 webcolors 1.13 websocket 1.7.0 yaml 6.0.1 zmq 24.0.1 zoneinfo NA zstandard 0.19.0
----- IPython 8.20.0 jupyter_client 8.6.0 jupyter_core 5.5.0 jupyterlab 4.0.11 notebook 7.0.8 ----- Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:36:39) [GCC 12.3.0] Linux-6.2.0-39-generic-x86_64-with-glibc2.35 ----- Session information updated at 2024-03-03 18:43