4. Visualization#

This page introduces the commands in CustardPy to plot the 3D visualization. These commands use the output data generated by custardpy_juicer or custardpy_process_hic.

Here we use the example data used in Step-by-Step Workflow of Hi-C Analysis.

4.1. plotHiCMatrix#

plotHiCMatrix visualizes a contact map as a simple square heatmap. The input data is a dense matrix output from makeMatrix_intra.sh. The contact level is normalized by the total number of mapped reads for the chromosome.

plotHiCMatrix <matrix> <output name> <start> <end> <title in figure>

Example:

chr=chr20
start=8000000
end=16000000
resolution=25000
norm=SCALE

matrix=CustardPyResults_Hi-C/Juicer_hg38/Control/Matrix/intrachromosomal/$resolution/observed.$norm.$chr.matrix.gz

plotHiCMatrix \
     $matrix \
     ContactMap.Control.$chr.$start-$end.pdf \
     $start $end Control
Alternate

Fig. 4.1 plotHiCMatrix#

4.2. drawSquareMulti#

drawSquareMulti visualizes the contact map of multiple Hi-C samples as triangle heatmaps. The input data is a dense matrix output from makeMatrix_intra.sh.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

# linear scale
drawSquareMulti \
  $Resdir/Control:Control \
  $Resdir/siCTCF:siCTCF \
  $Resdir/siRad21:siRad21 \
  -o SquareMulti.$chr \
  -c $chr --start $start --end $end --type $norm -r $resolution
Alternate

Fig. 4.2 SquareMulti#

The dashed squares indicate TADs.

Add --log option to visualize a log-scale heatmap.

# log scale
drawSquareMulti \
  $Resdir/Control:Control \
  $Resdir/siCTCF:siCTCF \
  $Resdir/siRad21:siRad21 \
  -o SquareMulti.$chr \
  -c $chr --start $start --end $end \
  --type $norm -r $resolution --log
Alternate

Fig. 4.3 SquareMulti (log scale)#

4.3. drawSquareRatioMulti#

drawSquareRatioMulti visualizes a relative contact frequency (log scale) of 2nd to the last samples against the first sample. The input data is a dense matrix output from makeMatrix_intra.sh.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

drawSquareRatioMulti \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -o SquareRatioMulti.$chr \
     -c $chr --start $start --end $end --type $norm -r $resolution
Alternate

Fig. 4.4 drawSquareRatioMulti#

This figure shows the relative contact frequency of 2nd (siCTCF) and 3rd (siRad21) against 1st (Control).

4.4. drawSquarePair#

drawSquarePair shows two samples in a single square heatmap. The first and second samples are visualzed in the upper and bottom triagles, respectively.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

drawSquarePair \
     $Resdir/Control/Matrix/intrachromosomal/$resolution/observed.$norm.$chr.matrix.gz:Control \
     $Resdir/siRad21/Matrix/intrachromosomal/$resolution/observed.$norm.$chr.matrix.gz:siRad21 \
     -o SquarePair.$chr --start $start --end $end -r $resolution
Alternate

Fig. 4.5 drawSquarePair#

The command visualizes Control and siRad21 in the upper and bottom triagles, respectively.

4.5. drawSquareRatioPair#

Similar to drawSquarePair, drawSquareRatioPair shows the relative contact map of two sample pairs in a single square heatmap. This command visualize the log-scale frequency of sample2/sample1 and sample4/sample3 in the upper and bottom triagles, respectively.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

drawSquareRatioPair \
     $Resdir/Control/Matrix/intrachromosomal/$resolution/observed.$norm.$chr.matrix.gz:Control \
     $Resdir/siRad21/Matrix/intrachromosomal/$resolution/observed.$norm.$chr.matrix.gz:siRad21 \
     $Resdir/Control/Matrix/intrachromosomal/$resolution/observed.$norm.$chr.matrix.gz:Control \
     $Resdir/siCTCF/Matrix/intrachromosomal/$resolution/observed.$norm.$chr.matrix.gz:siCTCF \
     -o drawSquareRatioPair.$chr --start $start --end $end -r $resolution
Alternate

Fig. 4.6 drawSquareRatioPair#

In this case, siRad21/Control and siCTCF/Control are visualized in the upper and bottom triagles, respectively.

4.6. drawTriangleMulti#

drawTriangleMulti visualizes the contact map of multiple Hi-C samples as triangle heatmaps. The input data is a dense matrix output from makeMatrix_intra.sh.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

drawTriangleMulti \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -o TriangleMulti.$chr \
     -c $chr --start $start --end $end --type $norm -d 5000000 -r $resolution
Alternate

Fig. 4.7 drawTriangleMulti#

The black dashed lines and blue circles indicate TADs and loops, respectively.

4.7. drawTrianglePair#

drawTrianglePair visualizes a contact map of the first and second sample in upper and lower triangles, respectively.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

drawTrianglePair \
     $Resdir/Control:Control \
     $Resdir/siRad21:siRad21 \
     -o TrianglePair.$chr \
     -c $chr --start $start --end $end --type $norm -d 5000000 -r $resolution
Alternate

Fig. 4.8 drawTrianglePair#

The black dashed lines and blue circles indicate TADs and loops, respectively.

4.8. plotHiCfeature#

plotHiCfeature can visualize various features values of chromatin folding for multiple samples.

plotHiCfeature [-h] [-o OUTPUT] [-c CHR] [--type TYPE]
               [--distance DISTANCE] [-r RESOLUTION] [-s START]
               [-e END] [--multi] [--multidiff] [--compartment] [--di]
               [--drf] [--drf_right] [--drf_left]
               [--triangle_ratio_multi] [-d VIZDISTANCEMAX] [--v4c]
               [--vmax VMAX] [--vmin VMIN] [--vmax_ratio VMAX_RATIO]
               [--vmin_ratio VMIN_RATIO] [--anchor ANCHOR]
               [--xsize XSIZE]
               [input [input ...]]

positional arguments:
     input                 <Input directory>:<label>

optional arguments:
     -h, --help            show this help message and exit
     -o OUTPUT, --output OUTPUT
                         Output prefix
     -c CHR, --chr CHR     chromosome
     --type TYPE           normalize type (default: SCALE)
     --distance DISTANCE   distance for DI (default: 500000)
     -r RESOLUTION, --resolution RESOLUTION
                         resolution (default: 25000)
     -s START, --start START
                         start bp (default: 0)
     -e END, --end END     end bp (default: 1000000)
     --multi               plot MultiInsulation Score
     --multidiff           plot differential MultiInsulation Score
     --compartment         plot Compartment (eigen)
     --di                  plot Directionaly Index
     --drf                 plot Directional Relative Frequency
     --drf_right           (with --drf) plot DirectionalRelativeFreq (Right)
     --drf_left            (with --drf) plot DirectionalRelativeFreq (Left)
     --triangle_ratio_multi
                         plot Triangle ratio multi
     -d VIZDISTANCEMAX, --vizdistancemax VIZDISTANCEMAX
                         max distance in heatmap
     --v4c                 plot virtual 4C from Hi-C data
     --vmax VMAX           max value of color bar (default: 50)
     --vmin VMIN           min value of color bar (default: 0)
     --vmax_ratio VMAX_RATIO
                         max value of color bar for logratio (default: 1)
     --vmin_ratio VMIN_RATIO
                         min value of color bar for logratio (default: -1)
     --anchor ANCHOR       (for --v4c) anchor point
     --xsize XSIZE         xsize for figure (default: max(length/2M, 10))
  • Input should be “<sample directory>:<label>”.

    • <sample directory> is the output directory by custardpy_juicer.

    • <label> is the label used in the figure.

  • In default, plotHiCfeature uses a 25-kbp bin matrix. Supply -r option to change the resolution.

  • type is the normalization type defined by Juicer (SCALE/KR/VC_SQRT/NONE).

4.8.1. Insulation score#

By default, plotHiCfeature outputs a single insulation score (500 kbp distance).

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotHiCfeature \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -c $chr --start $start --end $end -r $resolution \
     --type $norm -d 5000000 \
     -o IS.$chr.$start-$end
Alternate Text

Fig. 4.9 Insulation score#

plotHiCfeature always draws compartment PC1 (line plot in the second row) to roughly estimate compartments A and B. The third and fourth rows are the heatmap and line plot for the feature value specified (insulation score in this case).

4.8.2. Multi-insulation score#

plotHiCfeature can also calculate a “multi-scale insulation score” [Crane et al., Nature, 2015] by supplying --multi option.

Alternate Text

Fig. 4.10 Schematic illustration of multi-insulation score#

Red regions in the heatmap indicate the insulated regions (TAD boundary-like). The lower and upper sides of the heatmap are 100 kbp to 1 Mbp distances, respectively.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotHiCfeature \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -c $chr --start $start --end $end -r $resolution \
     --multi --type $norm -d 5000000 \
     -o MultiIS.$chr.$start-$end
Alternate Text

Fig. 4.11 Multi-insulation score#

4.8.3. Differential multi-insulation score#

To directory investigate the difference of multi-insulation score, we provide differential multi-insulation score by --multidiff option.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotHiCfeature \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -c $chr --start $start --end $end -r $resolution \
     --multidiff --type $norm -d 5000000 \
     -o MultiISdiff.$chr.$start-$end
Alternate Text

Fig. 4.12 Differential multi-insulation score#

The heatmap shows the difference against the first sample (siCTCF - Control and siRad21 - Control in this case).

4.8.4. Compartment PC1#

While the line plot in the second row shows the PC1 value of the first sample, plotHiCfeature --compartment visualizes the PC1 values for multiple samples. This plot can be used to explore compartment switching.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotHiCfeature \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -c $chr --start $start --end $end -r $resolution \
     --compartment --type $norm -d 5000000 \
     -o Compartment.$chr.$start-$end
Alternate Text

Fig. 4.13 Compartment PC1#

4.8.5. Directionality index#

The directionality index identifies TAD boundaries by capturing the bias in contact frequency up- and downstream of a TAD [Dixon et al., Nature, 2012]. The “left side” and “right side” of a TAD are likely to have positve and negative values, respectively.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotHiCfeature \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -c $chr --start $start --end $end -r $resolution \
     --di --type $norm -d 5000000 \
     -o DI.$chr.$start-$end
Alternate Text

Fig. 4.14 Directionality index#

4.8.6. Directional relative frequency#

Directional relative frequency (DRF) is a score that our group previously proposed [Nakato et al, Nature Communications, 2023]. This score estimates the inconsistency of relative contact frequence (log scale) between the left side (3′) and right side (5′).

Alternate Text

Fig. 4.15 Schematic illustration of directional relative frequency#

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotHiCfeature \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -c $chr --start $start --end $end -r $resolution \
     --drf --type $norm -d 5000000 \
     -o DRF.$chr.$start-$end
Alternate Text

Fig. 4.16 Directional relative frequency#

4.8.7. drawTriangleRatioMulti#

--triangle_ratio_multi visualizes a relative contact frequency (log scale) of 2nd to the last samples against the first sample. Directional relative frequency is also shown.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotHiCfeature \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -o TriangleRatioMulti.$chr \
     -c $chr --start $start --end $end -r $resolution \
     --triangle_ratio_multi --type $norm -d 5000000
Alternate Text

Fig. 4.17 TriangleRatioMulti#

“Right” and “left” shown as blue and orange line plots in the second row indicate the “B” and “A” in Fig. 4.15.

4.8.8. Output the logfold change matrices#

--triangle_ratio_multi has an option --output_logfc_matrix to output the logfold change matrices used for the visualization. This command is useful if you want to do a downstream analysis using the matrices.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotHiCfeature \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -o TriangleRatioMulti.$chr \
     -c $chr --start $start --end $end -r $resolution \
     --triangle_ratio_multi --type $norm -d 5000000 \
     --output_logfc_matrix

4.8.9. Virtual 4C#

Virtual 4C visualizes a 4C-like plot (interation from the anchor site) using Hi-C data. Use --anchor option to specify the anchor site.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotHiCfeature \
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -o virtual4C.$chr \
     -c $chr --start $start --end $end -r $resolution \
     --v4c --anchor 10400000 --vmax 100 --type $norm
Alternate Text

Fig. 4.18 Virtual 4C#

4.9. plotCompartmentGenome#

Plot a PC1 value of multiple samples for the whole genome.

Resdir=CustardPyResults_Hi-C/Juicer_hg38
norm=SCALE
resolution=25000
chr=chr20
start=8000000
end=16000000

plotCompartmentGenome
     $Resdir/Control:Control \
     $Resdir/siCTCF:siCTCF \
     $Resdir/siRad21:siRad21 \
     -o CompartmentGenome -r 25000 --type VC_SQRT
Alternate

Fig. 4.19 plotCompartmentGenome#

4.10. plotInsulationScore#

Plot a line graph of insulation score. The input data is a dense matrix output from makeMatrix_intra.sh.

plotInsulationScore [-h] [--num4norm NUM4NORM] [--distance DISTANCE]
                            [--sizex SIZEX] [--sizey SIZEY]
                            matrix output resolution
Example:
   plotInsulationScore WT/intrachromosomal/25000/observed.KR.chr7.matrix.gz InsulationScore_WT.chr7.png 25000
Alternate

Fig. 4.20 InsulationScore#

4.11. plotMultiScaleInsulationScore#

Plot multi-scale insulation scores from Juicer matrix.

plotMultiScaleInsulationScore [-h] [--num4norm NUM4NORM]
                              [--sizex SIZEX] [--sizey SIZEY]
                              matrix output resolution
Example:
   plotInsulationScore WT/intrachromosomal/25000/observed.KR.chr7.matrix.gz MultiInsulationScore_WT.chr7.png 25000
Alternate

Fig. 4.21 Multi-Insulation Score (chr7)#

4.12. (Jupyter notebook) Plot Contact frequency distribution#

4.13. (Jupyter notebook) Plot Loop length distribution#