7. Quality check, reproducibility, and comparative analysis

CustardPy provides the script to run 3DChromatin_ReplicateQC, which checks for quality and reproducibility. This reproducibility analysis can also be used to compare the overall similarity among Hi-C samples.

3DChromatin_ReplicateQC runs:

for quality check and similarity calculation. See the original website for the detailed usage.

Here is a tutorial showing how to use 3DChromatin_ReplicateQC in CustardPy. The data is the same as Step-by-Step Workflow of Hi-C Analysis. You can also use the 07.QualityCheck.sh script in the tutorial on GitHub.

7.1. Step-by-step tutorial to run 3DChromatin_ReplicateQC in CustardPy

7.1.1. Set the parameters

Here we compare three Hi-C samples (Control, siCTCF, siRad21). To save time, this tutorial only uses chromosomes 21 and 22 for calculation. All results will be output to $outputdir.

build=hg38
gt=genometable.$build.txt

outputdir=3DChromatin_ReplicateQC  # output directory
mkdir -p $outputdir

samples="Control siCTCF siRad21" # Samples to be compared

chrs="chr21 chr22" # chromosomes to be considered
resolution=50000
norm=SCALE

7.1.2. Prepare the metadata and input data

Create metadata.pairs.

pairlist=$outputdir/metadata.pairs
rm -rf $pairlist
for sample1 in $samples; do
    for sample2 in $samples; do
        echo -e $sample1"\t"$sample2 >> $pairlist
    done
done

Generate contact data for all samples.

rm -rf $outputdir/data
mkdir -p $outputdir/data
for cell in $samples
do
    hic=CustardPyResults_Hi-C/Juicer_hg38/$cell/aligned/inter_30.hic

    echo "preparing $cell..."
    for chr in $chrs; do
        $sing juicertools.sh dump observed $norm $hic $chr $chr BP $resolution \
        | awk -v chr=$chr 'OFS="\t" {printf("%s\t%d\t%s\t%d\t%d\n", chr, $1, chr, $2, $3)}' \
        | grep -v NaN > $outputdir/data/$cell.$chr.txt
    done

    cat $outputdir/data/$cell.*.txt > $outputdir/data/$cell.res$resolution
    $sing pigz $outputdir/data/$cell.res$resolution
    rm $outputdir/data/$cell.*.txt
done

Generate metadata.samples.

samplelist=$outputdir/metadata.samples
rm -rf $samplelist
for cell in $samples; do
    echo -e "$cell\t$(pwd)/$outputdir/data/$cell.res$resolution" >> $samplelist
done

Generate the Bin list.

binlist=$outputdir/data/Bins.$resolution.bed
rm -rf $binlist
for chr in $chrs; do
    $sing generate_binlist_from_gtfile.py $gt $chr $resolution >> $binlist
done
gzip -f $binlist

7.1.3. Run 3DChromatin_ReplicateQC

run_3DChromatin_ReplicateQC.sh run_all run all tools and output the results in $outputdir/output.

run_3DChromatin_ReplicateQC.sh run_all \
    --metadata_samples $samplelist --bins $binlist.gz --metadata_pairs $pairlist --outdir $outputdir/output

7.1.4. Plot figures from the output

visualize_QC.py plots figures for each tool. The pdf files are output to 3DChromatin_ReplicateQC/pdf.

visualize_QC.py 3DChromatin_ReplicateQC/