7. Quality check, reproducibility, and comparative analysis¶
CustardPy provides the script to run 3DChromatin_ReplicateQC, which checks for quality and reproducibility. This reproducibility analysis can also be used to compare the overall similarity among Hi-C samples.
3DChromatin_ReplicateQC
runs:
GenomeDISCO, and
for quality check and similarity calculation. See the original website for the detailed usage.
Here is a tutorial showing how to use 3DChromatin_ReplicateQC in CustardPy.
The data is the same as Step-by-Step Workflow of Hi-C Analysis.
You can also use the 07.QualityCheck.sh
script in the tutorial on GitHub.
7.1. Step-by-step tutorial to run 3DChromatin_ReplicateQC in CustardPy¶
7.1.1. Set the parameters¶
Here we compare three Hi-C samples (Control, siCTCF, siRad21).
To save time, this tutorial only uses chromosomes 21 and 22 for calculation.
All results will be output to $outputdir
.
build=hg38
gt=genometable.$build.txt
outputdir=3DChromatin_ReplicateQC # output directory
mkdir -p $outputdir
samples="Control siCTCF siRad21" # Samples to be compared
chrs="chr21 chr22" # chromosomes to be considered
resolution=50000
norm=SCALE
7.1.2. Prepare the metadata and input data¶
Create metadata.pairs
.
pairlist=$outputdir/metadata.pairs
rm -rf $pairlist
for sample1 in $samples; do
for sample2 in $samples; do
echo -e $sample1"\t"$sample2 >> $pairlist
done
done
Generate contact data for all samples.
rm -rf $outputdir/data
mkdir -p $outputdir/data
for cell in $samples
do
hic=CustardPyResults_Hi-C/Juicer_hg38/$cell/aligned/inter_30.hic
echo "preparing $cell..."
for chr in $chrs; do
$sing juicertools.sh dump observed $norm $hic $chr $chr BP $resolution \
| awk -v chr=$chr 'OFS="\t" {printf("%s\t%d\t%s\t%d\t%d\n", chr, $1, chr, $2, $3)}' \
| grep -v NaN > $outputdir/data/$cell.$chr.txt
done
cat $outputdir/data/$cell.*.txt > $outputdir/data/$cell.res$resolution
$sing pigz $outputdir/data/$cell.res$resolution
rm $outputdir/data/$cell.*.txt
done
Generate metadata.samples
.
samplelist=$outputdir/metadata.samples
rm -rf $samplelist
for cell in $samples; do
echo -e "$cell\t$(pwd)/$outputdir/data/$cell.res$resolution" >> $samplelist
done
Generate the Bin list.
binlist=$outputdir/data/Bins.$resolution.bed
rm -rf $binlist
for chr in $chrs; do
$sing generate_binlist_from_gtfile.py $gt $chr $resolution >> $binlist
done
gzip -f $binlist
7.1.3. Run 3DChromatin_ReplicateQC¶
run_3DChromatin_ReplicateQC.sh run_all
run all tools and output the results in $outputdir/output
.
run_3DChromatin_ReplicateQC.sh run_all \
--metadata_samples $samplelist --bins $binlist.gz --metadata_pairs $pairlist --outdir $outputdir/output
7.1.4. Plot figures from the output¶
visualize_QC.py
plots figures for each tool. The pdf files are output to 3DChromatin_ReplicateQC/pdf
.
visualize_QC.py 3DChromatin_ReplicateQC/