2. Quickstart#
CustardPy provides the functions for the analysis using Cooler, Juicer, and HiC-Pro pipelines. Here we describe how to run the analysis using these pipelines.
2.1. CustardPy using Docker or Apptainer#
Since CustardPy is provided as a Docker image, you need to use docker or apptainer commands. We recommended to use Apptainer.
The command below is a general example of how to run CustardPy using Docker or Apptainer.
It will mount the /work directory of the host machine to the /work directory of the container.
--gpus all option in docker or --nv option in Apptainer is required to use GPU in the container.
If you do not have a GPU or do not want to use it, simply omit this option.
# For docker
$ docker run --rm -it [--gpus all] -v /work:/work rnakato/custardpy <command>
## Example
$ docker run --rm -it [--gpus all] -v /work:/work rnakato/custardpy \
custardpy_cooler -g $gt -f $genome -b $build -e $enzyme \
-i $index_bwa -p $ncore fastq/$cell $cell
# For Apptainer
$ apptainer exec [--nv] --bind /work custardpy.sif <command>
## Example
$ apptainer exec [--nv] --bind /work custardpy.sif \
custardpy_cooler -g $gt -f $genome -b $build -e $enzyme \
-i $index_bwa -p $ncore fastq/$cell $cell
See also the sample scripts in the CustardPy Tutorial on GitHub.
2.2. Available Genome Builds#
The CustardPy docker image includes restriction enzyme files for the following genome builds, so users can perform Hi-C analysis without preparing them in advance.
Species |
Builds |
Human |
hg38, hg39, T2T |
Mouse |
mm10, mm39 |
Rat |
rn7 |
Zebrafish |
danRer11 |
Chicken |
galGal5, galGal6 |
Xenopus_tropicalis |
xenLae2 |
Fly |
dm6 |
C.elegans |
ce10, ce11 |
S.serevisiae |
sacCer3 |
2.3. Hi-C analysis using Juicer#
2.3.1. Analysis from FASTQ files#
The custardpy_juicer command performs the Hi-C analysis from FASTQ files using Juicer.
This command includes the custardpy_process_hic command described below.
build=hg38 # genome build
gt=genometable.$build.txt # genome_table file
gene=refFlat.$build.txt # gene annotation (refFlat format)
bwaindex=bwa-indexes/$build # BWA index file
ncore=64 # number of CPUs
cell=Control
fastq_post="_" # "_" or "_R"
enzyme=MboI
fqdir=fastq/$cell
custardpy_juicer -p $ncore -a $gene -b $build -g $gt \
-i $bwaindex -e $enzyme -z $fastq_post $fqdir $cell
custardpy_juicerassumes that the paired-end fastq files are stored infastq/$cell(herefastq/Control). The outputs are stored inCustardPyResults/Juicer_$build/$cell.$fastq_postindicates the filename of input fastqs is*_[1|2].fastq.gzor*_[R1|R2].fastq.gz.Available Enzymes: HindIII, DpnII, MboI, Sau3AI, Arima, AluI
2.3.2. Analysis from a .hic file#
If you start the Hi-C analysis from a .hic file, use the custardpy_process_hic command.
build=hg38 # genome build
gt=genometable.$build.txt # genome_table file
gene=refFlat.$build.txt # gene annotation (refFlat format)
ncore=64 # number of CPUs
odir=outputdir
hic=sample.hic
custardpy_process_hic -p $ncore -n $norm -g $gt -a $gene $hic $odir
The outputs are stored in
$odir.
Note
Due to the backward incompatibility of Juicertools , custardpy_process_hic fails with an error when processing .hic files created by older versions of Juicertools. In this case, try the -o option which uses older versions of Juicertools in CustardPy.
2.3.3. Output data#
These commands create the directories Eigen/, InsulationScore/, Matrix/, TAD/, and loops/ in $odir and save the results there.
The
Eigen/directory contains the eigenvectors and compartment structures for each chromosome.The
InsulationScore/directory contains the insulation scores and TAD boundaries for each chromosome.The
Matrix/directory contains the normalized contact matrices (observed and O/E).The
TAD/directory contains the TADs called by ArrowHead.The
loops/directory contains the chromatin loops called by HiCCUPS. If the GPU is unavailable, the chromatin loops will not be calculated andloops/directory will be blank.
The custardpy_juicer command also creates the distance/ directory and saves the distance decay curve plot there.
The fastq/, aligned/, and splits/ directories are direct outputs from Juicer, while several large intermediate files are automatically gzipped.
2.4. Hi-C/Micro-C analysis using Cooler#
Although the Juicer pipeline is widely used for Hi-C analysis, it is not suitable for Micro-C analysis. CustardPy also allows the Hi-C and Micro-C analysis by Cooler and cooltools.
2.4.1. Hi-C Analysis from FASTQ files#
Note
Starting from version 3.0.0, custardpy_cooler_HiC and custardpy_cooler_MicroC were merged into a single command, custardpy_cooler.
The custardpy_cooler command performs the analysis from FASTQ files, generates a .cool file, and converts it to a .hic file.
The outputs are stored in CustardPyResults/Cooler_$build/$cell.
build=hg38
gt=genometable.hg38.txt
index_bwa=bwa-indexes/hg38
gene=refFlat.$build.txt
genome=genome.$build.fa
fastq_post="_" # "_" or "_R"
ncore=64
cell=Control
enzyme=MboI
# Generate .cool and .hic files from FASTQ
custardpy_cooler -g $gt -f $genome -b $build -e $enzyme \
-i $index_bwa -z $fastq_post -p $ncore fastq/$cell $cell
custardpy_coolerassumes that the paired-end fastq files are stored infastq/$cell(herefastq/Control). The outputs are stored inCustardPyResults/Cooler_$build/$cell.$fastq_postindicates the filename of input fastqs is*_[1|2].fastq.gzor*_[R1|R2].fastq.gz.Available Enzymes: HindIII, DpnII, MboI, Sau3AI, Arima, AluI, None (for Micro-C)
2.4.2. Micro-C Analysis from FASTQ files#
For the Micro-C analysis, use -e None not to specify the enzyme in the custardpy_cooler command.
build=mm39
gt=genome_table.$build.txt # genome_table file
bwa_index=bwa-indexes/UCSC-$build
genome=genome.$build.fa
fastq_post="_" # "_" or "_R"
ncore=64
cell=Control # modify this for your FASTQ data
enzyme=None
# Generate .hic file from FASTQ
custardpy_cooler -g $gt -f $genome -b $build -e $enzyme \
-i $index_bwa -z $fastq_post -p $ncore fastq/$cell $cell
2.4.3. Analysis from a .cool file#
The custardpy_process_cool command to analyze a .cool file created by the custardpy_cooler command or any other tools.
odir=CustardPyResults/Cooler_$build/$cell
cool=$odir/cool/cooler.dedup.q30.multires.cool
pair=$odir/pairs/dedup.bwa.q30.pairs.gz
$sing custardpy_process_cool -t $ncore -g $gt -p $pair -f $genome $cool $odir $cell
2.4.4. Analysis from a .hic file#
You can apply the custardpy_process_hic command to the .hic file (see Analysis from a .hic file).
# Downstream analysis using .hic
odir=CustardPyResults/Cooler_$build/$cell
hic=$odir/hic/contact_map.q30.hic
norm=SCALE
custardpy_process_hic -p $ncore -n $norm -g $gt -a $gene $hic $odir
2.4.5. Output data#
custardpy_coolerThe
mapfile/andpairs/directories contain the mapfile (.cram) and pairs (.pairs.gz) files, respectively.The
cool/directory contains the.coolfiles generated by thecustardpy_coolercommand.The
hic/directory contains the.hicfiles generated by thecustardpy_coolercommand.The
qc_report/directory contains the QC report generated by MultiQC.
custardpy_process_coolThe
Eigen/directory contains the eigenvectors, compartment structures and saddle plot profiles.The
InsulationScore/directory contains the insulation scores, TADs and TAD boundaries for each chromosome.The
Matrix/directory contains the contact matrices with the matrix balancing method.The
loops/directory contains the chromatin loops called by cooltools.
2.5. Hi-C analysis using HiC-Pro#
CustardPy also allows the Hi-C analysis by HiC-Pro.
First, edit config-hicpro.txt.
Set the BOWTIE2_IDX_PATH and GENOME_SIZE variables to absolute paths that match your environment.
The CustardPy docker image stores the restriction files for HiC-Pro in /opt/restriction_sites/HiCPro-restriction_sites/.
You can set GENOME_FRAGMENT accordingly without preparing them. An example is shown below.
BOWTIE2_IDX_PATH = /home/youraccount/CustardPy/tutorial/Hi-C/03.HiC-Pro/bowtie2-indexes
REFERENCE_GENOME = hg38
GENOME_SIZE = /home/youraccount/CustardPy/tutorial/Hi-C/03.HiC-Pro/genometable.hg38.txt
GENOME_FRAGMENT = /opt/restriction_sites/HiCPro-restriction_sites/MboI_resfrag_hg38.bed
Then, run the command below to perform the Hi-C analysis from FASTQ files using HiC-Pro.
CustardPy provices the hicpro command to run HiC-Pro.
fqdir=fastq/
hicpro -i $fqdir -o HiCProResults -c config-hicpro.txt
Analysis will be run for all subdirectories in $fqdir.
The output files are the same as that of HiC-Pro. For details, please refer to the official website.