1 Introduction

This page is to exhibit examples and results for 4 preset pipeline in esATAC pakcage. For each pipeline, the toy example in the package document and at least one example run on representative dataset will be shown. In the examples, the directory for storing genome annotation is “~/ref”. Only report files(HTML files and their relative files) will be linked and shown.

2 Single Sample

2.1 Example in esATAC package

2.1.1 Dataset

The datasets in this package (esATAC/extdata/) are generated from GEO: SRR891271 (from GSE47753)[1]. The data is ATAC-seq paired end sequencing for GM12878 cell line. We random sampling 20000 mapped fragments from chr20 and rebuild raw paired-end FASTQ files(file names with chr20 prefix).

2.1.2 Script

Download the script

library(esATAC)
conclusion <-
  atacPipe(
       # MODIFY: Change these paths to your own case files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
       fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"),
       # MODIFY: Set the genome for your data
       genome = "hg19",
       refdir = "~/ref",
       motifPWM = getMotifPWM(motif.file = system.file("extdata", "CTCF.txt", package="esATAC"), is.PWM = FALSE))

2.2 Example of GM12878/hg19 from GEO

2.2.1 Data Description

The real dataset is from GEO (accession number GSE47753, Buenrostro, et al., 2013).We downloaded SRR891268.sra which is one of four replicates of ATAC-seq paired end sequencing data from human GM12878 cell line containing nearly 400M paired end reads in total.

2.2.2 Data Preprocess

And then, we used NCBI SRA Toolkit to extract FASTQ files with command like fastq-dump --split-3 SRR8912XX.sra. Two FASTQ files will be generated (SRR891268_1.fastq, SRR891268_2.fastq) under current working directory.

2.2.3 R Scripts

Download the script

library(esATAC)

conclusion <- 
  atacPipe(
    fastqInput1 = "SRR891268_1.fastq",
    fastqInput2 = "SRR891268_2.fastq",
    genome = "hg19",
    threads = 28,
    refdir = "~/ref")

3 Case and Control Single Sample

3.1 Example in esATAC package

3.1.1 Dataset

The datasets in this package (esATAC/extdata/) are generated from GEO: SRR891271 (from GSE47753)[1]. The data is ATAC-seq paired end sequencing for GM12878 cell line. We random sampling 20000 mapped fragments from chr20 and rebuild raw paired-end FASTQ files(file names with chr20 prefix).

3.1.2 Script

Download the script

library(esATAC)
conclusion <-
   atacPipe2(
       # MODIFY: Change these paths to your own case files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       case=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
                fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")),
       # MODIFY: Change these paths to your own control files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       control=list(fastqInput1 = system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"),
                    fastqInput2 = system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),
       # MODIFY: Set the genome for your data
       genome = "hg19",
       refdir = "~/ref",
       motifPWM = getMotifPWM(motif.file = system.file("extdata", "CTCF.txt", package="esATAC"), is.PWM = FALSE))

3.2 Example of T cells/mm10 from GEO

3.2.1 Dataset Description

The dataset is from GEO((assertion number GSE88987). All of them are from mouse. We can use GSM2356780 as case and GSM2356795 as control to run esATAC._

3.2.2 Data preprocess

Download raw data from GEO(assertion number GSE88987 - GSM2356780: SRR4435490.sra) and (GSM2356795: SRR4435505.sra)).And then, Use NCBI SRA Toolkit to extract fastq files with command like fastq-dump –split-3 SRR44354XX.sra. Four files will be generated (SRR4435490_1.fastq, SRR4435490_2.fastq, SRR4435505_1.fastq, SRR4435505_2.fastq).

3.2.3 Script

Download the script

library(esATAC)
conclusion2 <-
    atacPipe2(
        case=list(fastqInput1 = "./SRR4435490_1.fastq", #case mate1 file at current working directory
                  fastqInput2 = "./SRR4435490_2.fastq"), #case mate2 file at current working directory
        control=list(fastqInput1 = "./SRR4435505_1.fastq", #control mate1 file at current working directory
                     fastqInput2 = "./SRR4435505_2.fastq"), #control mate2 file at current working directory
        genome = "mm10", #hg19/hg38/mm9/mm10
        refdir = "~/ref",
        motifPWM = getMotifPWM(motif.file = system.file("extdata", "CTCF.txt", package="esATAC"), is.PWM = FALSE)) #scan single motif CTCF

4 Sample with Replicates

4.1 Example in esATAC Package

4.1.1 Dataset

The datasets in this package (esATAC/extdata/) are generated from GEO: SRR891271 (from GSE47753)[1]. The data is ATAC-seq paired end sequencing for GM12878 cell line. We random sampling 20000 mapped fragments from chr20 and rebuild raw paired-end FASTQ files(file names with chr20 prefix).

4.1.2 Script

Download the script

library(esATAC)

conclusion <-
  atacRepsPipe(
       # MODIFY: Change these paths to your own case files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       fastqInput1 = list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
                          system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")),
       fastqInput2 = list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"),
                          system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),
       # MODIFY: Set the genome for your data
       genome = "hg19",
       refdir = "~/ref",
       motifPWM = getMotifPWM(motif.file = system.file("extdata", "CTCF.txt", package="esATAC"), is.PWM = FALSE))

4.2 Example of GM12878/hg19 from GEO

4.2.1 Data Description

To test on real data from GEO (accession number GSE47753, Buenrostro, et al., 2013), we downloaded SRR891268.sra,SRR891269.sra,SRR891270.sra andSRR891271.sra. They are 4 replicates of ATAC-seq paired end sequencing data from human GM12878 cell line containing nearly 400M paired end reads in total.

4.2.2 Data Preprocess

And then, we used NCBI SRA Toolkit to extract FASTQ files with command like fastq-dump --split-3 SRR8912XX.sra. Eight FASTQ files will be generated (SRR891268_1.fastq, SRR891268_2.fastq, SRR891269_1.fastq, SRR891269_2.fastq, SRR891270_1.fastq, SRR891270_2.fastq, SRR891271_1.fastq, SRR891271_2.fastq) under current working directory.

4.2.3 Script

Download the script

options(java.parameters = "-Xmx8000m")
library(esATAC)

conclusion <-
  atacRepsPipe(
    fastqInput1 = list("SRR891268_1.fastq", "SRR891269_1.fastq", "SRR891270_1.fastq", "SRR891271_1.fastq"),
    fastqInput2 = list("SRR891268_2.fastq", "SRR891269_2.fastq", "SRR891270_2.fastq", "SRR891271_2.fastq"),
    genome = "hg19",
    threads = 28)

5 Case and Control Sample with Replicates

5.1 Example in esATAC package

5.1.1 Dataset

The datasets in this package (esATAC/extdata/) are generated from GEO: SRR891271 (from GSE47753)[1]. The data is ATAC-seq paired end sequencing for GM12878 cell line. We random sampling 20000 mapped fragments from chr20 and rebuild raw paired-end FASTQ files(file names with chr20 prefix).

5.1.2 Script

Download the script

library(esATAC)

conclusion <-
    atacRepsPipe2(
        # MODIFY: Change these paths to your own case files!
        # e.g. fastqInput1 = "your/own/data/path.fastq"
     caseFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz"),
                          system.file(package="esATAC", "extdata", "chr20_1.1.fq.gz")),
     # MODIFY: Change these paths to your own case files!
     # e.g. fastqInput1 = "your/own/data/path.fastq"
     caseFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz"),
                          system.file(package="esATAC", "extdata", "chr20_2.1.fq.gz")),
     # MODIFY: Change these paths to your own control files!
     # e.g. fastqInput1 = "your/own/data/path.fastq"
     ctrlFastqInput1=list(system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2"),
                          system.file(package="esATAC", "extdata", "chr20_1.2.fq.bz2")),
     # MODIFY: Change these paths to your own control files!
     # e.g. fastqInput1 = "your/own/data/path.fastq"
     ctrlFastqInput2=list(system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2"),
                          system.file(package="esATAC", "extdata", "chr20_2.2.fq.bz2")),
     # MODIFY: Set the genome for your data
     genome = "hg19",
     refdir = "~/ref",
     motifPWM = getMotifPWM(motif.file = system.file("extdata", "CTCF.txt", package="esATAC"), is.PWM = FALSE))

5.2 Example of T cells/mm10 from GEO

5.2.1 Dataset Description

The dataset is from GEO((assertion number GSE88987). All of them are from mouse. We can use GSM2356780, GSM2356781 as case and GSM2356795, GSM2356796 as control to run esATAC.

5.2.2 Data preprocess

Download raw data from GEO(assertion number GSE88987 - (GSM2356780: SRR4435490.sra), (GSM2356781: SRR4435491.sra), (GSM2356795: SRR4435505.sra) and (GSM2356796: SRR4435506.sra)).And then, Use NCBI SRA Toolkit to extract fastq files with command like fastq-dump –split-3 SRR44354XX.sra. Eight files will be generated (SRR4435490_1.fastq, SRR4435490_2.fastq,SRR4435491_1.fastq, SRR4435491_2.fastq, SRR4435505_1.fastq, SRR4435505_2.fastq, SRR4435506_1.fastq, SRR4435506_2.fastq).

5.2.3 Script

Download the script

library(esATAC)
conclusion <-
   atacRepsPipe2(
       # MODIFY: Change these paths to your own case files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       caseFastqInput1=list("./SRR4435490_1.fastq",
                            "./SRR4435491_1.fastq"),
       # MODIFY: Change these paths to your own control files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       caseFastqInput2=list("./SRR4435490_2.fastq",
                            "./SRR4435491_2.fastq"),
       # MODIFY: Change these paths to your own case files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       ctrlFastqInput1=list("./SRR4435505_1.fastq",
                            "./SRR4435506_1.fastq"),
       # MODIFY: Change these paths to your own control files!
       # e.g. fastqInput1 = "your/own/data/path.fastq"
       ctrlFastqInput2=list("./SRR4435505_2.fastq",
                            "./SRR4435506_2.fastq"),
       # MODIFY: Set the genome for your data
       genome = "mm10",
       refdir = "~/ref",
       threads=28,
       motifPWM = getMotifPWM(motif.file = system.file("extdata", "CTCF.txt", package="esATAC"), is.PWM = FALSE))

6 Reference

[1] Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y., & Greenleaf, W. J. (2013). Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nature methods, 10(12), 1213-1218.

[2] Mognol GP, Spreafico R, Wong V, Scott-Browne JP et al. Exhaustion-associated regulatory regions in CD8+ tumor-infiltrating T cells. Proc Natl Acad Sci U S A 2017 Mar 28;114(13):E2776-E2785. PMID: 28283662