load("Report.Rdata")

1 Summary Table

Sequence files below are set as inputs of the pipeline.

Summerized infomation on sequence files has been shown showed below. You can see details in later sections

Item	Value	Reference
Sequence Files Type	paired end (PE)	SE / PE
Original total reads	192.9M
– Reads after adapter removing (ratio)	192.6M (99.82%)	>99%
– – Total mapped reads (ratio of original reads)	187.6M (97.24%)	>95%
– – – Unique locations mapped uniquely by reads	41.8M
– – – Uniquely mappable reads	97.1M
– – – Non-Redundant Fraction (NRF)	0.43	>0.7
– – – Locations with only 1 reads mapping uniquely	25.1M
– – – Locations with only 2 reads mapping uniquely	10.8M
– – – PCR Bottlenecking Coefficients 1 (PBC1)	0.60	>0.7
– – – PCR Bottlenecking Coefficients 2 (PBC2)	2.31	>3
– – – Non-mitochondrial reads (ratio)	82.7M (44.11%)	>70%
– – – – Unique mapped reads (ratio)	61.3M (32.66%)
– – – – – Duplicate removed reads (final for use)	39.5M (21.05%)	>25M
– – – – – – Nucleosome free reads (<100bp)	12.9M (32.77%)
– – – – – – – Total peaks	272083
– – – – – – – Peaks overlaped with union DHS ratio	57.0%
– – – – – – – Peaks overlaped with blacklist ratio	0.2%
– – – – – – Fraction of reads in peaks (FRiP)	30.7%

A nucleosome free region (NFR) must be present. A mononucleosome peak must be present in the fragment length distribution. These are reads that span a single nucleosome, so they are longer than 147 bp but shorter than 147*2 bp. Good ATAC-seq datasets have reads that span nucleosomes (which allows for calling nucleosome positions in addition to open regions of chromatin).

2 Sequence Statistics

2.1 FastQC

Quality control for the sequence data

Click to Visit Report

2.2 Remove adapter

The adapter sequence are shown below. For paired end reads, if adapters were not setted, the adapters below are identified by AdapterRemoval.

Item	Value
adapter1	CTGTCTCTTATACACATCTCCGAGCCCACGAGACTAAAG
adapter2	CTGTCTCTTATACACATCTGACGCTGCCGACGAGTGTAG

The statistic of adapter removing are show below.

Item	Value
Total number of read pairs	192904649
Number of unaligned read pairs	84858773
Number of well aligned read pairs	108045876
Number of discarded mate 1 reads	346944
Number of singleton mate 1 reads	0
Number of discarded mate 2 reads	346944
Number of singleton mate 2 reads	0
Number of reads with adapters[1]	52711764
Number of retained reads	385115410
Number of retained nucleotides	18839676442
Average length of retained reads	48.9196

For detail, you can visit Website of AdapterRemoval on Github.

3 Reads Alignment Statistics

3.1 Bowtie2 alignment log

## [1] "192557705 reads; of these:"                               
## [2] "  192557705 (100.00%) were paired; of these:"             
## [3] "    4969895 (2.58%) aligned concordantly 0 times"         
## [4] "    97110035 (50.43%) aligned concordantly exactly 1 time"
## [5] "    90477775 (46.99%) aligned concordantly >1 times"      
## [6] "97.42% overall alignment rate"

3.2 Library complexity

Item	Value	Reference
Total mapped reads (ratio of original reads)	187.6M (187587810)
Unique locations mapped uniquely by reads	41.8M (41838886)
Uniquely mappable reads	97.1M (97077305)
Non-Redundant Fraction (NRF)	0.43	>0.7
Locations with only 1 reads mapping uniquely	25.1M (25070774)
Locations with only 2 reads mapping uniquely	10.8M (10836857)
PCR Bottlenecking Coefficients 1 (PBC1)	0.60	>0.7
PCR Bottlenecking Coefficients 2 (PBC2)	2.31	>3
The annotation you can see in section 1.

3.3 Filtering statistics

Item	Value	Reference
Original total reads	192.9M (192904649)
Reads after adapter removing (ratio)	192.6M (99.82%, 192557705 / 192904649)	>99%
Total mapped reads (ratio of original reads)	187.6M (97.24%, 187587810 / 192904649)	>95%
– Non-mitochondrial reads (ratio)	82.7M (44.11%, 82737403 / 187587810)	>70%
– – Unique mapped reads (ratio)	61.3M (32.66%, 61264560 / 187587810)	>60%
– – – Duplicate removed reads (ratio final for use)	39.5M (21.05%, 39483933 / 187587810)	>25M,>60%

3.4 Fragment size distribution

library(ggplot2)
load("Report.Rdata")
readsCounts<-getReportVal(atacProcs$fragLenDistr,"readsCounts")
ggplot(readsCounts[1:1000,], aes(length,counts))+geom_path(color="Red")+xlab("Fragment length (bp)")+ylab("Read counts") + theme_bw() + theme(panel.grid =element_blank()) + labs(title = "Fragment Length Distribution") + theme(plot.title = element_text(hjust = 0.5))

library(stats)
strength<-Mod(fft(readsCounts$counts))/length(readsCounts$counts)
periodx<-length(readsCounts$counts)/(1:(length(readsCounts$counts)-1))
strength<-strength[2:length(strength)]

rs1<-as.data.frame(cbind(periodx[periodx<20&periodx>2],strength[periodx<20&periodx>2],0))
rs2<-as.data.frame(cbind(periodx[periodx<400&periodx>2],strength[periodx<400&periodx>2],1))
rs<-rbind(rs1,rs2)
colnames(rs)<-c("period","strength","check")

g1<-ggplot(rs[rs["check"]==0,]) + geom_vline(xintercept = 10.4, linetype=2)+ geom_line(aes(x=period,y=strength),color="Red")+ theme_bw() + theme(panel.grid =element_blank()) + annotate("text", x = 10.4, y = max(rs[rs["check"]==0,2]), label = "10.4bp") +xlab("period") + ylab("strength") + labs(title = "the Pitch of the DNA Helix") + theme(plot.title = element_text(hjust = 0.5))

g2<-ggplot(rs[rs["check"]==1,]) + geom_vline(xintercept = 186, linetype=2)+ geom_line(aes(x=period,y=strength),color="Red")+ theme_bw() + theme(panel.grid =element_blank()) + annotate("text", x = 186, y = max(rs[rs["check"]==1,2]), label = "186bp") +xlab("period") + ylab("strength") + labs(title = "Nucleosome") + theme(plot.title = element_text(hjust = 0.5))
library(gridExtra)
grid.arrange(g1, g2, ncol=2)

3.5 TSS enrichment

The nucleosome free reads (<100bp) and monnucleosome span reads (180~247bp) enrichment around transcription starting site (TSS) are shown below.

library(ggplot2)
load("Report.Rdata")
df<-getReportVal(atacProcs$tssqc100,"tss")
g1<-ggplot(df,aes(pos,counts))+geom_line()+ geom_vline(xintercept = 0, linetype=2)+xlab("upstream<-TSS->downstream")+ylab("reads count")+theme_bw() + theme(panel.grid =element_blank()) + labs(title = "Nucleosome Free Reads") + theme(plot.title = element_text(hjust = 0.5))
df<-getReportVal(atacProcs$tssqc180_247,"tss")
g2<-ggplot(df,aes(pos,counts))+geom_line()+ geom_vline(xintercept = 0, linetype=2)+xlab("upstream<-TSS->downstream")+ylab("reads count")+theme_bw() + theme(panel.grid =element_blank()) + labs(title = "Monnucleosome Span Reads") + theme(plot.title = element_text(hjust = 0.5))
grid.arrange(g1, g2, ncol=2)

4 Peak Statistics

4.1 Blacklist ratio

Item	Value
Total peaks	272083
Blacklist regions	486
Ratio	0.00

4.2 DHS ratio

Item	Value
Total peaks	272083
DHS regions	155209
Ratio	0.57

4.3 Fraction of reads in peaks (FRiP)

Item	Value
The number of reads in peak	3971468
The number of total reads	12940484
The number of total peaks	272083
FRiP	0.31

4.4 Peak Annotation

5 Gene Ontology Analysis

Gene ontology analysis for all genes around peak regions.

ID	Description	GeneRatio	pvalue	qvalue
GO:0061564	axon development	462/15497	1.03e-12	3.72e-09
GO:0007409	axonogenesis	424/15497	1.63e-12	3.72e-09
GO:0048568	embryonic organ development	413/15497	4.7e-12	7.16e-09
GO:0048511	rhythmic process	312/15497	1.35e-11	1.37e-08
GO:0001701	in utero embryonic development	311/15497	1.5e-11	1.37e-08
GO:0042391	regulation of membrane potential	394/15497	2.89e-11	2.2e-08
GO:0022604	regulation of cell morphogenesis	421/15497	4.71e-11	3.08e-08
GO:0051962	positive regulation of nervous system development	462/15497	7.34e-11	4.19e-08
GO:0010975	regulation of neuron projection development	414/15497	8.96e-11	4.3e-08
GO:0001503	ossification	365/15497	9.41e-11	4.3e-08
GO:0016311	dephosphorylation	411/15497	1.18e-10	4.89e-08
GO:0198738	cell-cell signaling by wnt	468/15497	1.57e-10	5.99e-08
GO:0016055	Wnt signaling pathway	466/15497	1.87e-10	6.57e-08
GO:0051961	negative regulation of nervous system development	264/15497	2.51e-10	8.2e-08
GO:0032147	activation of protein kinase activity	319/15497	2.93e-10	8.5e-08

Click to Visit Go Analysis file

6 Footprint Analysis

The following figure is CTCF footprint.

All motif footprint figures are saved as pdf files. The absolute path is /home/zwei/esATAC_test/hg19C/esATAC_pipeline/intermediate_results/Footprint_ATAC_CutSite/FP_PDF_ATAC_CutSite.

7 Annotation of items in table

For single end sequencing data, esATAC will counts reads number.

For paired end sequencing data, esATAC will counts read pairs or fragment number.

Sequence files type is the type of sequencing data: single end data and paired end data. If paired end reads are stored in one file interleavedly rather than two files, “it is call”interleaved.
Original total reads is the sample’s raw total reads (pairs) number.
Reads after adapter removing (ratio) is the reads (pairs) number after adapter removing and the percentage of retained reads in original total reads. The larger value shows the better quality.
Total mapped reads (ratio)
is the reads (pairs) number mapped to reference genome and the percentage of mapped reads in original total reads (alignment rate). ENCODE recommend that the alignment rate, or percentage of mapped reads, should be greater than 95%, though values >80% may be acceptable.
Unique locations mapped uniquely
is the number of distinct uniquely mapping reads (i.e. after removing duplicates).
Non-Redundant Fraction (NRF) is the value of: Unique locations mapped uniquely (the number of positions in the genome that uniquely mappable reads map to) / the total number of uniquely mappable reads.

ENCODE recommend NRF>0.9 for ATAC-seq data.

\(NRF\) value range	Complexity
\(NRF<0.7\)	Concerning
\(0.7\le NRF \le 0.9\)	Acceptable
\(NRF>0.7\)	Ideal

Locations with only 1 reads mapping uniquely is the number of genomic locations where exactly one read maps uniquely.
Locations with only 2 reads mapping uniquely is the number of genomic locations where two reads map uniquely.
PCR Bottlenecking Coefficients 1 (PBC1) is the value of: Locations with only 1 reads mapping uniquely / Unique locations mapped uniquely. ENCODE recommend that PBC1>0.9 for ATAC-seq data.

\(PBC1\) value range	Bottlenecking level
\(PBC1<0.7\)	Severe
\(0.7\le PBC1 \le 0.9\)	Moderate
\(PBC1>0.7\)	None

PCR Bottlenecking Coefficients 2 (PBC2) is the value of: Locations with only 1 reads mapping uniquely / Locations with only 2 reads mapping uniquely. ENCODE recommend that PBC2>3 for ATAC-seq data.

\(PBC2\) value range	Bottlenecking level
\(PBC2<1\)	Severe
\(1\le PBC2 \le 3\)	Moderate
\(PBC2>3\)	None

Non-mitochondrial reads (ratio) is the percentage of non-mitochondrial read in total mapped reads. (mitochondrial reads removed).The larger value shows the better quality.
Unique mapped reads (ratio) is the percentage of non-mitochondrial unique mapped read in total mapped reads (multi-mapped reads removed). The larger value shows the better quality.
Duplicate removed reads (final for use) is the percentage of non-mitochondrial, unique mapped and non-duplicate reads in total mapped reads (duplicate reads removed). These reads are ready to use and storage at final. The larger value shows the better quality.
Nucleosome free reads (<100bp) is the nucleosome free reads reads shorter than 100bp for peak calling
Total peaks is the number of peak called by using nucleosome free reads (<100bp)
Peaks overlaped with union DHS (ratio) is the percentage of called peak overlaped with blacklist. The larger value shows the better quality.
Peaks overlaped with blacklist (ratio) is the percentage of called peak overlaped with blacklist. The smaller value shows the better quality.
Fraction of reads in peaks (FRiP) is the fraction of nucleosome free reads (<100bp) in peak. The larger value shows the better quality.

8 Session Info

sessionInfo()

## R version 3.4.3 (2017-11-30)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Red Hat Enterprise Linux Server 7.2 (Maipo)
## 
## Matrix products: default
## BLAS: /home/software/R/R-3.4.3/lib64/R/lib/libRblas.so
## LAPACK: /home/software/R/R-3.4.3/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.utf8          LC_NUMERIC=C                
##  [3] LC_TIME=en_US.utf8           LC_COLLATE=en_US.utf8       
##  [5] LC_MONETARY=en_US.utf8       LC_MESSAGES=en_US.utf8      
##  [7] LC_PAPER=en_US.utf8          LC_NAME=en_US.utf8          
##  [9] LC_ADDRESS=en_US.utf8        LC_TELEPHONE=en_US.utf8     
## [11] LC_MEASUREMENT=en_US.utf8    LC_IDENTIFICATION=en_US.utf8
## 
## attached base packages:
## [1] stats4    parallel  methods   stats     graphics  grDevices utils    
## [8] datasets  base     
## 
## other attached packages:
##  [1] ChIPseeker_1.14.1                      
##  [2] gridExtra_2.3                          
##  [3] ggplot2_2.2.1                          
##  [4] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
##  [5] GenomicFeatures_1.30.3                 
##  [6] org.Hs.eg.db_3.5.0                     
##  [7] AnnotationDbi_1.40.0                   
##  [8] BSgenome.Hsapiens.UCSC.hg19_1.4.0      
##  [9] BSgenome_1.46.0                        
## [10] rtracklayer_1.38.3                     
## [11] esATAC_1.0.15                          
## [12] ShortRead_1.36.0                       
## [13] GenomicAlignments_1.14.1               
## [14] SummarizedExperiment_1.8.1             
## [15] DelayedArray_0.4.1                     
## [16] matrixStats_0.53.1                     
## [17] Biobase_2.38.0                         
## [18] BiocParallel_1.12.0                    
## [19] Rsamtools_1.30.0                       
## [20] Biostrings_2.46.0                      
## [21] XVector_0.18.0                         
## [22] GenomicRanges_1.30.1                   
## [23] GenomeInfoDb_1.14.0                    
## [24] IRanges_2.12.0                         
## [25] S4Vectors_0.16.0                       
## [26] BiocGenerics_0.24.0                    
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.2             fastmatch_1.1-0            
##   [3] corrplot_0.84               VGAM_1.0-5                 
##   [5] plyr_1.8.4                  igraph_1.1.2               
##   [7] lazyeval_0.2.1              splines_3.4.3              
##   [9] gridBase_0.4-7              TFBSTools_1.16.0           
##  [11] digest_0.6.15               BiocInstaller_1.28.0       
##  [13] htmltools_0.3.6             GOSemSim_2.4.1             
##  [15] viridis_0.5.0               GO.db_3.5.0                
##  [17] gdata_2.18.0                magrittr_1.5               
##  [19] memoise_1.1.0               JASPAR2016_1.6.0           
##  [21] readr_1.1.1                 annotate_1.56.1            
##  [23] R.utils_2.6.0               prettyunits_1.0.2          
##  [25] colorspace_1.3-2            blob_1.1.0                 
##  [27] dplyr_0.7.4                 RCurl_1.95-4.10            
##  [29] jsonlite_1.5                bindr_0.1                  
##  [31] TFMPvalue_0.0.6             brew_1.0-6                 
##  [33] glue_1.2.0                  gtable_0.2.0               
##  [35] zlibbioc_1.24.0             UpSetR_1.3.3               
##  [37] Rook_1.1-1                  scales_0.5.0               
##  [39] DOSE_3.4.0                  futile.options_1.0.0       
##  [41] DBI_0.7                     Rcpp_0.12.15               
##  [43] plotrix_3.7                 viridisLite_0.3.0          
##  [45] xtable_1.8-2                progress_1.1.2             
##  [47] bit_1.1-12                  htmlwidgets_1.0            
##  [49] httr_1.3.1                  DiagrammeR_0.9.2           
##  [51] fgsea_1.4.1                 gplots_3.0.1               
##  [53] RColorBrewer_1.1-2          pkgconfig_2.0.1            
##  [55] XML_3.98-1.9                rJava_0.9-9                
##  [57] R.methodsS3_1.7.1           labeling_0.3               
##  [59] rlang_0.1.6                 reshape2_1.4.3             
##  [61] munsell_0.4.3               tools_3.4.3                
##  [63] visNetwork_2.0.3            downloader_0.4             
##  [65] DirichletMultinomial_1.20.0 RSQLite_2.0                
##  [67] evaluate_0.10.1             stringr_1.2.0              
##  [69] yaml_2.1.16                 knitr_1.19                 
##  [71] bit64_0.9-7                 caTools_1.17.1             
##  [73] purrr_0.2.4                 KEGGREST_1.18.1            
##  [75] bindrcpp_0.2                R.oo_1.21.0                
##  [77] poweRlaw_0.70.1             DO.db_2.9                  
##  [79] biomaRt_2.34.2              compiler_3.4.3             
##  [81] rstudioapi_0.7              rgexf_0.15.3               
##  [83] png_0.1-7                   Rbowtie2_1.0.1             
##  [85] tibble_1.4.2                stringi_1.1.6              
##  [87] highr_0.6                   futile.logger_1.4.3        
##  [89] lattice_0.20-35             CNEr_1.14.0                
##  [91] Matrix_1.2-12               pillar_1.1.0               
##  [93] data.table_1.10.4-3         bitops_1.0-6               
##  [95] qvalue_2.10.0               R6_2.2.2                   
##  [97] latticeExtra_0.6-28         hwriter_1.3.2              
##  [99] RMySQL_0.10.13              KernSmooth_2.23-15         
## [101] lambda.r_1.2                boot_1.3-20                
## [103] gtools_3.5.0                assertthat_0.2.0           
## [105] seqLogo_1.44.0              rprojroot_1.3-2            
## [107] GenomeInfoDbData_1.0.0      hms_0.4.1                  
## [109] influenceR_0.1.0            clusterProfiler_3.6.0      
## [111] VennDiagram_1.6.18          grid_3.4.3                 
## [113] tidyr_0.8.0                 rmarkdown_1.8              
## [115] rvcheck_0.0.9

esATAC Report

Zheng Wei and Wei Zhang

2018-02-14