Question

FourCseq: getZScores errors, first case 6 digenerated cutterm second case 4 base cutter

0

Entering edit mode

Theo ▴ 10

@theodoregeorgomanolis-7993

Last seen 6 months ago

Germany

Hi Felix,

Hi to all,

I am getting trouble at the getZscores step

First situation
> fcf<-getZScores(fc19)
[1] "SAMD4A_ApoI"
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates
Error in getZScores(fc19) :
  Failed to estimate the parameters of the Variance stabilizing transformation.

Second Situation

> fcf<-getZScores(fc19, fitFun = "distFitMonotone")
[1] "SAMD4A_NlaIII"
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  :
  every gene contains at least one zero, cannot compute log geometric means
>

My code is as follows

metaData19 <- list(projectPath = "fastq/FourCSeq_Analysis",
                   fragmentDir = "fastq/FourCSeq_Analysis",
                   referenceGenomeFile=referenceGenomeFile,
                   reSequence1 = "RAATTY",
                   reSequence2 = "GATC",
                   primerFile = primerFile,
                   bamFilePath = bamFilePath)
colData <- DataFrame(viewpoint = "SAMD4A_ApoI",
                     condition = factor(rep(c("Con1","Con2"),each=2),
                                        levels = c("Con1","Con2")),
                     replicate=rep(c(1,2),
                                   2),
                     bamFile = c("Con1_rep1.bam",
                                 "Con1_rep2.bam",
                                 "Con2_rep1.bam",
                                 "Con2_rep2.bam"),
                     sequencingPrimer="first")
fc19<-FourC(colData,metaData19)
fc19<-addFragments(fc19)
fc19<-countFragmentOverlaps(fc19)
fc19<- combineFragEnds(fc19)
fc19<-smoothCounts(fc19)
fcf<-getZScores(fc19)

In the final getZscores I used various options for the fitFun = "distFitMonotoneSymmetric" as mentioned at the help file for the function, "local" or "mean" with the same output, for both error codes.

Is there any Hope???

General Information

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8     
[8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] splines   parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.25.15                 AnnotationDbi_1.35.4                    FourCSeq_1.7.2                        
[5] LSD_3.0                                 DESeq2_1.13.8                           SummarizedExperiment_1.3.7              Biobase_2.33.0                        
[9] ggplot2_2.1.0                           GenomicRanges_1.25.9                    GenomeInfoDb_1.9.4                      IRanges_2.7.11                        
[13] S4Vectors_0.11.9                        BiocGenerics_0.19.2                    

loaded via a namespace (and not attached):
[1] httr_1.2.1                    AnnotationHub_2.5.5           gtools_3.5.0                  Formula_1.2-1                 shiny_0.13.2                  interactiveDisplayBase_1.11.3
[7] latticeExtra_0.6-28           RBGL_1.49.1                   BSgenome_1.41.2               Rsamtools_1.25.0              RSQLite_1.0.0                 lattice_0.20-33             
[13] biovizBase_1.21.0             chron_2.3-47                  digest_0.6.9                  RColorBrewer_1.1-2            XVector_0.13.6                colorspace_1.2-6            
[19] ggbio_1.21.2                  htmltools_0.3.5               httpuv_1.3.3                  Matrix_1.2-6                  plyr_1.8.4                    OrganismDbi_1.15.1          
[25] XML_3.98-1.4                  biomaRt_2.29.2                fda_2.4.4                     genefilter_1.55.2             zlibbioc_1.19.0               xtable_1.8-2                
[31] scales_0.4.0                  BiocParallel_1.7.4            annotate_1.51.0               nnet_7.3-12                   survival_2.39-5               magrittr_1.5                
[37] mime_0.5                      GGally_1.2.0                  foreign_0.8-66                graph_1.51.0                  BiocInstaller_1.23.6          tools_3.3.1                 
[43] data.table_1.9.6              stringr_1.0.0                 munsell_0.4.3                 locfit_1.5-9.1                cluster_2.0.4                 ensembldb_1.5.9             
[49] Biostrings_2.41.4             grid_3.3.1                    RCurl_1.95-4.8                dichromat_2.0-0               VariantAnnotation_1.19.8      bitops_1.0-6                
[55] gtable_0.2.0                  DBI_0.4-1                     reshape_0.8.5                 reshape2_1.4.1                R6_2.1.2                      GenomicAlignments_1.9.6     
[61] gridExtra_2.2.1               rtracklayer_1.33.10           Hmisc_3.17-4                  stringi_1.1.1                 Rcpp_0.12.6                   geneplotter_1.51.0          
[67] rpart_4.1-10                  acepack_1.3-3.3

fourcseq • 1.9k views

ADD COMMENT • link 8.4 years ago Theo ▴ 10

Wolfgang Huber · Answer 1 · 2016-07-23

Hi Theodore,

how do the scatter plots between your replicates look like? Do the libraries show sufficient agreement or are they quite different?

As a straightforward approach you can try to increase the number in minCount. This will set aside low count fragments, which tend to have a higher noise levels, and might help you with the analysis.

In the second situation: for every fragment there is one library which does not have any count. A possible explanation is that one library does not contain any counts at all. You can test this with the following command:

apply(counts(fc), 2, sum)

Best regards,

Felix

score 0 · Answer 2 · 2016-07-26

0

Entering edit mode

Theo ▴ 10

@theodoregeorgomanolis-7993

Last seen 6 months ago

Germany

Hi, Felix

here is the scatterplot of the two replicates
https://drive.google.com/open?id=0ByX6rNLnP_9qbTNtQWIya244Q1U
https://drive.google.com/open?id=0ByX6rNLnP_9qemc1REM2YmRJVTQ

So, I change the minCount on the getZscore function and I fount that the minCount in my case is 2390

> fcf<-getZScores(fc19, minCount = 2390)
[1] "SAMD4A_ApoI"
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates

Results
Iter.   PENSSE   Grad Length Intercept   Slope
0        0.0045      2e-04      19.8126      -0.0093
1        0.0027      0      20.0663      -0.0031
Results
Iter.   PENSSE   Grad Length Intercept   Slope
0        0.0017      1e-04      19.7234      -0.0045
1        0.0011      0      19.8851      -3e-04
 Results
Iter.   PENSSE   Grad Length Intercept   Slope
0        0.0028      1e-04      19.7559      -0.0062
1        0.0018      1e-04      19.966      -3e-04
 Results
Iter.   PENSSE   Grad Length Intercept   Slope
0        0.0011      0      19.6941      -0.0037
1        7e-04      0      19.8193      -5e-04

The question I got is that if I mix that dataset with an other one that has a different minCount expectancy I will be forced to use the high one. Will the DEseq2 (ANOVA test if I am not mistaken) produce trustworthy results?

Best

Theodore

ADD COMMENT • link 8.4 years ago Theo ▴ 10

0

Entering edit mode

Hello Theodore, this seems like a quite high number. How many fragments are actually left after you used this threshold? You should definetly double check the results in this case because you probably remove quite a lot of fragments. Especially watch out if you have a switch in the interaction differences, e.g. away from the viewpoint all interactions of cond 1 are stronger and close to the viewpoint all interactions of cond 2. This would hint to a problem with the normalization. Otherwise the results should be correct given the inputs. Best regards, Felix

ADD REPLY • link 8.4 years ago felix.klein ▴ 150

0

Entering edit mode

Hey Felix,

SO, I tried to mix the data set with an other dataset and I could lower significantly the minCount to 100, which is reasonable. Scatter plot looks the same when I am comparing the same datasets, but I was wondering if I will get more false "significant" contacts.

the QQ plots are looking different though from using only the dataset that required a high minCount to fit a model.

"Especially watch out if you have a switch in the interaction differences"
I do not get that part, could you please elaborate?

Thank you for your time and your help

Theodoros

"You should definetly double check the results in this case because you probably remove quite a lot of fragments"
Any suggestions on how I can procced with that?

ADD REPLY • link 8.4 years ago Theo ▴ 10

0

Entering edit mode

Hello Theodoros, there should be not much more false "significat" contacts if you change the threshold within a reasonable range. What I meant with the switch in interaction differences is the following: If you use the plotDifferences function and see a clear trend that the log fold change is always above 0 around the viewpoint and below 0 after a certain distance away from the viewpoint (or vice-versa) than this means that there is a problem with the normalization of the libraries. In detail the domains away from the viewpoint and close to the viewpoint show differences between the libraries which make up the observed differences. Best regards, Felix On 07/27/2016 06:24 PM, theodore.georgomanolis [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User theodore.georgomanolis <https: support.bioconductor.org="" u="" 7993=""/> > wrote Comment: FourCseq: getZScores errors, first case 6 digenerated > cutterm second case 4 base cutter > <https: support.bioconductor.org="" p="" 85332="" #85488="">: > > Hey Felix, > > SO, I tried to mix the data set with an other dataset and I could > lower significantly the |minCount| to 100, which is reasonable. > Scatter plot looks the same when I am comparing the same datasets, but > I was wondering if I will get more false "significant" contacts. > > the QQ plots are looking different though from using only the dataset > that required a high |minCount| to fit a model. > > "Especially watch out if you have a switch in the interaction differences" > I do not get that part, could you please elaborate? > > Thank you for your time and your help > > Theodoros > > "You should definetly double check the results in this case because > you probably remove quite a lot of fragments" > Any suggestions on how I can procced with that? > > ------------------------------------------------------------------------ > > Post tags: fourcseq > > You may reply via email or visit > C: FourCseq: getZScores errors, first case 6 digenerated cutterm second case 4 base >

ADD REPLY • link 8.4 years ago felix.klein ▴ 150

score 0 · Answer 3 · 2016-07-28

Regarding the same set, I encounter another problem while I was

fcf <- getDifferences(fcf, referenceCondition="0min")

the output was

> fcf <- getDifferences(fcf,
+ referenceCondition="0min")
[1] SAM_ApoI
Levels: SAM_ApoI
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates

I

MY whole commands till that point are:

library(FourCSeq)
library(GenomicFeatures)
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(S4Vectors)

referenceGenomeFile = "/media/trotos/4TB/WORK/UCSC/hg19/Sequence/WholeGenomeFasta/hg19_genome.fa"
bamFilePath = "/media/trotos/4TB/WORK/archive/Native4C/fastq/bam_files/SAMD4a/ApoI"

primerFile = "/media/trotos/4TB/WORK/archive/Native4C/fastq/FourCSeq_Analysis/primer_file"
metaData19 <- list(projectPath = "/media/trotos/4TB/WORK/archive/Native4C/fastq/FourCSeq_Analysis",
                   fragmentDir = "/media/trotos/4TB/WORK/archive/Native4C/fastq/FourCSeq_Analysis",
                   referenceGenomeFile=referenceGenomeFile,
                   reSequence1 = "RAATTY",
                   reSequence2 = "GATC",
                   primerFile = primerFile,
                   bamFilePath = bamFilePath)

colData<-DataFrame(read.csv("/media/trotos/4TB/WORK/archive/Native4C/fastq/bam_files/FourCSeq.csv"))

fc19<-FourC(colData,metaData19)
fc19<-addFragments(fc19)

colData(fc19)$chr = "chr14"
colData(fc19)$start =55032526
colData(fc19)$end =55035790

fc19 <- countFragmentOverlaps(fc19, trim=0, minMapq=10)
fc19 <- combineFragEnds(fc19)

fc19<-smoothCounts(fc19)

fcf <-getZScores(fc19[,c("SAM_ApoI_0min_1",
                                "SAM_ApoI_0min_2",
                                "SAM_ApoI_0min_3",
                                "SAM_ApoI_60min_1",
                                "SAM_ApoI_60min_2",
                                "SAM_ApoI_60min_3")],
                        minCount = 1845)

zScore<-assay(fcf,"zScore")

fcf<-addPeaks(fcf,zScoreThresh = 3, fdrThresh = 0.05)

fcf <- getDifferences(fcf,
                      referenceCondition="0min")