FourCseq: getZScores errors, first case 6 digenerated cutterm second case 4 base cutter
3
0
Entering edit mode
@theodoregeorgomanolis-7993
Last seen 14 months ago
Germany

Hi Felix,

Hi to all,

I am getting trouble at the getZscores step

First situation
> fcf<-getZScores(fc19)
[1] "SAMD4A_ApoI"
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates
Error in getZScores(fc19) :
Failed to estimate the parameters of the Variance stabilizing transformation.

Second Situation

> fcf<-getZScores(fc19, fitFun = "distFitMonotone")
[1] "SAMD4A_NlaIII"
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  :
every gene contains at least one zero, cannot compute log geometric means
>

My code is as follows

metaData19 <- list(projectPath = "fastq/FourCSeq_Analysis",
fragmentDir = "fastq/FourCSeq_Analysis",
referenceGenomeFile=referenceGenomeFile,
reSequence1 = "RAATTY",
reSequence2 = "GATC",
primerFile = primerFile,
bamFilePath = bamFilePath)
colData <- DataFrame(viewpoint = "SAMD4A_ApoI",
condition = factor(rep(c("Con1","Con2"),each=2),
levels = c("Con1","Con2")),
replicate=rep(c(1,2),
2),
bamFile = c("Con1_rep1.bam",
"Con1_rep2.bam",
"Con2_rep1.bam",
"Con2_rep2.bam"),
sequencingPrimer="first")
fc19<-countFragmentOverlaps(fc19)
fc19<- combineFragEnds(fc19)
fc19<-smoothCounts(fc19)
fcf<-getZScores(fc19)


In the final getZscores I used various options for the  fitFun = "distFitMonotoneSymmetric" as mentioned at the help file for the function, "local" or "mean" with the same output, for both error codes.

Is there any Hope???

General Information

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=de_DE.UTF-8
[8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] splines   parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.25.15                 AnnotationDbi_1.35.4                    FourCSeq_1.7.2
[5] LSD_3.0                                 DESeq2_1.13.8                           SummarizedExperiment_1.3.7              Biobase_2.33.0
[9] ggplot2_2.1.0                           GenomicRanges_1.25.9                    GenomeInfoDb_1.9.4                      IRanges_2.7.11
[13] S4Vectors_0.11.9                        BiocGenerics_0.19.2

loaded via a namespace (and not attached):
[1] httr_1.2.1                    AnnotationHub_2.5.5           gtools_3.5.0                  Formula_1.2-1                 shiny_0.13.2                  interactiveDisplayBase_1.11.3
[7] latticeExtra_0.6-28           RBGL_1.49.1                   BSgenome_1.41.2               Rsamtools_1.25.0              RSQLite_1.0.0                 lattice_0.20-33
[13] biovizBase_1.21.0             chron_2.3-47                  digest_0.6.9                  RColorBrewer_1.1-2            XVector_0.13.6                colorspace_1.2-6
[19] ggbio_1.21.2                  htmltools_0.3.5               httpuv_1.3.3                  Matrix_1.2-6                  plyr_1.8.4                    OrganismDbi_1.15.1
[25] XML_3.98-1.4                  biomaRt_2.29.2                fda_2.4.4                     genefilter_1.55.2             zlibbioc_1.19.0               xtable_1.8-2
[31] scales_0.4.0                  BiocParallel_1.7.4            annotate_1.51.0               nnet_7.3-12                   survival_2.39-5               magrittr_1.5
[37] mime_0.5                      GGally_1.2.0                  foreign_0.8-66                graph_1.51.0                  BiocInstaller_1.23.6          tools_3.3.1
[43] data.table_1.9.6              stringr_1.0.0                 munsell_0.4.3                 locfit_1.5-9.1                cluster_2.0.4                 ensembldb_1.5.9
[49] Biostrings_2.41.4             grid_3.3.1                    RCurl_1.95-4.8                dichromat_2.0-0               VariantAnnotation_1.19.8      bitops_1.0-6
[55] gtable_0.2.0                  DBI_0.4-1                     reshape_0.8.5                 reshape2_1.4.1                R6_2.1.2                      GenomicAlignments_1.9.6
[61] gridExtra_2.2.1               rtracklayer_1.33.10           Hmisc_3.17-4                  stringi_1.1.1                 Rcpp_0.12.6                   geneplotter_1.51.0
[67] rpart_4.1-10                  acepack_1.3-3.3  
fourcseq • 806 views
1
Entering edit mode
felix.klein ▴ 150
@felixklein-6900
Last seen 3.1 years ago
Germany

Hi Theodore,

how do the scatter plots between your replicates look like? Do the libraries show sufficient agreement or are they quite different?

As a straightforward approach you can try to increase the number in minCount. This will set aside low count fragments, which tend to have a higher noise levels, and might help you with the analysis.

In the second situation: for every fragment there is one library which does not have any count. A possible explanation is that one library does not contain any counts at all. You can test this with the following command:

apply(counts(fc), 2, sum)

Best regards,

Felix

0
Entering edit mode
@theodoregeorgomanolis-7993
Last seen 14 months ago
Germany

Hi, Felix

here is the scatterplot of the two replicates

So, I change the minCount on the getZscore function and I fount that the minCount in my case is 2390

> fcf<-getZScores(fc19, minCount = 2390)
[1] "SAMD4A_ApoI"
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates

Results
Iter.   PENSSE   Grad Length Intercept   Slope
0        0.0045      2e-04      19.8126      -0.0093
1        0.0027      0      20.0663      -0.0031
Results
Iter.   PENSSE   Grad Length Intercept   Slope
0        0.0017      1e-04      19.7234      -0.0045
1        0.0011      0      19.8851      -3e-04
Results
Iter.   PENSSE   Grad Length Intercept   Slope
0        0.0028      1e-04      19.7559      -0.0062
1        0.0018      1e-04      19.966      -3e-04
Results
Iter.   PENSSE   Grad Length Intercept   Slope
0        0.0011      0      19.6941      -0.0037
1        7e-04      0      19.8193      -5e-04

The question I got is that if I mix that dataset with an other one that has a different minCount expectancy I will be forced to use the high one. Will the DEseq2 (ANOVA test if I am not mistaken) produce trustworthy results?

Best

Theodore

0
Entering edit mode
Hello Theodore, this seems like a quite high number. How many fragments are actually left after you used this threshold? You should definetly double check the results in this case because you probably remove quite a lot of fragments. Especially watch out if you have a switch in the interaction differences, e.g. away from the viewpoint all interactions of cond 1 are stronger and close to the viewpoint all interactions of cond 2. This would hint to a problem with the normalization. Otherwise the results should be correct given the inputs. Best regards, Felix
0
Entering edit mode

Hey Felix,

SO, I tried to mix the data set with an other dataset and I could lower significantly the minCount to 100, which is reasonable. Scatter plot looks the same when I am comparing the same datasets, but I was wondering if I will get more false "significant" contacts.

the QQ plots are looking different though from using only the dataset that required a high minCount to fit a model.

"Especially watch out if you have a switch in the interaction differences"
I do not get that part, could you please elaborate?

Theodoros

"You should definetly double check the results in this case because you probably remove quite a lot of fragments"
Any suggestions on how I can procced with that?

0
Entering edit mode
Hello Theodoros, there should be not much more false "significat" contacts if you change the threshold within a reasonable range. What I meant with the switch in interaction differences is the following: If you use the plotDifferences function and see a clear trend that the log fold change is always above 0 around the viewpoint and below 0 after a certain distance away from the viewpoint (or vice-versa) than this means that there is a problem with the normalization of the libraries. In detail the domains away from the viewpoint and close to the viewpoint show differences between the libraries which make up the observed differences. Best regards, Felix On 07/27/2016 06:24 PM, theodore.georgomanolis [bioc] wrote: > Activity on a post you are following on support.bioconductor.org > <https: support.bioconductor.org=""> > > User theodore.georgomanolis <https: support.bioconductor.org="" u="" 7993=""/> > wrote Comment: FourCseq: getZScores errors, first case 6 digenerated > cutterm second case 4 base cutter > <https: support.bioconductor.org="" p="" 85332="" #85488="">: > > Hey Felix, > > SO, I tried to mix the data set with an other dataset and I could > lower significantly the |minCount| to 100, which is reasonable. > Scatter plot looks the same when I am comparing the same datasets, but > I was wondering if I will get more false "significant" contacts. > > the QQ plots are looking different though from using only the dataset > that required a high |minCount| to fit a model. > > "Especially watch out if you have a switch in the interaction differences" > I do not get that part, could you please elaborate? > > Thank you for your time and your help > > Theodoros > > "You should definetly double check the results in this case because > you probably remove quite a lot of fragments" > Any suggestions on how I can procced with that? > > ------------------------------------------------------------------------ > > Post tags: fourcseq > > You may reply via email or visit > C: FourCseq: getZScores errors, first case 6 digenerated cutterm second case 4 base >
0
Entering edit mode
@theodoregeorgomanolis-7993
Last seen 14 months ago
Germany

Regarding the same set, I encounter another problem while I was

fcf <- getDifferences(fcf, referenceCondition="0min")

the output was

> fcf <- getDifferences(fcf,
+ referenceCondition="0min")
[1] SAM_ApoI
Levels: SAM_ApoI
found already estimated dispersions, replacing these
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
final dispersion estimates

I

MY whole commands till that point are:

library(FourCSeq)
library(GenomicFeatures)
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(S4Vectors)

referenceGenomeFile = "/media/trotos/4TB/WORK/UCSC/hg19/Sequence/WholeGenomeFasta/hg19_genome.fa"
bamFilePath = "/media/trotos/4TB/WORK/archive/Native4C/fastq/bam_files/SAMD4a/ApoI"

primerFile = "/media/trotos/4TB/WORK/archive/Native4C/fastq/FourCSeq_Analysis/primer_file"
fragmentDir = "/media/trotos/4TB/WORK/archive/Native4C/fastq/FourCSeq_Analysis",
referenceGenomeFile=referenceGenomeFile,
reSequence1 = "RAATTY",
reSequence2 = "GATC",
primerFile = primerFile,
bamFilePath = bamFilePath)

colData(fc19)$chr = "chr14" colData(fc19)$start =55032526
colData(fc19)\$end =55035790

fc19 <- countFragmentOverlaps(fc19, trim=0, minMapq=10)
fc19 <- combineFragEnds(fc19)

fc19<-smoothCounts(fc19)

fcf <-getZScores(fc19[,c("SAM_ApoI_0min_1",
"SAM_ApoI_0min_2",
"SAM_ApoI_0min_3",
"SAM_ApoI_60min_1",
"SAM_ApoI_60min_2",
"SAM_ApoI_60min_3")],
minCount = 1845)

zScore<-assay(fcf,"zScore")

fcf<-addPeaks(fcf,zScoreThresh = 3, fdrThresh = 0.05)

fcf <- getDifferences(fcf,
referenceCondition="0min")

0
Entering edit mode
Hello Theodoros, have a look at the dispersion fit whether this makes sense to you (plotDispEsts(fcf)). And have a look at the DESeq2 package which is behind all these calculations. Best regards, Felix
0
Entering edit mode

Should I send a new question with a DESeq2 tag?