Search
Question: Sample label swap in HTML results
0
gravatar for maximinio
4 months ago by
maximinio0
San Gerardo Hospital, Monza, Italy
maximinio0 wrote:

This is my sample data frame:

sampleTable = data.frame(  condition = c("ctrl","ctrl","ctrl","resistant","resistant","resistant"),
                           replicate = c(1:3,1:3),
                           row.names = c("Ctrl1","Ctrl2","Ctrl4","AS4","BD1","BS1"),
                           stringsAsFactors = TRUE, check.names = FALSE,
                           libType = c("paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end"),
                           countName = c("Ctrl1_DEXSeq.tabular", "Ctrl2_DEXSeq.tabular", "Ctrl4_DEXSeq.tabular", "AS4_DEXSeq.tabular", "BD1_DEXSeq.tabular", "BS1_DEXSeq.tabular") )


I am sure the tabular files are correct because I checked them by comparing the RPMs with those obtained with htseq.
Nevertheless, when I test for DEU and estimate exon FC, the results seem to be fine but the 3 controls and the 3 experimental groups are evidently swapped.

In this picture for instance the RPMs belonging to the last exons are clearly higher in the control tab files as compared to the others:
dexseq

Could it be that the samples are erroneously processed alphabetically and so the AS4, BD1 and BS1 come first to the Ctrls?

Thank you in advance!
Best,
Luca

ADD COMMENTlink modified 4 months ago by Alejandro Reyes1.5k • written 4 months ago by maximinio0

Please post the complete R code that you used to get from the sample and count tables to the plot you posted. Thanks.

ADD REPLYlink written 4 months ago by Simon Anders3.4k

Could you also the output of your sessionInfo() please?

ADD REPLYlink written 4 months ago by Alejandro Reyes1.5k

It'll take some time...BRB!

Thank you all!!

ADD REPLYlink written 4 months ago by maximinio0
> sessionInfo()
R version 3.4.0 Patched (2017-06-02 r72765)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Italian_Italy.1252 
[2] LC_CTYPE=Italian_Italy.1252   
[3] LC_MONETARY=Italian_Italy.1252
[4] LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.1252    

attached base packages:
[1] stats4    parallel  stats    
[4] graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.22.0             
 [2] RColorBrewer_1.1-2        
 [3] AnnotationDbi_1.38.1      
 [4] DESeq2_1.16.1             
 [5] SummarizedExperiment_1.6.3
 [6] DelayedArray_0.2.7        
 [7] matrixStats_0.52.2        
 [8] GenomicRanges_1.28.3      
 [9] GenomeInfoDb_1.12.2       
[10] IRanges_2.10.2            
[11] S4Vectors_0.14.3          
[12] Biobase_2.36.2            
[13] BiocGenerics_0.22.0       
[14] BiocParallel_1.10.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11           
 [2] locfit_1.5-9.1         
 [3] lattice_0.20-35        
 [4] Biostrings_2.44.1      
 [5] Rsamtools_1.28.0       
 [6] digest_0.6.12          
 [7] plyr_1.8.4             
 [8] backports_1.1.0        
 [9] acepack_1.4.1          
[10] RSQLite_2.0            
[11] ggplot2_2.2.1          
[12] zlibbioc_1.22.0        
[13] rlang_0.1.1            
[14] lazyeval_0.2.0         
[15] data.table_1.10.4      
[16] annotate_1.54.0        
[17] blob_1.1.0             
[18] rpart_4.1-11           
[19] Matrix_1.2-10          
[20] checkmate_1.8.2        
[21] splines_3.4.0          
[22] statmod_1.4.30         
[23] geneplotter_1.54.0     
[24] stringr_1.2.0          
[25] foreign_0.8-69         
[26] htmlwidgets_0.8        
[27] biomaRt_2.32.1         
[28] RCurl_1.95-4.8         
[29] bit_1.1-12             
[30] munsell_0.4.3          
[31] compiler_3.4.0         
[32] base64enc_0.1-3        
[33] htmltools_0.3.6        
[34] nnet_7.3-12            
[35] tibble_1.3.3           
[36] gridExtra_2.2.1        
[37] htmlTable_1.9          
[38] GenomeInfoDbData_0.99.0
[39] Hmisc_4.0-3            
[40] XML_3.98-1.9           
[41] bitops_1.0-6           
[42] grid_3.4.0             
[43] xtable_1.8-2           
[44] gtable_0.2.0           
[45] DBI_0.7                
[46] magrittr_1.5           
[47] scales_0.4.1           
[48] stringi_1.1.5          
[49] XVector_0.16.0         
[50] hwriter_1.3.2          
[51] genefilter_1.58.1      
[52] latticeExtra_0.6-28    
[53] Formula_1.2-1          
[54] tools_3.4.0            
[55] bit64_0.9-7            
[56] survival_2.41-3        
[57] colorspace_1.3-2       
[58] cluster_2.0.6          
[59] memoise_1.1.0          
[60] knitr_1.16
ADD REPLYlink written 4 months ago by maximinio0

Thank you in advance for your help!

ADD REPLYlink written 4 months ago by maximinio0

Please also post the relevant lines of your count tables that lead you to think that 'control' is expressed above 'resistant', and the output of sizeFactors(dxd) 

ADD REPLYlink written 4 months ago by Simon Anders3.4k
> sizeFactors(dxd)
 [1] 0.9904217 0.8435785 0.9805609
 [4] 1.2139886 1.1632014 0.9221617
 [7] 0.9904217 0.8435785 0.9805609
[10] 1.2139886 1.1632014 0.9221617
ADD REPLYlink written 4 months ago by maximinio0
1
gravatar for Alejandro Reyes
4 months ago by
Alejandro Reyes1.5k
Dana-Farber Cancer Institute, Boston, USA
Alejandro Reyes1.5k wrote:

Hi Maximino. Thanks for sending your files. I think the the sample swap is in your code. Specifically the line below is not returning the tabular files in the same order as you are specifying 

list.files(full.names=TRUE,pattern=".tabular")

[1] "./AS4_DEXSeq.tabular"   "./BD1_DEXSeq.tabular"   "./BS1_DEXSeq.tabular"

[4] "./Ctrl1_DEXSeq.tabular" "./Ctrl2_DEXSeq.tabular" "./Ctrl4_DEXSeq.tabular"

Is not returning the files in the same order as your annotation table:

> sampleTable

      condition replicate    libType            countName

Ctrl1      ctrl         1 paired-end Ctrl1_DEXSeq.tabular

Ctrl2      ctrl         2 paired-end Ctrl2_DEXSeq.tabular

Ctrl4      ctrl         3 paired-end Ctrl4_DEXSeq.tabular

AS4   resistant         1 paired-end   AS4_DEXSeq.tabular

BD1   resistant         2 paired-end   BD1_DEXSeq.tabular

BS1   resistant         3 paired-end   BS1_DEXSeq.tabular
ADD COMMENTlink written 4 months ago by Alejandro Reyes1.5k

Hi Alejandro, thank you for getting back to the issue.

So in other words you are saying that since

countFiles <- list.files(full.names=TRUE,pattern=".tabular")

list the files in alphabetical order, the sampleTable should list the files in the same order...right?

ADD REPLYlink written 4 months ago by maximinio0

Yes! So either change the order of the sampleTable rows or the order of the file names. The important thing is that these are matching. 

ADD REPLYlink written 4 months ago by Alejandro Reyes1.5k

Thank you!!!!

Best,

Luca

ADD REPLYlink written 4 months ago by maximinio0
0
gravatar for maximinio
4 months ago by
maximinio0
San Gerardo Hospital, Monza, Italy
maximinio0 wrote:
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Exp2_2015_AS4_S59_RNASeq_hg38Aligned.sortedByCoord.out.bam AS4_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Exp2_2015_BD1_S61_RNASeq_hg38Aligned.sortedByCoord.out.bam BD1_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Exp2_2015_BS1_S60_RNASeq_hg38Aligned.sortedByCoord.out.bam BS1_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Ctrl1Aligned.sortedByCoord.out.bam Ctrl1_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Ctrl2Aligned.sortedByCoord.out.bam Ctrl2_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Ctrl4Aligned.sortedByCoord.out.bam Ctrl4_DEXSeq.tabular -f bam

 


sampleTable = data.frame( condition = c("ctrl","ctrl","ctrl","resistant","resistant","resistant"),

replicate = c(1:3,1:3),

row.names = c("Ctrl1","Ctrl2","Ctrl4","AS4","BD1","BS1"),

stringsAsFactors = TRUE, check.names = FALSE,

libType = c("paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end"),

countName = c("Ctrl1_DEXSeq.tabular", "Ctrl2_DEXSeq.tabular", "Ctrl4_DEXSeq.tabular", "AS4_DEXSeq.tabular", "BD1_DEXSeq.tabular", "BS1_DEXSeq.tabular")

)


countFiles <- list.files(full.names=TRUE,pattern=".tabular")
countFiles


flattenedFile = file.path("gencode.v26.annotation_DEXSeq.gff")
flattenedFile


fullModel <- ~ sample + exon + condition:exon

dxd = DEXSeqDataSetFromHTSeq( countfiles=countFiles,

sampleData=sampleTable,

design=fullModel,

flattenedfile=flattenedFile)


dxd = estimateSizeFactors(dxd)


dxd = estimateDispersions(dxd)


dxd <- testForDEU(dxd)
dxd <- estimateExonFoldChanges(dxd, fitExpToVar="condition")


dxr1 <- DEXSeqResults(dxd)


DEXSeqHTML( dxr1, FDR=0.1, color=c("#FF000080", "#0000FF80") )
ADD COMMENTlink written 4 months ago by maximinio0

Hi Maximino. I made a couple of tests and I could not reproduce the sample swap that you mentioned. Sorry for the multiple request, but could you include the normalized counts (norCounts) in the plotDEXSeq function?

ADD REPLYlink written 4 months ago by Alejandro Reyes1.5k

Not really sure I understood the Q...

The code I used for the plot above is:

plotDEXSeq( dxr1, "ENSG00000006576.16", legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )

Ant these are the counts in the tabular files:

As you can see, the 38th and the 39th exons have higher expression in the ctrls according to the tabular files and viceversa in the graph above.

ADD REPLYlink written 4 months ago by maximinio0

Sorry for not being clear. Could you post the plot resulting from the following line of code?

plotDEXSeq( dxr1, "ENSG00000006576.16", norCounts=TRUE, legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )
ADD REPLYlink written 4 months ago by Alejandro Reyes1.5k

Here you go:

ADD REPLYlink written 4 months ago by maximinio0

I believe the answer is there: the samples are not swapped. The count matrix that you are showing is not normalized for sequencing depth. If you do normalize for sequencing depth, you should get consistent values to the normalized counts plot.

counts(dxd, normalized=TRUE)

 

ADD REPLYlink written 4 months ago by Alejandro Reyes1.5k

My bad, I thought it was done by default.

Nevertheless, even with the normalized counts exons 38 and 39 are still plotted higher than controls...maybe I am missing something here.

ADD REPLYlink written 4 months ago by maximinio0

From the raw count matrix that you posted, you will see that the counts for the controls are generally higher than the resistant samples (not only for exons 38 and 39, but for all exons). Once you normalize for sequencing depth, the counts for the resistant samples are higher than in the resistant samples only for exons 38 and 39. This means that the higher raw counts in the control samples are due to sequencing depth.

ADD REPLYlink modified 4 months ago • written 4 months ago by Alejandro Reyes1.5k

Clear, although I am still missing something...

It appears to me that exons 38 and 39 (which I took as an example, many more exons are actually different) display higher values in resistants either without or with normalization.

ADD REPLYlink written 4 months ago by maximinio0

I see... in that case my previous explanation is not valid. But still, I can't reproduce these results. Would you mind sending me your raw files so I can try to reproduce the swap?

Alejandro 

ADD REPLYlink written 4 months ago by Alejandro Reyes1.5k

Thank you!

At which email address should I send them?

ADD REPLYlink written 4 months ago by maximinio0

alejandro.reyes.ds at gmail please!

ADD REPLYlink written 4 months ago by Alejandro Reyes1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 112 users visited in the last hour