Question

Sample label swap in HTML results

0

Entering edit mode

maximinio • 0

@maximinio-13364

Last seen 8.6 years ago

San Gerardo Hospital, Monza, Italy

This is my sample data frame:

sampleTable = data.frame(  condition = c("ctrl","ctrl","ctrl","resistant","resistant","resistant"),
                           replicate = c(1:3,1:3),
                           row.names = c("Ctrl1","Ctrl2","Ctrl4","AS4","BD1","BS1"),
                           stringsAsFactors = TRUE, check.names = FALSE,
                           libType = c("paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end"),
                           countName = c("Ctrl1_DEXSeq.tabular", "Ctrl2_DEXSeq.tabular", "Ctrl4_DEXSeq.tabular", "AS4_DEXSeq.tabular", "BD1_DEXSeq.tabular", "BS1_DEXSeq.tabular") )

I am sure the tabular files are correct because I checked them by comparing the RPMs with those obtained with htseq.
Nevertheless, when I test for DEU and estimate exon FC, the results seem to be fine but the 3 controls and the 3 experimental groups are evidently swapped.

In this picture for instance the RPMs belonging to the last exons are clearly higher in the control tab files as compared to the others:
dexseq

Could it be that the samples are erroneously processed alphabetically and so the AS4, BD1 and BS1 come first to the Ctrls?

Thank you in advance!
Best,
Luca

dexseq dexseqhtml • 3.7k views

ADD COMMENT • link updated 8.5 years ago by Alejandro Reyes ★ 1.9k • written 8.6 years ago by maximinio • 0

0

Entering edit mode

Please post the complete R code that you used to get from the sample and count tables to the plot you posted. Thanks.

ADD REPLY • link 8.6 years ago Simon Anders ★ 3.8k

0

Entering edit mode

Could you also the output of your sessionInfo() please?

ADD REPLY • link 8.6 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

It'll take some time...BRB!

Thank you all!!

ADD REPLY • link 8.6 years ago maximinio • 0

0

Entering edit mode

> sessionInfo()
R version 3.4.0 Patched (2017-06-02 r72765)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Italian_Italy.1252 
[2] LC_CTYPE=Italian_Italy.1252   
[3] LC_MONETARY=Italian_Italy.1252
[4] LC_NUMERIC=C                  
[5] LC_TIME=Italian_Italy.1252    

attached base packages:
[1] stats4    parallel  stats    
[4] graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] DEXSeq_1.22.0             
 [2] RColorBrewer_1.1-2        
 [3] AnnotationDbi_1.38.1      
 [4] DESeq2_1.16.1             
 [5] SummarizedExperiment_1.6.3
 [6] DelayedArray_0.2.7        
 [7] matrixStats_0.52.2        
 [8] GenomicRanges_1.28.3      
 [9] GenomeInfoDb_1.12.2       
[10] IRanges_2.10.2            
[11] S4Vectors_0.14.3          
[12] Biobase_2.36.2            
[13] BiocGenerics_0.22.0       
[14] BiocParallel_1.10.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.11           
 [2] locfit_1.5-9.1         
 [3] lattice_0.20-35        
 [4] Biostrings_2.44.1      
 [5] Rsamtools_1.28.0       
 [6] digest_0.6.12          
 [7] plyr_1.8.4             
 [8] backports_1.1.0        
 [9] acepack_1.4.1          
[10] RSQLite_2.0            
[11] ggplot2_2.2.1          
[12] zlibbioc_1.22.0        
[13] rlang_0.1.1            
[14] lazyeval_0.2.0         
[15] data.table_1.10.4      
[16] annotate_1.54.0        
[17] blob_1.1.0             
[18] rpart_4.1-11           
[19] Matrix_1.2-10          
[20] checkmate_1.8.2        
[21] splines_3.4.0          
[22] statmod_1.4.30         
[23] geneplotter_1.54.0     
[24] stringr_1.2.0          
[25] foreign_0.8-69         
[26] htmlwidgets_0.8        
[27] biomaRt_2.32.1         
[28] RCurl_1.95-4.8         
[29] bit_1.1-12             
[30] munsell_0.4.3          
[31] compiler_3.4.0         
[32] base64enc_0.1-3        
[33] htmltools_0.3.6        
[34] nnet_7.3-12            
[35] tibble_1.3.3           
[36] gridExtra_2.2.1        
[37] htmlTable_1.9          
[38] GenomeInfoDbData_0.99.0
[39] Hmisc_4.0-3            
[40] XML_3.98-1.9           
[41] bitops_1.0-6           
[42] grid_3.4.0             
[43] xtable_1.8-2           
[44] gtable_0.2.0           
[45] DBI_0.7                
[46] magrittr_1.5           
[47] scales_0.4.1           
[48] stringi_1.1.5          
[49] XVector_0.16.0         
[50] hwriter_1.3.2          
[51] genefilter_1.58.1      
[52] latticeExtra_0.6-28    
[53] Formula_1.2-1          
[54] tools_3.4.0            
[55] bit64_0.9-7            
[56] survival_2.41-3        
[57] colorspace_1.3-2       
[58] cluster_2.0.6          
[59] memoise_1.1.0          
[60] knitr_1.16

ADD REPLY • link 8.6 years ago maximinio • 0

0

Entering edit mode

Thank you in advance for your help!

ADD REPLY • link 8.6 years ago maximinio • 0

0

Entering edit mode

Please also post the relevant lines of your count tables that lead you to think that 'control' is expressed above 'resistant', and the output of sizeFactors(dxd)

ADD REPLY • link 8.6 years ago Simon Anders ★ 3.8k

0

Entering edit mode

> sizeFactors(dxd)
 [1] 0.9904217 0.8435785 0.9805609
 [4] 1.2139886 1.1632014 0.9221617
 [7] 0.9904217 0.8435785 0.9805609
[10] 1.2139886 1.1632014 0.9221617

ADD REPLY • link 8.6 years ago maximinio • 0

score 1 · Answer 1 · 2017-07-12

1

Entering edit mode

Alejandro Reyes ★ 1.9k

@alejandro-reyes-5124

Last seen 18 months ago

Novartis Institutes for BioMedical Rese…

Hi Maximino. Thanks for sending your files. I think the the sample swap is in your code. Specifically the line below is not returning the tabular files in the same order as you are specifying

list.files(full.names=TRUE,pattern=".tabular")

[1] "./AS4_DEXSeq.tabular"   "./BD1_DEXSeq.tabular"   "./BS1_DEXSeq.tabular"

[4] "./Ctrl1_DEXSeq.tabular" "./Ctrl2_DEXSeq.tabular" "./Ctrl4_DEXSeq.tabular"

Is not returning the files in the same order as your annotation table:

> sampleTable

      condition replicate    libType            countName

Ctrl1      ctrl         1 paired-end Ctrl1_DEXSeq.tabular

Ctrl2      ctrl         2 paired-end Ctrl2_DEXSeq.tabular

Ctrl4      ctrl         3 paired-end Ctrl4_DEXSeq.tabular

AS4   resistant         1 paired-end   AS4_DEXSeq.tabular

BD1   resistant         2 paired-end   BD1_DEXSeq.tabular

BS1   resistant         3 paired-end   BS1_DEXSeq.tabular

ADD COMMENT • link 8.5 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Hi Alejandro, thank you for getting back to the issue.

So in other words you are saying that since

countFiles <- list.files(full.names=TRUE,pattern=".tabular")

list the files in alphabetical order, the sampleTable should list the files in the same order...right?

ADD REPLY • link 8.5 years ago maximinio • 0

0

Entering edit mode

Yes! So either change the order of the sampleTable rows or the order of the file names. The important thing is that these are matching.

ADD REPLY • link 8.5 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Thank you!!!!

Best,

Luca

ADD REPLY • link 8.5 years ago maximinio • 0

score 0 · Answer 2 · 2017-06-30

0

Entering edit mode

maximinio • 0

@maximinio-13364

Last seen 8.6 years ago

San Gerardo Hospital, Monza, Italy

python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Exp2_2015_AS4_S59_RNASeq_hg38Aligned.sortedByCoord.out.bam AS4_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Exp2_2015_BD1_S61_RNASeq_hg38Aligned.sortedByCoord.out.bam BD1_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Exp2_2015_BS1_S60_RNASeq_hg38Aligned.sortedByCoord.out.bam BS1_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Ctrl1Aligned.sortedByCoord.out.bam Ctrl1_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Ctrl2Aligned.sortedByCoord.out.bam Ctrl2_DEXSeq.tabular -f bam
python /home/maximinio/R/i686-pc-linux-gnu-library/3.3/DEXSeq/python_scripts/dexseq_count.py gencode.v26.annotation_DEXSeq.gff Ctrl4Aligned.sortedByCoord.out.bam Ctrl4_DEXSeq.tabular -f bam

sampleTable = data.frame( condition = c("ctrl","ctrl","ctrl","resistant","resistant","resistant"),

replicate = c(1:3,1:3),

row.names = c("Ctrl1","Ctrl2","Ctrl4","AS4","BD1","BS1"),

stringsAsFactors = TRUE, check.names = FALSE,

libType = c("paired-end", "paired-end", "paired-end", "paired-end", "paired-end", "paired-end"),

countName = c("Ctrl1_DEXSeq.tabular", "Ctrl2_DEXSeq.tabular", "Ctrl4_DEXSeq.tabular", "AS4_DEXSeq.tabular", "BD1_DEXSeq.tabular", "BS1_DEXSeq.tabular")

)


countFiles <- list.files(full.names=TRUE,pattern=".tabular")
countFiles


flattenedFile = file.path("gencode.v26.annotation_DEXSeq.gff")
flattenedFile


fullModel <- ~ sample + exon + condition:exon

dxd = DEXSeqDataSetFromHTSeq( countfiles=countFiles,

sampleData=sampleTable,

design=fullModel,

flattenedfile=flattenedFile)


dxd = estimateSizeFactors(dxd)


dxd = estimateDispersions(dxd)


dxd <- testForDEU(dxd)
dxd <- estimateExonFoldChanges(dxd, fitExpToVar="condition")


dxr1 <- DEXSeqResults(dxd)


DEXSeqHTML( dxr1, FDR=0.1, color=c("#FF000080", "#0000FF80") )

ADD COMMENT • link 8.6 years ago maximinio • 0

0

Entering edit mode

Hi Maximino. I made a couple of tests and I could not reproduce the sample swap that you mentioned. Sorry for the multiple request, but could you include the normalized counts (norCounts) in the plotDEXSeq function?

ADD REPLY • link 8.6 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Not really sure I understood the Q...

The code I used for the plot above is:

plotDEXSeq( dxr1, "ENSG00000006576.16", legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )

Ant these are the counts in the tabular files:

As you can see, the 38th and the 39th exons have higher expression in the ctrls according to the tabular files and viceversa in the graph above.

ADD REPLY • link 8.6 years ago maximinio • 0

0

Entering edit mode

Sorry for not being clear. Could you post the plot resulting from the following line of code?

plotDEXSeq( dxr1, "ENSG00000006576.16", norCounts=TRUE, legend=TRUE, cex.axis=1.2, cex=1.3, lwd=2 )

ADD REPLY • link 8.6 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Here you go:

ADD REPLY • link 8.6 years ago maximinio • 0

0

Entering edit mode

I believe the answer is there: the samples are not swapped. The count matrix that you are showing is not normalized for sequencing depth. If you do normalize for sequencing depth, you should get consistent values to the normalized counts plot.

counts(dxd, normalized=TRUE)

ADD REPLY • link 8.6 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

My bad, I thought it was done by default.

Nevertheless, even with the normalized counts exons 38 and 39 are still plotted higher than controls...maybe I am missing something here.

ADD REPLY • link 8.6 years ago maximinio • 0

0

Entering edit mode

From the raw count matrix that you posted, you will see that the counts for the controls are generally higher than the resistant samples (not only for exons 38 and 39, but for all exons). Once you normalize for sequencing depth, the counts for the resistant samples are higher than in the resistant samples only for exons 38 and 39. This means that the higher raw counts in the control samples are due to sequencing depth.

ADD REPLY • link 8.6 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Clear, although I am still missing something...

It appears to me that exons 38 and 39 (which I took as an example, many more exons are actually different) display higher values in resistants either without or with normalization.

ADD REPLY • link 8.6 years ago maximinio • 0

0

Entering edit mode

I see... in that case my previous explanation is not valid. But still, I can't reproduce these results. Would you mind sending me your raw files so I can try to reproduce the swap?

Alejandro

ADD REPLY • link 8.6 years ago Alejandro Reyes ★ 1.9k

0

Entering edit mode

Thank you!

At which email address should I send them?

ADD REPLY • link 8.6 years ago maximinio • 0

0

Entering edit mode

alejandro.reyes.ds at gmail please!

ADD REPLY • link 8.6 years ago Alejandro Reyes ★ 1.9k