DESEq2 rerun gives different results
2
0
Entering edit mode
Mia • 0
@8a01aee2
Last seen 19 months ago
South Africa

Hi everyone

I'm experiencing a problem with DESEq2. I downloaded matched tumour-normal cancer data from the TCGA (111 patients, 222 samples), and am attempting to perform differential expression. The ultimate aim of my project is to use machine learning to identify a select few genes that can distinguish between normal and cancerous tissue, for the purpose of biomarker discovery.

But as a preliminary, I wanted to use DESeq2 to visualise clustering between tumour and normal samples using all coding genes (i.e. ~30,000 genes).

My code:

# read files
countData <- read_tsv("chosen_TCGA_gene_count_matrix.tsv")
colData <- read_tsv("TCGA_chosen_colData.tsv")

#make sure factors are factors
colData$patient <- as.factor(colData$patient)
colData$sample_type <- as.factor(colData$sample_type)

countData.mat <- as.matrix(countData)

## make dds 
dds.tcga<- DESeqDataSetFromMatrix(countData = countData.mat,
                                     colData = colData, design = ~ patient + sample_type)


# re-level so that base level is Normal tissue
dds.tcga$sample_type <- relevel(dds.tcga$sample_type, ref = "Solid Tissue Normal")

#filter out low count genes
keep <- rowSums((counts(dds.tcga)) >= 10) >=3
dds.tcga <- dds.tcga[keep,]


# run DESeq2
dds.tcga <- DESeq(dds.tcga)


# vst transform
vst <- vst(dds.tcga)

#plot 
plotPCA(vst, intgroup=c("sample_type"), ntop = 200)

An example of my metadata: Coldata

This worked fine a week ago (Good clustering), and clustering was perfect. However, I was re-running the exact same code today to double check that everything worked fine, and now there is absolutely no clustering Bad clustering.

Luckily, I saved the dds object that I made a week ago, so I compared the new one vs the old one, and they are identical in terms of the samples and genes used to construct colData and countData, as well as the vst assays after DESeq2 has run.

i.e.

#all samples are the same
all(dds.old$sample %in% dds.new$sample) #TRUE

assay.old <- assay(vst.old)
assay.new <- assay(vst.new)

#gene count transformations done by DESeq2 are the same
identical (assay.old, assay.new) #TRUE

#all size factors are the same
all(sizeFactors(dds.old) == sizeFactors(dds.new)) #TRUE


# but Mu, Cooks distance, and dispersions are not the same somehow
identical(assays(dds.old)[["mu"]],assays(dds.new)[["mu"]]) #FALSE

identical(assays(dds.old)[["cooks"]],assays(dds.new)[["cooks"]]) #FALSE

identical(dispersions(dds.new),dispersions(dds.old)) #FALSE

So it seems the input values are the same, but the inner workings of DESeq2 was maybe different this time?

Another strange thing, is that the results from the two runs are very different. In the old run, some of the genes had p-adjusted values of < 0.05, whereas many of the new run's p-adjusted values were close to 1.

Example of the first few genes shown below:

DESEq2 results from two runs

As you can see here, the basemean values are the same, but somehow the LFC and p's are way different...

I didn't update my version of R or of DESEq2. I was wondering whether anyone could help me out?

R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DescTools_0.99.45           readr_2.1.2                 readxl_1.4.0               
 [4] EnhancedVolcano_1.14.0      ggrepel_0.9.1               apeglm_1.18.0              
 [7] dplyr_1.0.9                 org.Hs.eg.db_3.15.0         AnnotationDbi_1.58.0       
[10] ggplot2_3.3.6               RColorBrewer_1.1-3          pheatmap_1.0.12            
[13] tibble_3.1.8                biomaRt_2.52.0              DESeq2_1.36.0              
[16] SummarizedExperiment_1.26.1 Biobase_2.56.0              MatrixGenerics_1.8.1       
[19] matrixStats_0.62.0          GenomicRanges_1.48.0        GenomeInfoDb_1.32.2        
[22] IRanges_2.30.0              S4Vectors_0.34.0            BiocGenerics_0.42.0        

loaded via a namespace (and not attached):
 [1] colorspace_2.0-3       ellipsis_0.3.2         class_7.3-20           XVector_0.36.0        
 [5] gld_2.6.5              rstudioapi_0.13        proxy_0.4-27           farver_2.1.1          
 [9] bit64_4.0.5            fansi_1.0.3            mvtnorm_1.1-3          xml2_1.3.3            
[13] codetools_0.2-18       splines_4.2.0          cachem_1.0.6           rootSolve_1.8.2.3     
[17] geneplotter_1.74.0     annotate_1.74.0        dbplyr_2.2.1           png_0.1-7             
[21] compiler_4.2.0         httr_1.4.3             assertthat_0.2.1       Matrix_1.4-1          
[25] fastmap_1.1.0          cli_3.3.0              prettyunits_1.1.1      tools_4.2.0           
[29] coda_0.19-4            gtable_0.3.0           glue_1.6.2             lmom_2.9              
[33] GenomeInfoDbData_1.2.8 rappdirs_0.3.3         Rcpp_1.0.9             bbmle_1.0.25          
[37] cellranger_1.1.0       vctrs_0.4.1            Biostrings_2.64.0      stringr_1.4.0         
[41] lifecycle_1.0.1        XML_3.99-0.10          zlibbioc_1.42.0        MASS_7.3-58.1         
[45] scales_1.2.0           vroom_1.5.7            hms_1.1.1              parallel_4.2.0        
[49] expm_0.999-6           curl_4.3.2             Exact_3.1              memoise_2.0.1         
[53] emdbook_1.3.12         bdsmatrix_1.3-6        stringi_1.7.8          RSQLite_2.2.15        
[57] genefilter_1.78.0      e1071_1.7-11           filelock_1.0.2         boot_1.3-28           
[61] BiocParallel_1.30.3    rlang_1.0.4            pkgconfig_2.0.3        bitops_1.0-7          
[65] lattice_0.20-45        purrr_0.3.4            labeling_0.4.2         bit_4.0.4             
[69] tidyselect_1.1.2       plyr_1.8.7             magrittr_2.0.3         R6_2.5.1              
[73] generics_0.1.3         DelayedArray_0.22.0    DBI_1.1.3              pillar_1.8.0          
[77] withr_2.5.0            survival_3.3-1         KEGGREST_1.36.3        RCurl_1.98-1.8        
[81] crayon_1.5.1           utf8_1.2.2             BiocFileCache_2.4.0    tzdb_0.3.0            
[85] progress_1.2.2         locfit_1.5-9.6         grid_4.2.0             data.table_1.14.2     
[89] blob_1.2.3             digest_0.6.29          xtable_1.8-4           numDeriv_2016.8-1.1   
[93] munsell_0.5.0

Thanks! Mia

DESeq2 • 837 views
ADD COMMENT
0
Entering edit mode
swbarnes2 ★ 1.3k
@swbarnes2-14086
Last seen 8 hours ago
San Diego

I think it's pretty obvious that the count data is unchanged, but the colData got changed. Figure out what happened there.

ADD COMMENT
0
Entering edit mode

Seconding this. The plot is even the same when it comes to coordinates of the points, something (on you to find out) altered colData. It is not a DESeq2 problem.

ADD REPLY
0
Entering edit mode
Mia • 0
@8a01aee2
Last seen 19 months ago
South Africa

Thank you! I fixed an issue I didn't see in my colData, and it worked :)

ADD COMMENT

Login before adding your answer.

Traffic: 551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6