Question

Inverted Log2FC values after DEseq2 analysis

0

Entering edit mode

Iwan • 0

@505d46ae

Last seen 2.9 years ago

Sweden

Hello. I am running DEseq2 analysis on published RNAseq data consisting of 4 control samples and 4 cKO samples. However my log2FC values are inverted compared to the original publication. I think the issue is with how I define the control group (#set factor level) as the plotted data is always appears the same irrespective of what group I select (Control or cKO). I am struggling to find a solution here as I am new to DEseq2 so I am reaching out for some help. I attach the volcano plot images for comparison and my script. Thanks in advance.

Published volcano Script volcano

#load libraries
library(dplyr)
library(tidyverse)
library(GEOquery)

#prepare data
reads <-read.csv(file = "GSE172457.csv")
colnames(reads) [1] <- "gene"
colnames(reads) [2] <- "GSM5257109"
colnames(reads) [3] <- "GSM5257110"
colnames(reads) [4] <- "GSM5257111"
colnames(reads) [5] <- "GSM5257112"
colnames(reads) [6] <- "GSM5257113"
colnames(reads) [7] <- "GSM5257114"
colnames(reads) [8] <- "GSM5257115"
colnames(reads) [9] <- "GSM5257116"
reads <- reads %>% separate(gene, into = c("ENSEMBL", "gene","gene_biotype"), convert = TRUE)
reads <- reads[, -c(2,3)]
rownames(reads) <- reads$ENSEMBL
reads <- reads[, -c(1)]

#get metadata
gse<- getGEO(GEO = 'GSE172457', GSEMatrix = TRUE)
metadata <- pData(phenoData(gse[[1]]))

#select column subsets from metadata and rename columns
metadata.subset <- metadata %>%
  select(1) %>%
  rename(genotype = title) %>%
  mutate(genotype = gsub("replicate 1", "", genotype)) %>%
  mutate(genotype = gsub("replicate 2", "", genotype)) %>%
  mutate(genotype = gsub("replicate 3", "", genotype)) %>%
  mutate(genotype = gsub("replicate 4", "", genotype))

#check sample information
all(colnames(reads) %in% rownames(metadata.subset))
all(colnames(reads) == rownames(metadata.subset))

#construct a DESeqDataSet
library(DESeq2)
dataset <- DESeqDataSetFromMatrix(countData = reads,
                                  colData = metadata.subset,
                                  design = ~ genotype)

#filtering data
#only keep rows that have at least 20 reads
dataset.filter <- rowSums(counts(dataset)) >= 20
dataset <- dataset[dataset.filter,]

#set factor level
metadata.subset$genotype <- as.factor(metadata.subset$genotype)
metadata.subset$genotype <- relevel(metadata.subset$genotype, ref = "Control ")

#run DEseq with p-value of 0.0001
dataset <- DESeq(dataset)
results <- results(dataset, alpha = 0.0001)
summary(results)

#load libraries for volcano plotting
library(org.Mm.eg.db)
library(EnhancedVolcano)

#convert results to a dataframe and add gene names
results.df <- as.data.frame(results)
results.df$symbol <- mapIds(org.Mm.eg.db, keys = rownames(results.df), keytype = "ENSEMBL", column = "SYMBOL")

#volcano plot
EnhancedVolcano(results.df, x = "log2FoldChange", y = "padj", lab = results.df$symbol,
                pCutoff = 1e-4, FCcutoff = 1)


> sessionInfo( )
R version 4.2.2 (2022-10-31)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.0.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] EnhancedVolcano_1.16.0      ggrepel_0.9.2               org.Mm.eg.db_3.16.0         AnnotationDbi_1.60.0        DESeq2_1.38.2              
 [6] SummarizedExperiment_1.28.0 MatrixGenerics_1.10.0       matrixStats_0.63.0          GenomicRanges_1.50.2        GenomeInfoDb_1.34.4        
[11] IRanges_2.32.0              S4Vectors_0.36.1            GEOquery_2.66.0             Biobase_2.58.0              BiocGenerics_0.44.0        
[16] forcats_0.5.2               stringr_1.5.0               purrr_1.0.0                 readr_2.1.3                 tidyr_1.2.1                
[21] tibble_3.1.8                ggplot2_3.4.0               tidyverse_1.3.2             dplyr_1.0.10                biomaRt_2.54.0             

loaded via a namespace (and not attached):
 [1] bitops_1.0-7           fs_1.5.2               lubridate_1.9.0        bit64_4.0.5            RColorBrewer_1.1-3     filelock_1.0.2        
 [7] progress_1.2.2         httr_1.4.4             tools_4.2.2            backports_1.4.1        utf8_1.2.2             R6_2.5.1              
[13] DBI_1.1.3              colorspace_2.0-3       withr_2.5.0            tidyselect_1.2.0       prettyunits_1.1.1      bit_4.0.5             
[19] curl_4.3.3             compiler_4.2.2         cli_3.5.0              rvest_1.0.3            xml2_1.3.3             DelayedArray_0.24.0   
[25] labeling_0.4.2         scales_1.2.1           rappdirs_0.3.3         digest_0.6.31          R.utils_2.12.2         XVector_0.38.0        
[31] pkgconfig_2.0.3        dbplyr_2.2.1           fastmap_1.1.0          limma_3.54.0           rlang_1.0.6            readxl_1.4.1          
[37] rstudioapi_0.14        RSQLite_2.2.20         farver_2.1.1           generics_0.1.3         jsonlite_1.8.4         BiocParallel_1.32.5   
[43] R.oo_1.25.0            googlesheets4_1.0.1    RCurl_1.98-1.9         magrittr_2.0.3         GenomeInfoDbData_1.2.9 Matrix_1.5-3          
[49] Rcpp_1.0.9             munsell_0.5.0          fansi_1.0.3            R.methodsS3_1.8.2      lifecycle_1.0.3        stringi_1.7.8         
[55] zlibbioc_1.44.0        BiocFileCache_2.6.0    grid_4.2.2             blob_1.2.3             parallel_4.2.2         crayon_1.5.2          
[61] lattice_0.20-45        Biostrings_2.66.0      haven_2.5.1            annotate_1.76.0        hms_1.1.2              KEGGREST_1.38.0       
[67] locfit_1.5-9.6         pillar_1.8.1           geneplotter_1.76.0     codetools_0.2-18       reprex_2.0.2           XML_3.99-0.13         
[73] glue_1.6.2             data.table_1.14.6      modelr_0.1.10          png_0.1-8              vctrs_0.5.1            tzdb_0.3.0            
[79] cellranger_1.1.0       gtable_0.3.1           assertthat_0.2.1       cachem_1.0.6           xtable_1.8-4           broom_1.0.2           
[85] googledrive_2.0.0      gargle_1.2.1           memoise_2.0.1          timechange_0.1.1       ellipsis_0.3.2        
>

DESeq2 Log2FC RNASeqData • 2.6k views

ADD COMMENT • link 3.0 years ago Iwan • 0

score 0 · Answer 1 · 2023-01-05

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 11 hours ago

United States

See vignette, note on factor levels.

ADD COMMENT • link 3.0 years ago Michael Love 43k

0

Entering edit mode

Thanks for the reply, Michael. Doesn't the following line in my script define the factor level?

metadata.subset$genotype <- relevel(metadata.subset$genotype, ref = "Control ")

ADD REPLY • link 3.0 years ago Iwan • 0

1

Entering edit mode

No because you are changing the factor levels of a column of a data.frame that was already used to create the dds.

R does not connect these objects. When you make the dds the information you provide as colData is copied into the dds. Modifying the original data.frame doesn't propagate changes to the the dds.

You can modify the information in the dds with:

colData(dds)$variable <- ...
# or equivalently, the easier/shorter form:
dds$variable <- ...

ADD REPLY • link 3.0 years ago Michael Love 43k

0

Entering edit mode

Thanks again for the reply. I understand now. I set the factor levels prior to creating the dataset and it worked. Cheers!

ADD REPLY • link 3.0 years ago Iwan • 0