Question: Getting different results with DESeq2
0
gravatar for Lillyhchen
4.2 years ago by
Lillyhchen10
United States
Lillyhchen10 wrote:

Hi I'm having an issue with DESeq2 results.  I'm working with 2 cell lines from different patients and originally I created a .csv excel file with both cell lines in it and then contrasted the results from the DESEq Analysis:

setwd("C:/Users/lchen/Documents/transfer.lilly/TMZ.052215")
sample=read.csv("sample.csv", header=T)
#Get the paths to the file
#Make sure the sample names in the sample.csv match up with the end path text
path=paste(getwd(),"/",sample$sample,".sorted.rmdup.bam",sep="")
library(GenomicRanges)
#Clump the bam files into a BamFileList with the same samples as listed in the .csv
list=BamFileList(path, index=character())
library(GenomicFeatures)
#Obtain Reference genome from UCSC 
Txdb=makeTxDbFromUCSC(
  genome="hg19",
  tablename="ensGene",
  transcript_ids=NULL,
  circ_seqs=DEFAULT_CIRC_SEQS,
  url="http://genome.ucsc.edu/cgi-bin/",
  goldenPath_url="http://hgdownload.cse.ucsc.edu/goldenPath",
  miRBaseBuild=NA)
#Sort the exons of the Txdb by gene
exons<-exonsBy(Txdb, by="gene")
#SummarizeOverlaps to count unique reads
library(BiocParallel)
se <- summarizeOverlaps(features=exons, reads=list,
                        mode="Union",
                        singleEnd=FALSE,
                        ignore.strand=FALSE,
                        fragments=TRUE, BPPARAM = SerialParam())
#Add column as listed in sample.csv
colData(se)$group= sample$group
#Run DESeq2 with variables
library(DESeq2)
dds<-DESeqDataSet(se, design=~group)
dds<-DESeq(dds)
#Results that contrast the Control and Treated groups
res<-results(dds, contrast=c("group", "HF2303.Control", "HF2303.TMZ"))
#Match Gene Names to the Ensembl IDs from the Txdb
res$ensembl <- sapply( strsplit( rownames(res), split="\\+" ), "[", 1 )
library( "biomaRt" )
ensembl = useMart( "ensembl", dataset = "hsapiens_gene_ensembl" )
genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"),
                  filters = "ensembl_gene_id",
                  values = res$ensembl,
                  mart = ensembl )
idx <- match( res$ensembl, genemap$ensembl_gene_id )
res$entrez <- genemap$entrezgene[ idx ]
res$hgnc_symbol <- genemap$hgnc_symbol[ idx ]
#Create a MAPlot and Dispersion Estimate Plot
plotMA(res, ylim=c(-1, 1), main="DESeq2")
plotDispEsts(dds)
#Filter by subsetting results for padj<0.1 and log2FoldChange>=2
res=as.data.frame(res)
sig=subset(res, padj<=0.1)
sig=subset(res, abs(log2FoldChange)>=2)
#Save sig and gene list as csv file
write.csv(sig, "sig.HF2303.csv")
gene=unique(sig$hgnc_symbol)
gene=gene[!is.na(gene)]
write.csv(gene, "gene.HF2303.csv")

The code runs and gives me about 81 significant genes

However, when I separate the 2 cell lines into separate 2 .csv excel files (one for each cell line) and run the same code, I only get about 5 significant genes.  At first I thought it would be better to work the cell lines separately, but I didn't think that it would make a huge difference if only the cell lines were contrasted in the results, so I'm confused why I am getting different results.  

Here is a code of when I separated the 2 cell lines into 2 excel files.

setwd("C:/Users/lchen/Documents/transfer.lilly/TMZ.052215")
sample=read.csv("HF2927.csv", header=T)
#Get the paths to the file
#Make sure the sample names in the sample.csv match up with the end path text
path=paste(getwd(),"/",sample$sample,".HF2927.sorted.rmdup.bam",sep="")
library(GenomicRanges)
#Clump the bam files into a BamFileList with the same samples as listed in the .csv
list=BamFileList(path, index=character())
library(GenomicFeatures)
#Obtain Reference genome from UCSC 
Txdb=makeTxDbFromUCSC(
  genome="hg19",
  tablename="ensGene",
  transcript_ids=NULL,
  circ_seqs=DEFAULT_CIRC_SEQS,
  url="http://genome.ucsc.edu/cgi-bin/",
  goldenPath_url="http://hgdownload.cse.ucsc.edu/goldenPath",
  miRBaseBuild=NA)
#Sort the exons of the Txdb by gene
exons<-exonsBy(Txdb, by="gene")
#SummarizeOverlaps to count unique reads
library(BiocParallel)
se <- summarizeOverlaps(features=exons, reads=list,
                        mode="Union",
                        singleEnd=FALSE,
                        ignore.strand=FALSE,
                        fragments=TRUE, BPPARAM = SerialParam())
#Add column as listed in sample.csv
colData(se)$group= sample$group
#Run DESeq2 with variables
library(DESeq2)
dds<-DESeqDataSet(se, design=~group)
dds<-DESeq(dds)
#Results that contrast the Control and Treated groups
res<-results(dds, contrast=c("group", "Control", "TMZ"))
#Match Gene Names to the Ensembl IDs from the Txdb
res$ensembl <- sapply( strsplit( rownames(res), split="\\+" ), "[", 1 )
library( "biomaRt" )
ensembl = useMart( "ensembl", dataset = "hsapiens_gene_ensembl" )
genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"),
                  filters = "ensembl_gene_id",
                  values = res$ensembl,
                  mart = ensembl )
idx <- match( res$ensembl, genemap$ensembl_gene_id )
res$entrez <- genemap$entrezgene[ idx ]
res$hgnc_symbol <- genemap$hgnc_symbol[ idx ]
#Create a MAPlot and Dispersion Estimate Plot
plotMA(res, ylim=c(-1, 1), main="DESeq2")
plotDispEsts(dds)
#Filter by subsetting results for padj<0.1 and log2FoldChange>=2
res=as.data.frame(res)
sig=subset(res, padj<=0.1)
sig=subset(res, abs(log2FoldChange)>=2)
#Save sig and gene list as csv file
write.csv(sig, "sig.HF2927.csv")
gene=unique(sig$hgnc_symbol)
gene=gene[!is.na(gene)]
write.csv(gene, "gene.HF2927.csv")

 

Any help or tips as to why would be very appreciated! Thank you!

 

deseq2 rstudio results • 1.3k views
ADD COMMENTlink modified 4.2 years ago by Michael Love25k • written 4.2 years ago by Lillyhchen10
Answer: Getting different results with DESeq2
0
gravatar for Michael Love
4.2 years ago by
Michael Love25k
United States
Michael Love25k wrote:

"At first I thought it would be better to work the cell lines separately, but I didn't think that it would make a huge difference if only the cell lines were contrasted in the results, so I'm confused why I am getting different results."

I can't tell exactly from your post, but just to make clear: one time you had four groups in the DESeqDataSet, and another time you had two groups in two separate DESeqDataSets?

Yes, it is expected that this will change the results.

The estimation steps for the model parameters look at all the samples in the DESeqDataSet. This often helps to have all the samples from all groups, because it improves the accuracy estimates (more residual degrees of freedom).

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Michael Love25k

Sorry for the confusion, but yes that's what I'm working with. Basically, I have two tumor cell lines each from a different patient, and each one of the tumor cell lines has a control and treated group, so as you were saying that would be 4 groups all together when I created a .csv file with both tumor cell lines.  I'm trying to look at the changes in gene expression that occur in each tumor cell line between untreated and those treated with Treatment A.

That's when i tried to just work with the two tumor cell lines separately.  Is there a way to contrast the tumor cell lines when working with samples from all the groups?  I've been trying to edit the design when creating a DESeqDataSet but i've been getting an error saying that "-- standard model matrices are used for factors with two levels and an interaction,
   where the main effects are for the reference level of other factors.
   see the 'Interactions' section of the vignette for more details: vignette('DESeq2')"

Here's the code i've been working with:

> exons<-exonsBy(Txdb, by="gene")
> library(BiocParallel)
> se <- summarizeOverlaps(features=exons, reads=list,
+                         mode="Union",
+                         singleEnd=FALSE,
+                         ignore.strand=FALSE,
+                         fragments=TRUE, BPPARAM = SerialParam())
> colData(se)=DataFrame(sample)
> library(DESeq2)
Loading required package: Rcpp
Loading required package: RcppArmadillo
> dds<-DESeqDataSet(se, design=~Cell + Treat + Cell:Treat)
> #Relevel Treat column so that Control is first
> dds$Treat <- relevel(dds$Treat, "Cont")
> #Run DESeq pipeline
> dds<-DESeq(dds)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- standard model matrices are used for factors with two levels and an interaction,
   where the main effects are for the reference level of other factors.
   see the 'Interactions' section of the vignette for more details: vignette('DESeq2')

 

ADD REPLYlink written 4.2 years ago by Lillyhchen10

answer posted on the new post:

A: Running DESeqDataSet with 4 factors?

ADD REPLYlink written 4.2 years ago by Michael Love25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 252 users visited in the last hour