Question

GFOLD file as input for DESeq2

0

Entering edit mode

alantb_cederj • 0

@alantb_cederj-6768

Last seen 2.1 years ago

United States

I am trying to do some analysis with DESEq2. I can run the analysis and get my results. However, I am concern about the table I have been using to input the read counting data into DESeq2. The read counts was done by another person using GFOLD. The GFOLD output is a table with Gene Symbol, Gene Name, Read Count, Exon Length, and RPKM. I deleted the unwanted rows and created a .txt file with Gene Symbol and Read Count to use as input to DESeq2. Is is a proper way to run the DESeq2 analysis, or I should count the reads again using one of the packages suggested in the "Begginer's guide to using the DESeq2 package".

Thanks a lot.

deseq2 • 3.8k views

ADD COMMENT • link 11.3 years ago alantb_cederj • 0

score 0 · Answer 1 · 2014-09-22

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

These files should be fine if you want to hack it together in R. Make sure that the gene symbols are in the same order in each file, and
that the colData sample information matches the columns of your count matrix.

If you want to use the R packages mentioned in the beginner's guide, they are quite easy as well. I recommend summarizeOverlaps from the GenomicAlignments package or featureCounts from the Rsubread package.

ADD COMMENT • link 11.3 years ago Michael Love 43k

0

Entering edit mode

Thanks a lot Michael

ADD REPLY • link 11.3 years ago alantb_cederj • 0

score 0 · Answer 2 · 2014-09-22

0

Entering edit mode

Dario Strbenac ★ 1.6k

@dario-strbenac-5916

Last seen 25 days ago

Australia

If you have no biological replicates, then it is not worth the effort of reformatting the data and using DESeq2 to analyse it. Simply report the results of GFOLD.

ADD COMMENT • link 11.3 years ago Dario Strbenac ★ 1.6k

score 0 · Answer 3 · 2014-09-23

0

Entering edit mode

alantb_cederj • 0

@alantb_cederj-6768

Last seen 2.1 years ago

United States

Hi Dario, I have 2 replicates for each treatment.

ADD COMMENT • link 11.3 years ago alantb_cederj • 0

score 0 · Answer 4 · 2014-09-23

I really do not have any experience with R, and basically I am going through some tutorials, reading the manuals and papers about this subject.... and it has been hard for me to understand all steps of the analysis...

I have a simple experiment with control and exposure. For each treatment I have 2 replicates. As I already mentioned, I got the output of the GFOLD, which is a table with 5 columns: Gene Symbol, Gene Name, Read Count, Exon Length, and RPKM. I deleted the unwanted rows and created a new .txt file with Gene Symbol and Read Count. This table has NO headers...

I also created a sample table, which is a .txt file with 3 columns: sample name, file name, and condition.

Following this tutorial "http://dwheelerau.com/2014/02/17/how-to-use-deseq2-to-analyse-rnaseq-data/" I got this script:

library('DESeq2')

directory<-"/Users/Alan/Documents/NGS_Data"

sampleFiles<-c("sample1.csv", "sample2.csv", "sample3.csv", "sample4.csv")

sampleCondition<-c("untreated", "untreated", "treated","treated")

sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)

ddsHTSeq

colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("untreated","treated"))

dds<-DESeq(ddsHTSeq)

res<-results(dds)

res<-res[order(res$padj),]

head(res)

plotMA(dds,ylim=c(-2,2),main="DESeq2")

dev.copy(png,"deseq2_MAplot.png")

dev.off()

mcols(res,use.names=TRUE)

write.csv(as.data.frame(res),file="results_deseq2.csv")

It runs fine and I can get the table. The big question is: Is this a proper way to analyze my data? Because I have no experience with DESeq2 and very little knowledge about R, I am concern about making errors, being unable to detect it, and generating a fake result. Another thing that bugs me is the facts that there are a few different ways to generate count tables to input in R, and I do not know if "DESeqDataSetFromHTSeqCount" is the best call for me.

Thanks in advance,

Alan