GFOLD file as input for DESeq2
4
0
Entering edit mode
@alantb_cederj-6768
Last seen 3 months ago
United States

I am trying to do some analysis with DESEq2. I can run the analysis and get my results. However, I am concern about the table I have been using to input the read counting data into DESeq2. The read counts was done by another person using GFOLD. The GFOLD output is a table with Gene Symbol, Gene Name, Read Count, Exon Length, and RPKM. I deleted the unwanted rows and created a .txt file with Gene Symbol and Read Count to use as input to DESeq2. Is is a proper way to run the DESeq2 analysis, or I should count the reads again using one of the packages suggested in the "Begginer's guide to using the DESeq2 package".

Thanks a lot.

deseq2 • 3.0k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 16 hours ago
United States

These files should be fine if you want to hack it together in R. Make sure that the gene symbols are in the same order in each file, and
that the colData sample information matches the columns of your count matrix. 

If you want to use the R packages mentioned in the beginner's guide, they are quite easy as well. I recommend summarizeOverlaps from the GenomicAlignments package or featureCounts from the Rsubread package.

ADD COMMENT
0
Entering edit mode

Thanks a lot Michael

ADD REPLY
0
Entering edit mode
Dario Strbenac ★ 1.5k
@dario-strbenac-5916
Last seen 23 hours ago
Australia

If you have no biological replicates, then it is not worth the effort of reformatting the data and using DESeq2 to analyse it. Simply report the results of GFOLD.

ADD COMMENT
0
Entering edit mode
@alantb_cederj-6768
Last seen 3 months ago
United States

Hi Dario, I have 2 replicates for each treatment.

ADD COMMENT
0
Entering edit mode
@alantb_cederj-6768
Last seen 3 months ago
United States

I really do not have any experience with R, and basically I am going through some tutorials, reading the manuals and papers about this subject.... and it has been hard for me to understand all steps of the analysis...

I have a simple experiment with control and exposure. For each treatment I have 2 replicates. As I already mentioned, I got the output of the GFOLD, which is a table with 5 columns: Gene Symbol, Gene Name, Read Count, Exon Length, and RPKM. I deleted the unwanted rows and created a new .txt file with Gene Symbol and Read Count. This table has NO headers...

I also created a sample table, which is a .txt file with 3 columns: sample name, file name, and condition.

Following this tutorial "http://dwheelerau.com/2014/02/17/how-to-use-deseq2-to-analyse-rnaseq-data/"  I got this script:

 

library('DESeq2')

directory<-"/Users/Alan/Documents/NGS_Data"

sampleFiles<-c("sample1.csv", "sample2.csv", "sample3.csv", "sample4.csv")

sampleCondition<-c("untreated", "untreated", "treated","treated")

sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)

ddsHTSeq

colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("untreated","treated"))

dds<-DESeq(ddsHTSeq)

res<-results(dds)

res<-res[order(res$padj),]

head(res)

plotMA(dds,ylim=c(-2,2),main="DESeq2")

dev.copy(png,"deseq2_MAplot.png")

dev.off()

mcols(res,use.names=TRUE)

write.csv(as.data.frame(res),file="results_deseq2.csv")

 

It runs fine and I can get the table. The big question is: Is this a proper way to analyze my data? Because I have no experience with DESeq2 and very little knowledge about R, I am concern about making errors, being unable to detect it, and generating a fake result. Another thing that bugs me is the facts that there are a few different ways to generate count tables to input in R, and I do not know if "DESeqDataSetFromHTSeqCount" is the best call for me.

Thanks in advance,

Alan

ADD COMMENT
0
Entering edit mode

hi Alan,

The other ways to generate count matrices involve having access to BAM files. You haven't mentioned if you have access to these.

Nevertheless, the DESeqDataSetFromHTSeqCount works on the output from htseq-count, which are files - one for each sample - with two columns: gene ID and count. So if that's what you've created, then it should work for you.

ADD REPLY
0
Entering edit mode

It seems I am good to go... Wonderful..

Thanks a lot.

Alan

ADD REPLY

Login before adding your answer.

Traffic: 788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6