Question: GFOLD file as input for DESeq2
0
gravatar for alantb_cederj
4.6 years ago by
United States
alantb_cederj0 wrote:

I am trying to do some analysis with DESEq2. I can run the analysis and get my results. However, I am concern about the table I have been using to input the read counting data into DESeq2. The read counts was done by another person using GFOLD. The GFOLD output is a table with Gene Symbol, Gene Name, Read Count, Exon Length, and RPKM. I deleted the unwanted rows and created a .txt file with Gene Symbol and Read Count to use as input to DESeq2. Is is a proper way to run the DESeq2 analysis, or I should count the reads again using one of the packages suggested in the "Begginer's guide to using the DESeq2 package".

Thanks a lot.

deseq2 • 1.7k views
ADD COMMENTlink modified 4.6 years ago • written 4.6 years ago by alantb_cederj0
Answer: GFOLD file as input for DESeq2
0
gravatar for Michael Love
4.6 years ago by
Michael Love23k
United States
Michael Love23k wrote:

These files should be fine if you want to hack it together in R. Make sure that the gene symbols are in the same order in each file, and
that the colData sample information matches the columns of your count matrix. 

If you want to use the R packages mentioned in the beginner's guide, they are quite easy as well. I recommend summarizeOverlaps from the GenomicAlignments package or featureCounts from the Rsubread package.

ADD COMMENTlink written 4.6 years ago by Michael Love23k

Thanks a lot Michael

ADD REPLYlink written 4.6 years ago by alantb_cederj0
Answer: GFOLD file as input for DESeq2
0
gravatar for Dario Strbenac
4.6 years ago by
Dario Strbenac1.4k
Australia
Dario Strbenac1.4k wrote:

If you have no biological replicates, then it is not worth the effort of reformatting the data and using DESeq2 to analyse it. Simply report the results of GFOLD.

ADD COMMENTlink written 4.6 years ago by Dario Strbenac1.4k
Answer: GFOLD file as input for DESeq2
0
gravatar for alantb_cederj
4.6 years ago by
United States
alantb_cederj0 wrote:

Hi Dario, I have 2 replicates for each treatment.

ADD COMMENTlink written 4.6 years ago by alantb_cederj0
Answer: GFOLD file as input for DESeq2
0
gravatar for alantb_cederj
4.6 years ago by
United States
alantb_cederj0 wrote:

I really do not have any experience with R, and basically I am going through some tutorials, reading the manuals and papers about this subject.... and it has been hard for me to understand all steps of the analysis...

I have a simple experiment with control and exposure. For each treatment I have 2 replicates. As I already mentioned, I got the output of the GFOLD, which is a table with 5 columns: Gene Symbol, Gene Name, Read Count, Exon Length, and RPKM. I deleted the unwanted rows and created a new .txt file with Gene Symbol and Read Count. This table has NO headers...

I also created a sample table, which is a .txt file with 3 columns: sample name, file name, and condition.

Following this tutorial "http://dwheelerau.com/2014/02/17/how-to-use-deseq2-to-analyse-rnaseq-data/"  I got this script:

 

library('DESeq2')

directory<-"/Users/Alan/Documents/NGS_Data"

sampleFiles<-c("sample1.csv", "sample2.csv", "sample3.csv", "sample4.csv")

sampleCondition<-c("untreated", "untreated", "treated","treated")

sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)

ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)

ddsHTSeq

colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c("untreated","treated"))

dds<-DESeq(ddsHTSeq)

res<-results(dds)

res<-res[order(res$padj),]

head(res)

plotMA(dds,ylim=c(-2,2),main="DESeq2")

dev.copy(png,"deseq2_MAplot.png")

dev.off()

mcols(res,use.names=TRUE)

write.csv(as.data.frame(res),file="results_deseq2.csv")

 

It runs fine and I can get the table. The big question is: Is this a proper way to analyze my data? Because I have no experience with DESeq2 and very little knowledge about R, I am concern about making errors, being unable to detect it, and generating a fake result. Another thing that bugs me is the facts that there are a few different ways to generate count tables to input in R, and I do not know if "DESeqDataSetFromHTSeqCount" is the best call for me.

Thanks in advance,

Alan

ADD COMMENTlink written 4.6 years ago by alantb_cederj0

hi Alan,

The other ways to generate count matrices involve having access to BAM files. You haven't mentioned if you have access to these.

Nevertheless, the DESeqDataSetFromHTSeqCount works on the output from htseq-count, which are files - one for each sample - with two columns: gene ID and count. So if that's what you've created, then it should work for you.

ADD REPLYlink written 4.6 years ago by Michael Love23k

It seems I am good to go... Wonderful..

Thanks a lot.

Alan

ADD REPLYlink written 4.6 years ago by alantb_cederj0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 236 users visited in the last hour