I really do not have any experience with R, and basically I am going through some tutorials, reading the manuals and papers about this subject.... and it has been hard for me to understand all steps of the analysis...
I have a simple experiment with control and exposure. For each treatment I have 2 replicates. As I already mentioned, I got the output of the GFOLD, which is a table with 5 columns: Gene Symbol, Gene Name, Read Count, Exon Length, and RPKM. I deleted the unwanted rows and created a new .txt file with Gene Symbol and Read Count. This table has NO headers...
I also created a sample table, which is a .txt file with 3 columns: sample name, file name, and condition.
Following this tutorial "http://dwheelerau.com/2014/02/17/how-to-use-deseq2-to-analyse-rnaseq-data/" I got this script:
sampleFiles<-c("sample1.csv", "sample2.csv", "sample3.csv", "sample4.csv")
sampleCondition<-c("untreated", "untreated", "treated","treated")
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory=directory, design=~condition)
It runs fine and I can get the table. The big question is: Is this a proper way to analyze my data? Because I have no experience with DESeq2 and very little knowledge about R, I am concern about making errors, being unable to detect it, and generating a fake result. Another thing that bugs me is the facts that there are a few different ways to generate count tables to input in R, and I do not know if "DESeqDataSetFromHTSeqCount" is the best call for me.
Thanks in advance,