I have a very large dataset which I downloaded from TCGA for pancreatic cancer. It has one normal sample but 184 patient samples. So, It has 186 columns (no technical replicates) and 20500 rows for all the genes. I am trying to get log2fold change data from this using deseq2. To begin my analysis I have extracted four patient samples(raw reads) and the normal control and now have 5 separate files. I tried using both .txt and .csv file formats with the following codes:
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory='/Users/dorothy/Desktop/PAAD/', design=~condition)
But, once I get to the ddsHTSeq function, it shows the following error: Error in Ops.factor(a$V1, l[]$V1) :
level sets of factors are different.
Is there a way I can solve this?
Also, Is it possible to perform ddHTSeq function where I can tell R to identify the columns as separate samples. In short, I do not want to have a separate metadata file that contains information about the columns.
Thanks in advance!