Hello,
I have a very large dataset which I downloaded from TCGA for pancreatic cancer. It has one normal sample but 184 patient samples. So, It has 186 columns (no technical replicates) and 20500 rows for all the genes. I am trying to get log2fold change data from this using deseq2. To begin my analysis I have extracted four patient samples(raw reads) and the normal control and now have 5 separate files. I tried using both .txt and .csv file formats with the following codes:
library('DESeq2')
setwd("/Users/dorothy/Desktop/PAAD")
getwd()
sampleFiles<-grep ('.txt',list.files('/Users/dorothy/Desktop/PAAD'),value=TRUE)
sampleCondition<-c('control','patient','patient','patient','patient')
sampleTable<-data.frame(sampleName=sampleFiles, fileName=sampleFiles, condition=sampleCondition)
sampleFiles
sampleCondition
sampleTable
ddsHTSeq<-DESeqDataSetFromHTSeqCount(sampleTable=sampleTable, directory='/Users/dorothy/Desktop/PAAD/', design=~condition)
colData(ddsHTSeq)$condition<-factor(colData(ddsHTSeq)$condition, levels=c('control','patient'))
dds<-DESeq(ddsHTSeq)
res<-results(dds)
res<-res[order(res$padj),]
But, once I get to the ddsHTSeq function, it shows the following error: Error in Ops.factor(a$V1, l[[1]]$V1) :
level sets of factors are different.
Is there a way I can solve this?
Also, Is it possible to perform ddHTSeq function where I can tell R to identify the columns as separate samples. In short, I do not want to have a separate metadata file that contains information about the columns.
Thanks in advance!
Dorothy
For us to reproduce the problem, can you share the download link for TCGA as well as any additional steps to get your 5 sample files?