DESeq2 for differential expression between species
1
0
Entering edit mode
Emma • 0
@emma-20891
Last seen 5.0 years ago

I am trying to use DESeq2 to identify differentially expressed genes across different rodent species as part of a larger project on rodent evolution. I have carried out alignment using STAR, and generated my counts with HTSeq. I am only interested in one-to-one orthologues that span all of the rodent species I am working with, so these count files have been amended to show the mouse orthologue gene ID for their respective genes, but the format of the file remains the same, with ID on the left and count on the right.

Here is what I have at the moment:

directory = "/Users/emma/Desktop/Differential Expression Analysis/Data"
sampleFiles = list.files(directory)
sampleName = unlist(strsplit(sampleFiles, "Ortho.txt", fixed = TRUE))
condition = strsplit(sampleName,  "^[^_]*(?:_[^_]*){0}\\K_", perl=TRUE)
species = sapply(condition, "[[", 3) 
fileInfo = data.frame(sampleName, sampleFiles, species)

ddsHTSeq = DESeqDataSetFromHTSeqCount(sampleTable = fileInfo, directory = directory, design= ~ species)

Which gives me the following fileInfo table:

          sampleName                sampleFiles species
1  SRR594397_1_Mouse SRR594397_1_MouseOrtho.txt   Mouse
2  SRR594397_2_Mouse SRR594397_2_MouseOrtho.txt   Mouse
3  SRR594405_1_Mouse SRR594405_1_MouseOrtho.txt   Mouse
4  SRR594405_2_Mouse SRR594405_2_MouseOrtho.txt   Mouse
5  SRR594414_1_Mouse SRR594414_1_MouseOrtho.txt   Mouse
6  SRR594414_2_Mouse SRR594414_2_MouseOrtho.txt   Mouse
7    SRR594423_1_Rat   SRR594423_1_RatOrtho.txt     Rat
8    SRR594423_2_Rat   SRR594423_2_RatOrtho.txt     Rat
9    SRR594432_1_Rat   SRR594432_1_RatOrtho.txt     Rat
10   SRR594432_2_Rat   SRR594432_2_RatOrtho.txt     Rat
11   SRR594441_1_Rat   SRR594441_1_RatOrtho.txt     Rat
12   SRR594441_2_Rat   SRR594441_2_RatOrtho.txt     Rat

But the following errors when I try to create ddsHTSeq:

Error in Ops.factor(a$V1, l[[1]]$V1) : 
  level sets of factors are different
In addition: Warning messages:
1: In `==.default`(a$V1, l[[1]]$V1) :
  longer object length is not a multiple of shorter object length
2: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
3: In `==.default`(a$V1, l[[1]]$V1) :
  longer object length is not a multiple of shorter object length
4: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length
5: In `==.default`(a$V1, l[[1]]$V1) :
  longer object length is not a multiple of shorter object length
6: In is.na(e1) | is.na(e2) :
  longer object length is not a multiple of shorter object length

I have tried a number of ways, and read many documents and forum posts online, but can't seem to get past this stage. So I am just trying to work out if there is a problem with what I am doing within RStudio, or whether there is a bigger problem that I am missing in my experimental design. Being new to DESeq2 and bioinformatics in general, I am in way over my head! Any help understanding what do do with this error would be very much appreciated.

deseq2 rna-seq • 800 views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 20 hours ago
United States

Likely what is happening is that the files are not comparable. This importer requires that the files have been counted across the same set of genes.

You can also construct the matrix yourself using base R and then provide it to DESeqDataSetFromMatrix

ADD COMMENT
0
Entering edit mode

Thank you for your help. I did what you suggested, and everything is working fine.

ADD REPLY

Login before adding your answer.

Traffic: 693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6