Question: DEseq normalization Small RNA-seq data using rRNA
0
2.5 years ago by
mosharrofgeb0 wrote:

Hello: I am using DEseq2 for Small RNA-seq data. I have four replicates for control and two replicates for treated. I want to see differential expression of different kinds of loci (like expression of all gypsy elements in control vs treated). But I want to normalize my two datasets using All ribosomal RNAs across the two samples (I have small RNAs generated from ribosomal RNA in both of my samples). Any idea how to do that? I am using these commands:

> countTable = read.table("ABC.txt")
> condition = factor(c("un", "un", "un", "un", "t", "t"))
> library ("DESeq")
>cds = newCountDataSet(CountTable, condition )
>estimateSizeFactors(cds[which(cds$feature=="rRNA")]) >sizeFactors( cds ) >head(counts( cds, normalized=TRUE)) But when I use: sizeFactors( cds ) command, I get this error: un1 un2 un3 un4 t1 t2 NA NA NA NA NA NA  Please help me out! ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by mosharrofgeb0 First of all, if you want to use DESeq2, you should be using DESeq2, not DESeq. They are two different packages. ADD REPLYlink written 2.5 years ago by Ryan C. Thompson7.3k Well, I am actually using DEseq. Not DEseq2. Do you think I should use DEseq2 instead of Deseq? Thanks. ADD REPLYlink written 2.5 years ago by mosharrofgeb0 If you're just starting and haven't committed to one or the other, DESeq2 is definitely the right choice. The only reason to use DESeq instead would be if you had previously performed a substantial amount of analysis using DESeq and wanted to be able to compare your new results to that analysis. ADD REPLYlink written 2.5 years ago by Ryan C. Thompson7.3k I have just started. So, I think I can use DEseq2 then. But still, I need solution for my problem! ADD REPLYlink written 2.5 years ago by mosharrofgeb0 In order to get a solution, you'll need to post code for your new analysis using DESeq2. Scanning your above code I'd guess the problem is that you didn't put a comma after the which() command. ADD REPLYlink written 2.5 years ago by Michael Love23k I am not an expert on R. I just need to get my things done! Probably I am not using the commands in the right way......I used these commands below (I have 4 untreated (un) replicates, 2 treated (t) replicates), sorry for all these silly questions: > countdata <- read.table("Dcr2.txt", header=TRUE, row.names=1) > countdata <- as.matrix(countdata) > head(countdata) un1 un2 un3 un4 t1 t2 AmnL2-1 230 216 231 210 8 7 AmnL2-2 1271 1160 1250 1193 19 13 AmnL2-3 3413 3336 3349 3136 20 20 AmnL2-4 255 234 251 213 26 25 AmnSINE1 476 438 480 451 35 30 Chap1a_Mam 3477 3456 3382 3256 94 57 > (condition <- factor(c(rep("un", 4), rep("t", 2)))) [1] un un un un t t Levels: t un > library(DESeq2) > (coldata <- data.frame(row.names=colnames(countdata), condition)) condition un1 un un2 un un3 un un4 un t1 t t2 t > dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition) > estimateSizeFactors(cds[which(cds$feature=="rRNA")])
Error in estimateSizeFactors(cds[which(cds\$feature == "rRNA")]) :
error in evaluating the argument 'object' in selecting a method for function 'estimateSizeFactors': Error: object 'cds' not found

Probably I am not using the commands in right order....

Thanks!!

The problem is what goes in the [ ] brackets. You have a 2 dimensional thing so you need to put a comma in between the [ and ].

dim(dds)

It has rows and columns, so it is 2 dimensional. See "Selecting Observations" here:
http://www.statmethods.net/management/subset.html

Do you want to normalize the whole dds using just a subset of genes, and then continue to do testing on all the genes? You could tell estimateSizeFactors() which genes to normalize with using the 'controlGenes' argument. See the help page for estimateSizeFactors.

Well, here is what I want:

My control library has 400 million reads whereas my experimental library has ~5 million reads. So, the normal R studio usage is: normalize both libraries by their relative depth (what I have understood) and use that factor for normalizing different loci (their expression). But, as my experimental library is very small,  ribosomal RNAs (which are pretty abundant in my samples) in my experimental library is very abundant compared to my control library. So, what I want is: normalize both libraries (control and experimental) first by the ribosomal RNAs (total 72 loci) followed by the relative depths of both libraries.

Here is what I have done lastly:

> countdata <- read.table("ABC.txt", header=TRUE, row.names=1)
> countdata <- as.matrix(countdata)
un1   un2   un3   un4   t1   t2
rrna1 51662 49671 49477 48694 1236 1405
rrna2 50625 49557 49793 48160 1215 1382
rrna3 50923 49879 49620 48455 1261 1340
rrna4 50955 49571 49436 48164 1287 1345
rrna5 51110 49541 49780 48422 1298 1420
rrna6 51054 49385 49431 48100 1274 1405
> (condition <- factor(c(rep("un", 4), rep("t", 2))))
[1] un un un un t  t
Levels: t un
> library(DESeq2)
> (coldata <- data.frame(row.names=colnames(countdata), condition))
condition
un1        un
un2        un
un3        un
un4        un
t1          t
t2          t
> dds <- DESeqDataSetFromMatrix(countData=countdata, colData=coldata, design=~condition)
> dds <- estimateSizeFactors(dds)
> sizeFactors(dds)
un1        un2        un3        un4         t1
3.38146929 3.30537121 3.29616968 3.21102454 0.09315123
t2
0.09538495
> dds <- estimateSizeFactors(dds, controlGenes=1:72)
Error in .local(object, ...) : unused argument (controlGenes = 1:72)
> ​

Could you please enlighten me with the commands that I should use? My file is ABC.txt and here are the first few lines from the file:

un1        un2        un3        un4        t1         t2
rrna1    51662    49671    49477    48694    1236    1405
rrna2    50625    49557    49793    48160    1215    1382
rrna3    50923    49879    49620    48455    1261    1340
rrna4    50955    49571    49436    48164    1287    1345
rrna5    51110    49541    49780    48422    1298    1420
rrna6    51054    49385    49431    48100    1274    1405
rrna7    51303    49709    49757    48208    1210    1422
rrna8    50813    49533    49661    48135    1261    1396
rrna9    54794    53071    53108    51775    1429    1538

.....

...........

Charlie25    5056    4964    5026    4771    129    121
Charlie26    5609    5474    5551    5329    148    141
Charlie27    7953    7881    7982    7639    193    196
Charlie28    7159    7377    7153    6913    176    197

Thank you very much!!

I moved this to a Comment because it is not an Answer.

I don't understand exactly what you want to do here. You can't normalize using one set of genes and then another. That's not how our software works. You either use all the genes (default) or you choose a set of genes to normalize by and then that's it. You can take a look at the DESeq2 paper to read up on the Methods.

Also very problematic for DESeq2 is that the library sizes are vastly different (much more than x10) while being confounded with the condition. This would be problematic for nearly any software testing for differences in mean across condition.

Well, thank you very much for your patience and time. Initially I used normal DESeq pipeline. But I noticed that there were enrichments for ribosomal RNA expression in my experimental library. The reason is: probably as the rRNA are pretty abundant in my libraries (both my control and exp), as the exp library is less depth, it captured more rRNA than my control library. So, first I need to adjust this bias. Then I can proceed to normal normalization by means of the entire library depth (normal DESeq pipeline).

I am sorry for all these confusions and thanks for bearing with me!!