Entering edit mode
mosharrofgeb
▴
10
@mosharrofgeb-11643
Last seen 8.1 years ago
Hello: I am using DEseq2 for Small RNA-seq data. I have four replicates for control and two replicates for treated. I want to see differential expression of different kinds of loci (like expression of all gypsy elements in control vs treated). But I want to normalize my two datasets using All ribosomal RNAs across the two samples (I have small RNAs generated from ribosomal RNA in both of my samples). Any idea how to do that? I am using these commands:
> countTable = read.table("ABC.txt") > condition = factor(c("un", "un", "un", "un", "t", "t")) > library ("DESeq") >cds = newCountDataSet(CountTable, condition ) >estimateSizeFactors(cds[which(cds$feature=="rRNA")]) >sizeFactors( cds ) >head(counts( cds, normalized=TRUE))
But when I use: sizeFactors( cds ) command, I get this error:
un1 un2 un3 un4 t1 t2 NA NA NA NA NA NA
Please help me out!
First of all, if you want to use DESeq2, you should be using DESeq2, not DESeq. They are two different packages.
Well, I am actually using DEseq. Not DEseq2. Do you think I should use DEseq2 instead of Deseq? Thanks.
If you're just starting and haven't committed to one or the other, DESeq2 is definitely the right choice. The only reason to use DESeq instead would be if you had previously performed a substantial amount of analysis using DESeq and wanted to be able to compare your new results to that analysis.
I have just started. So, I think I can use DEseq2 then. But still, I need solution for my problem!
In order to get a solution, you'll need to post code for your new analysis using DESeq2. Scanning your above code I'd guess the problem is that you didn't put a comma after the which() command.
I am not an expert on R. I just need to get my things done! Probably I am not using the commands in the right way......I used these commands below (I have 4 untreated (un) replicates, 2 treated (t) replicates), sorry for all these silly questions:
Probably I am not using the commands in right order....
Thanks!!
The problem is what goes in the [ ] brackets. You have a 2 dimensional thing so you need to put a comma in between the [ and ].
It has rows and columns, so it is 2 dimensional. See "Selecting Observations" here:
http://www.statmethods.net/management/subset.html
Do you want to normalize the whole dds using just a subset of genes, and then continue to do testing on all the genes? You could tell estimateSizeFactors() which genes to normalize with using the 'controlGenes' argument. See the help page for estimateSizeFactors.
Well, here is what I want:
My control library has 400 million reads whereas my experimental library has ~5 million reads. So, the normal R studio usage is: normalize both libraries by their relative depth (what I have understood) and use that factor for normalizing different loci (their expression). But, as my experimental library is very small, ribosomal RNAs (which are pretty abundant in my samples) in my experimental library is very abundant compared to my control library. So, what I want is: normalize both libraries (control and experimental) first by the ribosomal RNAs (total 72 loci) followed by the relative depths of both libraries.
Here is what I have done lastly:
Could you please enlighten me with the commands that I should use? My file is ABC.txt and here are the first few lines from the file:
un1 un2 un3 un4 t1 t2
rrna1 51662 49671 49477 48694 1236 1405
rrna2 50625 49557 49793 48160 1215 1382
rrna3 50923 49879 49620 48455 1261 1340
rrna4 50955 49571 49436 48164 1287 1345
rrna5 51110 49541 49780 48422 1298 1420
rrna6 51054 49385 49431 48100 1274 1405
rrna7 51303 49709 49757 48208 1210 1422
rrna8 50813 49533 49661 48135 1261 1396
rrna9 54794 53071 53108 51775 1429 1538
.....
...........
Charlie25 5056 4964 5026 4771 129 121
Charlie26 5609 5474 5551 5329 148 141
Charlie27 7953 7881 7982 7639 193 196
Charlie28 7159 7377 7153 6913 176 197
Thank you very much!!
I moved this to a Comment because it is not an Answer.
I don't understand exactly what you want to do here. You can't normalize using one set of genes and then another. That's not how our software works. You either use all the genes (default) or you choose a set of genes to normalize by and then that's it. You can take a look at the DESeq2 paper to read up on the Methods.
Also very problematic for DESeq2 is that the library sizes are vastly different (much more than x10) while being confounded with the condition. This would be problematic for nearly any software testing for differences in mean across condition.
Well, thank you very much for your patience and time. Initially I used normal DESeq pipeline. But I noticed that there were enrichments for ribosomal RNA expression in my experimental library. The reason is: probably as the rRNA are pretty abundant in my libraries (both my control and exp), as the exp library is less depth, it captured more rRNA than my control library. So, first I need to adjust this bias. Then I can proceed to normal normalization by means of the entire library depth (normal DESeq pipeline).
I am sorry for all these confusions and thanks for bearing with me!!