Normalization in edgeR
1
0
Entering edit mode
@myprogramming2016-9741
Last seen 7.5 years ago

Hi,

I am not quite clear about the normalization in edgeR. I am not seeing any change in the actual read counts.

They are same numbers except filtered for >10 reads, library size is reduced and norm.factor is 1.

Could you please comments on this?

Codes:

x<-read.delim("counts.txt",header=T,sep="\t")

y<-DGEList(counts=x[,2:151])
y <- calcNormFactors(y)
str(y)
keep<-rowSums(cpm(y)>10)>=3  
y<-y[keep,,keep.lib.sizes=FALSE]
str(y)
head(y$counts)
y$counts
# recaclulate the library size
y<-DGEList(counts=y)
y$samples  

edger rnaseq bioconductor • 2.0k views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 47 minutes ago
WEHI, Melbourne, Australia

Well, you reset all the normalization factors back to 1 when you ran DGEList() again at the end. Why did you do that? The library sizes had been recalculated already.

Have you read the section on normalization in the edgeR User's Guide? Amongst other things, that tells you why the read counts themselves are never changed.

Minor points:

I think it is usually better to run calcNormFactors() after filtering rather than before.

There is no need for str(y). Just type show(y) or just y by itself. It's far more informative.

ADD COMMENT
0
Entering edit mode

Thanks for your help. I have re-written the codes. 

I would like to subset the read counts without converting them into CPM. I mean, subsetting for raw read counts. I am planning to use median method of normalization over TMM. 

I want to subset for >10 counts and it should be present in atleast 3 libraries.

Could you please suggest a code?

Thanks

 

 

ADD REPLY

Login before adding your answer.

Traffic: 839 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6