Question

Normalization in edgeR

0

Entering edit mode

myprogramming2016 • 0

@myprogramming2016-9741

Last seen 6.9 years ago

Hi,

I am not quite clear about the normalization in edgeR. I am not seeing any change in the actual read counts.

They are same numbers except filtered for >10 reads, library size is reduced and norm.factor is 1.

Could you please comments on this?

Codes:

x<-read.delim("counts.txt",header=T,sep="\t")

y<-DGEList(counts=x[,2:151])
y <- calcNormFactors(y)
str(y)
keep<-rowSums(cpm(y)>10)>=3
y<-y[keep,,keep.lib.sizes=FALSE]
str(y)
head(y$counts)
y$counts
# recaclulate the library size
y<-DGEList(counts=y)
y$samples

edger rnaseq bioconductor • 1.8k views

ADD COMMENT • link updated 8.2 years ago by Gordon Smyth 50k • written 8.2 years ago by myprogramming2016 • 0

score 2 · Answer 1 · 2016-02-29

2

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 1 minute ago

WEHI, Melbourne, Australia

Well, you reset all the normalization factors back to 1 when you ran DGEList() again at the end. Why did you do that? The library sizes had been recalculated already.

Have you read the section on normalization in the edgeR User's Guide? Amongst other things, that tells you why the read counts themselves are never changed.

Minor points:

I think it is usually better to run calcNormFactors() after filtering rather than before.

There is no need for str(y). Just type show(y) or just y by itself. It's far more informative.

ADD COMMENT • link 8.2 years ago Gordon Smyth 50k

0

Entering edit mode

Thanks for your help. I have re-written the codes.

I would like to subset the read counts without converting them into CPM. I mean, subsetting for raw read counts. I am planning to use median method of normalization over TMM.

I want to subset for >10 counts and it should be present in atleast 3 libraries.

Could you please suggest a code?

Thanks

ADD REPLY • link 8.2 years ago myprogramming2016 • 0