CQN normalization
1
0
Entering edit mode
@sermsawat-tunlaya-anukit-4848
Last seen 10.3 years ago
I would like to normlization populus RNA seq with package CQN. I follow the code as show in manual, but after run i got the RPKM as minus(see detail below). What is the meaning of minus in RPKM? Or i done something wrong in my code? I try to visulize with cqn plot, and i think my data are not bias by GC content or lenght. I attached file of cqn plot for your information. Do you think i need to use your method to normailze data? Sincerely yours, Sermsawat T. #####normalize by CQN > #find GC content from phytozome populus genome > library(Rsamtools) > seq <- scanFa("pop.fa") > alph <- alphabetFrequency(seq, as.prob=TRUE) > gc <- rowSums(alph[,c("G", "C")]) > > library(cqn) > library(scales) > raw <- read.table("raw.txt", header=FALSE) > cqn.raw <- cqn(raw, lengths = width(seq), x = gc, sizeFactors = colSums(raw), verbose = TRUE) RQ fit .................. SQN . > > #cqn plot > par(mfrow=c(1,2)) > cqnplot(cqn.raw, n = 1, xlab = "GC content", lty = 1, ylim = c(1,7)) > cqnplot(cqn.raw, n = 2, xlab = "length", lty = 1, ylim = c(1,7)) > > #normalizedvalues > RPKM.cqn <- cqn.raw$y + cqn.raw$offset > head(RPKM.cqn) V1 V2 V3 V4 V5 V6 [1,] -2.3157759 -0.05011048 -1.5628979 -2.792141 -1.5042294 -2.1660771 [2,] 3.3638330 3.46092091 3.5447488 3.333357 3.1484934 3.4531245 [3,] -0.1924864 0.38536951 0.8976638 -1.134989 -0.1924864 0.4854206 [4,] -2.2808497 -0.50606822 -1.5613805 -2.802241 -1.4820398 -3.1508760 [5,] -2.2813593 -0.50602611 -1.5615417 -2.802341 -1.4822924 -3.1510596 [6,] 4.6875943 4.68565806 2.6239807 3.531895 3.5440903 1.9465045 V7 V8 V9 V10 V11 V12 [1,] -3.417837 -0.05011048 -0.05011048 -1.203393 -0.5058032 -1.4579155 [2,] 2.900329 3.07780614 3.41975117 3.693540 3.3387816 3.3901923 [3,] -1.369314 -0.05011048 0.58067142 0.437804 1.0192374 0.8445695 [4,] -3.459816 -0.38013895 -0.05011048 -1.182797 -0.5064885 -1.4613012 [5,] -3.459801 -0.38013895 -0.05011048 -1.183303 -0.5066405 0.1602583 [6,] 2.282431 2.19639308 1.39946079 4.753675 4.8337840 2.7983964 V13 V14 V15 V16 V17 V18 [1,] -4.303745 -1.844632 -3.1044275 -3.4759833 -0.1924864 -1.2015133 [2,] 3.116199 2.942757 3.2217711 2.9451114 2.8716179 3.4992808 [3,] -1.627192 -1.779755 -1.5114489 -0.8413003 -0.1924864 0.3091839 [4,] -3.289903 -2.870566 -3.1055409 -3.5218687 -0.3801390 -1.2399416 [5,] -4.290241 -2.870453 -2.1055097 -3.5218168 -0.3801390 -1.2400014 [6,] 3.685705 3.460004 0.8445695 2.4490306 2.6279081 0.5358322 > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] scales_0.2.1 Rsamtools_1.8.5 Biostrings_2.24.1 [4] GenomicRanges_1.8.7 IRanges_1.14.4 BiocGenerics_0.2.0 [7] cqn_1.2.0 quantreg_4.81 SparseM_0.96 [10] preprocessCore_1.18.0 nor1mix_1.1-3 mclust_3.5 loaded via a namespace (and not attached): [1] bitops_1.0-4.1 colorspace_1.1-1 dichromat_1.2-4 labeling_0.1 [5] munsell_0.3 plyr_1.7.1 RColorBrewer_1.0-5 stats4_2.15.1 [9] stringr_0.6 zlibbioc_1.2.0 -------------- next part -------------- A non-text attachment was scrubbed... Name: cqnplot.pdf Type: application/pdf Size: 18583 bytes Desc: not available URL: <https: stat.ethz.ch="" pipermail="" bioconductor="" attachments="" 20120717="" 0b43c6ee="" attachment.pdf="">
cqn cqn • 1.5k views
ADD COMMENT
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 18 months ago
United States
I have been away. The rpkm values are on log2 scale, hence the negative values. Your plot shows, in my opinion, a large gc bias effect. Kasper On Tuesday, July 17, 2012, Sermsawat Tunlaya-Anukit wrote: > I would like to normlization populus RNA seq with package CQN. I follow the > code as show in manual, but after run i got the RPKM as minus(see detail > below). What is the meaning of minus in RPKM? Or i done something wrong in > my code? I try to visulize with cqn plot, and i think my data are not bias > by GC content or lenght. I attached file of cqn plot for your information. > Do you think i need to use your method to normailze data? > > Sincerely yours, > Sermsawat T. > > #####normalize by CQN > > #find GC content from phytozome populus genome > > library(Rsamtools) > > seq <- scanFa("pop.fa") > > alph <- alphabetFrequency(seq, as.prob=TRUE) > > gc <- rowSums(alph[,c("G", "C")]) > > > > library(cqn) > > library(scales) > > raw <- read.table("raw.txt", header=FALSE) > > cqn.raw <- cqn(raw, lengths = width(seq), x = gc, sizeFactors = > colSums(raw), verbose = TRUE) > RQ fit .................. > SQN . > > > > #cqn plot > > par(mfrow=c(1,2)) > > cqnplot(cqn.raw, n = 1, xlab = "GC content", lty = 1, ylim = c(1,7)) > > cqnplot(cqn.raw, n = 2, xlab = "length", lty = 1, ylim = c(1,7)) > > > > #normalizedvalues > > RPKM.cqn <- cqn.raw$y + cqn.raw$offset > > head(RPKM.cqn) > V1 V2 V3 V4 V5 V6 > [1,] -2.3157759 -0.05011048 -1.5628979 -2.792141 -1.5042294 -2.1660771 > [2,] 3.3638330 3.46092091 3.5447488 3.333357 3.1484934 3.4531245 > [3,] -0.1924864 0.38536951 0.8976638 -1.134989 -0.1924864 0.4854206 > [4,] -2.2808497 -0.50606822 -1.5613805 -2.802241 -1.4820398 -3.1508760 > [5,] -2.2813593 -0.50602611 -1.5615417 -2.802341 -1.4822924 -3.1510596 > [6,] 4.6875943 4.68565806 2.6239807 3.531895 3.5440903 1.9465045 > V7 V8 V9 V10 V11 V12 > [1,] -3.417837 -0.05011048 -0.05011048 -1.203393 -0.5058032 -1.4579155 > [2,] 2.900329 3.07780614 3.41975117 3.693540 3.3387816 3.3901923 > [3,] -1.369314 -0.05011048 0.58067142 0.437804 1.0192374 0.8445695 > [4,] -3.459816 -0.38013895 -0.05011048 -1.182797 -0.5064885 -1.4613012 > [5,] -3.459801 -0.38013895 -0.05011048 -1.183303 -0.5066405 0.1602583 > [6,] 2.282431 2.19639308 1.39946079 4.753675 4.8337840 2.7983964 > V13 V14 V15 V16 V17 V18 > [1,] -4.303745 -1.844632 -3.1044275 -3.4759833 -0.1924864 -1.2015133 > [2,] 3.116199 2.942757 3.2217711 2.9451114 2.8716179 3.4992808 > [3,] -1.627192 -1.779755 -1.5114489 -0.8413003 -0.1924864 0.3091839 > [4,] -3.289903 -2.870566 -3.1055409 -3.5218687 -0.3801390 -1.2399416 > [5,] -4.290241 -2.870453 -2.1055097 -3.5218168 -0.3801390 -1.2400014 > [6,] 3.685705 3.460004 0.8445695 2.4490306 2.6279081 0.5358322 > > sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-mingw32/x64 (64-bit) > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > attached base packages: > [1] splines stats graphics grDevices utils datasets methods > [8] base > other attached packages: > [1] scales_0.2.1 Rsamtools_1.8.5 Biostrings_2.24.1 > [4] GenomicRanges_1.8.7 IRanges_1.14.4 BiocGenerics_0.2.0 > [7] cqn_1.2.0 quantreg_4.81 SparseM_0.96 > [10] preprocessCore_1.18.0 nor1mix_1.1-3 mclust_3.5 > loaded via a namespace (and not attached): > [1] bitops_1.0-4.1 colorspace_1.1-1 dichromat_1.2-4 > labeling_0.1 > [5] munsell_0.3 plyr_1.7.1 RColorBrewer_1.0-5 > stats4_2.15.1 > [9] stringr_0.6 zlibbioc_1.2.0 > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6