Principal component analysis produced by DiffBind, but PC1 with smaller variation than PC2
Entering edit mode
Gary ▴ 20
Last seen 3.2 years ago


I am not sure whether this question here is proper or not. If it is not, I will remove this post.

I have a principal component analysis (PCA) figure below produced by DiffBind. After discussing with my colleagues, we wonder why the variation of Principal Component #1 is smaller than Principal Component #2. Please note that the horizontal axis (PC1) is from -0.365 to -0.345 (range is only 0.019), and the vertical axis (PC2) is from 0.4 to -0.4 (range is 0.8). Is it normal? Below are command lines I adopted. Any command lines I used were not correct? Many thanks.

Example of MCAS2 commands I used

macs2 callpeak -t MSailH3K27ac_C153.bam -c MSailInput_C160_C161.bam --format BAM --gsize 1000000000 --keep-dup auto --outdir MSailH3K27ac_C153 --name MSailH3K27ac_C153 --verbose 3 --tsize 75 --nomodel --extsize 200 --qvalue 0.00001 --slocal 1000 --llocal 10000 --cutoff-analysis

DiffBind commands I used

> h3k27ac <- dba(sampleSheet = "h3k27ac.csv")

MSailH3K27ac_C153 Sail H3K27ac Male MaleSail 1 macs2

MSailH3K27ac_C156 Sail H3K27ac Male MaleSail 2 macs2

MFlightH3K27ac_C159 Flight H3K27ac Male MaleFlight 1 macs2

MFlightH3K27ac_C165 Flight H3K27ac Male MaleFlight 2 macs2

FSailH3K27ac_C168 Sail H3K27ac Female FemaleSail 1 macs2

FSailH3K27ac_C171 Sail H3K27ac Female FemaleSail 2 macs2

FFlightH3K27ac_C177 Flight H3K27ac Female FemaleFlight 1 macs2

FFlightH3K27ac_C180 Flight H3K27ac Female FemaleFlight 2 macs2

> h3k27ac

8 Samples, 13047 sites in matrix (19937 total):

                   ID Tissue  Factor Condition    Treatment Replicate Caller Intervals

1   MSailH3K27ac_C153   Sail H3K27ac      Male     MaleSail         1  macs2     10373

2   MSailH3K27ac_C156   Sail H3K27ac      Male     MaleSail         2  macs2      9158

3 MFlightH3K27ac_C159 Flight H3K27ac      Male   MaleFlight         1  macs2      9622

4 MFlightH3K27ac_C165 Flight H3K27ac      Male   MaleFlight         2  macs2      9059

5   FSailH3K27ac_C168   Sail H3K27ac    Female   FemaleSail         1  macs2     10853

6   FSailH3K27ac_C171   Sail H3K27ac    Female   FemaleSail         2  macs2     15717

7 FFlightH3K27ac_C177 Flight H3K27ac    Female FemaleFlight         1  macs2     12572

8 FFlightH3K27ac_C180 Flight H3K27ac    Female FemaleFlight         2  macs2     11963

> h3k27ac <- dba.count(h3k27ac, minOverlap = 1, score = DBA_SCORE_TMM_MINUS_FULL, fragmentSize = 0, bRemoveDuplicates = TRUE, bScaleControl = TRUE, mapQCth = 10, bCorPlot = TRUE, bUseSummarizeOverlaps = TRUE, bParallel = TRUE)

> savefile <-, file = "h3k27ac_counts")

> h3k27ac <- dba.load(file = "h3k27ac_counts")

> h3k27ac

8 Samples, 19937 sites in matrix:

                   ID Feather.type  Factor    Sex    Treatment Replicate Caller Intervals FRiP

1   MSailH3K27ac_C153         Sail H3K27ac   Male     MaleSail         1 counts     19937 0.09

2   MSailH3K27ac_C156         Sail H3K27ac   Male     MaleSail         2 counts     19937 0.08

3 MFlightH3K27ac_C159       Flight H3K27ac   Male   MaleFlight         1 counts     19937 0.09

4 MFlightH3K27ac_C165       Flight H3K27ac   Male   MaleFlight         2 counts     19937 0.08

5   FSailH3K27ac_C168         Sail H3K27ac Female   FemaleSail         1 counts     19937 0.09

6   FSailH3K27ac_C171         Sail H3K27ac Female   FemaleSail         2 counts     19937 0.11

7 FFlightH3K27ac_C177       Flight H3K27ac Female FemaleFlight         1 counts     19937 0.10

8 FFlightH3K27ac_C180       Flight H3K27ac Female FemaleFlight         2 counts     19937 0.10

> h3k27ac$config$condition <- "Sex"

> h3k27ac$config$tissue <- "Feather type"

> dba.plotPCA(h3k27ac, attributes = DBA_CONDITION, dotSize = 3, cor = TRUE, vColors = c("deepskyblue", "deeppink"))

diffbind PCA • 892 views
Entering edit mode
Rory Stark ★ 4.4k
Last seen 3 days ago
CRUK, Cambridge, UK

The axes are the loading values, as returned by prcomp() in the rotation field. They are not actually rotated as I understand it; see the man page for prcomp().

I wouldn't read too much into the specific values used on these axes. The first component accounts for more of the variance than the second despite the narrower value range. Perhaps someone with a better understanding of the scale of the eigenvectors can explain?



Entering edit mode

Thanks a lot. May I know the prcomp() parameters DiffBind adopted? More importantly, may I set prcomp() parameters in DiffBind by myself, e.g. scale. = TRUE? I really appreciate your help.

Entering edit mode

By default (bLog=TRUE), the read scores are first converted using log(). Then prcomp() is called with the log2 read score matrix. Only the data matrix is passed, no other parameters are set. Setting bLog=FALSE will pass in the read scores without calling log2() first.

There is currently no way to change how DiffBind calls prcomp(). You can however retrieve the read scores (using dba.peakset() with bRetrieve=TRUE for the global binding matrix, or with bCounts=TRUE for a specific contrast) and do your own PCA.




Login before adding your answer.

Traffic: 342 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6