Question

Swish log2FC NA output

0

Entering edit mode

jvandinter • 0

@05c09cf1

Last seen 3.1 years ago

Utrecht

Hi all,

I am interested in differential transcript expression in two conditions (fusion-positive vs fusion-negative) from two different datasets. I quantified using Salmon and imported using tximeta, following the swish vignette. I run swish using tumor subtype as variable of interest and the two batches as covariate (see code).


se <- tximeta(metadata, useHub = FALSE)
se <- scaleInfReps(se)

y <- se
y <- y[,y$subtype %in% c("PAX-FOXO","None")]
y$subtype <- factor(y$subtype, levels=c("None","PAX-FOXO"))
y$source <- factor(y$source, levels = c("inhouse","stjude"))
y <- labelKeep(y)
y <- y[mcols(y)$keep,]
y <- swish(y, x = "subtype", cov = "source")

However, when I check the output, the column for Log2FC is empty.

> head(mcols(y))
DataFrame with 6 rows and 10 columns
                    tx_id         gene_id         tx_name log10mean      keep      stat    log2FC    pvalue    locfdr    qvalue
                <integer> <CharacterList>     <character> <numeric> <logical> <numeric> <numeric> <numeric> <numeric> <numeric>
ENST00000456328         1 ENSG00000223972 ENST00000456328  1.073122      TRUE  0.718018        NA 0.3084146  1.000000  0.644582
ENST00000488147     10811 ENSG00000227232 ENST00000488147  2.382419      TRUE -1.596396        NA 0.0259436  0.415494  0.190842
MSTRG.12.28             3        MSTRG.12     MSTRG.12.28  2.330690      TRUE  1.676577        NA 0.0194736  0.364477  0.162842
ENST00000417324     10813 ENSG00000237613 ENST00000417324  0.643785      TRUE  1.072973        NA 0.1309100  0.840882  0.437874
ENST00000466430     10815 ENSG00000238009 ENST00000466430  1.480677      TRUE  0.422523        NA 0.5464961  1.000000  0.813149
ENST00000495576     10816 ENSG00000239945 ENST00000495576  0.876378      TRUE -0.243243        NA 0.7275535  1.000000  0.902662

I would like to include differences in Log2FC to look for biological differences, so I was wondering why it's not in the output.

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] circlize_0.4.15             ComplexHeatmap_2.12.0       reshape2_1.4.4              GenomicFeatures_1.48.3     
 [5] AnnotationDbi_1.58.0        PCAtools_2.8.0              ggrepel_0.9.1               ggplot2_3.3.6              
 [9] RColorBrewer_1.1-3          uwot_0.1.11                 Matrix_1.4-1                tximeta_1.14.1             
[13] tximport_1.24.0             rtracklayer_1.56.1          DESeq2_1.36.0               SummarizedExperiment_1.26.1
[17] Biobase_2.56.0              MatrixGenerics_1.8.1        matrixStats_0.62.0          GenomicRanges_1.48.0       
[21] GenomeInfoDb_1.32.2         IRanges_2.30.0              S4Vectors_0.34.0            BiocGenerics_0.42.0        
[25] magrittr_2.0.3              dplyr_1.0.9                 fishpond_2.2.0

Thank you in advance for the help!

swish fishpond DifferentialExpression • 2.1k views

ADD COMMENT • link 3.2 years ago jvandinter • 0

0

Entering edit mode

I also have a different cohort where I want to test DTE for. For some reason, the log2fold change is provided here.

> y <- se
> y$subtype <- ifelse(y$subtype %in% c("MYCN:ALK","MYCN"),"MYCN","other")
> y$subtype <- factor(y$subtype, levels=c("other","MYCN"))
> y$source <- factor(y$source, levels = c("inhouse","stjude"))
> y <- labelKeep(y)
> y <- y[mcols(y)$keep,]
> y <- swish(y, x = "subtype", cov = "source")
> head(mcols(y))
DataFrame with 6 rows and 10 columns
                    tx_id         gene_id         tx_name log10mean      keep
                <integer> <CharacterList>     <character> <numeric> <logical>
ENST00000456328         1 ENSG00000223972 ENST00000456328  0.971563      TRUE
MSTRG.11.5              3 ENSG00000223972      MSTRG.11.5  1.518454      TRUE
ENST00000488147     10901 ENSG00000227232 ENST00000488147  2.408130      TRUE
MSTRG.11.17             4 ENSG00000223972     MSTRG.11.17  1.933863      TRUE
MSTRG.11.24             5        MSTRG.11     MSTRG.11.24  1.968760      TRUE
ENST00000473358         6 ENSG00000243485 ENST00000473358  0.494373      TRUE
                     stat     log2FC    pvalue    locfdr    qvalue
                <numeric>  <numeric> <numeric> <numeric> <numeric>
ENST00000456328  0.509689  0.7906810  0.511639  1.000000  0.769889
MSTRG.11.5       0.261014  1.3074623  0.735965  1.000000  0.893553
ENST00000488147  0.507950 -0.0371601  0.513071  1.000000  0.770696
MSTRG.11.17      0.888957  1.3429038  0.256604  1.000000  0.572066
MSTRG.11.24      0.994961  0.9122093  0.205494  0.953114  0.516503
ENST00000473358  0.940655  0.1373825  0.230634  0.998923  0.544839

Maybe it has to do with too little samples per group? Just a guess.

ADD REPLY • link 3.2 years ago jvandinter • 0

score 0 · Answer 1 · 2022-08-02

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

Can you try:

table(y$source)

ADD COMMENT • link 3.2 years ago Michael Love 43k

0

Entering edit mode

For the cohort that gives NA Log2FC:

> table(y$source)

inhouse stjude
     7     36
> table(y$subtype)

None  ALK
  24   19

For the cohort with Log2FC output:

> table(y$subtype)

other  MYCN
  178    11

> table(y$source)

inhouse stjude
    84    105

For the second cohort: these are slightly imbalanced groups subtype-wise, might tweak them still a bit as this group 'other' includes samples with known mutations.

ADD REPLY • link 3.2 years ago jvandinter • 0

0

Entering edit mode

Oh can you also do cross-tabs for the NA one:

table(y$source, y$subtype)

ADD REPLY • link 3.2 years ago Michael Love 43k

0

Entering edit mode

Does it have to do with the 0 samples in the inhouse:None group? If so, I think I understand why it returned NA.

>table(y$source, y$subtype)
         None ALK
  inhouse  0   7
  stjude   24  12

ADD REPLY • link 3.2 years ago jvandinter • 0

0

Entering edit mode

Yes, and this highlights that I need to do more checking inside the function to detect this and error out (as the user is expecting results from each strata).

As one strata has no samples for differential analysis, you are just getting results for the strata with a defined test statistic.

ADD REPLY • link 3.2 years ago Michael Love 43k

0

Entering edit mode

This has been added to the devel branch if you want to try it out (that the error works)

devtools::install_github("mikelove/fishpond")

ADD REPLY • link 3.2 years ago Michael Love 43k

0

Entering edit mode

It does seem to work! Thank you for the quick reply and great help, as always

> table(y$source,y$subtype)

         None ALK
  inhouse    0   7
  stjude   24  12
> y <- labelKeep(y)
> y <- y[mcols(y)$keep,]
> y <- swish(y, x = "subtype", cov = "source")
Error in swish(y, x = "subtype", cov = "source") :
  some strata do not have complete data for computing LFC

ADD REPLY • link 3.2 years ago jvandinter • 0