We illustrate fishash on the Schraivogel et al 2020 CRISPRi dataset which is also bundled with the Python crispat package.
library(magrittr)
library(fishash)
library(ggplot2)
library(dplyr)
library(SummarizedExperiment)
library(Matrix)Loading the data and printing out the first few entries, we can see it is a sparse count matrix:
data(crispat_schraivogel)
crispat_schraivogel[1:10,1:10]
#> 10 x 10 sparse Matrix of class "dgCMatrix"
#> [[ suppressing 10 column names 'TAP1.AAACCTGAGCCTATGT', 'TAP1.AAACCTGAGGCGTACA', 'TAP1.AAACCTGAGGGCTTGA' ... ]]
#>
#> CROPseq_dCas9_DS_CCNE2_+_95907328.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CCNE2_+_95907382.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CCNE2_+_95907406.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CCNE2_-_95907017.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CPQ_+_97657557.23-P1P2 . 26 . . . . . . . .
#> CROPseq_dCas9_DS_CPQ_+_97657573.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CPQ_-_97657567.23-P1P2 . . . . . . . . 1 .
#> CROPseq_dCas9_DS_CPQ_-_97657591.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_DSCC1_+_120867694.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_DSCC1_+_120868042.23-P1P2 . . . . . . . . . .We pass this matrix to the fishash() function to call
presence/absence at 5% FDR:
res_fishash <- fishash(crispat_schraivogel, padj_cutoff = .05)
res_fishash
#> class: SummarizedExperiment
#> dim: 86 21977
#> metadata(2): log_pval_cutoff num_iter
#> assays(4): assigned log_pval odds_ratio odds_ratio_regularized
#> rownames(86): CROPseq_dCas9_DS_CCNE2_+_95907328.23-P1P2
#> CROPseq_dCas9_DS_CCNE2_+_95907382.23-P1P2 ...
#> CROPseq_dCas9_DS_non-targeting_00028 CROPseq_dCas9_DS_non-targeting_00029
#> rowData names(0):
#> colnames(21977): TAP1.AAACCTGAGCCTATGT TAP1.AAACCTGAGGCGTACA ...
#> TAP2.TTTGTCATCGCAGGCT TAP2.TTTGTCATCTACCTGC
#> colData names(2): demux_type assignmentThe return value is a SummarizedExperiment. The assigned
assay is a sparse boolean matrix indicating whether we call the guide
present:
assay(res_fishash, 'assigned')[1:10, 1:10]
#> 10 x 10 sparse Matrix of class "lgCMatrix"
#> [[ suppressing 10 column names 'TAP1.AAACCTGAGCCTATGT', 'TAP1.AAACCTGAGGCGTACA', 'TAP1.AAACCTGAGGGCTTGA' ... ]]
#>
#> CROPseq_dCas9_DS_CCNE2_+_95907328.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CCNE2_+_95907382.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CCNE2_+_95907406.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CCNE2_-_95907017.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CPQ_+_97657557.23-P1P2 . | . . . . . . . .
#> CROPseq_dCas9_DS_CPQ_+_97657573.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CPQ_-_97657567.23-P1P2 . . . . . . . . : .
#> CROPseq_dCas9_DS_CPQ_-_97657591.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_DSCC1_+_120867694.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_DSCC1_+_120868042.23-P1P2 . . . . . . . . . .The colData shows the classification of each cell; the
demux_type column indicates whether the cell contained 0,
1, or 2+ assigned guides, while the assignment column is a
comma-delimited string containing the guides called:
colData(res_fishash)
#> DataFrame with 21977 rows and 2 columns
#> demux_type assignment
#> <character> <character>
#> TAP1.AAACCTGAGCCTATGT doublet CROPseq_dCas9_DS_non..
#> TAP1.AAACCTGAGGCGTACA singlet CROPseq_dCas9_DS_CPQ..
#> TAP1.AAACCTGAGGGCTTGA singlet CROPseq_dCas9_DS_MYC-D
#> TAP1.AAACCTGAGGGTGTTG singlet CROPseq_dCas9_DS_PHF..
#> TAP1.AAACCTGAGGTGCTAG singlet CROPseq_dCas9_DS_LRR..
#> ... ... ...
#> TAP2.TTTGTCAGTTGTACAC singlet CROPseq_dCas9_DS_non..
#> TAP2.TTTGTCATCAAGATCC singlet CROPseq_dCas9_DS_non..
#> TAP2.TTTGTCATCCGTCAAA singlet CROPseq_dCas9_DS_OXR..
#> TAP2.TTTGTCATCGCAGGCT singlet CROPseq_dCas9_DS_non..
#> TAP2.TTTGTCATCTACCTGC doublet CROPseq_dCas9_DS_LRR..We can also inspect the log p-values from the Fisher test. Since we use a one-sided test, this matrix is also sparse:
assay(res_fishash, 'log_pval')[1:10, 1:10]
#> 10 x 10 sparse Matrix of class "dgCMatrix"
#> [[ suppressing 10 column names 'TAP1.AAACCTGAGCCTATGT', 'TAP1.AAACCTGAGGCGTACA', 'TAP1.AAACCTGAGGGCTTGA' ... ]]
#>
#> CROPseq_dCas9_DS_CCNE2_+_95907328.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CCNE2_+_95907382.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CCNE2_+_95907406.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CCNE2_-_95907017.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CPQ_+_97657557.23-P1P2 . -118.8247 . . . . . . . .
#> CROPseq_dCas9_DS_CPQ_+_97657573.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_CPQ_-_97657567.23-P1P2 . . . . . . . . -0.7195586 .
#> CROPseq_dCas9_DS_CPQ_-_97657591.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_DSCC1_+_120867694.23-P1P2 . . . . . . . . . .
#> CROPseq_dCas9_DS_DSCC1_+_120868042.23-P1P2 . . . . . . . . . .Finally, the metadata shows the p-value cutoff selected by the FDR procedure:
Below, we plot the nonzero log p-values together with the FDR cutoff, coloring by the number of counts in the entry:
crispat_schraivogel %>%
as('TsparseMatrix') %>%
{data.frame(
count=.@x,
row_idx=.@i+1,
col_idx=.@j+1
)} %>%
mutate(log_pval=assay(res_fishash, 'log_pval')[cbind(row_idx, col_idx)]) %>%
mutate(log_pval_trunc=pmax(log_pval, -20)) %>%
mutate(count=cut(count, c(0, 1, 2, 3, 5, 10, 30, 50, 100, Inf))) %>%
ggplot(aes(x=-log_pval, fill=count)) +
geom_histogram(bins=100) +
geom_vline(xintercept=-metadata(res_fishash)$log_pval_cutoff, lty='dashed') +
scale_x_continuous(trans='log1p', breaks=c(0,1,3,10,30,100,300,1000)) +
scale_fill_viridis_d() +
theme_classic(base_size=16) +
theme(legend.position='bottom')