MarioniLab/RegressionBASiCS2017: First release for the correction of the paper

2020-01-12T16:56:55Z (GMT) by Nils Eling Catalina Vallejos

While revising code used to analyze the data described in Eling et al. (2018), Cell Systems for inclusion in a Bioconductor workflow associated to the BASiCS package, we discovered a problem with the analyses of datasets generated using the Fluidigm C1 system (Antolović et al., 2017; Lönnberg et al., 2017; Martinez-Jimenez et al., 2017). Specifically, we observed that the number of spike-in molecules present in the cell lysis volume had been miscomputed in our original analyses, such that all true spike-in numbers were inadvertently scaled by the same constant factor. Where available, these quantities are used as an input for BASiCS, and therefore, some of the outputs originally reported in Eling et al. were incorrect.

In principle, the original error in calculating the exact number of spike-in molecules per reaction only scales the arbitrary units in which gene expression is measured and thus should not alter any interpretation or downstream analysis. However, when we re-analyzed these data to correct Eling et al., we noted that mis-scaling led to a poor initialization of the MCMC sampler, which led to less stable estimates of the mean and dispersion for lowly expressed genes. In essence, more iterations were needed to achieve optimal good convergence. Consequently, using the correct input spike-in numbers leads to changes in the set of differentially expressed and variable genes identified.