API Documentation

class sodirac.utils.RBFWeight(alpha: float | None = None)[source]

Bases: object

set_alpha(X: numpy.ndarray, n_max: int | None = None, dm: numpy.ndarray | None = None) None[source]

Set the alpha parameter of a Gaussian RBF kernel as the median distance between points in an array of observations.

Parameters

Xnp.ndarray

[N, P] matrix of observations and features.

n_maxint

maximum number of observations to use for median distance computation.

dmnp.ndarray, optional

[N, N] distance matrix for setting the RBF kernel parameter. speeds computation if pre-computed.

Returns

None. Sets self.alpha.

References

A Kernel Two-Sample Test Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, Alexander Smola. JMLR, 13(Mar):723−773, 2012. http://jmlr.csail.mit.edu/papers/v13/gretton12a.html

sodirac.utils.adata_to_cluster_expression(adata, cluster_label, scale=True, add_density=True)[source]

Convert an AnnData to a new AnnData with cluster expressions. Clusters are based on cluster_label in adata.obs. The returned AnnData has an observation for each cluster, with the cluster-level expression equals to the average expression for that cluster. All annotations in adata.obs except cluster_label are discarded in the returned AnnData.

Args:

adata (AnnData): single cell data cluster_label (String): field in adata.obs used for aggregating values scale (bool): Optional. Whether weight input single cell by # of cells in cluster. Default is True. add_density (bool): Optional. If True, the normalized number of cells in each cluster is added to the returned AnnData as obs.cluster_density. Default is True.

Returns:

AnnData: aggregated single cell data

sodirac.utils.append_categorical_to_data(X: numpy.ndarray | scipy.sparse.csr.csr_matrix, categorical: numpy.ndarray)[source]

Convert categorical to a one-hot vector and append this vector to each sample in X.

Parameters

Xnp.ndarray, sparse.csr.csr_matrix

[Cells, Features]

categoricalnp.ndarray

[Cells,]

Returns

Xanp.ndarray

[Cells, Features + N_Categories]

categoriesnp.ndarray

[N_Categories,] str category descriptors.

sodirac.utils.argmax_pred_class(grouping: numpy.ndarray, prediction: numpy.ndarray)[source]

Assign class to elements in groups based on the most common predicted class for that group.

Parameters

groupingnp.ndarray

[N,] partition values defining groups to be classified.

predictionnp.ndarray

[N,] predicted values for each element in grouping.

Returns

assigned_classesnp.ndarray

[N,] class labels based on the most common class assigned to elements in the group partition.

Examples

>>> grouping = np.array([0,0,0,1,1,1,2,2,2,2])
>>> prediction = np.array(['A','A','A','B','A','B','C','A','B','C'])
>>> argmax_pred_class(grouping, prediction)
np.ndarray(['A','A','A','B','B','B','C','C','C','C',])

Notes

scNym classifiers do not incorporate neighborhood information. This simple heuristic leverages cluster information obtained by an orthogonal method and assigns all cells in a given cluster the majority class label within that cluster.

sodirac.utils.build_classification_matrix(X: numpy.ndarray | scipy.sparse.csr.csr_matrix, model_genes: numpy.ndarray, sample_genes: numpy.ndarray, gene_batch_size: int = 512) numpy.ndarray | scipy.sparse.csr.csr_matrix[source]

Build a matrix for classification using only genes that overlap between the current sample and the pre-trained model.

Parameters

Xnp.ndarray, sparse.csr_matrix

[Cells, Genes] count matrix.

model_genesnp.ndarray

gene identifiers in the order expected by the model.

sample_genesnp.ndarray

gene identifiers for the current sample.

gene_batch_sizeint

number of genes to copy between arrays per batch. controls a speed vs. memory trade-off.

Returns

Nnp.ndarray, sparse.csr_matrix

[Cells, len(model_genes)] count matrix. Values where a model gene was not present in the sample are left as zeros. type(N) will match type(X).

sodirac.utils.compute_entropy_of_mixing(X: numpy.ndarray, y: numpy.ndarray, n_neighbors: int, n_iters: int | None = None, **kwargs) numpy.ndarray[source]

Compute the entropy of mixing among groups given a distance matrix.

Parameters

Xnp.ndarray

[N, P] feature matrix.

ynp.ndarray

[N,] group labels.

n_neighborsint

number of nearest neighbors to draw for each iteration of the entropy computation.

n_itersint

number of iterations to perform. if n_iters is None, uses every point.

Returns

entropy_of_mixingnp.ndarray

[n_iters,] entropy values for each iteration.

Notes

The entropy of batch mixing is computed by sampling n_per_sample cells from a local neighborhood in the nearest neighbor graph and contructing a probability vector based on their group membership. The entropy of this probability vector is computed as a metric of intermixing between groups.

If groups are more mixed, the probability vector will have higher entropy, and vice-versa.

sodirac.utils.get_adata_asarray(adata: anndata.AnnData) numpy.ndarray | scipy.sparse.csr.csr_matrix[source]

Get the gene expression matrix .X of an AnnData object as an array rather than a view.

Parameters

adataanndata.AnnData

[Cells, Genes] AnnData experiment.

Returns

Xnp.ndarray, sparse.csr.csr_matrix

[Cells, Genes] .X attribute as an array in memory.

Notes

Returned X will match the type of adata.X view.

sodirac.utils.get_multi_edge_index(pos, regions, graph_methods='knn', n_neighbors=None, n_radius=None)[source]
sodirac.utils.get_single_edge_index(pos, graph_methods='knn', n_neighbors=None, n_radius=None)[source]
sodirac.utils.knn_smooth_pred_class(X: numpy.ndarray, pred_class: numpy.ndarray, grouping: numpy.ndarray | None = None, k: int = 15) numpy.ndarray[source]

Smooths class predictions by taking the modal class from each cell’s nearest neighbors.

Parameters

Xnp.ndarray

[N, Features] embedding space for calculation of nearest neighbors.

pred_classnp.ndarray

[N,] array of unique class labels.

groupingsnp.ndarray

[N,] unique grouping labels for i.e. clusters. if provided, only considers nearest neighbors within the cluster.

kint

number of nearest neighbors to use for smoothing.

Returns

smooth_pred_classnp.ndarray

[N,] unique class labels, smoothed by kNN.

Examples

>>> smooth_pred_class = knn_smooth_pred_class(
...     X = X,
...     pred_class = raw_predicted_classes,
...     grouping = louvain_cluster_groups,
...     k = 15,)

Notes

scNym classifiers do not incorporate neighborhood information. By using a simple kNN smoothing heuristic, we can leverage neighborhood information to improve classification performance, smoothing out cells that have an outlier prediction relative to their local neighborhood.

sodirac.utils.knn_smooth_pred_class_prob(X: numpy.ndarray, pred_probs: numpy.ndarray, names: numpy.ndarray, grouping: numpy.ndarray | None = None, k: Callable | int = 15, dm: numpy.ndarray | None = None, **kwargs) numpy.ndarray[source]

Smooths class predictions by taking the modal class from each cell’s nearest neighbors.

Parameters

Xnp.ndarray

[N, Features] embedding space for calculation of nearest neighbors.

pred_probsnp.ndarray

[N, C] array of class prediction probabilities.

namesnp.ndarray,

[C,] names of predicted classes in pred_probs.

groupingsnp.ndarray

[N,] unique grouping labels for i.e. clusters. if provided, only considers nearest neighbors within the cluster.

kint

number of nearest neighbors to use for smoothing.

dmnp.ndarray, optional

[N, N] distance matrix for setting the RBF kernel parameter. speeds computation if pre-computed.

Returns

smooth_pred_classnp.ndarray

[N,] unique class labels, smoothed by kNN.

Examples

>>> smooth_pred_class = knn_smooth_pred_class_prob(
...     X = X,
...     pred_probs = predicted_class_probs,
...     grouping = louvain_cluster_groups,
...     k = 15,)

Notes

scNym classifiers do not incorporate neighborhood information. By using a simple kNN smoothing heuristic, we can leverage neighborhood information to improve classification performance, smoothing out cells that have an outlier prediction relative to their local neighborhood.

sodirac.utils.lsi(adata: anndata.AnnData, n_comps: int = 20, use_highly_variable: bool | None = None, **kwargs) None[source]

LSI analysis (following the Seurat v3 approach)

Parameters

adata

Input dataset

n_components

Number of dimensions to use

use_highly_variable

Whether to use highly variable features only, stored in adata.var['highly_variable']. By default uses them if they have been determined beforehand.

**kwargs

Additional keyword arguments are passed to sklearn.utils.extmath.randomized_svd()

Returns

adataanndata.AnnData

The input AnnData object with LSI results stored in adata.obsm[“X_lsi”].

sodirac.utils.make_one_hot(labels: torch.LongTensor, C=2) torch.FloatTensor[source]

Converts an integer label torch.autograd.Variable to a one-hot Variable.

Parameters

labelstorch.LongTensor or torch.cuda.LongTensor

[N, 1], where N is batch size. Each value is an integer representing correct classification.

Cint

number of classes in labels.

Returns

targettorch.FloatTensor or torch.cuda.FloatTensor

[N, C,], where C is class number. One-hot encoded.

sodirac.utils.mclust_R(adata, num_cluster, modelNames='EEE', used_obsm='emb_pca', random_seed=2020, key_added='mclust')[source]

Clustering using the mclust algorithm. The parameters are the same as those in the R package mclust.

sodirac.utils.pp_adatas(adata_sc, adata_sp, genes=None, gene_to_lowercase=True)[source]

Pre-process AnnDatas so that they can be mapped. Specifically: - Remove genes that all entries are zero - Find the intersection between adata_sc, adata_sp and given marker gene list, save the intersected markers in two adatas - Calculate density priors and save it with adata_sp Args:

adata_sc (AnnData): single cell data adata_sp (AnnData): spatial expression data genes (List): Optional. List of genes to use. If None, all genes are used.

Returns:

update adata_sc by creating uns training_genes overlap_genes fields update adata_sp by creating uns training_genes overlap_genes fields and creating obs rna_count_based_density & uniform_density field

sodirac.utils.tfidf(X)[source]

TF-IDF normalization (following the Seurat v3 approach) Parameters ———- X

Input matrix

Returns

X_tfidf

TF-IDF normalized matrix