API Documentation¶
- class sodirac.utils.RBFWeight(alpha: float | None = None)[source]¶
Bases:
object- set_alpha(X: numpy.ndarray, n_max: int | None = None, dm: numpy.ndarray | None = None) None[source]¶
Set the alpha parameter of a Gaussian RBF kernel as the median distance between points in an array of observations.
Parameters¶
- Xnp.ndarray
[N, P] matrix of observations and features.
- n_maxint
maximum number of observations to use for median distance computation.
- dmnp.ndarray, optional
[N, N] distance matrix for setting the RBF kernel parameter. speeds computation if pre-computed.
Returns¶
None. Sets self.alpha.
References¶
A Kernel Two-Sample Test Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, Alexander Smola. JMLR, 13(Mar):723−773, 2012. http://jmlr.csail.mit.edu/papers/v13/gretton12a.html
- sodirac.utils.adata_to_cluster_expression(adata, cluster_label, scale=True, add_density=True)[source]¶
Convert an AnnData to a new AnnData with cluster expressions. Clusters are based on cluster_label in adata.obs. The returned AnnData has an observation for each cluster, with the cluster-level expression equals to the average expression for that cluster. All annotations in adata.obs except cluster_label are discarded in the returned AnnData.
- Args:
adata (AnnData): single cell data cluster_label (String): field in adata.obs used for aggregating values scale (bool): Optional. Whether weight input single cell by # of cells in cluster. Default is True. add_density (bool): Optional. If True, the normalized number of cells in each cluster is added to the returned AnnData as obs.cluster_density. Default is True.
- Returns:
AnnData: aggregated single cell data
- sodirac.utils.append_categorical_to_data(X: numpy.ndarray | scipy.sparse.csr.csr_matrix, categorical: numpy.ndarray)[source]¶
Convert categorical to a one-hot vector and append this vector to each sample in X.
Parameters¶
- Xnp.ndarray, sparse.csr.csr_matrix
[Cells, Features]
- categoricalnp.ndarray
[Cells,]
Returns¶
- Xanp.ndarray
[Cells, Features + N_Categories]
- categoriesnp.ndarray
[N_Categories,] str category descriptors.
- sodirac.utils.argmax_pred_class(grouping: numpy.ndarray, prediction: numpy.ndarray)[source]¶
Assign class to elements in groups based on the most common predicted class for that group.
Parameters¶
- groupingnp.ndarray
[N,] partition values defining groups to be classified.
- predictionnp.ndarray
[N,] predicted values for each element in grouping.
Returns¶
- assigned_classesnp.ndarray
[N,] class labels based on the most common class assigned to elements in the group partition.
Examples¶
>>> grouping = np.array([0,0,0,1,1,1,2,2,2,2]) >>> prediction = np.array(['A','A','A','B','A','B','C','A','B','C']) >>> argmax_pred_class(grouping, prediction) np.ndarray(['A','A','A','B','B','B','C','C','C','C',])
Notes¶
scNym classifiers do not incorporate neighborhood information. This simple heuristic leverages cluster information obtained by an orthogonal method and assigns all cells in a given cluster the majority class label within that cluster.
- sodirac.utils.build_classification_matrix(X: numpy.ndarray | scipy.sparse.csr.csr_matrix, model_genes: numpy.ndarray, sample_genes: numpy.ndarray, gene_batch_size: int = 512) numpy.ndarray | scipy.sparse.csr.csr_matrix[source]¶
Build a matrix for classification using only genes that overlap between the current sample and the pre-trained model.
Parameters¶
- Xnp.ndarray, sparse.csr_matrix
[Cells, Genes] count matrix.
- model_genesnp.ndarray
gene identifiers in the order expected by the model.
- sample_genesnp.ndarray
gene identifiers for the current sample.
- gene_batch_sizeint
number of genes to copy between arrays per batch. controls a speed vs. memory trade-off.
Returns¶
- Nnp.ndarray, sparse.csr_matrix
[Cells, len(model_genes)] count matrix. Values where a model gene was not present in the sample are left as zeros. type(N) will match type(X).
- sodirac.utils.compute_entropy_of_mixing(X: numpy.ndarray, y: numpy.ndarray, n_neighbors: int, n_iters: int | None = None, **kwargs) numpy.ndarray[source]¶
Compute the entropy of mixing among groups given a distance matrix.
Parameters¶
- Xnp.ndarray
[N, P] feature matrix.
- ynp.ndarray
[N,] group labels.
- n_neighborsint
number of nearest neighbors to draw for each iteration of the entropy computation.
- n_itersint
number of iterations to perform. if n_iters is None, uses every point.
Returns¶
- entropy_of_mixingnp.ndarray
[n_iters,] entropy values for each iteration.
Notes¶
The entropy of batch mixing is computed by sampling n_per_sample cells from a local neighborhood in the nearest neighbor graph and contructing a probability vector based on their group membership. The entropy of this probability vector is computed as a metric of intermixing between groups.
If groups are more mixed, the probability vector will have higher entropy, and vice-versa.
- sodirac.utils.get_adata_asarray(adata: anndata.AnnData) numpy.ndarray | scipy.sparse.csr.csr_matrix[source]¶
Get the gene expression matrix .X of an AnnData object as an array rather than a view.
Parameters¶
- adataanndata.AnnData
[Cells, Genes] AnnData experiment.
Returns¶
- Xnp.ndarray, sparse.csr.csr_matrix
[Cells, Genes] .X attribute as an array in memory.
Notes¶
Returned X will match the type of adata.X view.
- sodirac.utils.get_multi_edge_index(pos, regions, graph_methods='knn', n_neighbors=None, n_radius=None)[source]¶
- sodirac.utils.get_single_edge_index(pos, graph_methods='knn', n_neighbors=None, n_radius=None)[source]¶
- sodirac.utils.knn_smooth_pred_class(X: numpy.ndarray, pred_class: numpy.ndarray, grouping: numpy.ndarray | None = None, k: int = 15) numpy.ndarray[source]¶
Smooths class predictions by taking the modal class from each cell’s nearest neighbors.
Parameters¶
- Xnp.ndarray
[N, Features] embedding space for calculation of nearest neighbors.
- pred_classnp.ndarray
[N,] array of unique class labels.
- groupingsnp.ndarray
[N,] unique grouping labels for i.e. clusters. if provided, only considers nearest neighbors within the cluster.
- kint
number of nearest neighbors to use for smoothing.
Returns¶
- smooth_pred_classnp.ndarray
[N,] unique class labels, smoothed by kNN.
Examples¶
>>> smooth_pred_class = knn_smooth_pred_class( ... X = X, ... pred_class = raw_predicted_classes, ... grouping = louvain_cluster_groups, ... k = 15,)
Notes¶
scNym classifiers do not incorporate neighborhood information. By using a simple kNN smoothing heuristic, we can leverage neighborhood information to improve classification performance, smoothing out cells that have an outlier prediction relative to their local neighborhood.
- sodirac.utils.knn_smooth_pred_class_prob(X: numpy.ndarray, pred_probs: numpy.ndarray, names: numpy.ndarray, grouping: numpy.ndarray | None = None, k: Callable | int = 15, dm: numpy.ndarray | None = None, **kwargs) numpy.ndarray[source]¶
Smooths class predictions by taking the modal class from each cell’s nearest neighbors.
Parameters¶
- Xnp.ndarray
[N, Features] embedding space for calculation of nearest neighbors.
- pred_probsnp.ndarray
[N, C] array of class prediction probabilities.
- namesnp.ndarray,
[C,] names of predicted classes in pred_probs.
- groupingsnp.ndarray
[N,] unique grouping labels for i.e. clusters. if provided, only considers nearest neighbors within the cluster.
- kint
number of nearest neighbors to use for smoothing.
- dmnp.ndarray, optional
[N, N] distance matrix for setting the RBF kernel parameter. speeds computation if pre-computed.
Returns¶
- smooth_pred_classnp.ndarray
[N,] unique class labels, smoothed by kNN.
Examples¶
>>> smooth_pred_class = knn_smooth_pred_class_prob( ... X = X, ... pred_probs = predicted_class_probs, ... grouping = louvain_cluster_groups, ... k = 15,)
Notes¶
scNym classifiers do not incorporate neighborhood information. By using a simple kNN smoothing heuristic, we can leverage neighborhood information to improve classification performance, smoothing out cells that have an outlier prediction relative to their local neighborhood.
- sodirac.utils.lsi(adata: anndata.AnnData, n_comps: int = 20, use_highly_variable: bool | None = None, **kwargs) None[source]¶
LSI analysis (following the Seurat v3 approach)
Parameters¶
- adata
Input dataset
- n_components
Number of dimensions to use
- use_highly_variable
Whether to use highly variable features only, stored in
adata.var['highly_variable']. By default uses them if they have been determined beforehand.- **kwargs
Additional keyword arguments are passed to
sklearn.utils.extmath.randomized_svd()
Returns¶
- adataanndata.AnnData
The input AnnData object with LSI results stored in adata.obsm[“X_lsi”].
- sodirac.utils.make_one_hot(labels: torch.LongTensor, C=2) torch.FloatTensor[source]¶
Converts an integer label torch.autograd.Variable to a one-hot Variable.
Parameters¶
- labelstorch.LongTensor or torch.cuda.LongTensor
[N, 1], where N is batch size. Each value is an integer representing correct classification.
- Cint
number of classes in labels.
Returns¶
- targettorch.FloatTensor or torch.cuda.FloatTensor
[N, C,], where C is class number. One-hot encoded.
- sodirac.utils.mclust_R(adata, num_cluster, modelNames='EEE', used_obsm='emb_pca', random_seed=2020, key_added='mclust')[source]¶
Clustering using the mclust algorithm. The parameters are the same as those in the R package mclust.
- sodirac.utils.pp_adatas(adata_sc, adata_sp, genes=None, gene_to_lowercase=True)[source]¶
Pre-process AnnDatas so that they can be mapped. Specifically: - Remove genes that all entries are zero - Find the intersection between adata_sc, adata_sp and given marker gene list, save the intersected markers in two adatas - Calculate density priors and save it with adata_sp Args:
adata_sc (AnnData): single cell data adata_sp (AnnData): spatial expression data genes (List): Optional. List of genes to use. If None, all genes are used.
- Returns:
update adata_sc by creating uns training_genes overlap_genes fields update adata_sp by creating uns training_genes overlap_genes fields and creating obs rna_count_based_density & uniform_density field