Similarity
Similarity
merging
similarity
chi_square(pu, pv)
Chi-square distance or histogram distance. d(u,v) = 1/2 \sum_i=0^N (p(u_i) - p(v_i))**2/(p(u_i)+p(v_i))
Parameters:
-
pu
(array
) –vector of probabilities of observing elements of u
-
pv
(array
) –vector of probabilities of observing elements of u
Returns:
-
float
–distance between u and v
Source code in src/simnetpy/similarity/similarity.py
multi_modal_similarity(data, N, method, idxmap=None, norm=True)
Compute pairwise similarity for each data modality Uses same metric on each modality.
Note all matrices in data must be NxN if idxmap not specified Args: data (dict): dictionary of Nmodality feature matrices (N x d_i np.ndarrays) N (int): number of individuals in pairwise calculation idxmap (dict): dictionary contain index of each matrix in larger set
Returns:
-
–
np.ndarray: (Nmodality, N, N) pairwise similarity matrix
Source code in src/simnetpy/similarity/similarity.py
partial_mm_similarity(data, metric, norm=True, snf_aff=False, K=20, mu=0.5)
Calculate multi-modal similarity where rows in certain modalities might be mssing. Can use normal distance metric of Affinity proposed in SNF (Similarity Network Fusion).
Note: Function returns a m x N x N dissimilarity matrix. If affinity then S = -(Affinity Matrix)
Parameters:
-
data
(list
) –array of data matrices for each modality. Each array should be N x d. if data missing include NaN rows in input.
-
metric
(_type_
) –metric to use in distance calculation.
-
norm
(bool
, default:True
) –description. Defaults to True.
-
snf_aff
(bool
, default:False
) –description. Defaults to False.
-
K
(int
, default:20
) –description. Defaults to 20.
-
mu
(float
, default:0.5
) –description. Defaults to 0.5.
Returns:
-
_type_
–description
Source code in src/simnetpy/similarity/similarity.py
threshold
combined_adj(D, K, t)
Create a network from a dissimilarity matrix through a mixture of KNN and global threshold.
Parameters:
-
D
(ndarray
) –nxn dissimilarity matrix. smaller values => more similar.
-
K
(int
) –Number of Neighbours to find for each individual
-
t
(float
) –0 to 1 top 100*t% of edges to keep. 0.01 means top 1% most similar connections.
Returns:
-
–
np.ndarray: Adjacency matrix of 0 and 1s
Source code in src/simnetpy/similarity/threshold.py
knn_adj(D, K)
Create a network from a dissimilarity matrix by finding top K most similar neighbours for each node. Note: uses brute force algorithm. Checks all possible values. Slow for large matrices
Parameters:
-
D
(ndarray
) –nxn dissimilarity matrix. smaller values => more similar.
-
K
(int
) –Number of Neighbours to find for each individual
Returns:
-
–
np.ndarray: Adjacency matrix of 0 and 1s
Source code in src/simnetpy/similarity/threshold.py
log_skewed_knn_adj(D, K, statNN=10, stat='mean', Kquantile=1.0)
Create a network from a dissimilarity matrix through a mixture of KNN and global threshold.
Parameters:
-
D
(ndarray
) –nxn dissimilarity matrix. smaller values => more similar.
-
K
(int
) –Control number of Neighbours in neighbour distribution. Coupled with Kquantile. e.g. K=5, Kquantile=0.5 means 5 will be mean of distribution. kquantile=1.0 mean 5 will be max.
-
statNN
(int
, default:10
) –Number of neighbours in stat calc. Defaults to 10.
-
stat
(str
, default:'mean'
) –Stat to calculate from neighest neighbours, one of mean, median, std. Defaults to 'mean'.
-
Kquantile
(float
, default:1.0
) –Quantile of stat dist to map K to. Defaults to 1.0.
Returns:
-
–
np.ndarray: Adjacency matrix of 0 and 1s
Source code in src/simnetpy/similarity/threshold.py
network_from_sim_mat(D, method='knn', **kwargs)
function to sparsify dissimilarity matrix into adjacency
Parameters:
-
D
(ndarray
) –nxn Dissimilarity matrix. Smaller => more similar
-
method
(str
, default:'knn'
) –method to use to sparsify matrix. one of [knn, threshold, combined, skewed_knn]. Defaults to 'knn'.
-
**kwargs
–keyword arguments for sparsifying functions
Returns:
-
–
ig.Graph: Graph created from similarity matrix
Source code in src/simnetpy/similarity/threshold.py
nn_distribution(D, K, statNN=10, stat='mean', Kquantile=1.0, mapping='linear')
Parameters:
-
D
(ndarray
) –nxn dissimilarity matrix. smaller values => more similar.
-
K
(int
) –Control number of Neighbours in neighbour distribution. Coupled with Kquantile. e.g. K=5, Kquantile=0.5 means 5 will be mean of distribution. kquantile=1.0 mean 5 will be max.
-
statNN
(int
, default:10
) –Number of neighbours in stat calc. Defaults to 10.
-
stat
(str
, default:'mean'
) –Stat to calculate from neighest neighbours, one of mean, median, std. Defaults to 'mean'.
-
Kquantile
(float
, default:1.0
) –Quantile of stat dist to map K to. Defaults to 1.0.
Returns:
-
_type_
–description
Source code in src/simnetpy/similarity/threshold.py
skewed_knn_adj(D, K, statNN=10, stat='mean', Kquantile=1.0)
Create a network from a dissimilarity matrix through a mixture of KNN and global threshold.
Parameters:
-
D
(ndarray
) –nxn dissimilarity matrix. smaller values => more similar.
-
K
(int
) –Control number of Neighbours in neighbour distribution. Coupled with Kquantile. e.g. K=5, Kquantile=0.5 means 5 will be mean of distribution. kquantile=1.0 mean 5 will be max.
-
statNN
(int
, default:10
) –Number of neighbours in stat calc. Defaults to 10.
-
stat
(str
, default:'mean'
) –Stat to calculate from neighest neighbours, one of mean, median, std. Defaults to 'mean'.
-
Kquantile
(float
, default:1.0
) –Quantile of stat dist to map K to. Defaults to 1.0.
Returns:
-
–
np.ndarray: Adjacency matrix of 0 and 1s
Source code in src/simnetpy/similarity/threshold.py
sparsify_sim_matrix(D, method='knn', **kwargs)
function to sparsify dissimilarity matrix into adjacency
Parameters:
-
D
(ndarray
) –nxn Dissimilarity matrix. Smaller => more similar
-
method
(str
, default:'knn'
) –method to use to sparsify matrix. one of [knn, threshold, combined, skewed_knn]. Defaults to 'knn'.
-
**kwargs
–keyword arguments for sparsifying functions
Returns:
-
–
np.ndarray: nxn symmetric Adjacency matrix of 0s and 1s
Source code in src/simnetpy/similarity/threshold.py
threshold_adj(D, t)
Threshold dissimilarity matrix using quantile of values. Assumes distance. Edges retained are values below smallest t%.
Parameters:
-
D
(ndarray
) –nxn dissimilarity matrix. smaller values => more similar.
-
t
(float
) –0 to 1 top 100*t% of edges to keep. 0.01 means top 1% most similar connections.
Returns:
-
–
np.ndarray: Adjacency matrix of 0 and 1s
Source code in src/simnetpy/similarity/threshold.py
threshold_graph(D, t)
Threshold dissimilarity matrix using quantile of values and create a igraph network. Assumes distance. Edges retained are values below smallest t%.
Parameters:
-
D
(ndarray
) –nxn dissimilarity matrix. smaller values => more similar.
-
t
(float
) –0 to 1 top 100*t% of edges to keep. 0.01 means top 1% most similar connections.
Returns:
-
–
ig.Graph: Graph created from thresholding connections.