utils module

hidef.utils.containment_indices_boolean(A, B)[source]

Calculate a matrix of containment index for two lists of clusters

Parameters
  • matA (2D np.array) – axis 0 for clusters, axis 1 for nodes in network

  • matB (2D np.array) –

Returns

CI

Return type

2D np.array

hidef.utils.data2graph(datafile, outfile=None, k=15, snn=- 1, mydist='cosine')[source]

take a dataframe [n_samples x n_features] as input, and output knn or snn graph

Parameters
  • datafile (str) – input tsv file

  • outfile (str) – file name of output edge list

  • k (int) – number of neighbors for each sample

  • snn (float) – a threshold of Jaccard index if going to calculate shared nearest neighbor graph

  • mydist (string or callable) – distance metric to calculate neighbors

Returns

idx – the indices of entries of the output matrix

Return type

tuple of two numpy.array

hidef.utils.jaccard_matrix(matA, matB, threshold=0.75, return_mat=False)[source]

Calculate jaccard matrix between all pairs between two sets of clusters.

Parameters
  • matA (2D array or scipy.sparse.csr_matrix) – axis 0 for clusters, axis 1 for nodes in network

  • matB (2D array or scipy.sparse.csr_matrix) –

  • threshold – a similarity cutoff for Jaccard index

  • return_mat (bool) – set to true will also return the full pairwise Jaccard matrix

Returns

  • index ((np.array, np.array)) – two sets of indices; the cluster pairs implied by those indices satisfied threshold

  • jac (np.array) – a full matrix of pairwise Jaccard indices

hidef.utils.network_perturb(G, sample=0.8)[source]

perturb the network by randomly deleting some edges

Parameters
  • G (input network) –

  • sample (the fraction of edges to retain) –

Returns

Return type

return: the perturbed graph

hidef.utils.node2mat(f, g2ind, format='node', has_persistence=False)[source]

convert a text file to binary matrix (input of weaver)

Parameters
  • f (str) – the input TSV file

  • g2ind (dict) – a dictionary to index genes

  • format (str) – accepted values are ‘node’ or ‘clixo’

  • has_persistence – if True, the input file has an extra column, here usually the persistence

Returns

data

Return type

a dictionary, contain the field ‘cluster’, ‘name’, and could contain ‘extra.data’