hidef_finder module

class hidef.hidef_finder.Cluster(binary, gamma)[source]

The base class representing a cluster in hidef.

Parameters
  • binary (np.array) – a binary vector indicating which objects belong to this cluster

  • gamma (float) – the resolution parameter which generated this cluster

calculate_similarity(cluster2)[source]

calculate the Jaccard similarity between two clusters

Parameters

cluster2 (hidef_finder.Cluster) –

Returns

Return type

float

class hidef.hidef_finder.ClusterGraph(incoming_graph_data=None, **attr)[source]

Extending nx.Graph class, each node is a hidef_finder.Cluster object

add_clusters(resolution_graph, new_resolution)[source]

Add new clusters to cluster graph once a new resolution is finished by the CD algorithm

Parameters
  • resolution_graph (nx.Graph) – a graph in which nodes represent resolutions in the scan, and edges connecting the resolutions that are considered ‘neighbors’

  • new_resolution (float) – the resolution just visited by the CD algorithm

hidef.hidef_finder.collapse_cluster_graph(cluG, components, p=100)[source]

take the cluster graph and collapse each component based on some consensus metric

Parameters
  • cluG (hidef_finder.ClusterGraph) –

  • components (list of list of int) – elements of the inner list are numbers to query a cluster in the cluster graph

  • p (int or float) – remove nodes if they did not appear in more than t percent of clusters in one component (the p parameter in the paper)

Returns

collapsed_clusters – binary numpy arrays indicating the members of each cluster after filtering by consensus

Return type

a list of np.array

See also

consensus

hidef.hidef_finder.consensus(cluG, k=5, f=1.0, p=100)[source]

create a more parsimonious results from the cluster graph

Parameters
  • cluG (hidef_finder.ClusterGraph) –

  • k (int) – delete components with lower size than this threshold. The ‘chi’ parameter in the paper

  • f (float) – take this fraction of clusters (ordered by degree in cluster graph)

  • p (int) – nodes that do not participate in the majority of clusters in a component will be removed

Returns

cluG_collapsed_w_len – the array indicates the member of clusters; the integer represents the persistence of each

Return type

a list of tuple (np.array, int)

hidef.hidef_finder.output_edges(wv, names, out, leaf=False, original_cluster_names=None)[source]

Output hierarchy in the DDOT format; wst column is parents and 2nd column is children Note this output is the ‘forward’ state.

Parameters
  • wv (weaver.Weaver) –

  • names (list of string) – a list of ordered gene names

  • out (string) – prefix of the output file

  • leaf (bool) – if True, then write genes into the result

  • original_cluster_names (list) – if not None, do not use the weaver renames, but use a list of specified names; should be equal to the number of clusters in “wv”

hidef.hidef_finder.output_gml(out)[source]

write a GML file for the hierarchy.

Parameters

out (string) – prefix of the output file

hidef.hidef_finder.output_nodes(wv, names, out, extra_data=None, original_cluster_names=None)[source]

Write a .nodes file according to the weaver result. Four columns are: community names, sizes, member genes, and persistence

Parameters
  • wv (weaver.Weaver) –

  • names (list of string) – a list of ordered gene names

  • out (string) – prefix of the output file

  • extra_data (list of int) – a list of numbers to show on the last column of the output file

  • original_cluster_names (list) – if not None, do not use the weaver renames, but use a list of specified names; should be equal to the number of clusters in “wv”

hidef.hidef_finder.partition_to_membership_matrix(partition, minsize=4)[source]
Parameters
  • partition (class partition in the louvain-igraph package) –

  • minsize (int) – minimum size of clusters; smaller clusters will be deleted afterwards

Returns

C – a matrix recording the membership of each cluster

Return type

scipy.sparse.csr_matrix

hidef.hidef_finder.run(Gs, jaccard=0.75, sample=0.9, minres=0.01, maxres=10, alg='leiden', maxn=None, density=0.1, neighbors=10, numthreads=2, layer_weights=None)[source]

Main function to run the Finder program

Parameters
  • Gs (a list of igraph.Graph) – input network(s)

  • jaccard (float) – use (0.5-1.0); a cutoff to call two clusters similar; the ‘tau’ parameter in paper

  • sample (float) – parameter to perturb input network in each run by deleting edges; lower values delete more

  • minres (float) – minimum resolution parameter

  • maxres (float) – maximum resolution parameter

  • maxn (float) – will explore resolution parameter until cluster number is similar to this number; will override ‘maxres’

  • alg (str) – can choose between ‘louvain’ or ‘leiden’

  • density (float) – inversed density of sampling resolution parameter. Use a smaller value to increase sample density (will increase running time)

  • neighbors (int) – also affect sampling density; a larger value may have additional benefits of stabilizing clustering results

  • bisect (deprecated) –

  • numthreads (int) – Number of threads to run in parallel. Default is set to number of cores.

hidef.hidef_finder.run_alg(Gs, alg, gamma=1.0, sample=1.0, layer_weights=None)[source]

Run community detection algorithm with a resolution parameter. Right now only use RB in Louvain/Leiden

Parameters
  • Gs (a list of igraph.Graph) –

  • alg (str) – choose between ‘louvain’ and ‘leiden’

  • gamma (float) – resolution parameter

  • sample (if smaller than 1, randomly delete a fraction of edges each time) –

  • layer_weights (a list of float) – specifying layer weights in the multilayer setting

Returns

C – a matrix recording the membership of each cluster

Return type

scipy.sparse.csr_matrix

hidef.hidef_finder.update_resolution_graph(G, new_resolution, neighborhood_size, neighbor_density_threshold)[source]

Update the “resolution graph”, which connect resolutions that are close enough

Parameters
  • G (nx.Graph) – the “resolution graph”

  • new_resolution (float) – the resolution just visited by the CD algorithm

  • neighborhood_size (float) – if two resolutions (log-scale) differs smaller than this value, they are called ‘neighbors’

  • neighbor_density_threshold (int) – if a resolution has neighbors more than this number, it is called “padded”. No more sampling will happen between two padded resolutions