Skip to content

Commit

Permalink
code cells added, to be documented
Browse files Browse the repository at this point in the history
  • Loading branch information
evenmatencio committed May 31, 2024
1 parent f11f68e commit eec9efa
Showing 1 changed file with 207 additions and 8 deletions.
215 changes: 207 additions & 8 deletions docs/examples/graph-signal-cpd.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Illustration and objectives\n",
"### Illustration\n",
"\n",
"First, we briefly illustrate the behavior of the GFSS and we justify the usage of the cost function `CostGFSSL2`. For a formal definition, please see [CostGFSSL2](../user-guide/costs/costgfssl2.md). \n",
"\n",
Expand Down Expand Up @@ -51,9 +51,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As explained in the [introduction](#introduction), applying a low-pass filter to a graph signal shrinks the high-frequency components, thus attenuating the signal components that are not smooth with respect to the graph structure. Based on this statement, we deduce two potential (related) benefits of applying the cost `CostGFSSL2`:\n",
"### Objectives\n",
"\n",
"1. as in [[Ferrari2019](#Ferrari2019)], detecting mean changes that are localized on clusters of the graph only. Formally if we denote $m_t(i)$ the mean of the process at node $i$ and $\\mathcal{C}$ a well connected subset of $\\mathcal{V}$ (a cluster), one may try to detect $t_r$ such that:\n",
"As explained above, applying a low-pass filter to a graph signal shrinks the high-frequency components, thus attenuating the signal components that are not smooth with respect to the graph structure. Based on this statement, we deduce two potential (related) benefits of applying the cost `CostGFSSL2`:\n",
"\n",
"1. *Scenario 1*: as in [[Ferrari2019](#Ferrari2019)], detecting mean changes that are localized on clusters of the graph only. Formally if we denote $m_t(i)$ the mean of the process at node $i$ and $\\mathcal{C}$ a well connected subset of $\\mathcal{V}$ (a cluster), one may try to detect $t_r$ such that:\n",
"\n",
"$$\n",
"y_t = m_t + e_t \\quad \\text{ with } ~ m_t = \n",
Expand All @@ -68,7 +70,7 @@
"\\end{cases}\n",
"$$\n",
"\n",
"2. attenuating changes induced by spatially white noise (with high variance) or that may be due to individual dysfunctions of the observed system, for instance a geographical censors network, a social network..."
"2. *Scenario 2*: attenuating changes induced by spatially white noise (with high variance) or that may be due to individual dysfunctions of the observed system, for instance a geographical censors network, a social network... in graphs showing clusters."
]
},
{
Expand All @@ -77,7 +79,7 @@
"source": [
"## Setup\n",
"\n",
"We generate a synthetic graph matching the above description and we define a signal over it."
"We generate a synthetic graph matching the above description, namely with a highly clusterized structure."
]
},
{
Expand All @@ -87,7 +89,9 @@
"outputs": [],
"source": [
"import ruptures as rpt # our package\n",
"import networkx as nx # for graph utils"
"import networkx as nx # for graph utils\n",
"\n",
"from ruptures.costs.costgfssl2 import CostGFSSL2 # the gfss based cost function"
]
},
{
Expand All @@ -102,7 +106,7 @@
"cluster_nb = 6\n",
"mean_cluster_size = 20\n",
"inter_density = 0.02 # density of inter-clusters edges\n",
"intra_density = 0.9 # density of intra-clusters edges\n",
"intra_density = 0.95 # density of intra-clusters edges\n",
"graph_seed = 9 # for reproducibility\n",
"G = nx.gaussian_random_partition_graph(\n",
" n=nb_nodes,\n",
Expand All @@ -112,7 +116,7 @@
" p_out=inter_density,\n",
" seed=graph_seed,\n",
")\n",
"coord = nx.spring_layout(G) # for plotting"
"coord = nx.spring_layout(G, seed=graph_seed) # for plotting consistency"
]
},
{
Expand All @@ -139,6 +143,196 @@
"nx.draw_networkx(G, pos=coord, with_labels=True, node_color=colors_l, ax=ax)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Robustness with respect to noise in Scenario 1\n",
"\n",
"We aim at detecting mean changes localized in a cluster only, as described in the scenario 1 presented in [the objectives](#objectives)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"here explain PELT & penalty coefficient & rho"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- to explain that we use PELT, assuming we do not know the number of ruptures\n",
"- say that we do not bother to correctly compute the penalty coefficient\n",
"- say that we look for a good cut-sparsityz"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scipy.linalg import eigh"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"eigvals, eigvects = eigh(nx.laplacian_matrix(G).toarray())\n",
"print(eigvals)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## SIGNAL GENERATION\n",
"\n",
"n_dims = nb_nodes\n",
"dims_with_change = np.array([dim for dim in clusters[-1]])\n",
"n_dims_with_change = len(dims_with_change)\n",
"n_samples = 100\n",
"signal_seed = 10\n",
"\n",
"print(f\"Number of nodes: {nb_nodes}, \")\n",
"print(f\"The dimensions with changes are: {[dim for dim in dims_with_change]} \")\n",
"\n",
"bic_L2_pen = n_dims / 2 * np.log(n_samples)\n",
"pen = 2 * bic_L2_pen\n",
"\n",
"noise_std_values = [0.1, 1, 2, 3, 4]\n",
"\n",
"rho = 1\n",
"print(f\"We set the cut-sparsity rho = {rho}. \\n\")\n",
"\n",
"for noise_std in noise_std_values:\n",
"\n",
" signal_with_change, gt_bkps = rpt.pw_constant(\n",
" n_samples, n_dims_with_change, 1, noise_std=noise_std, seed=signal_seed\n",
" )\n",
" signal, _ = rpt.pw_constant(\n",
" n_samples, n_dims, 0, noise_std=noise_std, seed=signal_seed\n",
" )\n",
" signal[:, dims_with_change] = signal_with_change\n",
"\n",
" ## CHANGE POINT DETECTION\n",
"\n",
" # with graph and GFSS\n",
" laplacian_matrix = nx.laplacian_matrix(\n",
" G\n",
" ).toarray() # extract the laplacian matrix as an numpy array\n",
" cost_rpt_pelt = CostGFSSL2(laplacian_matrix, rho)\n",
" graph_algo = rpt.Pelt(custom_cost=cost_rpt_pelt, jump=1, min_size=1).fit(signal)\n",
" my_graph_bkps = graph_algo.predict(pen=pen)\n",
"\n",
" # without graph\n",
" algo = rpt.Pelt(model=\"l2\", jump=1, min_size=1).fit(signal)\n",
" my_bkps = algo.predict(pen=pen)\n",
"\n",
" print(\n",
" \"NOISE_STD=\",\n",
" noise_std,\n",
" \"\\tGROUNDTRUTH\",\n",
" gt_bkps,\n",
" \"\\tWITH GRAPH: \",\n",
" my_graph_bkps,\n",
" \"\\tWITHOUT GRAPH: \",\n",
" my_bkps,\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Influence of $\\rho$ in Scenario 2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## SIGNAL GENERATION AND HYPER-PARAMETERS TUNING\n",
"\n",
"rng = np.random.default_rng(seed=10)\n",
"\n",
"# signal hyper-parameters\n",
"n_dims = nb_nodes # number of dimensions equals the number of nodes\n",
"n_dims_with_change = n_dims // 20 # number of nodes undergoing a mean change\n",
"dims_with_change = np.arange(n_dims)\n",
"rng.shuffle(dims_with_change) # randomizing the nodes with change\n",
"dims_with_change = dims_with_change[:n_dims_with_change]\n",
"\n",
"# building the signal itself\n",
"n_samples = 100\n",
"signal_seed = 10\n",
"noise_std = 1\n",
"\n",
"signal_with_change, gt_bkps = rpt.pw_constant(\n",
" n_samples, n_dims_with_change, 1, noise_std=noise_std, seed=signal_seed\n",
")\n",
"signal, _ = rpt.pw_constant(n_samples, n_dims, 0, noise_std=noise_std, seed=signal_seed)\n",
"signal[:, dims_with_change] = signal_with_change\n",
"\n",
"print(f\"The dimensions with changes are: {[dim for dim in dims_with_change]}\")\n",
"\n",
"# algorithm hyper-parameters\n",
"bic_L2_pen = n_dims / 2 * np.log(n_samples)\n",
"pen = 2 * bic_L2_pen\n",
"\n",
"rho_values = [\n",
" eigvals[1] / 10,\n",
" eigvals[1] / 2,\n",
" eigvals[1],\n",
" 2 * eigvals[1],\n",
" eigvals[60],\n",
" eigvals[-1],\n",
"]\n",
"\n",
"for rho in rho_values:\n",
"\n",
" ## CHANGE POINT DETECTION\n",
"\n",
" # with graph and GFSS\n",
" laplacian_matrix = nx.laplacian_matrix(\n",
" G\n",
" ).toarray() # extract the laplacian matrix as an numpy array\n",
" cost_rpt_pelt = CostGFSSL2(laplacian_matrix, rho)\n",
" graph_algo = rpt.Pelt(custom_cost=cost_rpt_pelt, jump=1, min_size=1).fit(signal)\n",
" my_graph_bkps = graph_algo.predict(pen=pen)\n",
"\n",
" # without graph\n",
" algo = rpt.Pelt(model=\"l2\", jump=1, min_size=1).fit(signal)\n",
" my_bkps = algo.predict(pen=pen)\n",
"\n",
" print(\n",
" \"RHO=\",\n",
" round(rho, ndigits=3),\n",
" \"\\tGROUNDTRUTH\",\n",
" gt_bkps,\n",
" \"\\tWITH GRAPH: \",\n",
" my_graph_bkps,\n",
" \"\\tWITHOUT GRAPH: \",\n",
" my_bkps,\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Conclusions"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -158,6 +352,11 @@
"Ljubisa Stankovic, Danilo P. Mandic, Milos Dakovic, Ilia Kisil, Ervin Sejdic, and Anthony G. Constantinides (2019). Understanding the Basis of Graph Signal Processing via an Intuitive Example-Driven Approach [Lecture Notes]. IEEE Signal Processing Magazine, 36(6):133–145.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
Expand Down

0 comments on commit eec9efa

Please sign in to comment.