From eec9efaec094235988fd0c1d09241209b1fe44d8 Mon Sep 17 00:00:00 2001 From: evenmatencio Date: Fri, 31 May 2024 14:45:10 +0200 Subject: [PATCH] code cells added, to be documented --- docs/examples/graph-signal-cpd.ipynb | 215 ++++++++++++++++++++++++++- 1 file changed, 207 insertions(+), 8 deletions(-) diff --git a/docs/examples/graph-signal-cpd.ipynb b/docs/examples/graph-signal-cpd.ipynb index c0ccb892..ab050704 100644 --- a/docs/examples/graph-signal-cpd.ipynb +++ b/docs/examples/graph-signal-cpd.ipynb @@ -19,7 +19,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Illustration and objectives\n", + "### Illustration\n", "\n", "First, we briefly illustrate the behavior of the GFSS and we justify the usage of the cost function `CostGFSSL2`. For a formal definition, please see [CostGFSSL2](../user-guide/costs/costgfssl2.md). \n", "\n", @@ -51,9 +51,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "As explained in the [introduction](#introduction), applying a low-pass filter to a graph signal shrinks the high-frequency components, thus attenuating the signal components that are not smooth with respect to the graph structure. Based on this statement, we deduce two potential (related) benefits of applying the cost `CostGFSSL2`:\n", + "### Objectives\n", "\n", - "1. as in [[Ferrari2019](#Ferrari2019)], detecting mean changes that are localized on clusters of the graph only. Formally if we denote $m_t(i)$ the mean of the process at node $i$ and $\\mathcal{C}$ a well connected subset of $\\mathcal{V}$ (a cluster), one may try to detect $t_r$ such that:\n", + "As explained above, applying a low-pass filter to a graph signal shrinks the high-frequency components, thus attenuating the signal components that are not smooth with respect to the graph structure. Based on this statement, we deduce two potential (related) benefits of applying the cost `CostGFSSL2`:\n", + "\n", + "1. *Scenario 1*: as in [[Ferrari2019](#Ferrari2019)], detecting mean changes that are localized on clusters of the graph only. Formally if we denote $m_t(i)$ the mean of the process at node $i$ and $\\mathcal{C}$ a well connected subset of $\\mathcal{V}$ (a cluster), one may try to detect $t_r$ such that:\n", "\n", "$$\n", "y_t = m_t + e_t \\quad \\text{ with } ~ m_t = \n", @@ -68,7 +70,7 @@ "\\end{cases}\n", "$$\n", "\n", - "2. attenuating changes induced by spatially white noise (with high variance) or that may be due to individual dysfunctions of the observed system, for instance a geographical censors network, a social network..." + "2. *Scenario 2*: attenuating changes induced by spatially white noise (with high variance) or that may be due to individual dysfunctions of the observed system, for instance a geographical censors network, a social network... in graphs showing clusters." ] }, { @@ -77,7 +79,7 @@ "source": [ "## Setup\n", "\n", - "We generate a synthetic graph matching the above description and we define a signal over it." + "We generate a synthetic graph matching the above description, namely with a highly clusterized structure." ] }, { @@ -87,7 +89,9 @@ "outputs": [], "source": [ "import ruptures as rpt # our package\n", - "import networkx as nx # for graph utils" + "import networkx as nx # for graph utils\n", + "\n", + "from ruptures.costs.costgfssl2 import CostGFSSL2 # the gfss based cost function" ] }, { @@ -102,7 +106,7 @@ "cluster_nb = 6\n", "mean_cluster_size = 20\n", "inter_density = 0.02 # density of inter-clusters edges\n", - "intra_density = 0.9 # density of intra-clusters edges\n", + "intra_density = 0.95 # density of intra-clusters edges\n", "graph_seed = 9 # for reproducibility\n", "G = nx.gaussian_random_partition_graph(\n", " n=nb_nodes,\n", @@ -112,7 +116,7 @@ " p_out=inter_density,\n", " seed=graph_seed,\n", ")\n", - "coord = nx.spring_layout(G) # for plotting" + "coord = nx.spring_layout(G, seed=graph_seed) # for plotting consistency" ] }, { @@ -139,6 +143,196 @@ "nx.draw_networkx(G, pos=coord, with_labels=True, node_color=colors_l, ax=ax)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Robustness with respect to noise in Scenario 1\n", + "\n", + "We aim at detecting mean changes localized in a cluster only, as described in the scenario 1 presented in [the objectives](#objectives)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "here explain PELT & penalty coefficient & rho" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- to explain that we use PELT, assuming we do not know the number of ruptures\n", + "- say that we do not bother to correctly compute the penalty coefficient\n", + "- say that we look for a good cut-sparsityz" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from scipy.linalg import eigh" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "eigvals, eigvects = eigh(nx.laplacian_matrix(G).toarray())\n", + "print(eigvals)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## SIGNAL GENERATION\n", + "\n", + "n_dims = nb_nodes\n", + "dims_with_change = np.array([dim for dim in clusters[-1]])\n", + "n_dims_with_change = len(dims_with_change)\n", + "n_samples = 100\n", + "signal_seed = 10\n", + "\n", + "print(f\"Number of nodes: {nb_nodes}, \")\n", + "print(f\"The dimensions with changes are: {[dim for dim in dims_with_change]} \")\n", + "\n", + "bic_L2_pen = n_dims / 2 * np.log(n_samples)\n", + "pen = 2 * bic_L2_pen\n", + "\n", + "noise_std_values = [0.1, 1, 2, 3, 4]\n", + "\n", + "rho = 1\n", + "print(f\"We set the cut-sparsity rho = {rho}. \\n\")\n", + "\n", + "for noise_std in noise_std_values:\n", + "\n", + " signal_with_change, gt_bkps = rpt.pw_constant(\n", + " n_samples, n_dims_with_change, 1, noise_std=noise_std, seed=signal_seed\n", + " )\n", + " signal, _ = rpt.pw_constant(\n", + " n_samples, n_dims, 0, noise_std=noise_std, seed=signal_seed\n", + " )\n", + " signal[:, dims_with_change] = signal_with_change\n", + "\n", + " ## CHANGE POINT DETECTION\n", + "\n", + " # with graph and GFSS\n", + " laplacian_matrix = nx.laplacian_matrix(\n", + " G\n", + " ).toarray() # extract the laplacian matrix as an numpy array\n", + " cost_rpt_pelt = CostGFSSL2(laplacian_matrix, rho)\n", + " graph_algo = rpt.Pelt(custom_cost=cost_rpt_pelt, jump=1, min_size=1).fit(signal)\n", + " my_graph_bkps = graph_algo.predict(pen=pen)\n", + "\n", + " # without graph\n", + " algo = rpt.Pelt(model=\"l2\", jump=1, min_size=1).fit(signal)\n", + " my_bkps = algo.predict(pen=pen)\n", + "\n", + " print(\n", + " \"NOISE_STD=\",\n", + " noise_std,\n", + " \"\\tGROUNDTRUTH\",\n", + " gt_bkps,\n", + " \"\\tWITH GRAPH: \",\n", + " my_graph_bkps,\n", + " \"\\tWITHOUT GRAPH: \",\n", + " my_bkps,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Influence of $\\rho$ in Scenario 2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "## SIGNAL GENERATION AND HYPER-PARAMETERS TUNING\n", + "\n", + "rng = np.random.default_rng(seed=10)\n", + "\n", + "# signal hyper-parameters\n", + "n_dims = nb_nodes # number of dimensions equals the number of nodes\n", + "n_dims_with_change = n_dims // 20 # number of nodes undergoing a mean change\n", + "dims_with_change = np.arange(n_dims)\n", + "rng.shuffle(dims_with_change) # randomizing the nodes with change\n", + "dims_with_change = dims_with_change[:n_dims_with_change]\n", + "\n", + "# building the signal itself\n", + "n_samples = 100\n", + "signal_seed = 10\n", + "noise_std = 1\n", + "\n", + "signal_with_change, gt_bkps = rpt.pw_constant(\n", + " n_samples, n_dims_with_change, 1, noise_std=noise_std, seed=signal_seed\n", + ")\n", + "signal, _ = rpt.pw_constant(n_samples, n_dims, 0, noise_std=noise_std, seed=signal_seed)\n", + "signal[:, dims_with_change] = signal_with_change\n", + "\n", + "print(f\"The dimensions with changes are: {[dim for dim in dims_with_change]}\")\n", + "\n", + "# algorithm hyper-parameters\n", + "bic_L2_pen = n_dims / 2 * np.log(n_samples)\n", + "pen = 2 * bic_L2_pen\n", + "\n", + "rho_values = [\n", + " eigvals[1] / 10,\n", + " eigvals[1] / 2,\n", + " eigvals[1],\n", + " 2 * eigvals[1],\n", + " eigvals[60],\n", + " eigvals[-1],\n", + "]\n", + "\n", + "for rho in rho_values:\n", + "\n", + " ## CHANGE POINT DETECTION\n", + "\n", + " # with graph and GFSS\n", + " laplacian_matrix = nx.laplacian_matrix(\n", + " G\n", + " ).toarray() # extract the laplacian matrix as an numpy array\n", + " cost_rpt_pelt = CostGFSSL2(laplacian_matrix, rho)\n", + " graph_algo = rpt.Pelt(custom_cost=cost_rpt_pelt, jump=1, min_size=1).fit(signal)\n", + " my_graph_bkps = graph_algo.predict(pen=pen)\n", + "\n", + " # without graph\n", + " algo = rpt.Pelt(model=\"l2\", jump=1, min_size=1).fit(signal)\n", + " my_bkps = algo.predict(pen=pen)\n", + "\n", + " print(\n", + " \"RHO=\",\n", + " round(rho, ndigits=3),\n", + " \"\\tGROUNDTRUTH\",\n", + " gt_bkps,\n", + " \"\\tWITH GRAPH: \",\n", + " my_graph_bkps,\n", + " \"\\tWITHOUT GRAPH: \",\n", + " my_bkps,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Conclusions" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -158,6 +352,11 @@ "Ljubisa Stankovic, Danilo P. Mandic, Milos Dakovic, Ilia Kisil, Ervin Sejdic, and Anthony G. Constantinides (2019). Understanding the Basis of Graph Signal Processing via an Intuitive Example-Driven Approach [Lecture Notes]. IEEE Signal Processing Magazine, 36(6):133–145.\n", "\n" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] } ], "metadata": {