Add a few pages to user guide

lverweijen · Jan 18, 2025 · ee8d393 · ee8d393
1 parent 37baf23
commit ee8d393
Show file tree

Hide file tree

Showing 13 changed files with 316 additions and 65 deletions.
diff --git a/docs/source/apriori.md b/docs/source/apriori.md
@@ -0,0 +1,35 @@
+# Apriori
+
+Apriori files can be used to control how TauArgus suppresses cells.
+These files can mark individual cells as protected, safe, or modify the suppression cost.
+
+## Use an existing file
+
+```python
+import piargus as pa
+
+apriori = pa.Apriori.from_hst("apriori.hst")
+```
+
+## Create an apriori file programmatically
+
+```python
+import piargus as pa
+
+apriori = pa.Apriori(expand_trivial=True)
+apriori.change_status(['A', 'ExampleDam'], pa.SAFE)
+apriori.change_status(['A', 'ExampleCity'], pa.SAFE)
+apriori.change_cost(['C', 'ExampleDam'], 10)
+apriori.change_protection_level(['C', 'ExampleCity'], 5)
+
+apriori.to_hist("apriori.hst")
+```
+
+## Attaching apriori to a table
+
+Simply pass it as a parameter when creating a `Table` or `TableData` instance:
+
+```python
+table = pa.Table(['symbol', 'regio'], 'income', ...,
+                 apriori=apriori)
+```
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -40,7 +40,7 @@
 html_theme = 'sphinx_rtd_theme'
 html_static_path = ['_static']
 
-myst_enable_extensions = ["colon_fence"]
+myst_enable_extensions = ["colon_fence", "dollarmath"]
 
 # Code is in src. Make sure sphinx can find it
 import sys

diff --git a/docs/source/hierarchies.md b/docs/source/hierarchies.md
@@ -1,12 +1,12 @@
 # Hierarchies #
 
-For explanatory variables, it is recommended to supply a hierarchy.
-There are 3 kinds of hierarchy supported by PiArgus.
+Hierarchies are important for explanatory variables.
+There are three types supported by PiArgus.
 
 ## FlatHierarchy ##
 
-This is the default if no hierarchy is supplied.
-All labels are of the same level with a single total.
+The `FlatHierarchy` is used by default if no hierarchy is specified.
+All codes add up to a single total.
 
 ```python
 import piargus as pa
@@ -15,6 +15,8 @@ datacol = ["A", "B", "C", "B", "A"]
 hierarchy = pa.FlatHierarchy(total_code="Total")
 ```
 
+This creates a simple structure where all values are aggregated into one total.
+
 ```{mermaid}
 graph LR;
 Total --> A;
@@ -24,7 +26,7 @@ Total --> C;
 
 ## LevelHierarchy ##
 
-A level hierarchy is useful when the hierarchy is encoded within the code itself.
+A `LevelHierarchy` is used when the hierarchical relationships are encoded directly within the data.
 
 ```python
 import piargus as pa
@@ -33,6 +35,9 @@ datacol = ["11123", "11234", "23456"]
 hierarchy = pa.LevelHierarchy(levels=[2, 3], total_code="Total")
 ```
 
+In this example, the first two digits represent a higher-level grouping,
+and the next 3 digits represent a more detailed level within that group.
+
 ```{mermaid}
 graph LR;
 Total --> 11;
@@ -44,8 +49,7 @@ Total --> 23;
 
 ## TreeHierarchy ##
 
-For complex hierarchies, a TreeHierarchy can be used.
-These are typically stored in a hrc-file.
+A `TreeHierarchy` is used for complex hierarchies, typically stored in `.hrc` files.
 
 ```python
 import piargus as pa
@@ -54,6 +58,8 @@ datacol = ["PV20", "PV21", "PV22"]
 hierarchy = pa.TreeHierarchy.from_hrc("provinces.hrc", total_code="NL01")
 ```
 
+These hierarchies have a tree-like structure.
+
 ```{mermaid}
 graph LR;
 NL01 --> LD01;
@@ -63,22 +69,30 @@ LD01 --> PV21;
 LD02 --> PV22;
 ```
 
-The file provinces.hrc may look like this:
-```hrc
-LD01
-@PV20
-@PV21
-LD02
-@PV22
-```
+### Creating a TreeHierarchy programmatically
+
+You can also create a TreeHierarchy programmatically, without relying on an external `.hrc` file.
 
-It can also be created programmatically:
 ```python
 import piargus as pa
 
 hierarchy = pa.TreeHierarchy(total_code="NL01")
-hierarchy.create_node(["NL01", "LD01", "PV20"])
-hierarchy.create_node(["NL01", "LD01", "PV21"])
-hierarchy.create_node(["NL01", "LD02", "PV22"])
+hierarchy.create_node(["LD01", "PV20"])
+hierarchy.create_node(["LD01", "PV21"])
+hierarchy.create_node(["LD02", "PV22"])
 hierarchy.to_hrc('provinces.hrc')
 ```
+
+## Attaching a hierarchy to inputdata
+
+To apply a hierarchy to your data, simply pass the hierarchy as part of the 
+`MicroData` or `TableData` constructor:
+
+```python
+import piargus as pa
+
+pa.MicroData(data_df, ...,
+             hierarchies = {"region": region_hierarchy})
+```
+
+This will apply the specified `region_hierarchy` to the `region` column in your data.
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -3,13 +3,13 @@ Installation
 
 Download and install the latest version of `τ-ARGUS <https://github.com/sdcTools/tauargus/releases>`_.
 Make sure to setup the location of the program on your path.
-For example (powershell):
+For example, in Powershell:
 
 .. code-block:: powershell
 
     $env:path += ";\Path\To\Folder\Containing\TauArgus\Program"  # Please adapt locally to put your own path here
 
-Use `pip <https://pip.pypa.io/en/stable/getting-started/>`_ to install piargus.
+Next, use `pip <https://pip.pypa.io/en/stable/getting-started/>`_ to install piargus.
 
 .. code-block:: powershell
 

diff --git a/docs/source/result.md b/docs/source/result.md
@@ -0,0 +1,88 @@
+# Result analysis #
+
+## Working directory
+
+The job can accept a `directory` argument.
+When provided, all temporary files and output tables will be created and stored in the specified location.
+After the job completes, this directory can be inspected for further analysis.
+
+```python
+import piargus as pa
+
+job = pa.Job(directory="argus_workdir")
+```
+
+If no directory is provided, a temporary directory will be created automatically.
+This directory will be cleaned up once the job is finished.
+
+## Report
+
+When tau argus is run, it returns a result that can be printed.
+It will display all output written to the logbook.
+
+```python
+import piargus as pa
+
+tau = pa.TauArgus()
+report = tau.run(job)
+print(job)
+```
+
+## Table result
+
+The resulting tables can be obtained from the specification `Table`.
+
+```python
+import piargus as pa
+
+table_spec = pa.Table(...)
+
+job = pa.Job(inputdata, [table_spec])
+
+try:
+    tau.run(job)
+except pa.TauArgusException as err:
+    print("An error occurred:")
+    print(err.result)
+else:
+    print("Job completed succesfully")
+    table_result = table_spec.load_result()
+```
+
+### TableResult methods
+
+The `TableResult` object provides three key methods:
+
+```python
+table_result.safe()
+table_result.status()
+table_result.unsafe(unsafe_marker='X')
+```
+
+Each of these methods returns a Pandas [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html).
+The index is a multi-index containing the explanatory variables.
+You can reshape the result into a preferred format using Pandas methods like `stack`, `unstack`, and `swaplevel`.
+
+#### `unsafe()`
+
+This returns the aggregated data in unprotected form.
+
+#### `safe(unsafe_marker='X')`
+
+This returns the aggregated data in its protected form.
+Unsafe cells are marked by a special value, with `X` as the default marker.
+Since this converts the resulting `pd.Series` to a string data type, you can pass `pd.NA` or a dummy value to keep the result in a numeric format.
+
+#### `status()`
+
+This method returns the safety status for each observation as a `pd.Series`.
+
+The following status codes are used:
+
+| Code | Meaning          |
+|------|------------------|
+| S    | Safe             |
+| P    | Protected        |
+| U    | Primary unsafe   |
+| M    | Secondary unsafe |
+| Z    | Empty            |
diff --git a/docs/source/suppression.md b/docs/source/suppression.md
@@ -0,0 +1,55 @@
+# Safety methods and suppression #
+
+Tau argus performs two kinds of suppression:
+
+1. Primary suppression suppresses cells that violate safety rules.
+2. Secondary suppression suppresses cells to protect other cells.
+
+## Safety rules ##
+
+Cells directly violating one of these rules are protected during primary suppression.
+
+| Rule                    | Meaning                           |
+|-------------------------|-----------------------------------|
+| pa.percent_rule(p, n)   | $p%$-rule                         |
+| pa.dominance_rule(n, k) | $N,K$ dominance rule              |
+| pa.frequency_rule(n)    | Every cell needs $n$ contributors |
+
+## Suppression methods ##
+
+Methods for secondary suppression aim to minimize the suppression cost while protecting the data.
+
+| Method       | Description                                       | Optimality | Speed  |  
+|--------------|---------------------------------------------------|------------|--------|
+| `pa.OPTIMAL` | Minimizes suppression costs (slowest)             | High       | Slow   |
+| `pa.MOD`     | Protects sub-tables first and combines the result | Medium     | Medium |
+| `pa.GH`      | Hypercube method                                  | Low        | Fast   |
+
+## Specifying rules ##
+
+Safety rules can be set for individual observations.
+If some of the observations belong to the same unit, a safety rule can also be set on a holding-level.
+In that case the microdata should have a `holding`-column.
+If there is no holding information, safety rules can only be set on an individual level (per cell).
+Suppression methods are also be set per table.
+
+```python
+import piargus as pa
+
+table = pa.Table(response, explanatory, ...,
+                 safety_rule={"individual": pa.percent_rule(20),
+                              "holding": pa.percent_rule(30)},
+                 suppress_method=pa.MODULAR)
+```
+
+If there are multiple linked tables, a safety rule can also be set on a job:
+
+```python
+job = pa.Job(tables, ...,
+             linked_suppress_method=pa.MODULAR)
+```
+
+## Disclaimer
+
+For a more official and theoretical explanation of suppression in argus, please consult the [tau-manual](https://research.cbs.nl/casc/Software/TauManualV4.1.pdf).
+This page is meant as a practical overview, but is not authoritative.
diff --git a/docs/source/tauargus.md b/docs/source/tauargus.md
@@ -0,0 +1,54 @@
+# TauArgus
+
+The `TauArgus` class wraps the `tauargus.exe` program.
+You can either add the directory containing `tauargus.exe` to your `PATH` environment variable or pass the executable’s location as follows:
+
+```python
+import piargus as pa
+
+tau = pa.TauArgus(r"C:\\path\to\argus.exe")
+```
+
+To test the setup:
+
+```python
+print("Tau:", tau.version_info())
+```
+
+## Running jobs
+
+If you have created a job, it can be run as follows:
+
+```python
+job = pa.Job(...)
+tau.run(job)
+```
+
+Multiple jobs can be run at the same time by passing them as a list:
+
+```python
+tau.run([job1, job2, ...])
+```
+
+## Running batch files
+
+If you have created a batch file, it can be run as follows:
+
+```python
+tau.run("myjob.arb")
+```
+
+To simplify the creation of batch files, `BatchWriter` may be used.
+
+```python
+with open("myjob.arb", "w") as output_file:
+    batch_writer = pa.BatchWriter(output_file)
+
+    batch_writer.open_microdata("microdata.csv")
+    batch_writer.open_metadata("metadata.rda")
+    batch_writer.specify_table(["explanator1", "explanatory2"], "response")
+    batch_writer.safety_rule(individual="NK(3, 70)")
+    batch_writer.read_microdata()
+    batch_writer.suppress("MOD")
+    batch_writer.write_table(1, 2, "AS+", "protected.csv")
+```