Skip to content

Commit

Permalink
Add a few pages to user guide
Browse files Browse the repository at this point in the history
  • Loading branch information
Verweijen committed Jan 18, 2025
1 parent 37baf23 commit ee8d393
Show file tree
Hide file tree
Showing 13 changed files with 316 additions and 65 deletions.
35 changes: 35 additions & 0 deletions docs/source/apriori.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Apriori

Apriori files can be used to control how TauArgus suppresses cells.
These files can mark individual cells as protected, safe, or modify the suppression cost.

## Use an existing file

```python
import piargus as pa

apriori = pa.Apriori.from_hst("apriori.hst")
```

## Create an apriori file programmatically

```python
import piargus as pa

apriori = pa.Apriori(expand_trivial=True)
apriori.change_status(['A', 'ExampleDam'], pa.SAFE)
apriori.change_status(['A', 'ExampleCity'], pa.SAFE)
apriori.change_cost(['C', 'ExampleDam'], 10)
apriori.change_protection_level(['C', 'ExampleCity'], 5)

apriori.to_hist("apriori.hst")
```

## Attaching apriori to a table

Simply pass it as a parameter when creating a `Table` or `TableData` instance:

```python
table = pa.Table(['symbol', 'regio'], 'income', ...,
apriori=apriori)
```
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
html_theme = 'sphinx_rtd_theme'
html_static_path = ['_static']

myst_enable_extensions = ["colon_fence"]
myst_enable_extensions = ["colon_fence", "dollarmath"]

# Code is in src. Make sure sphinx can find it
import sys
Expand Down
52 changes: 33 additions & 19 deletions docs/source/hierarchies.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Hierarchies #

For explanatory variables, it is recommended to supply a hierarchy.
There are 3 kinds of hierarchy supported by PiArgus.
Hierarchies are important for explanatory variables.
There are three types supported by PiArgus.

## FlatHierarchy ##

This is the default if no hierarchy is supplied.
All labels are of the same level with a single total.
The `FlatHierarchy` is used by default if no hierarchy is specified.
All codes add up to a single total.

```python
import piargus as pa
Expand All @@ -15,6 +15,8 @@ datacol = ["A", "B", "C", "B", "A"]
hierarchy = pa.FlatHierarchy(total_code="Total")
```

This creates a simple structure where all values are aggregated into one total.

```{mermaid}
graph LR;
Total --> A;
Expand All @@ -24,7 +26,7 @@ Total --> C;

## LevelHierarchy ##

A level hierarchy is useful when the hierarchy is encoded within the code itself.
A `LevelHierarchy` is used when the hierarchical relationships are encoded directly within the data.

```python
import piargus as pa
Expand All @@ -33,6 +35,9 @@ datacol = ["11123", "11234", "23456"]
hierarchy = pa.LevelHierarchy(levels=[2, 3], total_code="Total")
```

In this example, the first two digits represent a higher-level grouping,
and the next 3 digits represent a more detailed level within that group.

```{mermaid}
graph LR;
Total --> 11;
Expand All @@ -44,8 +49,7 @@ Total --> 23;

## TreeHierarchy ##

For complex hierarchies, a TreeHierarchy can be used.
These are typically stored in a hrc-file.
A `TreeHierarchy` is used for complex hierarchies, typically stored in `.hrc` files.

```python
import piargus as pa
Expand All @@ -54,6 +58,8 @@ datacol = ["PV20", "PV21", "PV22"]
hierarchy = pa.TreeHierarchy.from_hrc("provinces.hrc", total_code="NL01")
```

These hierarchies have a tree-like structure.

```{mermaid}
graph LR;
NL01 --> LD01;
Expand All @@ -63,22 +69,30 @@ LD01 --> PV21;
LD02 --> PV22;
```

The file provinces.hrc may look like this:
```hrc
LD01
@PV20
@PV21
LD02
@PV22
```
### Creating a TreeHierarchy programmatically

You can also create a TreeHierarchy programmatically, without relying on an external `.hrc` file.

It can also be created programmatically:
```python
import piargus as pa

hierarchy = pa.TreeHierarchy(total_code="NL01")
hierarchy.create_node(["NL01", "LD01", "PV20"])
hierarchy.create_node(["NL01", "LD01", "PV21"])
hierarchy.create_node(["NL01", "LD02", "PV22"])
hierarchy.create_node(["LD01", "PV20"])
hierarchy.create_node(["LD01", "PV21"])
hierarchy.create_node(["LD02", "PV22"])
hierarchy.to_hrc('provinces.hrc')
```

## Attaching a hierarchy to inputdata

To apply a hierarchy to your data, simply pass the hierarchy as part of the
`MicroData` or `TableData` constructor:

```python
import piargus as pa

pa.MicroData(data_df, ...,
hierarchies = {"region": region_hierarchy})
```

This will apply the specified `region_hierarchy` to the `region` column in your data.
4 changes: 2 additions & 2 deletions docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ Installation

Download and install the latest version of `τ-ARGUS <https://github.com/sdcTools/tauargus/releases>`_.
Make sure to setup the location of the program on your path.
For example (powershell):
For example, in Powershell:

.. code-block:: powershell
$env:path += ";\Path\To\Folder\Containing\TauArgus\Program" # Please adapt locally to put your own path here
Use `pip <https://pip.pypa.io/en/stable/getting-started/>`_ to install piargus.
Next, use `pip <https://pip.pypa.io/en/stable/getting-started/>`_ to install piargus.

.. code-block:: powershell
Expand Down
88 changes: 88 additions & 0 deletions docs/source/result.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Result analysis #

## Working directory

The job can accept a `directory` argument.
When provided, all temporary files and output tables will be created and stored in the specified location.
After the job completes, this directory can be inspected for further analysis.

```python
import piargus as pa

job = pa.Job(directory="argus_workdir")
```

If no directory is provided, a temporary directory will be created automatically.
This directory will be cleaned up once the job is finished.

## Report

When tau argus is run, it returns a result that can be printed.
It will display all output written to the logbook.

```python
import piargus as pa

tau = pa.TauArgus()
report = tau.run(job)
print(job)
```

## Table result

The resulting tables can be obtained from the specification `Table`.

```python
import piargus as pa

table_spec = pa.Table(...)

job = pa.Job(inputdata, [table_spec])

try:
tau.run(job)
except pa.TauArgusException as err:
print("An error occurred:")
print(err.result)
else:
print("Job completed succesfully")
table_result = table_spec.load_result()
```

### TableResult methods

The `TableResult` object provides three key methods:

```python
table_result.safe()
table_result.status()
table_result.unsafe(unsafe_marker='X')
```

Each of these methods returns a Pandas [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html).
The index is a multi-index containing the explanatory variables.
You can reshape the result into a preferred format using Pandas methods like `stack`, `unstack`, and `swaplevel`.

#### `unsafe()`

This returns the aggregated data in unprotected form.

#### `safe(unsafe_marker='X')`

This returns the aggregated data in its protected form.
Unsafe cells are marked by a special value, with `X` as the default marker.
Since this converts the resulting `pd.Series` to a string data type, you can pass `pd.NA` or a dummy value to keep the result in a numeric format.

#### `status()`

This method returns the safety status for each observation as a `pd.Series`.

The following status codes are used:

| Code | Meaning |
|------|------------------|
| S | Safe |
| P | Protected |
| U | Primary unsafe |
| M | Secondary unsafe |
| Z | Empty |
55 changes: 55 additions & 0 deletions docs/source/suppression.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Safety methods and suppression #

Tau argus performs two kinds of suppression:

1. Primary suppression suppresses cells that violate safety rules.
2. Secondary suppression suppresses cells to protect other cells.

## Safety rules ##

Cells directly violating one of these rules are protected during primary suppression.

| Rule | Meaning |
|-------------------------|-----------------------------------|
| pa.percent_rule(p, n) | $p%$-rule |
| pa.dominance_rule(n, k) | $N,K$ dominance rule |
| pa.frequency_rule(n) | Every cell needs $n$ contributors |

## Suppression methods ##

Methods for secondary suppression aim to minimize the suppression cost while protecting the data.

| Method | Description | Optimality | Speed |
|--------------|---------------------------------------------------|------------|--------|
| `pa.OPTIMAL` | Minimizes suppression costs (slowest) | High | Slow |
| `pa.MOD` | Protects sub-tables first and combines the result | Medium | Medium |
| `pa.GH` | Hypercube method | Low | Fast |

## Specifying rules ##

Safety rules can be set for individual observations.
If some of the observations belong to the same unit, a safety rule can also be set on a holding-level.
In that case the microdata should have a `holding`-column.
If there is no holding information, safety rules can only be set on an individual level (per cell).
Suppression methods are also be set per table.

```python
import piargus as pa

table = pa.Table(response, explanatory, ...,
safety_rule={"individual": pa.percent_rule(20),
"holding": pa.percent_rule(30)},
suppress_method=pa.MODULAR)
```

If there are multiple linked tables, a safety rule can also be set on a job:

```python
job = pa.Job(tables, ...,
linked_suppress_method=pa.MODULAR)
```

## Disclaimer

For a more official and theoretical explanation of suppression in argus, please consult the [tau-manual](https://research.cbs.nl/casc/Software/TauManualV4.1.pdf).
This page is meant as a practical overview, but is not authoritative.
54 changes: 54 additions & 0 deletions docs/source/tauargus.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# TauArgus

The `TauArgus` class wraps the `tauargus.exe` program.
You can either add the directory containing `tauargus.exe` to your `PATH` environment variable or pass the executable’s location as follows:

```python
import piargus as pa

tau = pa.TauArgus(r"C:\\path\to\argus.exe")
```

To test the setup:

```python
print("Tau:", tau.version_info())
```

## Running jobs

If you have created a job, it can be run as follows:

```python
job = pa.Job(...)
tau.run(job)
```

Multiple jobs can be run at the same time by passing them as a list:

```python
tau.run([job1, job2, ...])
```

## Running batch files

If you have created a batch file, it can be run as follows:

```python
tau.run("myjob.arb")
```

To simplify the creation of batch files, `BatchWriter` may be used.

```python
with open("myjob.arb", "w") as output_file:
batch_writer = pa.BatchWriter(output_file)

batch_writer.open_microdata("microdata.csv")
batch_writer.open_metadata("metadata.rda")
batch_writer.specify_table(["explanator1", "explanatory2"], "response")
batch_writer.safety_rule(individual="NK(3, 70)")
batch_writer.read_microdata()
batch_writer.suppress("MOD")
batch_writer.write_table(1, 2, "AS+", "protected.csv")
```
Loading

0 comments on commit ee8d393

Please sign in to comment.