-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Verweijen
committed
Jan 18, 2025
1 parent
37baf23
commit ee8d393
Showing
13 changed files
with
316 additions
and
65 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
# Apriori | ||
|
||
Apriori files can be used to control how TauArgus suppresses cells. | ||
These files can mark individual cells as protected, safe, or modify the suppression cost. | ||
|
||
## Use an existing file | ||
|
||
```python | ||
import piargus as pa | ||
|
||
apriori = pa.Apriori.from_hst("apriori.hst") | ||
``` | ||
|
||
## Create an apriori file programmatically | ||
|
||
```python | ||
import piargus as pa | ||
|
||
apriori = pa.Apriori(expand_trivial=True) | ||
apriori.change_status(['A', 'ExampleDam'], pa.SAFE) | ||
apriori.change_status(['A', 'ExampleCity'], pa.SAFE) | ||
apriori.change_cost(['C', 'ExampleDam'], 10) | ||
apriori.change_protection_level(['C', 'ExampleCity'], 5) | ||
|
||
apriori.to_hist("apriori.hst") | ||
``` | ||
|
||
## Attaching apriori to a table | ||
|
||
Simply pass it as a parameter when creating a `Table` or `TableData` instance: | ||
|
||
```python | ||
table = pa.Table(['symbol', 'regio'], 'income', ..., | ||
apriori=apriori) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
# Result analysis # | ||
|
||
## Working directory | ||
|
||
The job can accept a `directory` argument. | ||
When provided, all temporary files and output tables will be created and stored in the specified location. | ||
After the job completes, this directory can be inspected for further analysis. | ||
|
||
```python | ||
import piargus as pa | ||
|
||
job = pa.Job(directory="argus_workdir") | ||
``` | ||
|
||
If no directory is provided, a temporary directory will be created automatically. | ||
This directory will be cleaned up once the job is finished. | ||
|
||
## Report | ||
|
||
When tau argus is run, it returns a result that can be printed. | ||
It will display all output written to the logbook. | ||
|
||
```python | ||
import piargus as pa | ||
|
||
tau = pa.TauArgus() | ||
report = tau.run(job) | ||
print(job) | ||
``` | ||
|
||
## Table result | ||
|
||
The resulting tables can be obtained from the specification `Table`. | ||
|
||
```python | ||
import piargus as pa | ||
|
||
table_spec = pa.Table(...) | ||
|
||
job = pa.Job(inputdata, [table_spec]) | ||
|
||
try: | ||
tau.run(job) | ||
except pa.TauArgusException as err: | ||
print("An error occurred:") | ||
print(err.result) | ||
else: | ||
print("Job completed succesfully") | ||
table_result = table_spec.load_result() | ||
``` | ||
|
||
### TableResult methods | ||
|
||
The `TableResult` object provides three key methods: | ||
|
||
```python | ||
table_result.safe() | ||
table_result.status() | ||
table_result.unsafe(unsafe_marker='X') | ||
``` | ||
|
||
Each of these methods returns a Pandas [Series](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html). | ||
The index is a multi-index containing the explanatory variables. | ||
You can reshape the result into a preferred format using Pandas methods like `stack`, `unstack`, and `swaplevel`. | ||
|
||
#### `unsafe()` | ||
|
||
This returns the aggregated data in unprotected form. | ||
|
||
#### `safe(unsafe_marker='X')` | ||
|
||
This returns the aggregated data in its protected form. | ||
Unsafe cells are marked by a special value, with `X` as the default marker. | ||
Since this converts the resulting `pd.Series` to a string data type, you can pass `pd.NA` or a dummy value to keep the result in a numeric format. | ||
|
||
#### `status()` | ||
|
||
This method returns the safety status for each observation as a `pd.Series`. | ||
|
||
The following status codes are used: | ||
|
||
| Code | Meaning | | ||
|------|------------------| | ||
| S | Safe | | ||
| P | Protected | | ||
| U | Primary unsafe | | ||
| M | Secondary unsafe | | ||
| Z | Empty | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# Safety methods and suppression # | ||
|
||
Tau argus performs two kinds of suppression: | ||
|
||
1. Primary suppression suppresses cells that violate safety rules. | ||
2. Secondary suppression suppresses cells to protect other cells. | ||
|
||
## Safety rules ## | ||
|
||
Cells directly violating one of these rules are protected during primary suppression. | ||
|
||
| Rule | Meaning | | ||
|-------------------------|-----------------------------------| | ||
| pa.percent_rule(p, n) | $p%$-rule | | ||
| pa.dominance_rule(n, k) | $N,K$ dominance rule | | ||
| pa.frequency_rule(n) | Every cell needs $n$ contributors | | ||
|
||
## Suppression methods ## | ||
|
||
Methods for secondary suppression aim to minimize the suppression cost while protecting the data. | ||
|
||
| Method | Description | Optimality | Speed | | ||
|--------------|---------------------------------------------------|------------|--------| | ||
| `pa.OPTIMAL` | Minimizes suppression costs (slowest) | High | Slow | | ||
| `pa.MOD` | Protects sub-tables first and combines the result | Medium | Medium | | ||
| `pa.GH` | Hypercube method | Low | Fast | | ||
|
||
## Specifying rules ## | ||
|
||
Safety rules can be set for individual observations. | ||
If some of the observations belong to the same unit, a safety rule can also be set on a holding-level. | ||
In that case the microdata should have a `holding`-column. | ||
If there is no holding information, safety rules can only be set on an individual level (per cell). | ||
Suppression methods are also be set per table. | ||
|
||
```python | ||
import piargus as pa | ||
|
||
table = pa.Table(response, explanatory, ..., | ||
safety_rule={"individual": pa.percent_rule(20), | ||
"holding": pa.percent_rule(30)}, | ||
suppress_method=pa.MODULAR) | ||
``` | ||
|
||
If there are multiple linked tables, a safety rule can also be set on a job: | ||
|
||
```python | ||
job = pa.Job(tables, ..., | ||
linked_suppress_method=pa.MODULAR) | ||
``` | ||
|
||
## Disclaimer | ||
|
||
For a more official and theoretical explanation of suppression in argus, please consult the [tau-manual](https://research.cbs.nl/casc/Software/TauManualV4.1.pdf). | ||
This page is meant as a practical overview, but is not authoritative. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
# TauArgus | ||
|
||
The `TauArgus` class wraps the `tauargus.exe` program. | ||
You can either add the directory containing `tauargus.exe` to your `PATH` environment variable or pass the executable’s location as follows: | ||
|
||
```python | ||
import piargus as pa | ||
|
||
tau = pa.TauArgus(r"C:\\path\to\argus.exe") | ||
``` | ||
|
||
To test the setup: | ||
|
||
```python | ||
print("Tau:", tau.version_info()) | ||
``` | ||
|
||
## Running jobs | ||
|
||
If you have created a job, it can be run as follows: | ||
|
||
```python | ||
job = pa.Job(...) | ||
tau.run(job) | ||
``` | ||
|
||
Multiple jobs can be run at the same time by passing them as a list: | ||
|
||
```python | ||
tau.run([job1, job2, ...]) | ||
``` | ||
|
||
## Running batch files | ||
|
||
If you have created a batch file, it can be run as follows: | ||
|
||
```python | ||
tau.run("myjob.arb") | ||
``` | ||
|
||
To simplify the creation of batch files, `BatchWriter` may be used. | ||
|
||
```python | ||
with open("myjob.arb", "w") as output_file: | ||
batch_writer = pa.BatchWriter(output_file) | ||
|
||
batch_writer.open_microdata("microdata.csv") | ||
batch_writer.open_metadata("metadata.rda") | ||
batch_writer.specify_table(["explanator1", "explanatory2"], "response") | ||
batch_writer.safety_rule(individual="NK(3, 70)") | ||
batch_writer.read_microdata() | ||
batch_writer.suppress("MOD") | ||
batch_writer.write_table(1, 2, "AS+", "protected.csv") | ||
``` |
Oops, something went wrong.