Skip to content
josef-pkt edited this page Jun 6, 2012 · 6 revisions

R: Notes

Some collected notes for using R.

Warning I don't know much R. So this might not be the best way to do it

TODO: SUR example, sink, commit cat_items

Introspection

Assuming we already made a call to systemfit and assigned the results to SUR

> names(SUR)
 [1] "eq"           "call"         "coefficients" "coefCov"      "residCovEst"  "residCov"     "method"       "rank"
 [9] "df.residual"  "iter"         "control"      "panelLike"
> attributes(SUR)
$names
 [1] "eq"           "call"         "coefficients" "coefCov"      "residCovEst"  "residCov"     "method"       "rank"
 [9] "df.residual"  "iter"         "control"      "panelLike"

$class
[1] "systemfit"

> cc = SUR$coefCov
> is.numeric(cc)
[1] TRUE
> class(cc)
[1] "matrix"
> is.matrix(cc)
[1] TRUE
> class(SUR$eq)
[1] "list"

Looping over names and mkarray

A for loop that prints out all numeric attributes as python code that creates numpy arrays.
  • SUR[ [name]] or get(SUR, name) accesses the names attributes (?) of the object SUR. (I'm adding extra space between [ [ to avoid the Wiki to convert it to a link. It needs to be without space to be valid R code.)
  • mkarray is one of our helper functions in tools to print the data as np.array
  • it's a oneliner so it was easier to work with in the R shell
> for (name in names(SUR)) {if (is.numeric(SUR[ [name]])) {mkarray(SUR[ [name]], name)}}; cat("\n")
coefficients = np.array([0.9979991848420328,0.06886083327936214,...,0.0429020916196108])

coefCov = np.array([157.3943509170185,-0.2165142902938106,...,0.002035467551712387]).reshape(15,15, order='F')

residCovEst = np.array([176.3202565715889,-25.14782439226425,...,104.3078782568039]).reshape(5,5, order='F')

residCov = np.array([180.2786473970981,3.703259980763286,...,111.6549965340746]).reshape(5,5, order='F')

rank = np.array([15])

df.residual = np.array([85])

iter = np.array([1])

Create named list and save to python module and R2nparray

> aa = list(covparams=SUR$coefCov, rank=SUR$rank)
> R2nparray(aa, fname="temp3.py")

The content of temp3.py module is then

------------ temp3.py ----------
import numpy as np

covparams = np.array([157.3943509170185,-0.2165142902938106,...,0.002035467551712387]).reshape(15,15, order='F')

rank = np.array([15])
--------------------------------

Saving a dataframe

f is a data frame with fitted values from the ``SUR` model

> class(f)
[1] "data.frame"
> f
                Chrysler   General.Electric     General.Motors          US.Steel      Westinghouse
X1935  32.98546930516650  34.82254735597956  208.2453286635445 247.5131792455174 12.27690563625844
X1936  61.83516118316266  66.98918588257341  420.2793547553419 300.2827737683187 30.52156144761057
...

Calling another helper function, writes the data series of the data frame into a python module

> R2nparray(f, fname="temp4.py")

------------ temp4.py ----------
import numpy as np

Chrysler = np.array([32.9854693051665,...,177.371048256085])

General_Electric = np.array([34.82254735597956,...,195.5150518056073])

General_Motors = np.array([208.2453286635445,...,1364.599470457204])

US_Steel = np.array([247.5131792455174,3...,566.277048536767])

Westinghouse = np.array([12.27690563625844,...,77.5688631853628])
--------------------------------

We can also combine these two, named list aa and data frame f and save them at the same time

R2nparray(c(aa, f), fname="temp5.py")

The resulting python module contains the merged content

>>> import temp5
>>> dir(temp5)
['Chrysler', 'General_Electric', 'General_Motors', 'US_Steel',
'Westinghouse', '__builtins__', '__doc__', '__file__', '__name__',
'__package__', 'covparams', 'np', 'rank']
>>> temp5.covparams.shape
(15, 15)

Save all cat_items

a new version that saves everything that is not blacklisted, but currently mainly numerical types are useful. (TODO:not committed to statsmodels/tools yet, and no name cleaning):

> cat_items(SUR, prefix="sur.", blacklist=c("eq", "control"))
sur.call = '''systemfit(formula = formula, method = "SUR", data = panel)'''
sur.coefficients = np.array([0.9979991848420328,...,0.0429020916196108]).reshape(15,1, order='F')


sur.coefCov = np.array([157.3943509170185,...,0.002035467551712387]).reshape(15,15, order='F')


sur.residCovEst = np.array([176.3202565715889,...,104.3078782568039]).reshape(5,5, order='F')


sur.residCov = np.array([180.2786473970981,...,111.6549965340746]).reshape(5,5, order='F')


sur.method = SUR
sur.rank = 15
sur.df.residual = 85
sur.iter = 1
sur.panelLike = '''TRUE'''

Redirecting output to file - sink

Our helper functions use cat to write the output. cat print the strings to the standard output. The output can be redirected to a file using sink, for example

fname = "tmp_sur.py"
append = TRUE

sink(file=fname, append=append)
mkarray(SUR$coefficients, "params")
mkarray(SUR$coefCov, "cov_params")
mkarray(SUR$residCovEst, "resid_cov_est")
mkarray(SUR$residCov, "resid_cov")
mkarray(SUR$df.residual, "df_resid")
sink()

sink() clears the redirecting of the output. When there is an exception in the code, then sink() is not called and the interpreter shell doesn't print any output anymore. Typing sink() once or several times will bring the standard output back to the shell.