-
Notifications
You must be signed in to change notification settings - Fork 0
Gsoc ideas
For general information see here
The area where the coverage in statsmodels is lacking is still pretty wide. So, if a student has a strong preference, then it should be or might be possible to cover it.
The idea is basically, pick your favorite chapters in an econometrics or statistics book, or R package or Stata topic or any other package for statistical analysis and see what is missing and would be useful to have available with higher priority.
Of course, support for a topic will also depend on the availability of a mentor with sufficient expertise to advice.
The following are some ideas. If you are interested in one of the topics, we can also help with additional information.
Convenient support for categorical explanatory variables is still largely lacking in statsmodels. This can follow up on the existing formula implementation of Jonathan and of Nathaniel, and the start of the integration in the statsmodels account on github. The topic is pretty complex and I would recommend it only to someone familiar with the formula framework in R.
Linear_model, robust_linear_model and generalized_linear_model could all take a given non-linear function y = f(x, parameters) instead of the current linear version y = X*beta. Technically this can follow mostly the pattern of the current linear versions, but requires that one gets familiar with all three models.
Generic GMM is mostly implemented in the sandbox, but it has missing pieces. Except for two-stage least squares case no specific models that use GMM are implemented. The possible application areas are wide, one possibility that has been popular in recent years would be support for weak instruments.
These are models with an additional random component that can be either implemented from a statistics or an econometrics viewpoint. The topic is large so some selection has to be taken.
similar ideas but different implementation from a statistics or an econometrics viewpoint. Estimation and inference based on moment conditions or estimating equations based on a panel or longitudinal structure of the data.
A wide range of models where statsmodels is completely lacking. Examples would be threshold models, markov switching models, ...
mainly Stock and Watson and offspring. Interesting would be also to link this up with some of the variable selection procedures in sklearn similar to Bai and Ng.
extending current vector_ar models to include VECM representation and estimation and the corresponding cointegration estimation.
adapt and integrate Wes's DLM code (JP: I don't know what the status is.)
large parts for univariate GARCH are written and in the sandbox, but needs cleanup, enhancements and verification.
statsmodels has some plots with matplotlib included, but compared to other statistical packages there are still gaps. An idea would be to implement graphics with a coverage similar to other statistical packages in a user friendly way.
survival, duration
two stage models (e.g. Heckman sample selection)
system of equations
multivariate models (several endogenous/response variables)
extension to discrete models
non-parametric estimation
....