Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
marcjermaine-pontiveros committed Mar 17, 2020
1 parent 7f847eb commit e84c2b0
Showing 1 changed file with 33 additions and 7 deletions.
40 changes: 33 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,23 @@

Data scientists find it really difficult to choose the right features to get maximum accuracy especially if you are dealing with a lot of features. There are currenlty lots of ways to select the right features. But we will have to struggle if the feature space is really big. Genetic algorithm is one solution which searches for one of the best feature set from other features in order to attain a high accuracy.

#### Usage:
```
from sklearn.datasets import make_classification
from sklearn import linear_model
from feature_selection_ga import FeatureSelectionGA
import fitness_function as ff
X, y = make_classification(n_samples=100, n_features=15, n_classes=3,
n_informative=4, n_redundant=1, n_repeated=2,
random_state=1)
model = linear_model.LogisticRegression(solver='lbfgs', multi_class='auto')
fsga = FeatureSelectionGA(model,X,y, ff_obj = ff.FitnessFunction())
pop = fsga.generate(100)
#print(pop)
```

#### Usage (Advanced):

By default, the FeatureSelectionGA has its own fitness function class. We can also define our own
Expand All @@ -26,10 +43,10 @@ class FitnessFunction:
```

With this, we can design our own fitness function by defining our calculate fitness!
Consider the following example from Vieira, Mendoca, Sousa, et al. (2013)
Consider the following example from [Vieira, Mendoca, Sousa, et al. (2013)](http://www.sciencedirect.com/science/article/pii/S1568494613001361)
```f(X) = \alpha(1-P) + (1-\alpha) \left(1 - \dfrac{N_f}{N_t}\right)```


Define the constructor __init__ with needed parameters:
Define the constructor __init__ with needed parameters: alpha and N_t.
```
class FitnessFunction:
def __init__(self,n_total_features,n_splits = 5, alpha=0.01, *args,**kwargs):
Expand All @@ -52,6 +69,8 @@ class FitnessFunction:
self.n_total_features = n_total_features
```

Next, we define the fitness function, the name has to be
calculate_fitness:
```
def calculate_fitness(self,model,x,y):
alpha = self.alpha
Expand All @@ -73,12 +92,19 @@ class FitnessFunction:
return fitness
```

Example:
You may also see ```example2.py```
```
model = LogisticRegression()
fsga = FeatureSelectionGA(model,x_train,y_train)
X, y = make_classification(n_samples=100, n_features=15, n_classes=3,
n_informative=4, n_redundant=1, n_repeated=2,
random_state=1)
# Define the model
model = linear_model.LogisticRegression(solver='lbfgs', multi_class='auto')
# Define the fitness function object
ff = FitnessFunction(n_total_features= X.shape[1], n_splits=3, alpha=0.05)
fsga = FeatureSelectionGA(model,X,y, ff_obj = ff)
pop = fsga.generate(100)
#Select the best individual from the final population and fit the initialized model
```


Example adopted from [pyswarms](https://pyswarms.readthedocs.io/en/latest/examples/usecases/feature_subset_selection.html)

0 comments on commit e84c2b0

Please sign in to comment.