Ensembles module

`StreamingEnsemble`(base_estimator, n_estimators)	Abstract, base ensemble streaming class
`AUE`([base_estimator, n_estimators, ...])	Accuracy Updated Ensemble
`AWE`([base_estimator, n_estimators, n_splits])	Accuracy Weighted Ensemble
`DWM`([base_estimator, beta, theta, p, weighted])
`KMC`([base_estimator, n_estimators])	Wang, Yi, Yang Zhang, and Yong Wang. "Mining data streams
`CDS`([base_estimator, n_estimators, a, b])	Ditzler, Gregory, and Robi Polikar.
`NIE`([base_estimator, n_estimators, param_a, ...])	Ditzler, Gregory, and Robi Polikar.
`OnlineBagging`([base_estimator, n_estimators])	Online Bagging.
`OOB`([base_estimator, n_estimators, ...])	Oversamping-Based Online Bagging.
`OUSE`([base_estimator, n_estimators, n_chunks])	Gao, Jing, et al. "Classifying Data Streams with Skewed Class Distributions and Concept Drifts." IEEE Internet Computing 12.6 (2008): 37-49.
`REA`([base_estimator, n_estimators, ...])	Recursive Ensemble Approach.
`SEA`([base_estimator, n_estimators, metric])	Streaming Ensemble Algorithm.
`UOB`([base_estimator, n_estimators, ...])	Undersampling-Based Online Bagging.
`WAE`([base_estimator, n_estimators, theta, ...])	Weighted Aging Ensemble.
`KUE`([base_estimator, n_estimators, n_candidates])	Kappa Updated Ensemble

class strlearn.ensembles.AUE(base_estimator=None, n_estimators=10, n_splits=5, epsilon=1e-10)

Bases: StreamingEnsemble

Accuracy Updated Ensemble

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.AWE(base_estimator=None, n_estimators=10, n_splits=5)

Bases: StreamingEnsemble

Accuracy Weighted Ensemble

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.CDS(base_estimator=None, n_estimators=10, a=2, b=2)

Bases: StreamingEnsemble

Ditzler, Gregory, and Robi Polikar. “Incremental learning of concept drift from streaming imbalanced data.” IEEE Transactions on Knowledge and Data Engineering 25.10 (2013): 2283-2301.

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.DWM(base_estimator=None, beta=0.5, theta=0.01, p=1, weighted=False)

Bases: StreamingEnsemble

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.KMC(base_estimator=None, n_estimators=10)

Bases: StreamingEnsemble

Wang, Yi, Yang Zhang, and Yong Wang. “Mining data streams: with skewed distribution by static classifier ensemble.” Opportunities and Challenges for Next-Generation Applied Intelligence. Springer, Berlin, Heidelberg, 2009. 65-71.

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.KUE(base_estimator=None, n_estimators=10, n_candidates=1)

Bases: StreamingEnsemble

Kappa Updated Ensemble

ensemble_support_matrix(X): Ensemble support matrix.

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.NIE(base_estimator=None, n_estimators=5, param_a=1, param_b=1)

Bases: StreamingEnsemble

Ditzler, Gregory, and Robi Polikar. “Incremental learning of concept drift from streaming imbalanced data.” IEEE Transactions on Knowledge and Data Engineering 25.10 (2013): 2283-2301.

ensemble_support_matrix(X): Ensemble support matrix.

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.OOB(base_estimator=None, n_estimators=5, time_decay_factor=0.9)

Bases: StreamingEnsemble

Oversamping-Based Online Bagging.

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.OUSE(base_estimator=None, n_estimators=10, n_chunks=10)

Bases: ClassifierMixin, BaseEnsemble

Gao, Jing, et al. “Classifying Data Streams with Skewed Class Distributions and Concept Drifts.” IEEE Internet Computing 12.6 (2008): 37-49.

ensemble_support_matrix(X): Ensemble support matrix.

fit(X, y): Fitting.

minority_majority_name(y)

Returns minority and majority data

Parameters:: y (array-like, shape (n_samples)) – The target values.
Return type:: tuple (object, object)
Returns:: Tuple of minority and majority class names.

minority_majority_split(X, y, minority_name, majority_name)

Returns minority and majority data

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples)) – The target values.

Return type:

tuple (array-like, shape = [n_samples, n_features], array-like, shape = [n_samples, n_features])

Returns:

Tuple of minority and majority class samples

partial_fit(X, y, classes=None): Partial fitting.

predict(X)

Predict classes for X.

Parameters:: X (array-like, shape (n_samples, n_features)) – The training input samples.
Return type:: array-like, shape (n_samples, )
Returns:: The predicted classes.

class strlearn.ensembles.OnlineBagging(base_estimator=None, n_estimators=10)

Bases: StreamingEnsemble

Online Bagging.

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.REA(base_estimator=None, n_estimators=10, post_balance_ratio=0.5, k_parameter=10, weighted=False, pruning=False)

Bases: StreamingEnsemble

Recursive Ensemble Approach.

Sheng Chen, and Haibo He. “Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach.” Evolving Systems 2.1 (2011): 35-50.

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.ROSE(base_estimator=None, n_estimators=10, n_candidates=1, subspace_mean=0.7, buffer_limit=1000, min_lambda=4)

Bases: StreamingEnsemble

Robust Online Self-Adjusting Ensemble

ensemble_support_matrix(X): Ensemble support matrix.

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.SEA(base_estimator=None, n_estimators=10, metric=<function accuracy_score>)

Bases: StreamingEnsemble

Streaming Ensemble Algorithm.

Ensemble classifier composed of estimators trained on the fixed number of previously seen data chunks, prunning the worst one in the pool.

Parameters:

n_estimators (integer, optional (default=10)) – The maximum number of estimators trained using consecutive data chunks and maintained in the ensemble.
metric (function, optional (default=accuracy_score)) – The metric used to prune the worst classifier in the pool.

Variables:

ensemble (list of classifiers) – The collection of fitted sub-estimators.
classes (array-like, shape (n_classes, )) – The class labels.

Example:

>>> import strlearn as sl
>>> stream = sl.streams.StreamGenerator()
>>> clf = sl.ensembles.SEA()
>>> evaluator = sl.evaluators.TestThenTrainEvaluator()
>>> evaluator.process(clf, stream)
>>> print(evaluator.scores_)
...
[[0.92       0.91879699 0.91848191 0.91879699 0.92523364]
[0.945      0.94648779 0.94624912 0.94648779 0.94240838]
[0.925      0.92364329 0.92360881 0.92364329 0.91017964]
...
[0.925      0.92427885 0.924103   0.92427885 0.92890995]
[0.89       0.89016179 0.89015879 0.89016179 0.88297872]
[0.935      0.93569212 0.93540766 0.93569212 0.93467337]]

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.StreamingEnsemble(base_estimator, n_estimators, weighted=False)

Bases: ClassifierMixin, BaseEstimator

Abstract, base ensemble streaming class

ensemble_support_matrix(X): Ensemble support matrix.

fit(X, y): Fitting.

minority_majority_name(y)

Returns minority and majority data

Parameters:: y (array-like, shape (n_samples)) – The target values.
Return type:: tuple (object, object)
Returns:: Tuple of minority and majority class names.

minority_majority_split(X, y, minority_name, majority_name)

Returns minority and majority data

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples.
y (array-like, shape (n_samples)) – The target values.

Return type:

tuple (array-like, shape = [n_samples, n_features], array-like, shape = [n_samples, n_features])

Returns:

Tuple of minority and majority class samples

msei(clf, X, y): MSEi score from original AWE algorithm.

mser(y): MSEr score from original AWE algorithm.

partial_fit(X, y, classes=None): Partial fitting

predict(X)

Predict classes for X.

Parameters:: X (array-like, shape (n_samples, n_features)) – The training input samples.
Return type:: array-like, shape (n_samples, )
Returns:: The predicted classes.

predict_proba(X): Predict proba.

prior_proba(y): Calculate prior probability for given labels

class strlearn.ensembles.UOB(base_estimator=None, n_estimators=5, time_decay_factor=0.9)

Bases: StreamingEnsemble

Undersampling-Based Online Bagging.

partial_fit(X, y, classes=None): Partial fitting

class strlearn.ensembles.WAE(base_estimator=None, n_estimators=10, theta=0.1, post_pruning=False, pruning_criterion='accuracy', weight_calculation_method='kuncheva', aging_method='weights_proportional', rejuvenation_power=0.0)

Bases: StreamingEnsemble

Weighted Aging Ensemble.

The method was inspired by Accuracy Weighted Ensemble (AWE) algorithm to which it introduces two main modifications: (I) classifier weights depend on the individual classifier accuracies and time they have been spending in the ensemble, (II) individual classifier are chosen on the basis on the non-pairwise diversity measure.

Parameters:

base_estimator (ClassifierMixin class object) – Classification algorithm used as a base estimator.
n_estimators (integer, optional (default=10)) – The maximum number of estimators trained using consecutive data chunks and maintained in the ensemble.
theta (float, optional (default=0.1)) – Threshold for weight calculation method and aging procedure control.
post_pruning (boolean, optional (default=False)) – Whether the pruning is conducted before or after adding the classifier.
pruning_criterion (string, optional (default='accuracy')) – Selection of pruning criterion.
weight_calculation_method (string, optional (default='kuncheva')) – same_for_each, proportional_to_accuracy, kuncheva, pta_related_to_whole, bell_curve,
aging_method (string, optional (default='weights_proportional')) – weights_proportional, constant, gaussian.
rejuvenation_power (float, optional (default=0.0)) – Rejuvenation dynamics control of classifiers with high prediction accuracy.

Variables:

ensemble (list of classifiers) – The collection of fitted sub-estimators.
classes (array-like, shape (n_classes, )) – The class labels.
weights (array-like, shape (n_estimators, )) – Classifier weights.

Examples:

>>> import strlearn as sl
>>> from sklearn.naive_bayes import GaussianNB
>>> stream = sl.streams.StreamGenerator()
>>> clf = sl.ensembles.WAE(GaussianNB())
>>> ttt = sl.evaluators.TestThenTrain(
>>> metrics=(sl.metrics.balanced_accuracy_score))
>>> ttt.process(stream, clf)
>>> print(ttt.scores)
[[[0.91386218]
  [0.93032581]
  [0.90907219]
  [0.90544872]
  [0.90466186]
  [0.91956783]
  [0.90776942]
  [0.92685422]
  [0.92895186]
  ...

partial_fit(X, y, classes=None): Partial fitting