Ensembles module

StreamingEnsemble(base_estimator, n_estimators)

Abstract, base ensemble streaming class

AUE([base_estimator, n_estimators, ...])

Accuracy Updated Ensemble

AWE([base_estimator, n_estimators, n_splits])

Accuracy Weighted Ensemble

DWM([base_estimator, beta, theta, p, weighted])

KMC([base_estimator, n_estimators])

Wang, Yi, Yang Zhang, and Yong Wang. "Mining data streams

CDS([base_estimator, n_estimators, a, b])

Ditzler, Gregory, and Robi Polikar.

NIE([base_estimator, n_estimators, param_a, ...])

Ditzler, Gregory, and Robi Polikar.

OnlineBagging([base_estimator, n_estimators])

Online Bagging.

OOB([base_estimator, n_estimators, ...])

Oversamping-Based Online Bagging.

OUSE([base_estimator, n_estimators, n_chunks])

Gao, Jing, et al. "Classifying Data Streams with Skewed Class Distributions and Concept Drifts." IEEE Internet Computing 12.6 (2008): 37-49.

REA([base_estimator, n_estimators, ...])

Recursive Ensemble Approach.

SEA([base_estimator, n_estimators, metric])

Streaming Ensemble Algorithm.

UOB([base_estimator, n_estimators, ...])

Undersampling-Based Online Bagging.

WAE([base_estimator, n_estimators, theta, ...])

Weighted Aging Ensemble.

KUE([base_estimator, n_estimators, n_candidates])

Kappa Updated Ensemble

class strlearn.ensembles.AUE(base_estimator=None, n_estimators=10, n_splits=5, epsilon=1e-10)

Bases: StreamingEnsemble

Accuracy Updated Ensemble

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.AWE(base_estimator=None, n_estimators=10, n_splits=5)

Bases: StreamingEnsemble

Accuracy Weighted Ensemble

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.CDS(base_estimator=None, n_estimators=10, a=2, b=2)

Bases: StreamingEnsemble

Ditzler, Gregory, and Robi Polikar. “Incremental learning of concept drift from streaming imbalanced data.” IEEE Transactions on Knowledge and Data Engineering 25.10 (2013): 2283-2301.

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.DWM(base_estimator=None, beta=0.5, theta=0.01, p=1, weighted=False)

Bases: StreamingEnsemble

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.KMC(base_estimator=None, n_estimators=10)

Bases: StreamingEnsemble

Wang, Yi, Yang Zhang, and Yong Wang. “Mining data streams

with skewed distribution by static classifier ensemble.” Opportunities and Challenges for Next-Generation Applied Intelligence. Springer, Berlin, Heidelberg, 2009. 65-71.

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.KUE(base_estimator=None, n_estimators=10, n_candidates=1)

Bases: StreamingEnsemble

Kappa Updated Ensemble

ensemble_support_matrix(X)

Ensemble support matrix.

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.NIE(base_estimator=None, n_estimators=5, param_a=1, param_b=1)

Bases: StreamingEnsemble

Ditzler, Gregory, and Robi Polikar. “Incremental learning of concept drift from streaming imbalanced data.” IEEE Transactions on Knowledge and Data Engineering 25.10 (2013): 2283-2301.

ensemble_support_matrix(X)

Ensemble support matrix.

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.OOB(base_estimator=None, n_estimators=5, time_decay_factor=0.9)

Bases: StreamingEnsemble

Oversamping-Based Online Bagging.

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.OUSE(base_estimator=None, n_estimators=10, n_chunks=10)

Bases: ClassifierMixin, BaseEnsemble

Gao, Jing, et al. “Classifying Data Streams with Skewed Class Distributions and Concept Drifts.” IEEE Internet Computing 12.6 (2008): 37-49.

ensemble_support_matrix(X)

Ensemble support matrix.

fit(X, y)

Fitting.

minority_majority_name(y)

Returns minority and majority data

Parameters:

y (array-like, shape (n_samples)) – The target values.

Return type:

tuple (object, object)

Returns:

Tuple of minority and majority class names.

minority_majority_split(X, y, minority_name, majority_name)

Returns minority and majority data

Parameters:
  • X (array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples)) – The target values.

Return type:

tuple (array-like, shape = [n_samples, n_features], array-like, shape = [n_samples, n_features])

Returns:

Tuple of minority and majority class samples

partial_fit(X, y, classes=None)

Partial fitting.

predict(X)

Predict classes for X.

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples.

Return type:

array-like, shape (n_samples, )

Returns:

The predicted classes.

class strlearn.ensembles.OnlineBagging(base_estimator=None, n_estimators=10)

Bases: StreamingEnsemble

Online Bagging.

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.REA(base_estimator=None, n_estimators=10, post_balance_ratio=0.5, k_parameter=10, weighted=False, pruning=False)

Bases: StreamingEnsemble

Recursive Ensemble Approach.

Sheng Chen, and Haibo He. “Towards incremental learning of nonstationary imbalanced data stream: a multiple selectively recursive approach.” Evolving Systems 2.1 (2011): 35-50.

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.ROSE(base_estimator=None, n_estimators=10, n_candidates=1, subspace_mean=0.7, buffer_limit=1000, min_lambda=4)

Bases: StreamingEnsemble

Robust Online Self-Adjusting Ensemble

ensemble_support_matrix(X)

Ensemble support matrix.

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.SEA(base_estimator=None, n_estimators=10, metric=<function accuracy_score>)

Bases: StreamingEnsemble

Streaming Ensemble Algorithm.

Ensemble classifier composed of estimators trained on the fixed number of previously seen data chunks, prunning the worst one in the pool.

Parameters:
  • n_estimators (integer, optional (default=10)) – The maximum number of estimators trained using consecutive data chunks and maintained in the ensemble.

  • metric (function, optional (default=accuracy_score)) – The metric used to prune the worst classifier in the pool.

Variables:
  • ensemble (list of classifiers) – The collection of fitted sub-estimators.

  • classes (array-like, shape (n_classes, )) – The class labels.

Example:

>>> import strlearn as sl
>>> stream = sl.streams.StreamGenerator()
>>> clf = sl.ensembles.SEA()
>>> evaluator = sl.evaluators.TestThenTrainEvaluator()
>>> evaluator.process(clf, stream)
>>> print(evaluator.scores_)
...
[[0.92       0.91879699 0.91848191 0.91879699 0.92523364]
[0.945      0.94648779 0.94624912 0.94648779 0.94240838]
[0.925      0.92364329 0.92360881 0.92364329 0.91017964]
...
[0.925      0.92427885 0.924103   0.92427885 0.92890995]
[0.89       0.89016179 0.89015879 0.89016179 0.88297872]
[0.935      0.93569212 0.93540766 0.93569212 0.93467337]]
partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.StreamingEnsemble(base_estimator, n_estimators, weighted=False)

Bases: ClassifierMixin, BaseEstimator

Abstract, base ensemble streaming class

ensemble_support_matrix(X)

Ensemble support matrix.

fit(X, y)

Fitting.

minority_majority_name(y)

Returns minority and majority data

Parameters:

y (array-like, shape (n_samples)) – The target values.

Return type:

tuple (object, object)

Returns:

Tuple of minority and majority class names.

minority_majority_split(X, y, minority_name, majority_name)

Returns minority and majority data

Parameters:
  • X (array-like, shape (n_samples, n_features)) – The training input samples.

  • y (array-like, shape (n_samples)) – The target values.

Return type:

tuple (array-like, shape = [n_samples, n_features], array-like, shape = [n_samples, n_features])

Returns:

Tuple of minority and majority class samples

msei(clf, X, y)

MSEi score from original AWE algorithm.

mser(y)

MSEr score from original AWE algorithm.

partial_fit(X, y, classes=None)

Partial fitting

predict(X)

Predict classes for X.

Parameters:

X (array-like, shape (n_samples, n_features)) – The training input samples.

Return type:

array-like, shape (n_samples, )

Returns:

The predicted classes.

predict_proba(X)

Predict proba.

prior_proba(y)

Calculate prior probability for given labels

class strlearn.ensembles.UOB(base_estimator=None, n_estimators=5, time_decay_factor=0.9)

Bases: StreamingEnsemble

Undersampling-Based Online Bagging.

partial_fit(X, y, classes=None)

Partial fitting

class strlearn.ensembles.WAE(base_estimator=None, n_estimators=10, theta=0.1, post_pruning=False, pruning_criterion='accuracy', weight_calculation_method='kuncheva', aging_method='weights_proportional', rejuvenation_power=0.0)

Bases: StreamingEnsemble

Weighted Aging Ensemble.

The method was inspired by Accuracy Weighted Ensemble (AWE) algorithm to which it introduces two main modifications: (I) classifier weights depend on the individual classifier accuracies and time they have been spending in the ensemble, (II) individual classifier are chosen on the basis on the non-pairwise diversity measure.

Parameters:
  • base_estimator (ClassifierMixin class object) – Classification algorithm used as a base estimator.

  • n_estimators (integer, optional (default=10)) – The maximum number of estimators trained using consecutive data chunks and maintained in the ensemble.

  • theta (float, optional (default=0.1)) – Threshold for weight calculation method and aging procedure control.

  • post_pruning (boolean, optional (default=False)) – Whether the pruning is conducted before or after adding the classifier.

  • pruning_criterion (string, optional (default='accuracy')) – Selection of pruning criterion.

  • weight_calculation_method (string, optional (default='kuncheva')) – same_for_each, proportional_to_accuracy, kuncheva, pta_related_to_whole, bell_curve,

  • aging_method (string, optional (default='weights_proportional')) – weights_proportional, constant, gaussian.

  • rejuvenation_power (float, optional (default=0.0)) – Rejuvenation dynamics control of classifiers with high prediction accuracy.

Variables:
  • ensemble (list of classifiers) – The collection of fitted sub-estimators.

  • classes (array-like, shape (n_classes, )) – The class labels.

  • weights (array-like, shape (n_estimators, )) – Classifier weights.

Examples:

>>> import strlearn as sl
>>> from sklearn.naive_bayes import GaussianNB
>>> stream = sl.streams.StreamGenerator()
>>> clf = sl.ensembles.WAE(GaussianNB())
>>> ttt = sl.evaluators.TestThenTrain(
>>> metrics=(sl.metrics.balanced_accuracy_score))
>>> ttt.process(stream, clf)
>>> print(ttt.scores)
[[[0.91386218]
  [0.93032581]
  [0.90907219]
  [0.90544872]
  [0.90466186]
  [0.91956783]
  [0.90776942]
  [0.92685422]
  [0.92895186]
  ...
partial_fit(X, y, classes=None)

Partial fitting