
class spkit.ml.DecisionTree(min_samples_split=2, min_impurity=1e-07, max_depth=inf, thresholdFromMean=False)

Super class of RegressionTree and ClassificationTree.

Optimizing depth:
  • In this version, a very large value can be used to build a tree and later can be shrink to lower depth (d),

  • using “.updateTree(shrink=True, max_depth=d)”. The optimal value of depth for given data can be choosing by analysing learning curve using “.getLcurve” method and/or “plotTree(Measures=True)”

max_depth: int:>0,
  • maximum depth to go for tree, default is Inf, which leads to overfit

  • decrease the max depth to reduce the overfitting.

min_samples_split: int
  • minimum number of samples to split further

min_impurity: float:
  • minimum impurity (or gain) to split

thresholdFromMean: bool, default = False.
  • if threshold is selcted from mean of two

  • concecutive unique values of selected a an unique value of feaure.

    Only applicable to float or int type features, not to catogorical type.


DictDepth(DT, n=0)

Get the maximum depth of dictionary

fit(X, y)

Building a tree and saving in an dictionary at self.tree

X: ndarray
  • (number of sample, number of features)

y: list of 1D array
  • labels

verbose: int (default=0)
  • 0 no progress or tree

  • 1 - show progress

  • 2 - show tree

getLcurve(Xt=None, yt=None, Xs=None, ys=None, measure='acc')

Getting Learning Curve By supplying Trainign and Testing Data, compute the given measure for each level of depth

Xt, yt: training data
Xs, ys: testing data
measure: str, metric

default=’acc’ for accuracy

Lcurve: dict,
  • as training and testing measures at each level of depth


Get the maximum depth of the tree


Extract built tree

plotLcurve(ax=None, title=True)

Plotting Learning Curve

After computing Learning Curve using getLcurve, Learnign curve can be plotted

ax: axis to plot
title: bool, if to show title
plotTree(scale=True, show=True, showtitle=True, showDirection=False, DiffBranchColor=True, legend=True, showNodevalues=True, showThreshold=True, hlines=False, Measures=False, dcml=0, leaf_labels=None)

Plot Decision Tree

plotTreePath(path, ax=None, fig=None)

Plotting the path of the tree

predict(X, max_depth=inf, treePath=False)

Predicting labels

USE “max_depth” to limit the depth of tree for expected values

Compute expected value for each sample in X, up till depth=max_depth of the tree

For classification: Expected value is a label with maximum probabilty among the samples at leaf nodes. For probability and count of each labels at leaf node, use “.predict_proba” method

For Regression: Expected value is a mean value of smaples at the leaf node. For Standard Deviation and number of samples at leaf node, use “.predict_proba” method

X: ndarray
  • (number of sample, number of features)

max_depth: int, default=np.inf
  • maximum depth of the tree to use for the prediction

treePath: bool, Default=False
  • if True, path of the tree is also returned, such as ‘TTFTFT…’

y: list of 1D array
  • labels

  • and paths, if treePath is True

predict_proba(X, label_counts=False, max_depth=inf, treePath=False)

Predicting probabilties of labels

USE “max_depth” to limit the depth of tree for expected values Compute probabilty/SD for labeles at the leaf till max_depth level, for each sample in X

For classification: Returns the Probability of samples one by one and return the set of labels label_counts=True: Includes in the return, the counts of labels at the leaf

For Regression: Returns the standard deviation of values at the leaf node. Mean value is returened with “.predice()” method label_counts=True: Includes in the return, the number of samples at the leaf

treePath=True: Includes the path of tree for each sample as string

X: ndarray
  • (number of sample, number of features)

max_depth: int, default=np.inf
  • maximum depth of the tree to use for the prediction

label_counts: bool, default=False
  • If true
    • count of each class labels are returned (classification)

    • count of samples at the leaf returned (regression)

treePath: bool, Default=False
  • if True, path of the tree is also returned, such as ‘TTFTFT…’

y_prob: 1D/2D array
  • labels

y_counts: count of labels
y_paths: path of each samples

Prunning the tree

DT: A decision Tree
DT: Prunned Tree

Set Feature names

set_xyNode(DT, lxy=[0, 1], xy=[1, 1], rxy=[2, 1], ldiff=1)

Setting the xy location of each node

in the tree

showTree(DT, DiffBranchColor=False, showNodevalues=True, showThreshold=True)

Helper function fot plotTree

shrinkTree(DT, max_depth=inf)

Shinking the tree

DT: A decision Tree
max_depth: int
  • depth, to which tree is to be shrinked

DT: Shrinked Tree
treeDepth(DT, mx=0)

Compute the maximum depth of the tree

updateTree(DT=None, shrink=False, max_depth=inf)

Updating Tree with Shrinking and Pruning

DT: A decision Tree,
  • if None, root tree self.tree is used

max_depth: int
  • depth, to which tree is to be shrinked

shrink: bool,
  • if to shrink the tree

DT: Updated Tree