|This class implements the CHAID algorithm proposed by Kass (1980) for decision tree learning. CHAID consists of three steps: merging, splitting and stopping. A tree is grown by repeatedly using these three steps on each node starting from the root node. CHAID accepts nominal or ordinal categorical predictors only. If predictors are continuous, they have to be transformed into ordinal predictors before tree growing. |
CONVERTING CONTINUOUS PREDICTORS TO ORDINAL :
Continuous predictors are converted to ordinal by binning. The number of bins (K) has to be supplied by the user. Given K, a predictor is split in such a way that all the bins get the same number (more or less) of distinct predictor values. The maximum feature value in each bin is used as a breakpoint.
During the merging step, allowable pairs of categories of a predictor are evaluated for similarity. If the similarity of a pair is above a threshold, the categories constituting the pair are merged into a single category. The process is repeated until there is no pair left having high similarity between its categories. Similarity between categories is evaluated using the p_value
The splitting step selects which predictor to be used to best split the node. Selection is accomplished by comparing the adjusted p_value associated with each predictor. The predictor that has the smallest adjusted p_value is chosen for splitting the node.
The tree growing process stops if any of the following conditions is satisfied :