Hierarchical Clustering

Hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. We apply a “bottom up” approach: each observation starts in its own clister, and pairs of clusters are subsequently merged.

The merges are determined in a greedy manner. We start by constructing a pairwise distance matrix. Then, the clusters of the pair with closest distance are merged iteratively.


Imagine we have files with the training data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) as:

features_train = RealFeatures(f_feats_train)
features_train = RealFeatures(f_feats_train);
RealFeatures features_train = new RealFeatures(f_feats_train);
features_train = Shogun::RealFeatures.new f_feats_train
features_train <- RealFeatures(f_feats_train)
features_train = shogun.RealFeatures(f_feats_train)
RealFeatures features_train = new RealFeatures(f_feats_train);
auto features_train = some<CDenseFeatures<float64_t>>(f_feats_train);

In order to run CHierarchical, we need to choose a distance, for example CEuclideanDistance, or other sub-classes of CDistance. The distance is initialized with the data we want to classify.

distance = EuclideanDistance(features_train, features_train)
distance = EuclideanDistance(features_train, features_train);
EuclideanDistance distance = new EuclideanDistance(features_train, features_train);
distance = Shogun::EuclideanDistance.new features_train, features_train
distance <- EuclideanDistance(features_train, features_train)
distance = shogun.EuclideanDistance(features_train, features_train)
EuclideanDistance distance = new EuclideanDistance(features_train, features_train);
auto distance = some<CEuclideanDistance>(features_train, features_train);

We then create an instance of the CHierarchical classifier by assigning the steps of merging we expect to have in the training.

merges = 3
hierarchical = Hierarchical(merges, distance)
merges = 3;
hierarchical = Hierarchical(merges, distance);
int merges = 3;
Hierarchical hierarchical = new Hierarchical(merges, distance);
merges = 3
hierarchical = Shogun::Hierarchical.new merges, distance
merges <- 3
hierarchical <- Hierarchical(merges, distance)
merges = 3
hierarchical = shogun.Hierarchical(merges, distance)
int merges = 3;
Hierarchical hierarchical = new Hierarchical(merges, distance);
auto merges = 3;
auto hierarchical = some<CHierarchical>(merges, distance);

We can extract the information of the two merged elements, as well as the distance between them in each merging step:

d = hierarchical.get_merge_distances()
cp = hierarchical.get_cluster_pairs()
d = hierarchical.get_merge_distances();
cp = hierarchical.get_cluster_pairs();
DoubleMatrix d = hierarchical.get_merge_distances();
DoubleMatrix cp = hierarchical.get_cluster_pairs();
d = hierarchical.get_merge_distances 
cp = hierarchical.get_cluster_pairs 
d <- hierarchical$get_merge_distances()
cp <- hierarchical$get_cluster_pairs()
d = hierarchical:get_merge_distances()
cp = hierarchical:get_cluster_pairs()
double[] d = hierarchical.get_merge_distances();
int[,] cp = hierarchical.get_cluster_pairs();
auto d = hierarchical->get_merge_distances();
auto cp = hierarchical->get_cluster_pairs();