Synthetic

Cluster Analysis

SD1: it has 700 data points and 5 well-separated clusters

SD2: it has 1400 data points and no clear cluster structure.

Semi-Supervised Clustering (SSC)

Must-link (ML) and cannot-link (CL) pairs: sample data generated randomly using Gaussian distribution: there are 70 data points with six ML and six CL pairs. “Data_SSC_points” has the data points, and “Data_SSC_ML” and “Data_SSC_CL” present the location of data points with ML and CL pairs, respectively. The figure below illustrates the data set: the data points are grouped in four clusters where points in each cluster are denoted using the same color: orange for the points in the first cluster; green for the points in the second cluster; purple for the points in the third cluster, and black for the points in the fourth cluster. The points with ML pairs are joined using the blue solid lines, and those with CL pairs are presented using red dashed lines.

Regression Analysis