- flink-project: it contains the source code and the jar file (that can be submitted to the Flink cluster). The project can also be runned without the cluster, directly from the IDE
- results: it contains the result of some runs of the algorithm (more info in the Jupyter notebook)
- flink_k-means.ipynb: it is a Jupyter notebook used to show data, plots and results
- openable with Jupyter
- it requires tabulate
- files:
- points: string, path of the input file containing the points
- centroids: string, path of the input file containing the centroids, (if any)
- pointsout: string, path of the output file containing the points with the associated cluster
- centroidsout: string, path of the output file containing the computed centroids
- objfunout: string, path of the output file containing the value of the objective function
- iteration params:
- iterations: int, max number of iterations
- custconvergence: boolean, if custom convergence is used
- centroids:
- numcentroids: int number of centroids to generate. If specified, centroids input file is ignored
- minc: int, min value for centroid x, y
- maxc: int, max value for centroid x, y
- recompnearest: int, number of centroids nearest centroids are recomputed
Example:
-numcentroids 8 -recompnearest 3 -iterations 10 -custconvergence false
-points "points.csv" -centroids "centroids.csv"
-pointsout "new_points.csv" -centroidsout "new_centroids.csv" -objfunout "objfun.csv"
Download flink here, follow this to start the cluster and submit the Jar file.