Compiling ========= The `Makefile` is generated by `build.py`, which specifies the possible combinations of Algorithms and the possible variants. After the `Makefile` is generated the `CXXFLAGS` can be adjusted to specify extra flags that apply to all algorithms. The following flags are available: | Flags | Explanation | | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | DEBUG_CORRECTNESS | Sends hashes of all clusters to `stderr` after every iteration. | | DUMP_CENTER_POSITIONS | Sends the positions of all cluster centers to `stderr` after every iteration. | | ENABLE_STATS | Sends detailed statistics to `stdout`. | | ENABLE_DATASET_STATS | Includes statistics that are specific to the dataset in the statistics. These dataset statistics are the same for exact algorithms if the starting configuration is identical. | Simply invoking `make` will build all variants within the `target/` directory. A `Dockerfile` installing the necessary environment and compiling the binaries is available. Executing ========= Executing the resulting binaries works like this: ``` target/kmeans_… ``` with the following parameters: | Parameter | Explanation | | ---------------------- | ------------------------------------------------------------------ | | `` | Either Ascii or Ubytes depending on the format of the ``. | | `` | Filename of the dataset to cluster. | | `` | Number of clusters. | | `` | Required for KMEANSPP variants. Specifies the seed for the PRNG. | Output will be sent to `stdout` / `stderr`. It can be redirected to a file using the usual shell operators. Evaluating ========== The scripts in the `contrib/` folder might be helpful to evaluate the raw data generated by the binaries. They use a combination of JavaScript on the [node.js platform](https://nodejs.org/en/) and [jq](https://stedolan.github.io/jq/) scripts, tied together with bash and [fish](https://fishshell.com/) scripts. Depending on the script the output format is one of human readable plaintext, CSV, TSV and LaTeX code.