Go implementation of Self-Organizing Maps (SOM) alias Kohonen maps. Provides a command line tool and a library for training and visualizing SOMs.
- Multi-layered SOMs, alias XYF, alias super-SOMs.
- Visualization Induced SOMs, alias ViSOMs.
- Training from CSV files without any manual preprocessing.
- Supports continuous and discrete data.
- Fully customizable training and SOM parameters.
- Visualization of SOMs by a wide range of flexible plots.
- Use as command line tool or as Go library.
Please note that the built-in visualizations are not intended for publication-quality output. Instead, they serve as quick tools for inspecting training and prediction results. For high-quality visualizations, we recommend exporting the SOM and other results to CSV files. You can then use dedicated visualization libraries in languages such as Python or R to create more refined and customized graphics.
Pre-compiled binaries for Linux, Windows and MacOS are available in the Releases.
Alternatively, install the latest version using Go:
go install github.com/mlange-42/som/cmd/som@latest
Get help for the command line tool:
som --help
Here are some examples how to use the command line tool, using the World Countries dataset.
Train an SOM with the dataset:
som train _examples/countries/untrained.yml _examples/countries/data.csv > trained.yml
Visualize the trained SOM as heatmaps of components, showing labels of data points (i.e. countries):
som plot heatmap trained.yml heatmap.png --data-file _examples/countries/data.csv --label Country
Export the trained SOM to a CSV file:
som export trained.yml > nodes.csv
Determine the best-matching unit (BMU) for a each row in the dataset:
som bmu trained.yml _examples/countries/data.csv --preserve Country,code,continent > bmu.csv
Taken from the CLI help, here is a tree representation of all currently available (sub)-commands:
som Self-organizing maps command line tool.
├─train Trains an SOM on the given dataset.
├─quality Calculates various quality metrics for a trained SOM.
├─label Classifies SOM nodes using label propagation.
├─export Exports an SOM to a CSV table of node vectors.
├─predict Predicts entire layers or table columns using a trained SOM.
├─bmu Finds the best-matching unit (BMU) for each table row in a dataset.
├─fill Fills missing data in the data file based on a trained SOM.
└─plot Plots visualizations for an SOM in various ways. See sub-commands.
├─heatmap Plots heat maps of multiple SOM variables, a.k.a. components plot.
├─codes Plots SOM node codes in different ways. See sub-commands.
│ ├─line Plots SOM node codes as line charts.
│ ├─bar Plots SOM node codes as bar charts.
│ ├─pie Plots SOM node codes as pie charts.
│ ├─rose Plots SOM node codes as rose alias Nightingale charts.
│ └─image Plots SOM node codes as images.
├─u-matrix Plots the u-matrix of an SOM, showing inter-node distances.
├─xy Plots for pairs of SOM variables as scatter plots.
├─density Plots the data density of an SOM as a heatmap.
└─error Plots (root) mean-squared node error as a heatmap.
The command line tool uses a YAML configuration file to specify the SOM parameters.
Here is an example of a configuration file for the Iris dataset.
The dataset has these columns: species
, sepal_length
, sepal_width
, petal_length
, and petal_width
.
som: # SOM definitions
size: [8, 6] # Size of the SOM
neighborhood: gaussian # Neighborhood function
metric: manhattan # Distance metric in map space
visom-metric: euclidean # Distance metric for ViSOM update
layers: # Layers of the SOM
- name: Scalars # Name of the layer. Has no meaning for continuous layers
columns: # Columns of the layer
- sepal_length # Column names as in the dataset
- sepal_width
- petal_length
- petal_width
norm: [gaussian] # Normalization function(s) for columns
metric: euclidean # Distance metric
weight: 1 # Weight of the layer
- name: species # Name of the layer. Use column name for categorical layers
metric: hamming # Distance metric
categorical: true # Layer is categorical. Omit columns
weight: 0.5 # Weight of the layer
training: # Training parameters. Optional. Can be overwritten by CLI arguments
epochs: 2500 # Number of training epochs
alpha: polynomial 0.25 0.01 2 # Learning rate decay function
radius: polynomial 6 1 2 # Neighborhood radius decay function
weight-decay: polynomial 0.5 0.0 3 # Weight decay coefficient function
lambda: 0.33 # ViSOM resolution parameter
See the examples folder for more examples.
This project is distributed under the MIT license.