Metaxa is a deep learning–based classifier for metagenomic data that predicts taxonomic labels at the species and genus levels.
-
Clone the repository:
git clone https://github.com/lbcb-sci/metaxa.git cd metaxa
-
Install dependencies:
pip install -r requirements.txt pip install flash-attn --no-build-isolation
Note: The provided requirements.txt
includes dependencies for CUDA 12.4.
If you're using a different CUDA version, please install the appropriate version of torch
, torchvision
, and torchaudio
manually, and remove their entries from requirements.txt
before running pip install
.
- Download model:
wget -O metaxa-model.v0.1.ckpt https://zenodo.org/records/15062544/files/metaxa-model.v0.1.ckpt?download=1
Once installed, you can run Metaxa from the command line:
python metaxa/inference.py -c model.ckpt -d cuda:0 -b 1024 --n_workers 16 -o output.tsv reads.fastq
Argument | Description | Example |
---|---|---|
--checkpoint, -c | Path to model checkpoint | -c checkpoint.ckpt |
--device, -d | Device to run inference on | -d cuda:0 |
--batch_size, -b | Batch size | -b 1024 |
--n_workers | Number of data loading workers | --n_workers 16 |
--output, -o | Path to output classification file | -o output.tsv |
Input FASTQ/A file with sequences to classify | reads.fastq |
The output is a TSV file where each row contains:
- Read identifier (
read_id
), - Predicted species-level taxonomic ID (
species_taxid
), - Predicted genus-level taxonomic ID (
genus_taxid
).
This research is supported by the Singapore Ministry of Health’s National Medical Research Council under its Open Fund – Individual Research Grants (NMRC/OFIRG/MOH-000649-00).
Coming soon.