WuKong is an easy-to-use framework for LLM inference and agent serving in golang from scratch and try to solve the issue.
sudo apt-get install cuda-toolkit # for cuda-runtime
sudo apt-get install golang # for golang tools
sudo apt-get install cmake make # for makefiles
It'd better to put these 'exports' in ~/.bashrc
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export PATH=$PATH:$(go env GOPATH)/bin
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
git clone [email protected]:liuy/wukong.git # get the source
cd wukong/
make # kick start compiling
Right now we support pulling models from huggingface.co, modelscope.cn and ollama.com
./wk run hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF:f16 # Download and run model from huggingface.com
./wk run modelscope.cn/bartowski/Llama-3.2-1B-Instruct-GGUF:f16 # Download and run model from modelscople.cn of Alibaba
./wk run llama3.2:1b # Download and run model from ollama.com (default to Q8_0)