Llama2通过llama.cpp模型量化 Windows&Linux本地部署
fc&&fl:
教程可行,全程无报错,但是我不确定的是,我这里是卡住了吗,发不了图片,输入
.\build\bin\Release\main.exe -m .\models\13B\ggml-model-q4_0.gguf -n 256 -t 18 --repeat_penalty 1.0 --color -i -r "User:" -f .\prompts\chat-with-bob.txt
后终端输出
...................................................................................................
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.12 MiB
llama_new_context_with_model: CPU compute buffer size = 70.50 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 1
然后就不动了
|