qwen2微调与使用过程

lora微调

参考文章进行微调，得到 .pt 和 .pth 等文件（checkpoint）。

部分代码如下：

# Transformers加载模型权重
tokenizer = AutoTokenizer.from_pretrained("./qwen/Qwen2-0___5B-Instruct/", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("./qwen/Qwen2-0___5B-Instruct/", device_map={"": 0}, torch_dtype=torch.bfloat16, max_memory={0: "4GB"})
model.enable_input_require_grads()  # 开启梯度检查点时，要执行该方法

train_dataset_path = "Muice-Dataset/train.jsonl"

# 得到训练集
train_df = pd.read_json(train_dataset_path, lines=True)
train_ds = Dataset.from_pandas(train_df)
train_dataset = train_ds.map(process_func, remove_columns=train_ds.column_names)

config = LoraConfig(
    # ...
)

model = get_peft_model(model, config)

args = TrainingArguments(
    # ...
)

swanlab_callback = SwanLabCallback(
    # ...
    config={
   
        "model": "qwen/Qwen2_05B_Instruct",
        "dataset": "huangjintao/Muice-Dataset",
    }
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
    callbacks=[swanlab_callback],
)

trainer.train()
trainer.save_model("./trained")  # 这将保存模型权重到指定目录
tokenizer.save_pretrained("./trained")  # 确保和模型权重保存在同一目录
model.config.save_pretrained("./trained")

swanlab.finish()

首先加载 tokenizer，model，然后设置trainer想要的部分参数（如 config，TrainingArguments），使用 train 方法开始训练，训练结束后保存模型和 tokenizer。

与底模合并

参考文章，使用其中的 merge.py 合并，得到safetensor及其附属文件。

部分代码如下

base = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16)
base_tokenizer = AutoTokenizer.from_pretrained(model_path)
lora_model = PeftModel.from_pretrained(
    base,
    lora_path,
    torch_dtype=torch.float16,
    config=config
)
model = lora_model.merge_and_unload()
model.save_pretrained(output_path)
base_tokenizer.save_pretrained(output_path)

首先，加载底模和训练得到的对应lora（checkpoint文件），然后使用 merge_and_unload 方法进行合并。

合并得到的是 safetensor 文件。

合并为.bin文件

使用 llama.cpp 中的 convert-legacy-llama.py 进行合并。

1	`python ./examples/convert-legacy-llama.py G:\Git\qwen-tmp-1\merged\merged-2 --outtype f16 --outfile G:\Git\qwen-tmp-1\merged\merged-2.bin --vocab-type bpe --pad-vocab`

注意最后的两个参数：—vocab-type bpe 和 —pad-vocab。

如果不添加前者，可能会出现：

1	`FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']`

如果不添加后者，可能会出现尺寸不符：

1	`ValueError: Vocab size mismatch (model has 151936, but G:\Git\qwen-tmp-1\merged\merged-2\vocab.json has 151646). Add the --pad-vocab option and try again.`

但是对于 qwen 来说

需要使用 convert-hf-to-gguf.py 而不是 convert-legacy-llama.py。

1	`python ./convert-hf-to-gguf.py G:\Git\qwen-tmp-1\merged\merged-3 --outtype f16 --outfile G:\Git\qwen-tmp-1\merged\merged-3-3.bin`

这个时候，就不需要后面的参数了。

后续部分照常。

.bin 文件量化

依然是使用 llama.cpp。

1	`./bin/llama-quantize.exe G:\Git\qwen-tmp-1\merged\merged-2.bin q5_k_m`

有多种量化方式可以选择。量化后，得到 .gguf 文件。当然，也可以使用 COPY，不进行量化，只是单纯的拷贝到 gguf 文件。

Allowed quantization types:
   2  or  Q4_0    :  4.34G, +0.4685 ppl @ Llama-3-8B
   3  or  Q4_1    :  4.78G, +0.4511 ppl @ Llama-3-8B
   8  or  Q5_0    :  5.21G, +0.1316 ppl @ Llama-3-8B
   9  or  Q5_1    :  5.65G, +0.1062 ppl @ Llama-3-8B
  19  or  IQ2_XXS :  2.06 bpw quantization
  20  or  IQ2_XS  :  2.31 bpw quantization
  28  or  IQ2_S   :  2.5  bpw quantization
  29  or  IQ2_M   :  2.7  bpw quantization
  24  or  IQ1_S   :  1.56 bpw quantization
  31  or  IQ1_M   :  1.75 bpw quantization
  10  or  Q2_K    :  2.96G, +3.5199 ppl @ Llama-3-8B
  21  or  Q2_K_S  :  2.96G, +3.1836 ppl @ Llama-3-8B
  23  or  IQ3_XXS :  3.06 bpw quantization
  26  or  IQ3_S   :  3.44 bpw quantization
  27  or  IQ3_M   :  3.66 bpw quantization mix
  12  or  Q3_K    : alias for Q3_K_M
  22  or  IQ3_XS  :  3.3 bpw quantization
  11  or  Q3_K_S  :  3.41G, +1.6321 ppl @ Llama-3-8B
  12  or  Q3_K_M  :  3.74G, +0.6569 ppl @ Llama-3-8B
  13  or  Q3_K_L  :  4.03G, +0.5562 ppl @ Llama-3-8B
  25  or  IQ4_NL  :  4.50 bpw non-linear quantization
  30  or  IQ4_XS  :  4.25 bpw non-linear quantization
  15  or  Q4_K    : alias for Q4_K_M
  14  or  Q4_K_S  :  4.37G, +0.2689 ppl @ Llama-3-8B
  15  or  Q4_K_M  :  4.58G, +0.1754 ppl @ Llama-3-8B
  17  or  Q5_K    : alias for Q5_K_M
  16  or  Q5_K_S  :  5.21G, +0.1049 ppl @ Llama-3-8B
  17  or  Q5_K_M  :  5.33G, +0.0569 ppl @ Llama-3-8B
  18  or  Q6_K    :  6.14G, +0.0217 ppl @ Llama-3-8B
   7  or  Q8_0    :  7.96G, +0.0026 ppl @ Llama-3-8B
   1  or  F16     : 14.00G, +0.0020 ppl @ Mistral-7B
  32  or  BF16    : 14.00G, -0.0050 ppl @ Mistral-7B
   0  or  F32     : 26.00G              @ 7B
          COPY    : only copy tensors, no quantizing

阿里官方发布的微调模型使用的是 Q4_0 量化。

在 ollama 中使用

使用 ollama create <modelname> -f <modelfile>

其中，modelname是自己指定，在 ollama run后面跟的那个名字；modelfile默认是当前目录下的 Modelfile 文件，内容应该为 FROM <gguf文件路径>，当然也可以通过 -f 参数指定。

然后就可以和其他模型一样使用了，使用ollama run或者通过 api 使用。

总结

如果只是单纯想要训练一个lora并合并，这个过程并不复杂，不涉及模型内部层的修改。

qwen2微调与使用过程

http://blog.wspdwzh.space/2024/07/06/qwen2微调与使用过程/

作者

peter？

发布于

2024年7月6日

许可协议

nga机器人重写上一篇

2024年5月8日-日志下一篇