lora_structeval_t_qwen3_4b_0118

This repository provides a LoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using Unsloth (QLoRA, 4-bit base).

Contents: LoRA adapter weights (PEFT) + tokenizer files (if present)
Does not include: Base model weights, training dataset files

Training Objective

This adapter was trained for structured output quality (format conversion / structured serialization) while avoiding learning verbose chain-of-thought.

Loss design

The model sees the full conversation context (system + user + assistant).
Loss is applied only to the final assistant turn ("assistant-only loss").
Additionally, when an Output marker is present, loss is applied only to:
- OUTPUT_LEARN_MODE="after_marker"
- Markers searched: Output:, OUTPUT:, Final:, Answer:, Result:, Response:
- With MASK_COT=True this typically means learning the content after Output: (suppressing CoT-style "Approach:" text).

This setup is intended to improve final answer correctness and formatting without encouraging the model to emit chain-of-thought.

Training Configuration (Key)

Run stamp (UTC): 2026-01-18_062458Z
Base model: unsloth/Qwen3-4B-Instruct-2507
Dataset: u-10bei/structured_data_with_cot_dataset_512_v2
Method: QLoRA (4-bit base) + LoRA adapters (PEFT)
Max sequence length: 512
Seed: 3407
Train/Val split: val_ratio=0.05

Hyperparameters

Epochs: 2
LR: 0.0001
Warmup ratio: 0.1
Weight decay: 0.05
Per-device train batch: 2
Gradient accumulation: 8 (effective batch ≈ 16)
LR scheduler: cosine
Precision: fp16 (T4-friendly)

LoRA

r: 64
alpha: 128
dropout: 0.0
target_modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Prompt / Output Style (Dataset-aligned)

The training dataset uses chat messages and often includes a short reasoning header followed by a final structured output. With the default masking setup, the adapter is optimized primarily for the final structured segment.

Typical assistant response shape:

Approach: (may be present, but often masked from loss)
Output: (structured data begins here; primary training target)

You can encourage concise responses by explicitly requesting: "Return ONLY the final structured output."

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id = "unsloth/Qwen3-4B-Instruct-2507"
adapter_id = "daichira/lora_structeval_t_qwen3_4b_0118"

tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=True)

model = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

# Example: run generation with your preferred chat template usage.

Limitations / Notes

This is a LoRA adapter, not a standalone model. You must load unsloth/Qwen3-4B-Instruct-2507 separately.
Format correctness depends on your decoding settings and prompt discipline. For strict tasks, consider:
- temperature=0 (or low), top_p=1.0
- Post-validation (JSON/YAML/TOML/XML parsers) where applicable
The adapter is specialized for structured serialization/format conversion; it may not improve general chat ability.

Sources & Terms (IMPORTANT)

Training dataset: u-10bei/structured_data_with_cot_dataset_512_v2 (referenced on Hugging Face Hub)
This repository contains LoRA adapter weights only and does not redistribute the training dataset.
You are responsible for complying with:
- The dataset license/terms as stated in the dataset repository.
- The base model license/terms for unsloth/Qwen3-4B-Instruct-2507 (these apply to derivatives/adapters as well).

License

Adapter repo license field: other (model card metadata)
Important: Base model terms for unsloth/Qwen3-4B-Instruct-2507 apply. Dataset terms for u-10bei/structured_data_with_cot_dataset_512_v2 apply.

Downloads last month: -

Model tree for daichira/test-lora-repo

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Adapter

(436)

this model

daichira
/

test-lora-repo