Qwen2.5-1.5B-SQL-Assistant

This is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct for text-to-SQL generation, trained using the b-mc2/sql-create-context dataset.

Model Details

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Fine-tuning Method: LoRA (PEFT)
  • Dataset: b-mc2/sql-create-context
  • Task: Text-to-SQL Generation

How to Use

Installation

pip install transformers peft trl accelerate bitsandbytes datasets torch

Example Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

MODEL_ID = "Qwen/Qwen2.5-1.5B-Instruct"
FINE_TUNED_MODEL_NAME = "YOUR_HF_USERNAME/Qwen2.5-1.5B-SQL-Assistant" # Replace YOUR_HF_USERNAME

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

# Load fine-tuned adapters
model = PeftModel.from_pretrained(base_model, FINE_TUNED_MODEL_NAME)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(FINE_TUNED_MODEL_NAME, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

def generate_sql(schema: str, question: str) -> str:
    system = (
        "You are a SQL expert. Given a context and a question, "
        "reply with ONLY the SQL query, nothing else."
    )
    user = f"Schema:
{schema}

Question: {question}"
    messages = [
        {"role": "system", "content": system},
        {"role": "user",   "content": user},
    ]
    text_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

    inputs = tokenizer(text_input, return_tensors="pt").to(model.device)
    model.eval()
    outputs = model.generate(
        **inputs,
        max_new_tokens=100,
        do_sample=True,
        top_p=0.9,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )
    input_len = inputs["input_ids"].shape[-1]
    generated_text = tokenizer.decode(outputs[0][input_len:], skip_special_tokens=True, cleanup_tokenization_spaces=True)
    return generated_text.strip()


# Example usage:
context = "CREATE TABLE employees (id INT, name TEXT, department TEXT, salary INT);"
question = "List the names of employees in the Engineering department earning more than 100000."

sql_query = generate_sql(context, question)
print(f"Generated SQL:
{sql_query}")

Training Details

This model was fine-tuned for 1 epoch on a subset of the b-mc2/sql-create-context dataset. Quantization (4-bit) and LoRA were used to efficiently train the model.

Evaluation

BERTScore was used to evaluate the semantic similarity between generated and reference SQL queries.

Inference Example

Here's an example of an inference query using the fine-tuned model:

Input Schema (DDL)

CREATE TABLE buyers (
    cid SERIAL PRIMARY KEY,
    FNAME VARCHAR(100) NOT NULL,
    SURNAME VARCHAR(100) NOT NULL,
    email VARCHAR(255) NOT NULL UNIQUE,
    TELEPHONE VARCHAR(25),
    modify_date TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE TABLE order_header (
    oid SERIAL PRIMARY KEY,
    cid INTEGER NOT NULL,
    oe_number VARCHAR(50) NOT NULL UNIQUE,
    order_entry_date DATE NOT NULL DEFAULT CURRENT_DATE,
    order_status VARCHAR(50) NOT NULL DEFAULT 'pending',
    order_total_amount NUMERIC(12, 2) NOT NULL DEFAULT 0,
    modify_date TIMESTAMP NOT NULL DEFAULT NOW(),
    CONSTRAINT fk_order_header_customer
        FOREIGN KEY (cid) REFERENCES buyers(cid)
        ON UPDATE CASCADE
        ON DELETE RESTRICT
);

CREATE TABLE order_detail (
    order_detail_id SERIAL PRIMARY KEY,
    oid INTEGER NOT NULL,
    line_number INTEGER NOT NULL,
    catalog_sku VARCHAR(100) NOT NULL,
    description TEXT,
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price NUMERIC(10, 2) NOT NULL CHECK (unit_price >= 0),
    line_item_total_amount NUMERIC(12, 2) GENERATED ALWAYS AS (quantity * unit_price) STORED,
    modify_date TIMESTAMP NOT NULL DEFAULT NOW(),
    CONSTRAINT fk_order_detail_order
        FOREIGN KEY (oid) REFERENCES order_header(oid)
        ON UPDATE CASCADE
        ON DELETE CASCADE,
    CONSTRAINT uq_order_detail_line UNIQUE (oid, line_number)
);

Input Question

Find order details for buyer with FNAME "Girish" and SURNAME with "Manure"?

Generated SQL

SELECT *
FROM order_detail
WHERE oid IN
(SELECT oid FROM order_header WHERE cid IN
(SELECT cid FROM buyers WHERE firstname = "Girish" AND surname = "Manure") ORDER BY oid ASC LIMIT 1);
Downloads last month
47
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for girishmanure/Qwen2.5-1.5B-SQL-Assistant

Adapter
(1024)
this model