neon-360-0.1

Instructions to use nomadicsynth/neon-360-0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nomadicsynth/neon-360-0.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="nomadicsynth/neon-360-0.1")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("nomadicsynth/neon-360-0.1")
model = AutoModelForCausalLM.from_pretrained("nomadicsynth/neon-360-0.1")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nomadicsynth/neon-360-0.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nomadicsynth/neon-360-0.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nomadicsynth/neon-360-0.1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/nomadicsynth/neon-360-0.1

SGLang

How to use nomadicsynth/neon-360-0.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nomadicsynth/neon-360-0.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nomadicsynth/neon-360-0.1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nomadicsynth/neon-360-0.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nomadicsynth/neon-360-0.1",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use nomadicsynth/neon-360-0.1 with Docker Model Runner:
```
docker model run hf.co/nomadicsynth/neon-360-0.1
```

Neon-360 v0.1

Note: This is not fully trained and will be replaced with either a fine-tuned version or a new model soon-ish. Don't expect anything useful from it rn. Only download it if you're curious.

I'm working on retraining this, trying to find out what can be achieved on consumer hardware, namely my RTX 4090. Hopefully i can make a tiny agentic model, maybe a nice fast one. Self-improvement? Can I teach it to make itself better?

Suggestions wanted! What tasks would you want from a tiny model? Let me know in the Community Tab

This is currently a copy of the below:

mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast

This repository contains the mini-mistral-360M model, a 360 million parameter version of the Mistral architecture, trained for a single epoch. The model was trained on a diverse dataset comprising Wikipedia articles and the OpenHermes dataset. While this model is still in its early stages and not particularly useful as of now, it serves as an experimental showcase of integrating the Grokfast algorithm into the training process.

Model Details

Architecture: Mistral
Parameters: 360 million
Training Duration: 1 epoch
Training Dataset: Wikipedia articles and OpenHermes dataset
Training Method: Transformers Trainer with grokfast-adamw as the optimiser
Training Hardware: 2 x Nvidia RTX 3060 12GB

Purpose

The primary goal of this experiment was to observe the impact of the Grokfast algorithm on the training dynamics of a 360M parameter Mistral model. During training, it was noted that the evaluation loss followed the training loss closely, which is an intriguing behavior warranting further investigation.

Usage

To use this model, you can load it with the transformers library from HuggingFace:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
model = AutoModel.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")

# Example usage
input_text = "Hello, world!"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model(**inputs)

Training Insights

This experiment was inspired by the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients" by Jaerin Lee, Bong Gyun Kang, Kihoon Kim, and Kyoung Mu Lee, aims to accelerate the generalization of models under the grokking phenomenon. The paper is available at https://arxiv.org/abs/2405.20233

Acknowledgments

Special thanks to the YouTube channel Tunadorable for bringing the Grokfast paper to my attention in his video "Accelerated Training by Amplifying Slow Gradients". Tunadorable reads and discusses AI papers from arXiv, providing valuable insights into the latest research.

Disclaimer

This model is not optimized for practical use and should be considered experimental. It has only been trained for a single epoch, and its performance is not guaranteed to be reliable or accurate. Future iterations and more extensive training may improve its capabilities.

Contributing

If you are interested in discussing, contributing or have any suggestions, please reach out or open an issue on the repository.

License

This model is licensed under the OpenRAIL License.

Feel free to check out the model and experiment with it here. Your feedback and insights are welcome as I try and figure out wtf I'm doing.

Downloads last month: 116

Safetensors

Model size

0.4B params

Tensor type

BF16

Model tree for nomadicsynth/neon-360-0.1

Base model

neoncortex/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast

Finetuned

(1)

this model

Datasets used to train nomadicsynth/neon-360-0.1

Paper for nomadicsynth/neon-360-0.1

Grokfast: Accelerated Grokking by Amplifying Slow Gradients

Paper • 2405.20233 • Published May 30, 2024 • 7