Instructions to use nomadicsynth/neon-360-0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nomadicsynth/neon-360-0.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nomadicsynth/neon-360-0.1")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("nomadicsynth/neon-360-0.1") model = AutoModelForCausalLM.from_pretrained("nomadicsynth/neon-360-0.1") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nomadicsynth/neon-360-0.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nomadicsynth/neon-360-0.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nomadicsynth/neon-360-0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/nomadicsynth/neon-360-0.1
- SGLang
How to use nomadicsynth/neon-360-0.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nomadicsynth/neon-360-0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nomadicsynth/neon-360-0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nomadicsynth/neon-360-0.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nomadicsynth/neon-360-0.1", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use nomadicsynth/neon-360-0.1 with Docker Model Runner:
docker model run hf.co/nomadicsynth/neon-360-0.1
Neon-360 v0.1
Note: This is not fully trained and will be replaced with either a fine-tuned version or a new model soon-ish. Don't expect anything useful from it rn. Only download it if you're curious.
I'm working on retraining this, trying to find out what can be achieved on consumer hardware, namely my RTX 4090. Hopefully i can make a tiny agentic model, maybe a nice fast one. Self-improvement? Can I teach it to make itself better?
Suggestions wanted! What tasks would you want from a tiny model? Let me know in the Community Tab
This is currently a copy of the below:
mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast
This repository contains the mini-mistral-360M model, a 360 million parameter version of the Mistral architecture, trained for a single epoch. The model was trained on a diverse dataset comprising Wikipedia articles and the OpenHermes dataset. While this model is still in its early stages and not particularly useful as of now, it serves as an experimental showcase of integrating the Grokfast algorithm into the training process.
Model Details
- Architecture: Mistral
- Parameters: 360 million
- Training Duration: 1 epoch
- Training Dataset: Wikipedia articles and OpenHermes dataset
- Training Method: Transformers Trainer with grokfast-adamw as the optimiser
- Training Hardware: 2 x Nvidia RTX 3060 12GB
Purpose
The primary goal of this experiment was to observe the impact of the Grokfast algorithm on the training dynamics of a 360M parameter Mistral model. During training, it was noted that the evaluation loss followed the training loss closely, which is an intriguing behavior warranting further investigation.
Usage
To use this model, you can load it with the transformers library from HuggingFace:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
model = AutoModel.from_pretrained("RoboApocalypse/mini-mistral-360M-wikipedia-20231101.en-science-sci-fi-OpenHermes-2.5-chatML-Grokfast")
# Example usage
input_text = "Hello, world!"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model(**inputs)
Training Insights
This experiment was inspired by the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients" by Jaerin Lee, Bong Gyun Kang, Kihoon Kim, and Kyoung Mu Lee, aims to accelerate the generalization of models under the grokking phenomenon. The paper is available at https://arxiv.org/abs/2405.20233
Acknowledgments
Special thanks to the YouTube channel Tunadorable for bringing the Grokfast paper to my attention in his video "Accelerated Training by Amplifying Slow Gradients". Tunadorable reads and discusses AI papers from arXiv, providing valuable insights into the latest research.
Disclaimer
This model is not optimized for practical use and should be considered experimental. It has only been trained for a single epoch, and its performance is not guaranteed to be reliable or accurate. Future iterations and more extensive training may improve its capabilities.
Contributing
If you are interested in discussing, contributing or have any suggestions, please reach out or open an issue on the repository.
License
This model is licensed under the OpenRAIL License.
Feel free to check out the model and experiment with it here. Your feedback and insights are welcome as I try and figure out wtf I'm doing.
- Downloads last month
- 116