This document provides a step-by-step guide for training an uncensored large language model, referred to as CAI Proprietary LLM, using fine-tuning techniques. The training pipeline includes dataset curation, supervised fine-tuning (SFT), and direct preference optimization (DPO).
CAI Proprietary LLM is a highly steerable instruct and chat-tuned model developed using a base Llama 3.1 model. The model follows system prompts precisely and can be configured to respond to various requests without censorship. It supports functionalities such as structured reasoning, retrieval-augmented generation (RAG), tool use, and long-context interactions.
Supported model sizes:
Each version retains the core steerability properties and can be fine-tuned based on specific requirements.
Training CAI Proprietary LLM requires a high-quality instruction dataset. The dataset should cover a broad range of domains to enhance model adaptability. The data mixture should be structured as follows:
To ensure dataset quality:
SFT is conducted on base models (Llama 3.1 8B, 70B, 405B) using a mixture of instruction-tuned datasets.
Note: Efficient sample packing using Flash Attention 2 helps maximize training efficiency.
DPO is applied after SFT using a LoRA-based approach to fine-tune user preferences while reducing computational overhead.
DPO fine-tuning results in moderate performance improvements in user-aligned responses while maintaining uncensored behavior.
For efficient inference, models are quantized using FP8 rounding with the llm-compressor library in vLLM.
Final model evaluations are conducted using the following benchmarks:
These benchmarks validate the model’s performance on general reasoning, factual knowledge, and instruction-following tasks.
By following this guide, developers can successfully train an uncensored LLM using CAI Proprietary LLM. The training pipeline ensures high performance across reasoning, generation, and tool-augmented tasks while maintaining neutrality in responses.
For further optimizations, developers can explore:
For additional inquiries or implementation support, refer to the open-source repositories used in training or customize the pipeline to fit specific project needs.