- A new command
ilab model upload
has been introduced so users can now upload their trained models to Hugging Face via theilab
CLI ilab model serve
now has separate--host
and--port
options, replacing thehost_port
configuration. The default values are127.0.0.1
for--host
and8000
for--port
, allowing users to configure the server's binding address and port independently through the configuration file or command-line flags.
ilab train --pipeline=accelerated --strategy=lab-skills-only
supports training with only the skills phase (leaving out knowledge).- Previously, System Profile auto-detection was done by reading the names of the YAML files and matching them to your hardware. We now depend on the
Metadata
class stored in the configuration file itself. Please selecty
when prompted to over-write your existing system profiles to utilize the the new auto-detection system.
- train-profiles have been deprecated and replaced with system-profiles. These profiles follow the format of the config file and apply to all commands. They live in
~/.local/share/instructlab/internal/system_profiles
- The default model has been changed from Merlinite to Granite - see instructlab#2238 for more details
- Removed the
--greedy-mode
flag fromilab model chat
. Please update any scripts or workflows relying on--greedy-mode
to ensure compatibility.
-
ilab
now supports system profiles. These profiles apply entire configuration files tailored to specific hardware configurations. We support a set of auto-detected profiles for CPU enabled Linux machines, M-Series Apple Silicon Chips, and Nvidia GPUs, and Intel Gaudi 3. When you runilab config init
, one of these profiles should be selected for you. If there is not a direct match, a menu will be displayed allowing you to choose one. -
Add support for inferencing with IBM granite architecture models.
-
ilab model chat
now includes a temperature setting feature, allowing users to adjust the response generation behavior. A default temperature of 1.0 has been added to the configuration file, which can be customized for each chat session using the--temperature
or-t
flag. Lower values produce more deterministic and focused responses, while higher values increase variability. -
the
full
training pipeline now fits on devices with 16 and 32 GB of RAM! If you are on a Mac, these optimizations are done for you. If you are on Linux try using--optimize-memory
, results vary per CPU vendor. -
ilab data generate
now has--max-num-tokens
which defaults to 4096. This flag can be used to generate less data per SDG run. Specifying a value like512
results in a faster run with less data generated. This works well with consumer hardware and will reduce training time. -
ilab model download
uses thehf_transfer
library for faster model downloads reducing the average download time by 60%. This only applies to models that are hosted on Hugging Face Hub. This can be disabled by setting the environment variableHF_HUB_ENABLE_HF_TRANSFER
to0
.
- vLLM has been upgraded to v0.6.2 and will need to be reinstalled if you are upgrading
ilab
from an older version - Intel Gaudi software has been updated to 1.18.0 with Python 3.11 and Torch 2.4.0.
- InstructLab now uses XDG-based directories on macOS, similar to Linux. Users are advised to re-initialize their config files and remove cached models.
- Removed unused argument
--rouge-threshold
ofilab data generate
- Removed the following aliased commands:
convert
diff
download
evaluate
init
list
sysinfo
test
- Intel Gaudi software has been updated to 1.17.1 with Python 3.11 and Torch 2.3.1 support.
--legacy
has been removed and replaced with--pipeline=simple
inilab model train
ilab data generate
now defaults to--pipeline full
and uses theTheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf
model as the teacher. This provides increased performance and better generated data.
ilab config init
now auto detects your hardware when running on Nvidia enabled systems and chooses the best train profile. It does this by checking first if your system directly matches one of our supported train profiles and then attempts to match the vRAM for each profile to the total vRAM on your system.- Add
log_format
to theconfig.yaml
file to allow for customizing the log format. ilab model evaluate --max-workers=auto
is now supported and is the default option. When auto is specified, the optimal value is determined based on your GPUs, CPUs, and configuration.ilab model train
now supports--pipeline
. The supported pipelines aresimple
,full
, andaccelerated
. Simple preserves the functionality found in--legacy
and current MacOS training. Full introduces a new training loop optimized for CPU and MacOS performance.accelerated
allows users with dedicated graphics cards to run the full fine tuning and multi-phase training found in our training library.--device
inilab model train
now supportsmps
which stands for Metal Performance Shaders. This is a PyTorch device for MacOS training that allows us to utilize the same code path for Linux and MacOS.- Multi-phase training with
ilab model train --strategy lab-multiphase
is now resumeable! In the case of a machine failure or an incidental stop, progress is tracked in a newjournal
file, rendered in yaml. Upon restarting training, the user can confirm whether they would like to proceed with a pre-existing training run (one that might have only evaluated a few checkpoints of the first eval phase, for instance) or restart from scratch. - Allow users to pick a distributed training backend framework for GPU accelerated training between 'fsdp' and 'deepspeed'. Also add support for FSDP specific configuration options.
ilab data generate
andilab taxonomy diff
now support--taxonomy-base=empty
to allow specifying that all taxonomy files in the supplied repo should be included.
ilab data generate
now supports parallelized data generation across batches of the seed data when running with a the vLLM serving. The--batch-size
argument can be used to control this behavior.ilab model download
now supports downloading models from OCI registries. Repositories that are prefixed by "docker://" and specified against--repository
are treated as OCI registries.ilab
now uses dedicated directories for storing config and data files. On Linux, these will generally be the XDG directories:~/.config/instructlab
for config,~/.local/share/instructlab
for data, and~/.cache
for temporary files, including downloaded models. On MacOS, both the config and data is located at~/Library/Application Support/instructlab
.- A new
ilab config show
command is introduced as a convenience feature, which prints out the contents of the actively loaded config, not just the contents of the config file. ilab system
: A new command group namedilab system
has been added which will serve as the basis for all system-related commands. This currently containsilab system info
as its only sub-command.- Add vLLM backend to serve, chat and generate commands.
- Add
--backend
flag toilab model serve
command to allow for specifying the backend to use when serving a model. This is useful when you have multiple backends installed and want to specify which one to use. Currently, the only supported backend arellama-cpp
andvllm
. - Update
llama-cpp-python
to latest upstream release 0.2.79 to address poor results of synthetic data generation and local training. - Adding
ilab model evaluate
which uses the new backend serving functionality. Evaluate offers two standard benchmarks (mt_bench and mmlu) as well as two variations (mt_bench_branch and mmlu_branch) which are integrated with the ilab workflow to evaluate new skills and knowledge. Includes--gpus
option for specifying number of gpus to utilize when serving models for evaluation (currently applicable for vLLM only). Also includes--merge-system-user-message
flag to enable Mistral based judge models and a--enable-serving-output
flag that configures whether the output of the model serving backend is suppressed. - The
ilab
command now accepts a-v
/--verbose
option to enable debug logging.ilab -vv
orilab --verbose --verbose
enables more verbose debug logging. ilab model test
generic support- Add
--chat-template
option toilab model serve
to support customization of the chat template for both vLLM and llama.cpp backends. Options include 'auto' (current behavior, ilab provides its own template), 'tokenizer' (uses the model's tokenizer config), and an external file name. - Default log format changes to include the logger name in the logs.
ilab data generate
now supports a new and more extensive pipeline with the option--pipeline full
. This option requiresmixtral-8x7b-instruct
as the teacher model.- The
instructlab
package now uses optional dependencies for each supported hardwarecpu
,cuda
,hpu
,mps
, androcm
. To install InstructLab for e.g. NVIDIA CUDA, usepip install instructlab[cuda]
. - Add a
--enable-serving-output
flag forilab data generate
. This flag determines whether vLLM will have its output suppressed when it serves the teacher model in the background. - The
generate
section of the config now has ateacher
section. This section configures the teacher model when it is automatically served in the background. This new section has the same values as theserve
section of the config. - Support for
ILAB_GLOBAL_CONFIG
environment variable: When set, this environment variable specifies a global configuration file that serves as the template for the~/.config/instructlab/config.yaml
user space config. This bypasses the interactive mode inilab config init
and can be used to specify alternative configurations for any command, ensuring that defaults such as taxonomy repositories and base models are honored from the global config. ilab model list
: a new command which lists all GGUF and Safetensor Models on the system.ilab data list
: a new command which lists the generated datasets in the user's datasets directory.- Legacy Linux training now supports the new messages format. When a dataset is provided in the
HuggingFace messages format,
ilab
will automatically convert it back into the legacy format. - Legacy Linux training is now compatible with the phase07 pretraining format.
- Add support for
ILAB_TRAIN_PROFILE_DIR
which will point to the template train profiles to be brought into thetrain_configuration
directory. - Add interactive prompt for users to choose their train profile.
- The
generate
section of the config now has apipeline
value. This value sets a default value and can be overridden by the--pipeline
flag. The default for this value is 'simple'.
ilab
: Deprecation of Python 3.9 support and withdrawal of Python 3.12 support Due to changes to training requiring the usage of GPTDolomite, Python 3.9 is no longer supported and Python 3.12 support is currently withdrawn. If you are using either of these versions, you will need to start using either Python 3.10 or Python 3.11 to use this and subsequent versions of the CLI.ilab model train
: The '--device' parameter no longer supports specifying a GPU index (e.g., 'cuda:0'). To use a specific GPU, set the visible GPU before running the train command.ilab init
: With the introduction of a dedicated storage system within theilab
CLI,ilab init
andilab config init
will now output and read the config file from the platform's config directory under theinstructlab
package.ilab taxonomy
andilab data
: Theilab
CLI now uses the platform's dedicated data directory to store the taxonomy under theinstructlab/taxonomy
directory as a default.ilab data
: The default directory for new datasets is now underinstructlab/datasets
in the platform's dedicated data directory under theinstructlab
package.ilab model
: The default location for saved and downloaded models is now underinstructlab/models
in the platform's dedicated data directory under theinstructlab
package. Outputted checkpoints now live in theinstructlab/checkpoints
directory under the platform's dedicated program cache directory.ilab model chat
: Chatlogs are now stored under theinstructlab/checkpoints
directory in the platform's dedicated data directory under theinstructlab
package.- The
--num-instructions
option toilab data generate
has been deprecated. See--sdg-scale-factor
for an updated option providing similar functionality. ilab model train --legacy
: Trained GGUF models are now saved in the global user checkpoints directory. Previously, checkpoints were always saved into a directory local to where the user called it from.
ilab config
: Fixed a bug whereilab
didn't recognizetrain.lora_quantize_dtype: null
as a valid value.ilab model chat
: Fixed an issue where the default served model couldn't be resolved when running model besides the defaultmerlinite-7b-lab-Q4_K_M.gguf
.
The ilab command redesign included in v0.17 introduces a new command structure that follows a resource group design. This means that commands that once were something like ilab chat
now are ilab model chat
. The new groups are model, data, taxonomy, and config. The commands that fall under these are all of the pre-existing ilab
commands just now grouped by the resource which the command commonly deals with.
The old command structure is still aliased to work but will be removed in 0.19.0. This means for 0.17.0 and 0.18.0 the aliases will exist and work as expected.