Skip to content

Latest commit

 

History

History
95 lines (53 loc) · 6.72 KB

TROUBLESHOOTING.md

File metadata and controls

95 lines (53 loc) · 6.72 KB

Troubleshooting

This document is for commonly found problems and their solutions when using ilab. There is also a section that includes information on fine-tuning and troubleshooting your model to optimize the quality of its responses.

ilab troubleshooting

ilab data generate --endpoint-url with llama-cpp fails with openai.InternalServerError: Service Unavailable

llama-cpp does not support batching, which is enabled by default with remote endpoints. To resolve this error, disable batching using --batch-size=0.

See this issue.

ilab data generate command running slow on macOS

If you notice ilab data generate running for several hours or more on a Mac M-series, you should first check out the available memory on your system (See Activity Monitor for more details). If there is < 8GM RAM available before serving a model, then check to see if you can free up some memory.

If this has not improved the running of the generation then check out this discussion. The suggestion here is to tweak the GPU limit of the macOS. By default it's around 60%-70% of your total RAM available, which is expressed as 0:

sudo sysctl iogpu.wired_limit_mb
iogpu.wired_limit_mb = 0

You can set it to any number, although it's advisable to leave 4-6GB RAM for the macOS.

For example, on a M1 with 16GB RAM, the ilab data generate command with the limit bumped to 12GB was able to finish in less than an hour. Previously, it took several hours.

sudo sysctl iogpu.wired_limit_mb=12288

Once done, make sure to reset the limit back to 0, which is the default.

Note: This value will reset to the default after the machine reboots.

Model fine tuning and response optimization

If you are looking to optimize the quality of the outputs generated by the model, there are a number of steps and parameters at various stages of the CLI workflow that you can consider leveraging. Some of these steps are discussed in the following sections.

It is important to note that improved response quality will come at a cost, in the form of increased compute requirements, increased time requirement, or both. The described steps will provide you with the best chance of improving the quality of your model's responses, but cannot guarantee an improvement in response quality.

Skill composition

Composing and contributing effective and impactful skills is an iterative process. The typical workflow looks something like this:

  1. Compose skill examples.

  2. Run the ilab data generate command.

  3. Examine the generated examples based on the supplied skill (found in the generated folder).

  4. If the generated examples are not satisfactory in quality, edit the skill examples.

  5. Repeat the process until you are satisfied with the generated data.

How to improve the skill YAML

  1. Increase the number of examples in your skill YAML file. The more examples that the model has to go off of, the faster it will be able to generate synthetic data. The generated data will also be better if the input contains a wider range of examples.

  2. Improve the quality of provided examples. Review the examples provided to it and see if they can be rephrased in a way that they align better with what you are hoping to see the model generate. This will improve chances of the model generating better quality synthetic data in large quantities.

Data generation

The data generation step is executed via the ilab data generate command, and is responsible for generating synthetic data. This forms the basis for what the model will end up learning.

NOTE The data produced from the generation step is only used within the user's local workflow to train the model and help the user fine tune their skill example. There is a separate process of data generation that is conducted in the backend once a user's skill is actually merged into the taxonomy repository.

How to improve the quality of generated data

  1. Increase the number of instructions generated by passing the --num-instructions flag to the ilab data generate command as follows: ilab data generate --num-instructions 1000. The --num-instructions flag will generate 1000 points of synthetic data based on your provided examples. The greater the number of instructions generated, the better the model will be trained (within reasonable limits).

  2. Using a better model via --model. Larger models can lead to better data generation. This option requires users to be familiar with various existing models, and which specific models would suit their needs. This could mean either using a model with more nodes than the default InstructLab merlinite-7b-lab-GGUF model, such as the Mixtral-8x7B-Instruct-v0.1 model, or using an unquantized version of the InstructLab merlinite-7b-lab model. It can be used as follows: ilab serve --model-path models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf and ilab data generate --model models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf

  3. Set the number of CPU cores that can be used to generate data via --num-cpus. This defaults to 10, but increasing this value could potentially lead to better generated data. It can be used as follows: ilab data generate --num-cpus 15

Training

The training step is run with the ilab model train command. This step trains the model on the synthetic data that was generated. The output of this step is a set of adapter files with the general format adapters-xxx.npz, where xxx is a number. These adapter files represent a snapshot of the model's trained state and are periodically written to disk.

Ways to train the model better

  1. Increase the number of training iterations via --iters flag. A larger number of iterations usually means a better trained model.

    NOTE: Diminishing returns might kick in around 300 or so iterations. Increasing the number of iterations comes at the cost of having to wait longer for the training to complete.

  2. Pick an adapter file with the lowest validation loss. The training process generates and persists an adapter file periodically. The terminal output will tell you the validation loss that each adapter is associated with. The frequency of adapter file generation will be controlled by --save-every flag. For example, ilab model train --save-every 10 outputs an adapter file every 10th iteration.

Additional resources

  • InstructLab Community FAQ
  • InstructLab Taxonomy FAQ
  • Discussion board