• AIPressRoom
  • Posts
  • Tremendous-tune Falcon 7B and different LLMs on Amazon SageMaker with @distant decorator

Tremendous-tune Falcon 7B and different LLMs on Amazon SageMaker with @distant decorator

Immediately, generative AI fashions cowl quite a lot of duties from textual content summarization, Q&A, and picture and video technology. To enhance the standard of output, approaches like n-short studying, Immediate engineering, Retrieval Augmented Era (RAG) and wonderful tuning are used. Tremendous-tuning lets you alter these generative AI fashions to attain improved efficiency in your domain-specific duties.

With Amazon SageMaker, now you’ll be able to run a SageMaker coaching job just by annotating your Python code with @distant decorator. The SageMaker Python SDK routinely interprets your current workspace setting, and any related knowledge processing code and datasets, into an SageMaker coaching job that runs on the coaching platform. This has the benefit of writing the code in a extra pure, object-oriented manner, and nonetheless makes use of SageMaker capabilities to run coaching jobs on a distant cluster with minimal modifications.

On this put up, we showcase the right way to fine-tune a Falcon-7B Basis Fashions (FM) utilizing @distant decorator from SageMaker Python SDK. It additionally makes use of Hugging Face’s parameter-efficient fine-tuning (PEFT) library and quantization strategies by bitsandbytes to help fine-tuning. The code offered on this weblog may also be used to fine-tune different FMs, reminiscent of Llama-2 13b.

The complete precision representations of this mannequin might need challenges to suit into reminiscence on a single and even a number of Graphic Processing Items (GPUs) — or might even want an even bigger occasion. Therefore, with a purpose to fine-tune this mannequin with out growing value, we use the method referred to as Quantized LLMs with Low-Rank Adapters (QLoRA). QLoRA is an environment friendly fine-tuning method that reduces reminiscence utilization of LLMs whereas sustaining excellent efficiency.

Benefits of utilizing @distant decorator

Earlier than going additional, let’s perceive how distant decorator improves developer productiveness whereas working with SageMaker:

  • @distant decorator triggers a coaching job immediately utilizing native python code, with out the express invocation of SageMaker Estimators and SageMaker enter channels

  • Low barrier for entry for builders coaching fashions on SageMaker.

  • No want to change Built-in growth environments (IDEs). Proceed writing code in your selection of IDE and invoke SageMaker coaching jobs.

  • No have to find out about containers. Proceed offering dependencies in a necessities.txt and provide that to distant decorator.

Conditions

An AWS account is required with an AWS Id and Entry Administration (AWS IAM) function that has permissions to handle assets created as a part of the answer. For particulars, consult with Creating an AWS account.

On this put up, we use Amazon SageMaker Studio with the Knowledge Science 3.0 picture and a ml.t3.medium quick launch occasion. Nonetheless, you need to use any built-in growth setting (IDE) of your selection. You simply have to arrange your AWS Command Line Interface (AWS CLI) credentials accurately. For extra data, consult with Configure the AWS CLI.

For fine-tuning, the Falcon-7B, an ml.g5.12xlarge occasion is used on this put up. Please guarantee enough capability for this occasion in AWS account.

You want to clone this Github repository for replicating the answer demonstrated on this put up.

Answer overview

  1. Set up pre-requisites to wonderful tuning the Falcon-7B mannequin

  2. Arrange distant decorator configurations

  3. Preprocess the dataset containing AWS companies FAQs

  4. Tremendous-tune Falcon-7B on AWS companies FAQs

  5. Check the fine-tune fashions on pattern questions associated to AWS companies

1. Set up stipulations to wonderful tuning the Falcon-7B mannequin

Launch the pocket book falcon-7b-qlora-remote-decorator_qa.ipynb in SageMaker Studio by deciding on the Image as Knowledge Science and Kernel as Python 3. Set up all of the required libraries talked about within the necessities.txt. Few of the libraries have to be put in on the pocket book occasion itself. Carry out different operations wanted for dataset processing and triggering a SageMaker coaching job.

%pip set up -r necessities.txt

%pip set up -q -U transformers==4.31.0
%pip set up -q -U datasets==2.13.1
%pip set up -q -U peft==0.4.0
%pip set up -q -U speed up==0.21.0
%pip set up -q -U bitsandbytes==0.40.2
%pip set up -q -U boto3
%pip set up -q -U sagemaker==2.154.0
%pip set up -q -U scikit-learn

2. Setup distant decorator configurations

Create a configuration file the place all of the configurations associated to Amazon SageMaker coaching job are specified. This file is learn by @distant decorator whereas operating the coaching job. This file incorporates settings like dependencies, coaching picture, occasion, and the execution function for use for coaching job. For an in depth reference of all of the settings supported by config file, try Configuring and utilizing defaults with the SageMaker Python SDK.

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        Dependencies: ./necessities.txt
        ImageUri: '{aws_account_id}.dkr.ecr.{area}.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04'
        InstanceType: ml.g5.12xlarge
        RoleArn: arn:aws:iam::111122223333:function/ExampleSageMakerRole

It’s not obligatory to make use of the config.yaml file with a purpose to work with the @distant decorator. That is only a cleaner method to provide all configurations to the @distant decorator. This retains SageMaker and AWS associated parameters exterior of code with a one time effort for organising the config file used throughout the crew members. All of the configurations is also provided immediately within the decorator arguments, however that reduces readability and maintainability of modifications in the long term. Additionally, the configuration file could be created by an administrator and shared with all of the customers in an setting.

Preprocess the dataset containing AWS companies FAQs

Subsequent step is to load and preprocess the dataset to make it prepared for coaching job. First, allow us to take a look on the dataset:

It exhibits FAQ for one of many AWS companies. Along with QLoRA, bitsanbytes is used to transform to 4-bit precision to quantize frozen LLM to 4-bit and connect LoRA adapters on it.

Create a immediate template to transform every FAQ pattern to a immediate format:

from random import randint

# customized instruct immediate begin
prompt_template = f"n---nAnswer:n"

# template dataset so as to add immediate to every pattern
def template_dataset(pattern):
    pattern["text"] = prompt_template.format(query=pattern["question"],
                                            reply=pattern["answers"],
                                            eos_token=tokenizer.eos_token)
    return pattern

Subsequent step is to transform the inputs (textual content) to token IDs. That is executed by a Hugging Face Transformers Tokenizer.

from transformers import AutoTokenizer

model_id = "tiiuae/falcon-7b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
# Set the Falcon tokenizer
tokenizer.pad_token = tokenizer.eos_token

Now merely use the prompt_template perform to transform all of the FAQ to immediate format and arrange prepare and take a look at datasets.

4. Tremendous tune Falcon-7B on AWS companies FAQs

Now you’ll be able to put together the coaching script and outline the coaching perform train_fn and put @distant decorator on the perform.

The coaching perform does the next:

  • tokenizes and chunks the dataset

  • arrange BitsAndBytesConfig, which specifies the mannequin needs to be loaded in 4-bit however whereas computation needs to be transformed to bfloat16.

  • Load the mannequin

  • Discover goal modules and replace the mandatory matrices by utilizing the utility methodology find_all_linear_names

  • Create LoRA configurations that specify rating of replace matrices (s), scaling issue (lora_alpha), the modules to use the LoRA replace matrices (target_modules), dropout likelihood for Lora layers(lora_dropout), task_type, and so on.

  • Begin the coaching and analysis

import bitsandbytes as bnb

def find_all_linear_names(hf_model):
    lora_module_names = set()
    for identify, module in hf_model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = identify.cut up(".")
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:
        lora_module_names.take away("lm_head")
    return listing(lora_module_names)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from sagemaker.remote_function import distant
import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import transformers

# Begin coaching
@distant(volume_size=50)
def train_fn(
        model_name,
        train_ds,
        test_ds,
        lora_r=8,
        lora_alpha=32,
        lora_dropout=0.05,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=8,
        learning_rate=2e-4,
        num_train_epochs=1
):
    # tokenize and chunk dataset
    lm_train_dataset = train_ds.map(
        lambda pattern: tokenizer(pattern["text"]), batched=True, batch_size=24, remove_columns=listing(train_dataset.options)
    )


    lm_test_dataset = test_ds.map(
        lambda pattern: tokenizer(pattern["text"]), batched=True, remove_columns=listing(test_dataset.options)
    )

    # Print complete variety of samples
    print(f"Complete variety of prepare samples: {len(lm_train_dataset)}")

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16
    )
    # Falcon requires you to permit distant code execution. It's because the mannequin makes use of a brand new structure that isn't a part of transformers but.
    # The code is supplied by the mannequin authors within the repo.
    mannequin = AutoModelForCausalLM.from_pretrained(
        model_name,
        trust_remote_code=True,
        quantization_config=bnb_config,
        device_map="auto")

    mannequin.gradient_checkpointing_enable()
    mannequin = prepare_model_for_kbit_training(mannequin, use_gradient_checkpointing=True)

    # get lora goal modules
    modules = find_all_linear_names(mannequin)
    print(f"Discovered {len(modules)} modules to quantize: {modules}")

    config = LoraConfig(
        r=lora_r,
        lora_alpha=lora_alpha,
        target_modules=modules,
        lora_dropout=lora_dropout,
        bias="none",
        task_type="CAUSAL_LM"
    )

    mannequin = get_peft_model(mannequin, config)
    print_trainable_parameters(mannequin)

    coach = transformers.Coach(
        mannequin=mannequin,
        train_dataset=lm_train_dataset,
        eval_dataset=lm_test_dataset,
        args=transformers.TrainingArguments(
            per_device_train_batch_size=per_device_train_batch_size,
            per_device_eval_batch_size=per_device_eval_batch_size,
            logging_steps=2,
            num_train_epochs=num_train_epochs,
            learning_rate=learning_rate,
            bf16=True,
            save_strategy="no",
            output_dir="outputs"
        ),
        data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, multilevel marketing=False),
    )
    mannequin.config.use_cache = False

    coach.prepare()
    coach.consider()

    mannequin.save_pretrained("/choose/ml/mannequin")

And invoke the train_fn()

train_fn(model_id, train_dataset, test_dataset)

The tuning job can be operating on the Amazon SageMaker coaching cluster. Look forward to tuning job to complete.

5. Check the wonderful tune fashions on pattern questions associated to AWS companies

Now, it’s time to run some exams on the mannequin. First, allow us to load the mannequin:

from peft import PeftModel, PeftConfig
import torch
from transformers import AutoModelForCausalLM

system="cuda" if torch.cuda.is_available() else 'mps' if torch.backends.mps.is_available() else 'cpu'

config = PeftConfig.from_pretrained("./mannequin")
mannequin = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, trust_remote_code=True)
mannequin = PeftModel.from_pretrained(mannequin, "./mannequin")
mannequin.to(system)

Now load a pattern query from the coaching dataset to see the unique reply after which ask the identical query from the tuned mannequin to see the reply compared.

Here’s a pattern a query from coaching set and the unique reply:

Now, similar query being requested to tuned Falcon-7B mannequin:

This concludes the implementation of wonderful tuning Falcon-7B on AWS companies FAQ dataset utilizing @distant decorator from Amazon SageMaker Python SDK.

Cleansing up

Full the next steps to scrub up your assets:

  • Shut down the Amazon SageMaker Studio cases to keep away from incurring extra prices.

  • Clear up your Amazon Elastic File System (Amazon EFS) listing by clearing the Hugging Face cache listing:rm -R ~/.cache/huggingface/hub

Conclusion

On this put up, we confirmed you the right way to successfully use the @distant decorator’s capabilities to fine-tune Falcon-7B mannequin utilizing QLoRA, Hugging Face PEFT with bitsandbtyes with out making use of important modifications within the coaching pocket book, and used Amazon SageMaker capabilities to run coaching jobs on a distant cluster.

All of the code proven as a part of this put up to fine-tune Falcon-7B is out there within the GitHub repository. The repository additionally incorporates pocket book displaying the right way to fine-tune Llama-13B.

As a subsequent step, we encourage you to take a look at the @distant decorator performance and Python SDK API and use it in your selection of setting and IDE. Further examples can be found within the amazon-sagemaker-examples repository to get you began rapidly. You may also try the next posts:

In regards to the Authors

Bruno Pistone is an AI/ML Specialist Options Architect for AWS primarily based in Milan. He works with massive clients serving to them to deeply perceive their technical wants and design AI and Machine Studying options that make the very best use of the AWS Cloud and the Amazon Machine Studying stack. His experience embrace: Machine Studying finish to finish, Machine Studying Industrialization, and Generative AI. He enjoys spending time along with his associates and exploring new locations, in addition to travelling to new locations.

Vikesh Pandey is a Machine Studying Specialist Options Architect at AWS, serving to clients from monetary industries design and construct options on generative AI and ML. Exterior of labor, Vikesh enjoys attempting out completely different cuisines and enjoying out of doors sports activities.

#Finetune #Falcon #LLMs #Amazon #SageMaker #distant #decorator