In July 2021, AWS and Hugging Face announced collaboration to make Hugging Face a first party framework within SageMaker. Earlier, you had to use PyTorch container and install packages manually to do this. With the new Hugging Face Deep Learning Containers (DLC) availabe in Amazon SageMaker, the process of training and deploying models is greatly simplified.
In this post, we will go through a high level overview of Hugging Face Transformers library before looking at how to use the newly announced Hugging Face DLCs within Sagemaker.
Introduction to Hugging Face Transformers
The Hugging Face Transformers is a library that makes it easy to use NLP models. It allows developers to leverage hundreds of pretrained models for Natural Language Understanding (NLU) tasks as well as making it simple to train new transformer models. The API of this library is based around 3 broad classes:
- Model - PyTorch or Keras models that we can use in training loop or for prediction
- Configuration - Stores all the configuration required to build a model
- Tokenizer - Stores the vocabulary and methods for encoding and decoding between strings and tokens
The transformers library offers a simple abstraction over the above 3 models using the pipeline
method. This is the simplest way to get started using the pre-trained models from model hub.
|
|
The first argument is a Hugging Face NLP task, in this case it is sentiment analysis
. Some of the supported tasks are:
- Sequence Classification
- Sentiment Analysis
- Question Answering
- Language Modelling
- Text Generation
- Named Entity Recognition (NER)
- Summarization
- Translation
See here for an overview of the tasks supported by the library.
Under the hood, calling the pipeline method roughly covers the following steps:
|
|
The downloaded models are stored in ~/.cache/huggingface/transformers
.
Here is the process:
- Instantiate
AutoTokenizer
to download the tokenizer associated to the model we picked and instantiate it. - Use
AutoModelForSequenceClassification
to download the model itself. - Build a sequence from the input sentence, using the correct model-specific separators token type ids and attention masks
- Pass this sequence through the model to get the logits
- Compute the softmax of the result to get probabilities over the classes
Tokenizer
Tokenizer’s job is to preprocess your text into tokens suitable for training or inference. Tokens can be a word (predict
) or a subword (##ly
). For example, a tokenizer may split the word Transformers
into (transform
, ##ers
) so that the model’s vocabulary doesn’t explode. The tokenizer can also take care of other pre-processing tasks such as normalizing cases and punctuations.
The tokenization logic is tied to the model we use. That is why in our example we derived the model and tokenizer from the same model name. The AutoTokenizer
and AutoModelForxxx
classes ensures that the tokenizers and models are paired correctly.
When we apply a tokenizer to an input text, it returns a dictionary containing ids
of the tokens and attention mask
. ids
are the numerical representation of tokens. To learn about attention mask and other details related to Tokenizers refer here.
|
|
Note that the tokens also consists of some special tokens which encodes special meaning in the sentences. They differ from model to model. In our model, they are:
|
|
Hugging Face Model
Once the input text has been preprocessed by the tokenizer, we can pass it directly to the model
|
|
The contents of the model output depends on the task. For SequenceClassification we get back logit
, an optional loss
, hidden_states
and attentions
attributes.
The model
class can also be used to do transfer learning for custom NLP tasks. The Transformers library provides a Trainer
API that takes this model as input, extracts the pre-trained weights and fine tunes it.
|
|
This covers a high level overview of the Hugging Face Transformers library. Next we will see how to use the library along with Sagemaker.
Using Hugging Face on Sagemaker
Hugging Face in collaboration with AWS released Sagemaker Hugging Face Deep Learning Containers (DLCs) that makes it easy to train and deploy Hugging Face models using AWS platform. In the following section, we will see how to use these DLCs to train and deploy Hugging Face Transformer models in AWS.
Running a Training job
Preparing a training script
First we need to prepare the training script. This would be similar to any Transformers training script. A minimal training script would look like this:
|
|
For a more complete version of the script covering model evaluation, logging and additional training arguments, refer to this sample script.
As with any Sagemaker training job, we need to ensure that this script reads data from DLC’s data input directory and saves the model to model directory. The following lines of codes takes care of this.
|
|
Run training using Hugging Face estimator
First we create a Hugging Face estimator which exposes methods similar to other Sagemaker Estimator. Note that the entry_point
attribute matches the file name of our training script.
|
|
Training is invoked by calling the fit
method on Hugging Face
Estimator.
|
|
The trained model is a tarball with all the resources needed for inference.
|
|
Deploying the model for inference
Once the training is completed, we can deploy a Hugging Face
model directly from the Estimator
object.
|
|
Alternatively, if we already have a completed training job, we can used its output model to deploy a new Hugging Face
model and deploy it.
|
|
We can use this deployed model to make predictions on input text. The default inference script in Hugging Face DLC expects a dictionary with inputs
as key. For details on default input formats for various tasks refer to this.
|
|
The above method makes use of Sagemaker SDK to invoke the model. Often in a production ML application, invocation is handled by calling InvokeEndpoint API via boto3 or other SDK. A sample boto3 based invocation would look like below:
|
|
Remember to delete the endpoint at the end of experiments.
|
|
Advanced features of the Inference toolkit
We can also pass additional environment variables to the inference model that simplifies deployment.
|
|
Here, HF_TASK
variable defines the task for the Transformers pipeline and HF_MODEL_ID
defines the model id to load from huggingface.co/models. For the full list of supported environment variables refer to here.
Customizing Inference script
When creating an inference model, we can specify use defined code/modules that allows us to customize the inference process.
For example, here is a barebones inference script which we will call inference.py
:
|
|
To use this script, we need to place it under a source directory along with any additional files required.
|
|
Next when we create the Hugging FaceModel
we need to set the source_dir
and entry_point
attribute. These attributes are derived from the Sagemaker Estimator Framework so they are available under all Frameworks.
|
|
This has the effect of setting the environment variables SAGEMAKER_SUBMIT_DIRECTORY
to source
and SAGEMAKER_PROGRAM
to inference.py
on the inference model. The inference model also has the files packaged with the following directory structure:
|
|
Now when we deploy the model, we can pass custom inputs to it.
|
|
For further instructions on how to customize inference, refer to this
Additional resources
To learn more, you can refer to: