Releases

Finetuner Update 0.6.7

Finetuner makes neural network fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure requirements in the cloud.

Engineering Group

Nov 24, 2022 • 3 min read

Finetuner makes neural network fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure requirements in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models and make them production-ready without expensive hardware.

This release covers Finetuner version 0.6.7, including dependencies finetuner-api 0.4.8 and and finetuner-core 0.11.4.

This release contains 4 new features.

🆕 Features

In previous versions of Finetuner, when using the EvaluationCallback to calculate IR metrics, you could only use a single model to encode both the query and the index data. This means that for training multiple models at the same time, like in CLIP fine-tuning, you could only use one encoder for evaluation. It is now possible to do cross-modal evaluation, where you use one model for encoding the query data and a second model for encoding the index data. This is useful in multi-modal tasks like text-to-image.

For doing the cross-modal evaluation, all you need to do is specify the model and index_model arguments in the EvaluationCallback, like so:

import finetuner
from finetuner.callback import EvaluationCallback

run = finetuner.fit(
    model='openai/clip-vit-base-patch32',
    train_data=train_data,
    eval_data=eval_data,
    loss='CLIPLoss',
    callbacks=[
        EvaluationCallback(
            query_data=query_data,
            index_data=index_data,
            model='clip-text',
            index_model='clip-vision'
        )
    ]
)

See the EvaluationCallback section of the Finetuner documentation for details on using this callback. See also the sections Text-to-Image Search via CLIP and Using MCLIP for concrete examples of cross-modal evaluation.

Add support for Multilingual CLIP (#611)

Finetuner now supports a Multilingual CLIP model from the OpenCLIP project. Multilingual CLIP models are trained on large text and image datasets from different languages using the CLIP constrastive learning approach.

They are a good fit for text-to-image applications where texts are in languages other than English.

The currently supported Multilingual CLIP model - xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k - uses a ViT Base32 image encoder and an XLM Roberta Base text encoder.

You can find details on how to fine-tune this specific model in the Multilingual Text-to-Image search with MultilingualCLIP section of the documentation.

import finetuner
run = finetuner.fit(
    model='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',
    train_data=train_data,
    eval_data=eval_data,
    epochs=5,
    learning_rate=1e-6,
    loss='CLIPLoss',
    device='cuda',
)

Filter models by task in `finetuner.describe_models()` (#610)

The finetuner.describe_models() function, which provides an overview of supported model backbones, now accepts an optional task argument that filters the models by task.

To display all models you can omit the argument.

import finetuner
finetuner.describe_models()

To filter based on task, you need to provide a valid task name. For example:

finetuner.describe_models(task='image-to-image')

finetuner.describe_models(task='text-to-image')

Currently valid task names are text-to-text, text-to-image and image-to-image.

Configure the `num_items_per_class` argument in `finetuner.fit()` (#614)

The finetuner.fit() method now includes a new argument num_items_per_class that allows you to set the number of items per label that will be included in each batch. This gives the user the ability to further tailor batch construction to their liking. If not set, this argument has a default value of 4, compatible with the previous versions of Finetuner.

You can easily set this when calling finetuner.fit():

import finetuner
run = finetuner.fit(
    model='efficient_b0',
    train_data=train_data,
    eval_data=eval_data,
    batch_size=128,
    num_items_per_class=8,
)

⚠️

The batch size needs to be a multiple of the number of items per class, in other words batch_size % num_items_per_class == 0. Otherwise Finetuner cannot respect the given num_items_per_class and throws an error.

🤟 Contributors

We would like to thank all contributors to this release:

Wang Bo (@bwanglzu)
Michael Günther (@guenthermi)
Louis Milliken (@LMMilliken)
George Mastrapas (@gmastrapas)

🆕 Features

Add support for cross-modal evaluation in the EvaluationCallback (#615)

Add support for Multilingual CLIP (#611)

Filter models by task in finetuner.describe_models() (#610)

Configure the num_items_per_class argument in finetuner.fit() (#614)

🤟 Contributors

Sign up for more like this.

Add support for cross-modal evaluation in the `EvaluationCallback` (#615)

Filter models by task in `finetuner.describe_models()` (#610)

Configure the `num_items_per_class` argument in `finetuner.fit()` (#614)