Finetuner Update 0.6.5

Finetuner makes neural network fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure requirements in the cloud.

Black background with "feature update" text, Finetuner logo, and a "read more" tag

Finetuner makes neural network fine-tuning easier and faster by streamlining the workflow and handling all the complexity and infrastructure requirements in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models and make them production-ready without expensive hardware.

GitHub - jina-ai/finetuner: Task-oriented finetuning for better embeddings on neural search
:dart: Task-oriented finetuning for better embeddings on neural search - GitHub - jina-ai/finetuner: Task-oriented finetuning for better embeddings on neural search

This release covers Finetuner version 0.6.5, including dependencies finetuner-api 0.4.7 and finetuner-core 0.11.4.

This release contains 6 new features, 1 bug fix, 2 refactorings, and 2 documentation improvements.

๐Ÿ†• Features

Support loading training data and evaluation data from CSV files (#592)

We now support CSV files in the finetuner.fit()method. This simplifies training because it is no longer necessary to construct a DocumentArray object to contain training data. Instead, you can use a CSV file that contains the training data or pointers (i.e. URIs) to the relevant data objects.

train_data = 'path/to/some/data.csv'

run = finetuner.fit(
    model='efficientnet_b0',
    train_data=train_data
)

See the Finetuner documentation page for preparing CSV files for more information.

You can also provide CSV files for evaluation data, as well as for query and index data when using EvaluationCallback. See the EvaluationCallback page in the Finetuner documentation for more information.

Support for data in lists when encoding (#598)

The finetuner.encode() method now takes lists of texts or image URIs as well as DocumentArray objects as inputs. This simplifies encoding because it is no longer necessary to construct a DocumentArray object to contain data.

model = finetuner.get_model('/path/to/YOUR-MODEL.zip')

texts = ['some text to encode']

embeddings = finetuner.encode(model=model, data=texts)

See the Finetuner documentation page for encoding documents for more information.

Artifact sharing (#602)

Users can now share their model artifacts with anyone who has access to Jina and has the artifact ID by adding the public=True flag to finetuner.fit(). By default, artifacts are set to private, equivalent to public=False.

finetuner.fit(
    model=model_name
    train_data=data,
    public=True,
)

See the Finetuner documentation for advanced job options for more information.

Allow access_paths for FinetunerExecutor

The FinetunerExecutor now takes an optional argument access_paths that allows users to specify a traversal path through an array of nested Document instances. The executor only processes those document chunks specified by the traversal path.

See the FinetunerExecutor documentation and the DocArray documentation for information on constructing document paths.

Allow logger callback for Weights & Biases during Finetuner runs

You can now use the Weights & Biases logger callback to track metrics for your Finetuner run, using anonymous mode. After fine-tuning runs are finished, users receive a URL in the logs that points to a Weights & Biases web page with the tracked metrics of the run. This log is temporary (automatically deleted after seven days if unclaimed), and users can claim it by logging in with their Weights & Biases account credentials.

wandb: Currently logged in as: anony-mouse-279369. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.13.5
wandb: Run data is saved locally in [YOUR-PATH]
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run cool-wildflower-2
wandb:  View project at https://wandb.ai/anony-mouse-279369/[YOUR-PROJECT-URL]
wandb:  View run at https://wandb.ai/anony-mouse-279369/[YOUR-RUN-URL]

See the Finetuner documentation page on callbacks for more information.

Support for image blobs

We now support DocumentArray image blobs in Finetuner. It is no longer necessary to directly convert images into tensors before sending them to the cloud.

You can convert image filepaths or URIs to blobs with the Document.load_uri_to_blob() method.

This saves a lot of memory and bandwidth since blobs are stored in their native, typically compressed format. Blobs are usually as small as 10% of the size of their corresponding tensor.

d = Document(uri='tests/resources/lena.png')
d.load_uri_to_blob()

If you use CSV to input local image files to Finetuner, this conversion happens automatically by default.

โš™ Refactoring

Bump Hubble SDK version to 0.23.3 (#594)

We have updated Finetuner to the latest version of Hubble, improving functionality and particularly improving access from code running in notebooks.

We will deprecate the methodfinetuner.notebook_login()starting from version 0.7 of Finetuner. Inside notebooks,finetuner.login()will now detect the environment automatically.

Remove connect function (#596)

We have removed the finetuner.connect() method, since Finetuner no longer requires you to log in to Jina again if you are already logged in.

๐Ÿž Bug Fixes

Fix executor _finetuner import

This bug caused the Finetuner executor to fail to start, and we have fixed the underlying issue.

๐Ÿ“— Documentation Improvements

Document the force argument to finetuner.login() (#596)

We have documented the force parameter to finetuner.login(), which forces users to log in to Jina again, even if already logged in.

Update Image-to-Image example (#599)

We have changed the configuration and training sets in the examples in the Image-to-Image Search via ResNet50 documentation page.

๐ŸคŸ Contributors

We would like to thank all contributors to this release: