Posit AI Weblog: safetensors 0.1.0

safetensors is a brand new, easy, quick, and secure file format for storing tensors. The design of the file format and its unique implementation are being ledby Hugging Face, and it’s getting largely adopted of their fashionable ‘transformers’ framework. The safetensors R package deal is a pure-R implementation, permitting to each learn and write safetensor recordsdata.

The preliminary model (0.1.0) of safetensors is now on CRAN.

Motivation

The principle motivation for safetensors within the Python group is safety. As famouswithin the official documentation:

The principle rationale for this crate is to take away the necessity to use pickle on PyTorch which is utilized by default.

Pickle is taken into account an unsafe format, because the motion of loading a Pickle file canset off the execution of arbitrary code. This has by no means been a priority for torchfor R customers, for the reason that Pickle parser that’s included in LibTorch solely helps a subsetof the Pickle format, which doesn’t embrace executing code.

Nonetheless, the file format has further benefits over different generally used codecs, together with:

  • Help for lazy loading: You’ll be able to select to learn a subset of the tensors saved within the file.

  • Zero copy: Studying the file doesn’t require extra reminiscence than the file itself.(Technically the present R implementation does makes a single copy, however that maybe optimized out if we actually want it sooner or later).

  • Easy: Implementing the file format is straightforward, and doesn’t require advanced dependencies.Which means that it’s a superb format for exchanging tensors between ML frameworks andbetween totally different programming languages. As an illustration, you’ll be able to write a safetensors filein R and cargo it in Python, and vice-versa.

There are further benefits in comparison with different file codecs frequent on this house, andyou’ll be able to see a comparability desk here.

Format

The safetensors format is described within the determine beneath. It’s principally a header filecontaining some metadata, adopted by uncooked tensor buffers.

Fundamental utilization

safetensors will be put in from CRAN utilizing:

install.packages("safetensors")

We can then write any named list of torch tensors:

library(torch)
library(safetensors)

tensors <- list(
  x = torch_randn(10, 10),
  y = torch_ones(10, 10)
)

str(tensors)
#> List of 2
#>  $ x:Float [1:10, 1:10]
#>  $ y:Float [1:10, 1:10]

tmp <- tempfile()
safe_save_file(tensors, tmp)

It’s possible to pass additional metadata to the saved file by providing a metadataparameter containing a named list.

Reading safetensors files is handled by safe_load_file, and it returns the namedlist of tensors along with the metadata attribute containing the parsed file header.

tensors <- safe_load_file(tmp)
str(tensors)
#> List of 2
#>  $ x:Float [1:10, 1:10]
#>  $ y:Float [1:10, 1:10]
#>  - attr(*, "metadata")=List of 2
#>   ..$ x:List of 3
#>   .. ..$ shape       : int [1:2] 10 10
#>   .. ..$ dtype       : chr "F32"
#>   .. ..$ data_offsets: int [1:2] 0 400
#>   ..$ y:List of 3
#>   .. ..$ shape       : int [1:2] 10 10
#>   .. ..$ dtype       : chr "F32"
#>   .. ..$ data_offsets: int [1:2] 400 800
#>  - attr(*, "max_offset")= int 929

Currently, safetensors only supports writing torch tensors, but we plan to addsupport for writing plain R arrays and tensorflow tensors in the future.

Future directions

The next version of torch will use safetensors as its serialization format,meaning that when calling torch_save() on a model, list of tensors, or othertypes of objects supported by torch_save, you will get a valid safetensors file.

This is an improvement over the previous implementation because:

  1. It’s much faster. More than 10x for medium sized models. Could be even more for large files.This also improves the performance of parallel dataloaders by ~30%.

  2. It enhances cross-language and cross-framework compatibility. You can train your modelin R and use it in Python (and vice-versa), or train your model in tensorflow and run itwith torch.

If you want to try it out, you can install the development version of torch with:

remotes::install_github("mlverse/torch")

Photo by Nick Fewings on Unsplash

Take pleasure in this weblog? Get notified of latest posts by electronic mail:

Posts additionally accessible at r-bloggers

Reuse

Textual content and figures are licensed below Inventive Commons Attribution CC BY 4.0. The figures which were reused from different sources do not fall below this license and will be acknowledged by a be aware of their caption: “Determine from …”.

Quotation

For attribution, please cite this work as

Falbel (2023, June 15). Posit AI Weblog: safetensors 0.1.0. Retrieved from https://blogs.rstudio.com/tensorflow/posts/2023-06-15-safetensors/

BibTeX quotation

@misc{safetensors,
  creator = {Falbel, Daniel},
  title = {Posit AI Weblog: safetensors 0.1.0},
  url = {https://blogs.rstudio.com/tensorflow/posts/2023-06-15-safetensors/},
  12 months = {2023}
}

The post Posit AI Weblog: safetensors 0.1.0 appeared first on AIPressRoom.