• AIPressRoom
  • Posts
  • Fast intro to multi-modal ML with OpenAI’s CLIP

Fast intro to multi-modal ML with OpenAI’s CLIP

OpenAI’s CLIP is “multi-modal” model capable of understanding the relationships and concepts between both text and images. As we’ll see, CLIP is very capable, and when used via the Hugging Face library, could not be easier to work with.

70% Discount on the NLP With Transformers in Python course:https://bit.ly/3DFvvY5

00:00 Intro00:15 What is CLIP?02:13 Getting started05:38 Creating text embeddings07:23 Creating image embeddings10:26 Embedding a lot of images15:08 Text-image similarity search21:38 Alternative image and text search