Smashing Computer Vision Models
===============================

This tutorial demonstrates how to use the `pruna` package to optimize any custom computer vision model. We will use the vit_b_16 model as an example.

Loading the CV Model
--------------------

First, load your stable diffusion model.

.. code-block:: python

    from torchvision.models import resnet50, ResNet50_Weights
    import torchvision

    model = torchvision.models.vit_b_16(weights="ViT_B_16_Weights.DEFAULT").cuda()

Initializing the Smash Config
-------------------------------

Next, initialize the smash_config.

.. code-block:: python

    from pruna_engine.SmashConfig import SmashConfig

    # Initialize the SmashConfig
    smash_config = SmashConfig()
    smasher_config['task'] = 'image_classification'
    smash_config["compilers"] = "cv-fast"
    smash_config['weight_quantization_bits'] = 16

Smashing the Model
------------------

Now, smash the model.

.. code-block:: python

    from pruna.smash import smash

    # Smash the model
    smashed_model = smash(
        model=model,
        api_key='<your-api-key>',  # replace <your-api-key> with your actual API key
        smash_config=smash_config,
    )

Don't forget to replace the api_key by the one provided by PrunaAI.

Preparing the Input
-------------------

.. code-block:: python
    
    import numpy as np
    from torchvision import transforms

    # Generating a random image
    image = np.random.randint(0, 256, size=(224, 224, 3), dtype=np.uint8)
    input_tensor = transforms.ToTensor()(image).unsqueeze(0).to(device)

Running the Model
-----------------

Finally, run the model to transcribe the audio file.

.. code-block:: python

    # Display the result
    smashed_model(input_tensor)

Wrap Up
---------

Congratulations! You have successfully smashed a CV model. You can now use the `pruna` package to optimize any custom CV model. The only parts that you should modify are step 1 and step 5 to fit your use case. Additionally you can use the compiler 'all' which explores many compression methods as well as supporting cpu optimization, albeit it supports less CV models.