Course project for VT CS5824 Spring '23
- Clone the repository:
git clone
- Data collection: Run the data_collection Jupyter notebook to download the image data from Google Open Images and segregate them into their respective catgeories.
- Data preprocessing: Run the data_preprocessing Jupyter notebook to preprocess the data and prepare the images and its associated masks for inpainting model inference
- Clone the LaMa repository:
git clone
- Setup the environment with the required dependencies mentioned here
- Change
in the config file given here - Run the inference as per the steps given here for each category. Example:
python --config configs/prediction/default.yaml --input_dir /home/ram/CS5824/images_and_masks/validation/food --output_dir /home/ram/CS5824/output/lama/validation/food
- Clone the Deepfill v2 repository:
git clone
- Setup the environment with the required dependencies mentioned here
- Run the inference as per the steps given here for each category. Example:
python --image /home/ram/CS5824/images_and_masks/validation/food/000000000139.jpg --mask /home/ram/CS5824/images_and_masks/validation/food/000000000139_mask001.png --output /home/ram/CS5824/output/deepfill/validation/food/000000000139_mask001.png --checkpoint pretrained/states_tf_places2.pth
- Go to the clone LaMa repository
- Change
in the config file given here - Run the evaluation script given here for the inputs and outputs to get the LPIPS, FID, and SSIM metrics. Example:
python bin/ configs/eval2_gpu.yaml /home/ram/CS5824/images_and_masks/validation/food /home/ram/CS5824/output/lama/validation/food /home/ram/CS5824/output/lama/food.csv
- You can repeat the above steps by using the same LaMa evaluation script for the deepfill v2 outputs as well.
- You can also find the results from the evaluation done for the reports can be found here
- Run the results_visualization Jupyter notebook to visualize the results of the inpainting models.
- Please note that the metrics may vary slightly from the ones reported in the report due to the different GPU configuration and also the fact that the report used a downsized version of the dataset (1500 images per category) for the evaluation.