Hi Mahesh,

1 min readMar 19, 2022

Hi Mahesh,

the UNet is a fully convolutional neural network. As a consequence the network's weights are only the kernels (these 3x3 matrices that iterate over the image or the layers). This means that for training you can have varying image input sizes, e.g. a batch of 512x512 images for the first pass and a batch 1024x1024 images for the second pass.

For inference, the input size also does not matter as long as you have compatible sizes (take a loog at the compute_possible_shapes function and text in Part 2).

I've created an example that shows such an scenario:

https://gist.github.com/johschmidt42/73f548a3b6eb3a6aa7c504f76dc51b25

For my thesis I was training on smaller image patches but inference was on full sizes images (more than 10x the size). The resolution was the same, but training was only possible with smaller images due to hardware limitations (3D images).

Please don't forget that if you train on 512x512 images, and inference is on 1024x1024 images, your output is 1024x1024, not 512x512.

Let me know if that helps,

Johannes

Written by Johannes Schmidt

Responses (2)