Hi! Thanks,
the format is dictated by the faster_rcnn implementation from pytorch.
So it expects a certain torch.tensor when you feed it with batches of data. How this data is transformed to this format is up to the user. So if you have the coco format (https://cocodataset.org/#format-data), I’d start to modify the dataset class (ObjectDetectionDataSet) in a way that you read in the data (assuming json files here), extract the important information so that you get the two variables: boxes, labels
Then the rest of the code pretty much stays the same.
If you’re struggling to make the transformation or code work, you can open up an issue on the github repo, provide a minimal example and we can work it out together.
best,
Johannes