You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your excellent work. In your models/latte.py, there is the following code:
def unpatchify(self, x):
"""
x: (N, T, patch_size**2 * C)
imgs: (N, H, W, C)
"""
c = self.out_channels
p = self.x_embedder.patch_size[0]
h = w = int(x.shape[1] ** 0.5)
assert h * w == x.shape[1]
x = x.reshape(shape=(x.shape[0], h, w, p, p, c))
x = torch.einsum('nhwpqc->nchpwq', x)
imgs = x.reshape(shape=(x.shape[0], c, h * p, h * p))
return imgs
For the input x, why can't it be reshaped to (x.shape[0], h, w, c, p, p)? Is there any particular reason for this specific reshaping order?
The text was updated successfully, but these errors were encountered:
This order depends on whether the order of channels (C) and spatial dimensions (p, p) is consistent with the original patchify process. After modification, it may not be able to generate a video, you can simply try.
This order depends on whether the order of channels (C) and spatial dimensions (p, p) is consistent with the original patchify process. After modification, it may not be able to generate a video, you can simply try.
I noticed that your patchify operation uses a patch embedding similar to that of Vision Transformers (ViT), with the specific structure as follows:
Thanks for your excellent work. In your
models/latte.py
, there is the following code:For the input
x
, why can't it be reshaped to(x.shape[0], h, w, c, p, p)
? Is there any particular reason for this specific reshaping order?The text was updated successfully, but these errors were encountered: