Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The issue of reshape in unpatchify #132

Open
prsigma opened this issue Mar 15, 2025 · 4 comments
Open

The issue of reshape in unpatchify #132

prsigma opened this issue Mar 15, 2025 · 4 comments
Labels
automatic-stale question Further information is requested

Comments

@prsigma
Copy link

prsigma commented Mar 15, 2025

Thanks for your excellent work. In your models/latte.py, there is the following code:

def unpatchify(self, x):
        """
        x: (N, T, patch_size**2 * C)
        imgs: (N, H, W, C)
        """
        c = self.out_channels
        p = self.x_embedder.patch_size[0]
        h = w = int(x.shape[1] ** 0.5)
        assert h * w == x.shape[1]

        x = x.reshape(shape=(x.shape[0], h, w, p, p, c))
        x = torch.einsum('nhwpqc->nchpwq', x)
        imgs = x.reshape(shape=(x.shape[0], c, h * p, h * p))
        return imgs

For the input x, why can't it be reshaped to (x.shape[0], h, w, c, p, p)? Is there any particular reason for this specific reshaping order?

@maxin-cn maxin-cn added the question Further information is requested label Mar 15, 2025
@maxin-cn
Copy link
Collaborator

This order depends on whether the order of channels (C) and spatial dimensions (p, p) is consistent with the original patchify process. After modification, it may not be able to generate a video, you can simply try.

@prsigma
Copy link
Author

prsigma commented Mar 15, 2025

This order depends on whether the order of channels (C) and spatial dimensions (p, p) is consistent with the original patchify process. After modification, it may not be able to generate a video, you can simply try.

I noticed that your patchify operation uses a patch embedding similar to that of Vision Transformers (ViT), with the specific structure as follows:

(x_embedder): PatchEmbed(
    (proj): Conv2d(4, 1152, kernel_size=(2, 2), stride=(2, 2))
    (norm): Identity()
)

How do you determine the order of (p, p) and C in the patchify process?

@maxin-cn
Copy link
Collaborator

Please see here.

Copy link

Hi There! 👋

This issue has been marked as stale due to inactivity for 7 days.

We would like to inquire if you still have the same problem or if it has been resolved.

If you need further assistance, please feel free to respond to this comment within the next 7 days. Otherwise, the issue will be automatically closed.

We appreciate your understanding and would like to express our gratitude for your contribution to Latte. Thank you for your support. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automatic-stale question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants