Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Section 3 (of the paper) should have pseudocode #1

Open
dinhanhx opened this issue May 30, 2024 · 3 comments
Open

[Discussion] Section 3 (of the paper) should have pseudocode #1

dinhanhx opened this issue May 30, 2024 · 3 comments

Comments

@dinhanhx
Copy link

Discussion

I know the paper is being reviewed and will likely be modified. However, I think some sort of pseudocode would be nice. Few chunks of paragraphs make things a bit hard to follow. The pseudocode also would help other people implement this technique onto their current models.

@dinhanhx dinhanhx changed the title [Discussion] Section 3 (of the paper) should pseudocode [Discussion] Section 3 (of the paper) should have pseudocode May 30, 2024
@mu-cai
Copy link
Owner

mu-cai commented Jun 3, 2024

Thanks for your advice!
The core of M3 is here https://github.com/mu-cai/matryoshka-mm/blob/main/llava/model/llava_arch.py#L147

Let me know if you have further questions!

@dinhanhx
Copy link
Author

dinhanhx commented Jun 3, 2024

def matryoshka_vis_token_process(self, image_features, matryoshka_vis_token_scale):
N, H_W, C = image_features.shape
H = W = int(H_W ** 0.5)
reshaped_tensor = image_features.view(N, H, W, C)
reshaped_tensor = reshaped_tensor.permute(0, 3, 1, 2)
pool_size = stride = int( np.sqrt(H_W / matryoshka_vis_token_scale) )
pooled_tensor = F.avg_pool2d(reshaped_tensor, kernel_size=pool_size, stride=stride)
image_features = pooled_tensor.permute(0, 2, 3, 1)
image_features = image_features.reshape(N, -1, C)
# print('image_features.shape :', image_features.shape)
return image_features

What is the value range of matryoshka_vis_token_scale? From 1 to infinity? Or 0.0 to 1.0?

@mu-cai
Copy link
Owner

mu-cai commented Jun 4, 2024

Hi, the range is shown here: https://github.com/mu-cai/matryoshka-mm/blob/main/scripts/v1_5/finetune.sh#L36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants