Replies: 3 comments 2 replies
-
mmm, requirement is unclear. What should be done to slices that have different partition1 and partition2? So far I understand you requirement as:
|
Beta Was this translation helpful? Give feedback.
1 reply
-
From your response I'm guessing I can do the following, is this correct? a = torch.randn(8, 6, 4, 32)
m = EinMix("b n1 n2 c -> b n1 n2 c0", weight_shape="n1 n2 c c0", bias_shape="c0", c=32, c0=32, n1=6, n2=4)
assert m(a).shape == torch.Size([8, 6, 4, 32]) |
Beta Was this translation helpful? Give feedback.
0 replies
-
Yes, that looks close.
Maybe you also want bias_shape = 'n1 n2 c0' so bias was individual for
every partition.
…On Sat, 11 Feb 2023, 13:58 Xujin Chris Liu, ***@***.***> wrote:
From your response I'm guessing I can do the following, is this correct?
a = torch.randn(8, 6, 4, 32)m = EinMix("b n1 n2 c -> b n1 n2 c0", weight_shape="n1 n2 c c0", bias_shape="c0", c=32, c0=32, n1=6, n2=4)assert m(a).shape == torch.Size([8, 6, 4, 32])
—
Reply to this email directly, view it on GitHub
<#240 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABQGVWYT3J2Z527SNDLJGILWXADPHANCNFSM6AAAAAAUYDPGLI>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm working on an architecture that involves of lot of multi-head operations. For example, I would like to apply linear layers on a tensor of size
batch, partition 1, partition 2, hiddens
. Here I want all slices that have the same index forpartition 1
andpartition 2
to be applied the same weight matrix of sizehiddens, hiddens
, and vice versa. What I do now is just initialize a bunch of weight and bias tensor and usetorch.addmm
. It works but is very ugly code.I'd love to use einmix here but it looks like this functionality is not present / not trivial to work out from the doc. Is this correct?
Beta Was this translation helpful? Give feedback.
All reactions