-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question related to #106 (Weight Initialization) #302
Comments
Hi @jiwonjae, As a side note, not sure what you are trying to do, but it would be much more convenient to set parameters in defining the meta parameters in |
Hi @jiwonjae , |
Just adding to my above reply: Each of the hidden parameters can be set individually as one would expect. For instance, one can set the decay (lifetime) of each of the arrays separately as in the following example. Note that
from torch import ones, Tensor
from torch.nn.functional import mse_loss
from aihwkit.nn import AnalogLinear
from aihwkit.simulator.configs import UnitCellRPUConfig
from aihwkit.simulator.configs.devices import (
TransferCompound,
ReferenceUnitCell,
SoftBoundsDevice)
from aihwkit.optim import AnalogSGD
# The Tiki-taka learning rule can be implemented using the transfer device.
def SBdevice():
"""Custom device """
return SoftBoundsDevice(w_max_dtod=0.0, w_min_dtod=0.0, w_max=1.0, w_min=-1.0,
lifetime=100., lifetime_dtod=0.0)
# arbitrary
gamma = 0.2
rpu_config = UnitCellRPUConfig(
device=TransferCompound(
# Devices that compose the Tiki-taka compound.
unit_cell_devices=[
ReferenceUnitCell([SBdevice(), SBdevice()]), # fast "A" matrix
ReferenceUnitCell([SBdevice(), SBdevice()]) # slow "C" matrix
],
gamma=gamma,
)
)
# print the config
print(rpu_config)
# construct the model
in_features = 2
out_features = 1
model = AnalogLinear(in_features, out_features, bias=True, rpu_config=rpu_config)
# set the reference devices to some values
params = model.analog_tile.get_hidden_parameters()
shape = params['hidden_weights_0_0'].shape
# just dummy settings
a, b, c, d = 0.4, 0.2, 0.6, 0.1
params['hidden_weights_0_0'] = a*ones(*shape) # A
params['hidden_weights_1_0'] = b*ones(*shape) # A ref
params['hidden_weights_0_1'] = c*ones(*shape) # C
params['hidden_weights_1_1'] = d*ones(*shape) # C_ref
# explicitly set the decay scales (which is 1-1/lifetime)
a_dcy, b_dcy, c_dcy, d_dcy = 0.95, 0.78, 0.93, 0.92
params['decay_scales_0_0'] = a_dcy * ones(*shape) # A
params['decay_scales_1_0'] = b_dcy * ones(*shape) # A ref
params['decay_scales_0_1'] = c_dcy * ones(*shape) # C
params['decay_scales_1_1'] = d_dcy * ones(*shape) # C_ref
model.analog_tile.set_hidden_parameters(params)
# LR set to zero. Only lifetime will be applied
opt = AnalogSGD(model.parameters(), lr=0.0)
x_b = Tensor([[0.1, 0.2], [0.2, 0.4]])
y_b = Tensor([[0.3], [0.6]])
batches = 2
for _ in range(epochs):
opt.zero_grad()
pred = model(x_b)
loss = mse_loss(pred, y_b)
loss.backward()
opt.step()
weight, bias = model.get_weights()
# values are decayed for each batch
a = a * pow(a_dcy, batches)
b = b * pow(b_dcy, batches)
c = c * pow(c_dcy, batches)
d = d * pow(d_dcy, batches)
# should be true for all weights
print(weight == gamma*(a - b) + c - d) |
Thank you so much for the help! |
Hi,
I have questions related to #106 .
It was answered that initialization of weights in
TransferCompound
can be done by "set_hidden_parameters", yet I couldn't change the weight values of each array inside the unit cell.Here is the code I used:
1) Device used
2) setting up hidden weights - gamma = 1
The result of the code above is :
Parameters like
max_bound_0
anddwmin_up_1
were updated as expected yet hidden weights were not.Q1) Is my method of setting hidden weights wrong?
Q2) Would there be any other way of changing values of hidden weights of each array in
TransferCompound
?Q3) In the reply, it is written that effective weight
w
inTransferCompound
can be written asw = gamma*A + (1-gamma)*C
. I'm a little confused with the concept ofgamma
.Shouldn't I get
w = 0
if I setgamma = 1
inTransferCompound
sinceA
(first device) will be set to zero as initialization andC
(second device) will be ignored? Yet I got some values when I setgamma = 1
while allA
/C
/get_weight
becomes zero when I setgamma = 0
.Am I getting the formation of effective
w
inTransferCompound
wrong?cf) If I set
gamma = 0
inTransferCompound
:Thank you.
The text was updated successfully, but these errors were encountered: