Question related to #106 (Weight Initialization) #302

wonjae-ji · 2021-11-05T08:49:01Z

Hi,

I have questions related to #106 .
It was answered that initialization of weights in TransferCompound can be done by "set_hidden_parameters", yet I couldn't change the weight values of each array inside the unit cell.

Here is the code I used:

1) Device used

device=TransferCompound(
# Devices that compose the Tiki-taka compound.
unit_cell_devices=[
LinearStepDevice(),
LinearStepDevice()
],
# Make some adjustments of the way Tiki-Taka is performed.
units_in_mbatch=True, # batch_size=1 anyway
transfer_every=1, # every 2 batches do a transfer-read
n_cols_per_transfer=1, # one forward read for each transfer
gamma=1,
scale_transfer_lr=False, # in relative terms to SGD LR
transfer_lr=0.01, # same transfer LR as for SGD
)
rpu = UnitCellRPUConfig(device=device)

2) setting up hidden weights - gamma = 1

analog_tile = AnalogTile(out_size=1,
in_size=1,
rpu_config=rpu)



temp = analog_tile.get_hidden_parameters()
print("original value for weight1: ",temp["hidden_weights_0"])
print("original value for weight2: ",temp["hidden_weights_1"])
print("original value for max bound1: ", temp["max_bound_0"])
print("original value for dwmin_up2: ", temp["dwmin_up_1"])
print(" ")
temp["hidden_weights_0"] = torch.tensor(1.0).repeat((1,1)).transpose(0,1)
temp["hidden_weights_1"] = torch.tensor(1.0).repeat((1,1)).transpose(0,1)
temp["max_bound_0"] = torch.tensor(123.0).repeat((1,1)).transpose(0,1)
temp["dwmin_up_1"] = torch.tensor(987.0).repeat((1,1)).transpose(0,1)
print("setting for weights saved in 'temp' :",temp["hidden_weights_1"])
print("setting for max bound1 saved in 'temp' :",temp["max_bound_0"])
print("setting for dwmin_up2 saved in 'temp' :",temp["dwmin_up_1"])
print(" ")
print("'temp'")
print(temp)



analog_tile.set_hidden_parameters(temp)
print(" ")
print("initial setting done for weight1: ",analog_tile.get_hidden_parameters()["hidden_weights_0"])
print("initial setting done for weight2: ", analog_tile.get_hidden_parameters()["hidden_weights_1"])
print("initial setting done for max_bound1: ",analog_tile.get_hidden_parameters()["max_bound_0"])
print("initial setting done for dwmin_up2: ",analog_tile.get_hidden_parameters()["dwmin_up_1"])

The result of the code above is :

Parameters like max_bound_0 and dwmin_up_1 were updated as expected yet hidden weights were not.

Q1) Is my method of setting hidden weights wrong?

Q2) Would there be any other way of changing values of hidden weights of each array in TransferCompound?

Q3) In the reply, it is written that effective weight w in TransferCompound can be written as w = gamma*A + (1-gamma)*C. I'm a little confused with the concept of gamma.

Shouldn't I get w = 0 if I set gamma = 1 in TransferCompound since A(first device) will be set to zero as initialization and C(second device) will be ignored? Yet I got some values when I set gamma = 1 while all A / C / get_weight becomes zero when I set gamma = 0.
Am I getting the formation of effective w in TransferCompound wrong?

cf) If I set gamma = 0 in TransferCompound:

Thank you.

The text was updated successfully, but these errors were encountered:

maljoras · 2021-11-05T14:40:16Z

Hi @jiwonjae,
many thanks for the question! It is a little bit counter-intuitive, but in the TransferCompound in case of gamma=0 (fully hidden "fast" array A) the overall weight W of the compound is just identical to the "slow" array (also sometimes called C). Because W=C always, there is a small memory improvement that it is handled internally with the same pointer, so that the hidden parameter C weight cannot be set directly from the hidden parameters (it is a nullptr). However, all what you need to do is to set/get the C weight with the usual set_weights/get_weights methods (since W is just C). This indeed should be the default way to set the weights, the hidden parameter setting is only to be uses for those parameters that truly are hidden.

As a side note, not sure what you are trying to do, but it would be much more convenient to set parameters in defining the meta parameters in rpu_config. Note that the symmetry point is guaranteed to be zero for e.g. SoftBoundsDevices if up_down_dtod=0.0. Spread of all the parameters can be conveniently given as well. So in most cases there is no need to manually modify hidden parameters.

maljoras · 2021-11-08T15:52:38Z

Hi @jiwonjae ,
regarding Q3, indeed the gamma setting was not correctly mentioned in my earlier post #106 . For two devices the weighting is set to gamma*A + C when specifying gamma, see here. Sorry about the confusion.
By the way, you can also set the weighting explicitely for A and C be setting the gamma_vec field to something like [0.1, 0.2] to get 0.1*A + 0.2*C (see here)

maljoras · 2021-11-09T22:24:53Z

Hi @jiwonjae ,
The explicit setting of the hidden weight parameter in the hierarchical TransferCompound with ReferenceUnitCell was indeed not functioning correctly. It should be fixed now once #313 is merged.

maljoras · 2021-11-10T15:12:25Z

Just adding to my above reply: Each of the hidden parameters can be set individually as one would expect. For instance, one can set the decay (lifetime) of each of the arrays separately as in the following example. Note that decay_scales of the hidden parameters are defined as 1-1/lifetime.

    
from torch import ones, Tensor
from torch.nn.functional import mse_loss

from aihwkit.nn import AnalogLinear
from aihwkit.simulator.configs import UnitCellRPUConfig
from aihwkit.simulator.configs.devices import (
    TransferCompound,
    ReferenceUnitCell,
    SoftBoundsDevice)
from aihwkit.optim import AnalogSGD

# The Tiki-taka learning rule can be implemented using the transfer device.
def SBdevice():
    """Custom device """
    return SoftBoundsDevice(w_max_dtod=0.0, w_min_dtod=0.0, w_max=1.0, w_min=-1.0, 
                            lifetime=100., lifetime_dtod=0.0)

# arbitrary
gamma = 0.2

rpu_config = UnitCellRPUConfig(
    device=TransferCompound(

        # Devices that compose the Tiki-taka compound.
        unit_cell_devices=[
            ReferenceUnitCell([SBdevice(), SBdevice()]),  # fast "A" matrix
            ReferenceUnitCell([SBdevice(), SBdevice()])   # slow "C" matrix
        ],
        gamma=gamma,
    )
)

# print the config
print(rpu_config)

# construct the model
in_features = 2
out_features = 1
model = AnalogLinear(in_features, out_features, bias=True, rpu_config=rpu_config)

# set the reference devices to some values
params = model.analog_tile.get_hidden_parameters()
shape = params['hidden_weights_0_0'].shape

# just dummy settings
a, b, c, d = 0.4, 0.2, 0.6, 0.1
params['hidden_weights_0_0'] = a*ones(*shape)  # A
params['hidden_weights_1_0'] = b*ones(*shape)  # A ref
params['hidden_weights_0_1'] = c*ones(*shape)  # C
params['hidden_weights_1_1'] = d*ones(*shape)  # C_ref


# explicitly set the decay scales (which is 1-1/lifetime)
a_dcy, b_dcy, c_dcy, d_dcy = 0.95, 0.78, 0.93, 0.92
params['decay_scales_0_0'] = a_dcy * ones(*shape)  # A
params['decay_scales_1_0'] = b_dcy * ones(*shape)  # A ref
params['decay_scales_0_1'] = c_dcy * ones(*shape)  # C
params['decay_scales_1_1'] = d_dcy * ones(*shape)  # C_ref

model.analog_tile.set_hidden_parameters(params)

# LR set to zero. Only lifetime will be applied
opt = AnalogSGD(model.parameters(), lr=0.0)

x_b = Tensor([[0.1, 0.2], [0.2, 0.4]])
y_b = Tensor([[0.3], [0.6]])

batches = 2
for _ in range(epochs):
       opt.zero_grad()
       pred = model(x_b)
       loss = mse_loss(pred, y_b)
       loss.backward()
       opt.step()

weight, bias = model.get_weights()

#  values are decayed for each batch
a = a * pow(a_dcy, batches)
b = b * pow(b_dcy, batches)
c = c * pow(c_dcy, batches)
d = d * pow(d_dcy, batches)


# should be true for all weights
print(weight == gamma*(a - b) + c - d)

wonjae-ji · 2021-11-11T15:17:07Z

Thank you so much for the help!

maljoras added the question label Nov 5, 2021

maljoras mentioned this issue Nov 5, 2021

Set/get slow weight of a transfer compound through hidden parameters in case gamma equal zero #303

Closed

maljoras mentioned this issue Nov 9, 2021

Fixed hierachical setting of hidden weights #313

Merged

wonjae-ji closed this as completed Nov 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question related to #106 (Weight Initialization) #302

Question related to #106 (Weight Initialization) #302

wonjae-ji commented Nov 5, 2021 •

edited

Loading

maljoras commented Nov 5, 2021 •

edited

Loading

maljoras commented Nov 8, 2021

maljoras commented Nov 9, 2021

maljoras commented Nov 10, 2021

wonjae-ji commented Nov 11, 2021

Question related to #106 (Weight Initialization) #302

Question related to #106 (Weight Initialization) #302

Comments

wonjae-ji commented Nov 5, 2021 • edited Loading

maljoras commented Nov 5, 2021 • edited Loading

maljoras commented Nov 8, 2021

maljoras commented Nov 9, 2021

maljoras commented Nov 10, 2021

wonjae-ji commented Nov 11, 2021

wonjae-ji commented Nov 5, 2021 •

edited

Loading

maljoras commented Nov 5, 2021 •

edited

Loading