Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question related to #106 (Weight Initialization) #302

Closed
wonjae-ji opened this issue Nov 5, 2021 · 5 comments
Closed

Question related to #106 (Weight Initialization) #302

wonjae-ji opened this issue Nov 5, 2021 · 5 comments
Labels
question Further information is requested

Comments

@wonjae-ji
Copy link

wonjae-ji commented Nov 5, 2021

Hi,

I have questions related to #106 .
It was answered that initialization of weights in TransferCompound can be done by "set_hidden_parameters", yet I couldn't change the weight values of each array inside the unit cell.

Here is the code I used:

1) Device used

device=TransferCompound(
# Devices that compose the Tiki-taka compound.
unit_cell_devices=[
LinearStepDevice(),
LinearStepDevice()
],
# Make some adjustments of the way Tiki-Taka is performed.
units_in_mbatch=True, # batch_size=1 anyway
transfer_every=1, # every 2 batches do a transfer-read
n_cols_per_transfer=1, # one forward read for each transfer
gamma=1,
scale_transfer_lr=False, # in relative terms to SGD LR
transfer_lr=0.01, # same transfer LR as for SGD
)
rpu = UnitCellRPUConfig(device=device)

2) setting up hidden weights - gamma = 1

analog_tile = AnalogTile(out_size=1,
in_size=1,
rpu_config=rpu)



temp = analog_tile.get_hidden_parameters()
print("original value for weight1: ",temp["hidden_weights_0"])
print("original value for weight2: ",temp["hidden_weights_1"])
print("original value for max bound1: ", temp["max_bound_0"])
print("original value for dwmin_up2: ", temp["dwmin_up_1"])
print(" ")
temp["hidden_weights_0"] = torch.tensor(1.0).repeat((1,1)).transpose(0,1)
temp["hidden_weights_1"] = torch.tensor(1.0).repeat((1,1)).transpose(0,1)
temp["max_bound_0"] = torch.tensor(123.0).repeat((1,1)).transpose(0,1)
temp["dwmin_up_1"] = torch.tensor(987.0).repeat((1,1)).transpose(0,1)
print("setting for weights saved in 'temp' :",temp["hidden_weights_1"])
print("setting for max bound1 saved in 'temp' :",temp["max_bound_0"])
print("setting for dwmin_up2 saved in 'temp' :",temp["dwmin_up_1"])
print(" ")
print("'temp'")
print(temp)



analog_tile.set_hidden_parameters(temp)
print(" ")
print("initial setting done for weight1: ",analog_tile.get_hidden_parameters()["hidden_weights_0"])
print("initial setting done for weight2: ", analog_tile.get_hidden_parameters()["hidden_weights_1"])
print("initial setting done for max_bound1: ",analog_tile.get_hidden_parameters()["max_bound_0"])
print("initial setting done for dwmin_up2: ",analog_tile.get_hidden_parameters()["dwmin_up_1"])

The result of the code above is :

image

Parameters like max_bound_0 and dwmin_up_1 were updated as expected yet hidden weights were not.

Q1) Is my method of setting hidden weights wrong?

Q2) Would there be any other way of changing values of hidden weights of each array in TransferCompound?

Q3) In the reply, it is written that effective weight w in TransferCompound can be written as w = gamma*A + (1-gamma)*C. I'm a little confused with the concept of gamma.

Shouldn't I get w = 0 if I set gamma = 1 in TransferCompound since A(first device) will be set to zero as initialization and C(second device) will be ignored? Yet I got some values when I set gamma = 1 while all A / C / get_weight becomes zero when I set gamma = 0.
Am I getting the formation of effective w in TransferCompound wrong?

cf) If I set gamma = 0 in TransferCompound:

image

Thank you.

@maljoras
Copy link
Collaborator

maljoras commented Nov 5, 2021

Hi @jiwonjae,
many thanks for the question! It is a little bit counter-intuitive, but in the TransferCompound in case of gamma=0 (fully hidden "fast" array A) the overall weight W of the compound is just identical to the "slow" array (also sometimes called C). Because W=C always, there is a small memory improvement that it is handled internally with the same pointer, so that the hidden parameter C weight cannot be set directly from the hidden parameters (it is a nullptr). However, all what you need to do is to set/get the C weight with the usual set_weights/get_weights methods (since W is just C). This indeed should be the default way to set the weights, the hidden parameter setting is only to be uses for those parameters that truly are hidden.

As a side note, not sure what you are trying to do, but it would be much more convenient to set parameters in defining the meta parameters in rpu_config. Note that the symmetry point is guaranteed to be zero for e.g. SoftBoundsDevices if up_down_dtod=0.0. Spread of all the parameters can be conveniently given as well. So in most cases there is no need to manually modify hidden parameters.

@maljoras
Copy link
Collaborator

maljoras commented Nov 8, 2021

Hi @jiwonjae ,
regarding Q3, indeed the gamma setting was not correctly mentioned in my earlier post #106 . For two devices the weighting is set to gamma*A + C when specifying gamma, see here. Sorry about the confusion.
By the way, you can also set the weighting explicitely for A and C be setting the gamma_vec field to something like [0.1, 0.2] to get 0.1*A + 0.2*C (see here)

@maljoras
Copy link
Collaborator

maljoras commented Nov 9, 2021

Hi @jiwonjae ,
The explicit setting of the hidden weight parameter in the hierarchical TransferCompound with ReferenceUnitCell was indeed not functioning correctly. It should be fixed now once #313 is merged.

@maljoras
Copy link
Collaborator

Just adding to my above reply: Each of the hidden parameters can be set individually as one would expect. For instance, one can set the decay (lifetime) of each of the arrays separately as in the following example. Note that decay_scales of the hidden parameters are defined as 1-1/lifetime.

    
from torch import ones, Tensor
from torch.nn.functional import mse_loss

from aihwkit.nn import AnalogLinear
from aihwkit.simulator.configs import UnitCellRPUConfig
from aihwkit.simulator.configs.devices import (
    TransferCompound,
    ReferenceUnitCell,
    SoftBoundsDevice)
from aihwkit.optim import AnalogSGD

# The Tiki-taka learning rule can be implemented using the transfer device.
def SBdevice():
    """Custom device """
    return SoftBoundsDevice(w_max_dtod=0.0, w_min_dtod=0.0, w_max=1.0, w_min=-1.0, 
                            lifetime=100., lifetime_dtod=0.0)

# arbitrary
gamma = 0.2

rpu_config = UnitCellRPUConfig(
    device=TransferCompound(

        # Devices that compose the Tiki-taka compound.
        unit_cell_devices=[
            ReferenceUnitCell([SBdevice(), SBdevice()]),  # fast "A" matrix
            ReferenceUnitCell([SBdevice(), SBdevice()])   # slow "C" matrix
        ],
        gamma=gamma,
    )
)

# print the config
print(rpu_config)

# construct the model
in_features = 2
out_features = 1
model = AnalogLinear(in_features, out_features, bias=True, rpu_config=rpu_config)

# set the reference devices to some values
params = model.analog_tile.get_hidden_parameters()
shape = params['hidden_weights_0_0'].shape

# just dummy settings
a, b, c, d = 0.4, 0.2, 0.6, 0.1
params['hidden_weights_0_0'] = a*ones(*shape)  # A
params['hidden_weights_1_0'] = b*ones(*shape)  # A ref
params['hidden_weights_0_1'] = c*ones(*shape)  # C
params['hidden_weights_1_1'] = d*ones(*shape)  # C_ref


# explicitly set the decay scales (which is 1-1/lifetime)
a_dcy, b_dcy, c_dcy, d_dcy = 0.95, 0.78, 0.93, 0.92
params['decay_scales_0_0'] = a_dcy * ones(*shape)  # A
params['decay_scales_1_0'] = b_dcy * ones(*shape)  # A ref
params['decay_scales_0_1'] = c_dcy * ones(*shape)  # C
params['decay_scales_1_1'] = d_dcy * ones(*shape)  # C_ref

model.analog_tile.set_hidden_parameters(params)

# LR set to zero. Only lifetime will be applied
opt = AnalogSGD(model.parameters(), lr=0.0)

x_b = Tensor([[0.1, 0.2], [0.2, 0.4]])
y_b = Tensor([[0.3], [0.6]])

batches = 2
for _ in range(epochs):
       opt.zero_grad()
       pred = model(x_b)
       loss = mse_loss(pred, y_b)
       loss.backward()
       opt.step()

weight, bias = model.get_weights()

#  values are decayed for each batch
a = a * pow(a_dcy, batches)
b = b * pow(b_dcy, batches)
c = c * pow(c_dcy, batches)
d = d * pow(d_dcy, batches)


# should be true for all weights
print(weight == gamma*(a - b) + c - d)

@wonjae-ji
Copy link
Author

Thank you so much for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants