Major update #512

maljoras · 2023-06-26T20:47:01Z

Summary

Major update from the development branch.

New training algorithms: Chopped Tiki-taka, AGAD
Major re-organization of AnalogTiles for increased modularity (TileWithPeriphery, SimulatorTile, SimulatorTileWrapper). Analog tile modules (that might have multiple simulator tiles) are now also torch Module and used by analog layers. New analog tiles can be customized.
Added CustomTile
Added TorchInferenceTile for a fully torch-based analog tile for inference (not using the C++ RPUCuda engine), supporting a subset of MVM nonidealities
New inference presets StandardHWATrainingPreset
New inference noise model ReRamWan2022NoiseModel (see also Noise model fitted and characterized on RRAM devices (programming noise and output-referred read noise) #394)
Improved HWA-training for inference featuring input and output range learning and more
Improved memory management (using torch cached GPU memory for internal RPUCuda buffer, significantly reducing the memory needed for running analog models)
Change in generators: analog_model.analog_tiles() now loops over all available tiles (in all modules)
Generator: analog_layers() loops over layer modules (except AnalogSequential) and is replacing analog_modules()
New correlation detection example for comparing specialized analog SGD algorithms
Simplified build_rpu_config script for generating RPUConfigs for analog in-memory SGD
Import and file position changes. However, user can import RPUConfig related things from aihwkit.simulator.config
convert_to_digital utility
Change: convert_to_analog now also considered mapping. Set mapping.max_out_size = 0 and mapping.max_out_size = 0 to avoid this.
Added logical TileModuleArray for logical weight matrices large than a single tile. Mapped layers now use this tile array
Change: Checkpoint structure is different. utils.legacy_load provides a way to load old checkpoints
realistic_read_write is removed from some high-level function. Use program_weights (after setting the weights) or read_weights for realistic reading (using weight estimation technique).
New training preset: ReRamArrayOMPresetDevice, ReRamArrayHfO2PresetDevice, ChoppedTTv2*, AGAD*
Pulse counters for pulsed analog training
Dumping of all C++ fields for accurate analog training saving and training continuation after checkpoint load.
apply_write_noise_on_set for pulsed devices
Reset now also for simple devices
SoftBoundsReference, PowStepReference for explicit reference subtraction of symmetry point in Tiki-taka
Analog MVM with output-to-output std-deviation variability (output_noise_std)
per_batch_sample weight noise injections for TorchInferenceRPUConfig

Details

make black for code formatting.
Switched to python 3.10, torch >= 1.9, C++ 14, numpy >=1.22

FIXED: `Read_weights` could have applied the scales wrongly (if learning of out scaling was used)

Now forward is called, not analog_forward. To avoid confusion analog_forward is renamed to joint_forward as it called pre_forward, tile.forward, and post_forward. It applied the mapping scales but not the learnable parts like digital bias and out scaling alpha.

ADDED: Output noise per output column support

Here forward.out_noise_std parameter is introduced which enables a systematic output-to-output variation of the output noise std dev. This is in relative terms, eg. 0.3 means 30% relative variation around out_noise.

The parameter values are drawn at instantiation. However, the can be modified, eg.:

rpu_config = InferenceRPUConfig()
rpu_config.forward.out_noise_std = 0.1  # 10% variation around out_noise
rpu_config.forward.out_noise = 0.1
model = AnalogLinear(4, 2, bias=True, rpu_config=rpu_config)

analog_tile  = next(model.analog_tiles())  # get individual tiles
dic = analog_tile.get_forward_parameters()
dic['out_noise_values'][0] = 0.23  # e.g. sets the output noise std-dev of the first output line
analog_tile.set_forward_parameters(dic)

CHANGED: Generators

analog_modules generator loops through all AnalogModuleBase instances including AnalogSequantial, AnalogWrapper
analog_layers generator loops through all AnalogModuleBase instances excluding AnalogSequantial, AnalogWrapper

ADDED: Plot the error for a particular device

Each analog training tile has the program_weights method which uses SGD and pulsed update to program the weights using the device properties. This can be used to make a weight error plot which could be compared to inference results with the phenomenological noise_model.

Example:

import matplotlib.pyplot as plt
import numpy as np

from aihwkit.utils.visualization import plot_programming_error

from aihwkit.simulator.presets import (
    MixedPrecisionEcRamMOPreset,
    MixedPrecisionReRamESPreset,
    StandardHWATrainingPreset,
)

figure = plt.figure(figsize=[10, 5])
ax = figure.add_subplot(1, 1, 1)

for preset, name in [(StandardHWATrainingPreset(), 'PCM model (1h)'),
                     (MixedPrecisionReRamESPreset(), 'ReRAM [GDP]'),
                     (MixedPrecisionEcRamMOPreset(), 'EcRAM MO [GDP]'),
                     ]:
    plot_programming_error(
        preset,
        w_range=(-0.8, 0.8),
        n_bins=51,
        t_inference=3600.,
        realistic_read=True,  # otherwise no drift compensation
        label=name
    )

plt.legend()
plt.show()

This would produce:

ADDED: presets for new training algorithms and convenient way to define new presets

Here a new function is added to generate presets for any analog training algorithm and device.

For example,

from aihwkit.simulator.configs import build_config
from aihwkit.simulator.presets import StandardIOParameters, ReRamSBPresetsDevice

rpu_config = build_config(algorithm, ReRamSBPresetDevice, StandardIOParameters)

would generate a valid configuration for training algorithm which could be sgd, tiki-taka, ttv2, mp, c-ttv2, or agad

ADDED: Torch inference tile

It was hard to make quick changes to the c++ backend, and we, therefore, created a pure torch-based version.
Added code for a pure torch-based tile. However, note that it supports only a subset of all non-idealities of the RPUCuda based tile.

New torch analog tile (similar to base tile) and new torch tile that is similar to the c++ tile.
Example showing how to use new torch tile.
Tests for testing the functionality compared to c++ based tile.

…nter/aihwkit into IBM-AI-Hardware-Center-release/0.7.0

Signed-off-by: Malte Rasch <[email protected]>

Kaoutar El Maghraoui and others added 14 commits January 4, 2023 22:39

New release prep. Changes to version number and CHANGELOG

25158a2

Merge branch 'release/0.7.0' of https://github.com/IBM-AI-Hardware-Ce…

5410864

…nter/aihwkit into IBM-AI-Hardware-Center-release/0.7.0

Merge branch 'IBM-AI-Hardware-Center-release/0.7.0'

d9ba361

Merge branch 'master' of https://github.com/IBM/aihwkit

55f03f3

upstream CHANGELOG

a9bec5a

Merge branch 'master' of https://github.com/IBM/aihwkit

877b41f

Merge branch 'master' of https://github.com/IBM/aihwkit

73a98f0

Merge branch 'master' of https://github.com/IBM/aihwkit

147827c

Merge branch 'master' of https://github.com/IBM/aihwkit

b4d1ad8

Merge branch 'master' of https://github.com/IBM/aihwkit

6a0b28c

Merge branch 'master' of https://github.com/IBM/aihwkit

eb29d62

version update

668e662

fix

e0bc049

changelog

e7479bb

maljoras requested review from kaoutar55 and kkvtran June 26, 2023 21:25

kkvtran previously approved these changes Jun 26, 2023

View reviewed changes

changelog

1fc285c

Signed-off-by: Malte Rasch <[email protected]>

maljoras dismissed kkvtran’s stale review via 1fc285c June 26, 2023 21:36

format

ede435e

maljoras requested a review from kkvtran June 27, 2023 12:03

remove example

f0a6774

kkvtran approved these changes Jun 27, 2023

View reviewed changes

maljoras merged commit 79e4900 into IBM:master Jun 27, 2023

maljoras deleted the version-update branch June 27, 2023 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Major update #512

Major update #512

maljoras commented Jun 26, 2023 •

edited

Loading

Major update #512

Major update #512

Conversation

maljoras commented Jun 26, 2023 • edited Loading

Summary

Details

FIXED: Read_weights could have applied the scales wrongly (if learning of out scaling was used)

ADDED: Output noise per output column support

CHANGED: Generators

ADDED: Plot the error for a particular device

ADDED: presets for new training algorithms and convenient way to define new presets

ADDED: Torch inference tile

maljoras commented Jun 26, 2023 •

edited

Loading

FIXED: `Read_weights` could have applied the scales wrongly (if learning of out scaling was used)