Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RF results #63

Open
kdruart29 opened this issue Dec 8, 2023 · 2 comments
Open

RF results #63

kdruart29 opened this issue Dec 8, 2023 · 2 comments

Comments

@kdruart29
Copy link

Hello there,

I currently am experimenting RandomForests from sklearn into a ZCU102 board. I first tried with the classic HLS/Vivado/Vitis flow but was struggling with the results. I tried using pynq + the hls accelerator and my results are still weird.

So, for the example I am using the basic wine dataset from sklearn, with a RF (100 trees with a max depth of 100).
With sklearn I obtain these predictions: (using clf.predict_proba), which are fine
[0.97 0.03 0. ]
[0.93 0.05 0.02]
[0.06 0.12 0.82]
[0.91 0.08 0.01]
[0.07 0.85 0.08]

Then, with the model converted and compiled I obtain this : (using model.decision_function)
[ 8.59375000e-01 6.23525391e+01 2.60214844e+01]
[ 7.51953125e-01 -3.56474609e+01 2.61230469e+01]
[ 1.75781250e-01 8.43525391e+01 2.62246094e+01]
[ 7.03125000e-01 -8.66474609e+01 2.63261719e+01]
[ 2.83203125e-01 -9.96474609e+01 2.64277344e+01]
These results are strange and I don't understand them, what would be the explanation about them ?

Finally, on the PL, here are the results provided by accelerator.decision_function(np.float32(X_test))
[0.859375 0. 0. ]
[0.7519531 0. 0. ]
[0.17578125 0. 0. ]
[0.703125 0. 0. ]
[0.28320312 0. 0. ]
These one correspond to the precedent results given by the converted model.

For the conversion I used the examples :
clf = RandomForestClassifier(n_estimator=100, max_depth=100)
clf.fit(X_train, X_test)

cfg = conifer.backends.xilinxhls.auto_config()
accelerator_config = {'Board' : 'zcu102',
'InterfaceType': 'float'}
cfg['AcceleratorConfig'] = accelerator_config
cfg['OutputDir'] = 'prj_{}'.format(int(datetime.datetime.now().timestamp()))

model = conifer.converters.convert_from_sklearn(clf, cfg)
model.compile()

y_hls = model.decision_function(X_test)
y_skl = clf.predict_proba(X_test)

model.build(bitfile=True, package=True)

What am I doing wrong ?
Thank you in advance

@thesps
Copy link
Owner

thesps commented Jan 17, 2024

Hi, thanks for reaching out.

I think there are a few things going on, but it seems to me that the Random Forest conversion is not working correctly, at least for multi-class problems. I tried working with the same wine dataset and see similar nonsense results to yours, and I can see 'missing' trees in the converted model firmware under firmware/parameters.h (missing tree indices). For a binary classification example the results looked more compatible between sklearn and the conifer HLS.

One effect that is smaller, but would eventually need to be taken into account for this dataset is the data types. The defaults probably don't work well for the features in this case. In general this is dataset dependent, but for the wine example a better configuration might be:

# Create a conifer config
cfg = conifer.backends.xilinxhls.auto_config(granularity='full')
cfg['InputPrecision'] = 'ap_fixed<18,16>'
cfg['ThresholdPrecision'] = 'ap_fixed<18,16>'
cfg['ScorePrecision'] = 'ap_fixed<18,8,AP_RND_CONV,AP_SAT>'

Besides your issue, it seems that you used the accelerator support and ran on a device. Since this is a quite new feature I'm also looking for feedback on that part of the workflow. Was it easy enough to make the bitfile and run it on the board?

@kdruart29
Copy link
Author

kdruart29 commented Feb 2, 2024

Hi!

Actually the conversion is doing great, the trees are correctly saved in the parameters.h file. The issue is with how RF and BDT are implemented in Sklearn
In Sklearn, BDT are converted into subtrees for each class in each estimator wheras RF use a single tree, so the BDT_rolled.cpp can't do the other classes because it expects subtrees for each class. I solved this by modifying the way the value field is converted, adapted the BDT header and cpp file to accept the multiclass RF. Issue is it's not compatible with BDT now, just RF for my case. I plan on commiting my code when fully compatible.

The accelerator workflow is surprisingly easy and it works very well. The only difficulty was to find a compatible image of pynq for my zcu102.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants