Skip to content
Commit cd6b0cea authored by Oded Gabbay's avatar Oded Gabbay
Browse files

habanalabs/gaudi: increase default cs timeout to 10 minutes



In order to improve scalability and reduce host overhead, it is better
to increase the default TDR timeout of Gaudi1 from 30 seconds to
10 minutes.

This will allow the DL Framework (e.g. PyTorch, TensorFlow) to remove
the host sync they are using now and improve overall performance on
scaleout training.

Note that one can always set the timeout to a custom value via
a kernel module parameter given during driver load.

Signed-off-by: default avatarOded Gabbay <ogabbay@kernel.org>
parent 913bd417
Loading
Loading
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please to comment