PREREQUISITES:
First step is to clone the ascend-utils repositorium from GitHub. You should open a New Terminal and then run the command
git clone https://github.com/br-ai-ns-institute/ascend-utils
in the Terminal window.
Then we go to the open folder button and we select ascend-utils folder and click ok (if necessary enter your password).
Then we select the Terminal button, located at the top of the page, and select New Terminal, which will open a Terminal in a new window and we can see that we are now on ubuntu, specifically on username@ubuntu-brains:~/ascend-utils$
In Terminal we input command
npu-smi info
and we will see the status of our Ascend devices. We look at the HBM-Usage(MB) section of the table and if it is showing 0 in the row of the device, this is an indicator that device is not being used at the moment, if we have any other value than 0, the device is currently used. This shows us which device is available.
You can also use the command
npu-smi info watch -i 4 -c 0 -d 5 -s ptaicmb
Which gives us a information about the monitoring data of all chips or a single chip.
More about this subject
and other commands regarding the Huawei npu-smi commands you can find on this link.
We need to use dockers in order to get all the necessary packages for our training run. Through command
docker image ls
we can list all of the docker image files.
In order to use appropriate docker image we need to run container with that docker image. Command
docker ps
print out containers that are running.
If our container is not running already, you need to run the docker command
docker run --name demo_seminar -d -it -e ASCEND_VISIBLE_DEVICES=1,4 --mount type=blind,source=/home df8e2e867f54 /bin/bash
and you need to add right docker IMAGE ID, that you can see when you run previous command docker image ls. In this case IMAGE ID df8e2e867f54.
After this we run the command
make python file=tutorials/tf_qs.py
and our training is executed on our CPU. We run it on our CPU and not on NPU, so we did not gain any benefit.
So, we need to copy helpers.py file from GitHub sentinel repository https://github.com/br-ai-ns-institute/sentinel to our local device in the cloned ascend-utils folder, that is in the tutorials folder under ascend-utils. We go on the folder tutorials, showed on the left side of the VS code and then click on the first button (new file) and create a new python file helpers.py.
Then copy the code from the original helpers.py file and save the file with ctrl+s command. Then we need to modify our original helpers.py so it run on NPU device. Modified code you can find here,
import os
import tensorflow as tf
from npu_bridge.npu_init import *
from tensorflow.core.protobuf.rewriter_config_pb2 import RewriterConfig
from tensorflow.python.keras import backend as K
class NpuHelperForTF:
"""Initialize NPU session for TF on Ascend platform."""
def __init__(self, device_id, rank_id, rank_size, job_id, rank_table_file):
# Init Ascend
os.environ["ASCEND_DEVICE_ID"] = device_id
os.environ["JOB_ID"] = job_id
os.environ["RANK_ID"] = rank_id
os.environ["RANK_SIZE"] = rank_size
os.environ["RANK_TABLE_FILE"] = rank_table_file
sess_config = tf.ConfigProto()
custom_op = sess_config.graph_options.rewrite_options.custom_optimizers.add()
custom_op.name = "NpuOptimizer"
custom_op.parameter_map["use_off_line"].b = True
custom_op.parameter_map["precision_mode"].s = tf.compat.as_bytes("force_fp16")
custom_op.parameter_map["graph_run_mode"].i = 0
custom_op.parameter_map["dynamic_input"].b = True
custom_op.parameter_map["dynamic_graph_execute_mode"].s = tf.compat.as_bytes("lazy_recompile")
sess_config.graph_options.rewrite_options.remapping = RewriterConfig.OFF
sess_config.graph_options.rewrite_options.memory_optimization = RewriterConfig.OFF
self._sess = tf.Session(config=sess_config)
K.set_session(self._sess)
def sess(self):
return self._sess
and you copied it into the helpers.py file and save the file with ctrl+s command. This step is shown here for you to see the modification in the code that is necessary in order to run our Model on NPU.
Also, you need to modify the tf_qs.py file that you cloned. Modified code you can find here, and you copied it into the tf_qs.py file:
"""TF quickstart example running on NPU devices."""
import tensorflow as tf
from helpers import NpuHelperForTF
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10),
]
)
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
npu_config = {
"device_id": "0",
"rank_id": "0",
"rank_size": "0",
"job_id": "10385",
"rank_table_file": "",
}
print("________________________ INIT NPU SESSION ________________________")
sess = NpuHelperForTF(**npu_config).sess()
print("________________________ COMPILE MODEL ___________________________")
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
print("________________________ TRAINING ________________________________")
model.fit(x_train, y_train, epochs=5)
print("________________________ EVALUATE ________________________________")
model.evaluate(x_test, y_test, verbose=2)
print("________________________ CLOSE NPU SESSION _______________________")
sess.close()
and save the file with ctrl+s command.
Now, you can run the command again
make python file=tutorials/tf_qs.py
and now our Model will run on NPU and print out our results, with CLOSE NPU SESSION as a marker that the procces is finished.