I have recently installed an NVIDIA Tesla P4 GPU into the machine I use CPAI on. CPAI will utilize the GPU for a few minutes after reinstalling CPAI but then will crash and stop using GPU with the following Log Messages Present.
18:58:50:Object Detection (YOLOv5 6.2): [RuntimeError] : Traceback (most recent call last):
File "C:\Program Files\CodeProject\AI\modules\ObjectDetectionYOLOv5-6.2\detect.py", line 140, in do_detection
det = detector(img, size=640)
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\models\common.py", line 669, in forward
with dt[0]:
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\utils\general.py", line 158, in __enter__
self.start = self.time()
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\yolov5\utils\general.py", line 167, in time
torch.cuda.synchronize()
File "C:\Program Files\CodeProject\AI\runtimes\bin\windows\python37\venv\lib\site-packages\torch\cuda\__init__.py", line 566, in synchronize
return torch._C._cuda_synchronize()
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
What I have tried:
I have uninstalled CPAI as well as CUDA and all NVIDIA drivers. I have then Reinstalled CUDA 11.7 and NVIDIA drivers 516.01. cuDNN 8.9.4 is installed as well. After this I will reinstall CPAI and see same results where it uses GPU correctly for a few minutes then stops with the above error log.