Перейти к основному содержимому

6. RKLLM

6.1 Introduction to RKLLM

The RKLLM software stack can help users quickly deploy AI models on Rockchip chips. The overall framework is as follows:

rknn-llm

6.1.1 Introduction to RKLLM toolchain

6.1.1.1 Introduction to RKLLM-Toolkit

RKLLM-Toolkit is a development kit that provides users with the ability to quantize and convert large language models on a computer. The Python interface provided by the tool can be used to easily complete the following functions:

  1. Model conversion: Supports conversion of large language models (LLM) in Hugging Face format to RKLLM models. Currently supported models include LLaMA, Qwen/Qwen2, Phi2, etc. The converted RKLLM model can be loaded and used on the Rockchip NPU platform.
  2. Quantization function: supports quantizing floating-point models to fixed-point models. Currently supported quantization types include w4a16 and w8a8.

6.1.1.2 Introduction to RKLLM Runtime Function

RKLLM Runtime is mainly responsible for loading the RKLLM model converted by RKLLM-Toolkit, and implementing the inference of the RKLLM model on the Rockchip NPU by calling the NPU driver on the RK3576/RK3588 board. When inferring the RKLLM model, users can define the inference parameter settings of the RKLLM model, define different text generation methods, and continuously obtain the inference results of the model through pre-defined callback functions.

6.1.2 Introduction to RKLLM Development Process

The overall development steps of RKLLM are mainly divided into two parts: model conversion and board deployment and operation.

  1. Model conversion: In this stage, the large language model in Hugging Face format provided by the user will be converted to RKLLM format for efficient inference on the Rockchip NPU platform. This step includes:
  • a. Get the original model: Get the large language model in Hugging Face format; or the large language model trained by yourself, the structure of the model saved is required to be consistent with the model structure on the Hugging Face platform.
  • b. Model loading: Load the original model through the rkllm.load_huggingface() function.
  • c. Model quantization configuration: Build the RKLLM model through the rkllm.build() function. During the building process, you can choose whether to perform model quantization to improve the performance of the model deployed on the hardware, and choose different optimization levels and quantization types.
  • d. Model export: Export the RKLLM model to a .rkllm format file through the rkllm.export_rkllm() function for subsequent deployment.
  1. Board deployment and operation: This stage covers the actual deployment and operation of the model. It usually includes the following steps:
  • a. Model initialization: Load the RKLLM model to the Rockchip NPU platform, set the corresponding model parameters to define the required text generation method, and define the callback function for receiving real-time inference results in advance to prepare for inference.
  • b. Model inference: Perform inference operations, pass input data to the model and run model inference. Users can continuously obtain inference results through pre-defined callback functions.
  • c. Model release: After completing the inference process, release model resources so that other tasks can continue to use the computing resources of the NPU. These two steps constitute the complete RKLLM development process, ensuring that large language models can be successfully converted, debugged, and ultimately deployed efficiently on the Rockchip NPU.

6.1.3 Applicable hardware platforms

The hardware platforms applicable to this document mainly include: RK3576, RK3588

6.2 Development environment preparation

The released RKLLM toolchain compressed file includes the whl installation package of RKLLM-Toolkit, the relevant files of RKLLM Runtime library and reference sample code. The specific folder structure is as follows:

doc
└──Rockchip_RKLLM_SDK_CN.pdf # RKLLM SDK description document
rkllm-runtime
├──example
│ └── src
│ └── main.cpp
│ └── build-android.sh
│ └── build-linux.sh
│ └── CMakeLists.txt
│ └── Readme.md
├──runtime
│ └── Android
│ └── librkllm_api
│ └──arm64-v8a
│ └── librkllmrt.so # RKLLM Runtime library
│ └──include
│ └── rkllm.h # Runtime header file
│ └── Linux
│ └── librkllm_api
│ └──aarch64
│ └── librkllmrt.so
│ └──include
│ └── rkllm.h
rkllm-toolkit
├──examples
│ └── huggingface
│ └── test.py
├──packages
│ └── md5sum.txt
│ └── rkllm_toolkit-1.0.0-cp38-cp38-linux_x86_64.whl
rknpu-driver
└──rknpu_driver_0.9.6_20240322.tar.bz2

This chapter will introduce the installation of RKLLM-Toolkit and RKLLM Runtime in detail. For specific usage, please refer to the instructions in Chapter 3.

6.2.1 RKLLM-Toolkit Installation

This section mainly describes how to install RKLLM-Toolkit through pip. Users can refer to the following specific process instructions to complete the installation of RKLLM-Toolkit tool chain.

6.2.1.1 Installation through pip

Install miniforge3 tool

To prevent the system from requiring multiple versions of Python environments, it is recommended to use miniforge3 to manage Python environments. Check whether miniforge3 and conda version information are installed. If they are already installed, you can skip this step.

conda -V
# If the prompt conda: command not found indicates that conda is not installed
# Prompt, for example, version conda 23.9.0

Download miniforge3 installation package

wget -c https://mirrors.bfsu.edu.cn/github-release/conda forge/miniforge/LatestRelease/Miniforge3-Linux-x86_64.sh

Install miniforge3

chmod 777 Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh
Create RKLLM-Toolkit Conda environment

Enter Conda base environment

source ~/miniforge3/bin/activate # miniforge3 is the installation directory
# (base) xxx@xxx-pc:~$

Create a Python 3.8 version (recommended version) Conda environment named RKLLM-Toolkit

conda create -n RKLLM-Toolkit python=3.8

Enter the RKLLM-Toolkit Conda environment

conda activate RKLLM-Toolkit
# (RKLLM-Toolkit) xxx@xxx-pc:~$

6.2.1.2 Install RKLLM-Toolkit

Use pip to directly install the provided toolchain whl package in the RKLLM-Toolkit Conda environment. During the installation process, the installation tool will automatically download the related dependency packages required by the RKLLM-Toolkit tool.

pip3 install rkllm_toolkit-1.0.0-cp38-cp38-linux_x86_64.whl

If there is no error when executing the following command, the installation is successful.

python
from rkllm.api import RKLLM

6.2.2 Use of RKLLM Runtime Library

The public RKLLM toolchain file includes all the files containing RKLLM Runtime:

  • lib/librkllmrt.so: RKLLM Runtime library for RK3576/RK3588 board to call for RKLLM model deployment reasoning;
  • include/rkllm_api.h: The header file corresponding to the librkllmrt.so function library, which contains the description of the relevant structure and function definition; When building the deployment reasoning code for the RK3576/RK3588 board through the RKLLM toolchain, it is necessary to pay attention to the linking of the above header files and function libraries to ensure the correctness of the compilation. When the code is actually running on the RK3576/RK3588 board, it is also necessary to ensure that the above function library files are successfully pushed to the board, and complete the declaration of the function library through the following environment variable settings:
ulimit -Sn 50000
export LD_LIBRARY_PATH=./lib
./llm_demo qwen.rkllm

6.2.3 Compilation requirements for RKLLM Runtime

When using RKLLM Runtime, you need to pay attention to the version of the gcc compiler. It is recommended to use the cross-compilation tool gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu; the specific download path is: GCC_10.2 cross-compilation tool download address.

Please note that cross-compilation tools are often backward compatible but not upward compatible, so do not use versions below 10.2.

If you choose to use the Android platform, you need to compile Android executable files. It is recommended to use the Android NDK tool for cross-compilation. The download path is: Android_NDK cross-compilation tool download address. It is recommended to use the r18b version.

For specific compilation methods, you can also refer to example/build_demo.sh in the RKLLM-Toolkit tool chain file.

2.4 Chip kernel update Since the current public firmware kernel driver version does not support the RKLLM tool, the kernel needs to be updated. The rknpu driver package supports two major kernel versions: kernel-5.10 and kernel-6.1. For kernel-5.10, it is recommended to use the specific version number 5.10.198, repo: GitHub - rockchip-linux/kernel at develop-5.10; for kernel-6.1, it is recommended to use the specific version number 6.1.57. The specific version number can be confirmed in the Makefile in the kernel root directory. The update steps are as follows: a. Download the compressed package rknpu_driver_0.9.6_20240322.tar.bz2. b. Unzip the compressed package and overwrite the rknpu driver code in the current kernel code directory. c. Recompile the kernel. d. Burn the newly compiled kernel to the device.

6.3 Usage environment

Development board: ArmSoM-Sige7 / ArmSoM-Sige5

Code repository: rknn-llm

6.4 Sample Purchase

ArmSoM Independent Site: https://www.armsom.org/product-page/sige7

ArmSoM AliExpress Official Store: https://www.aliexpress.com/store/1102800175

ArmSoM Taobao Official Store: https://item.taobao.com/item.htm?id=757023687970

OEM&ODM, please contact: sales@armsom.org