In this paper, we propose DeepRec, a novel LLM-based recommender that facilitates autonomous multi-turn interactions between LLMs and traditional recommendation models (TRMs) for deep item space exploration. In each interaction turn, LLMs reason over user preferences and collaborate with TRMs to retrieve candidate items. After multi-turn interaction, LLMs rank the aggregated candidates to generate the final recommendations. We utilize reinforcement learning (RL) for optimization and introduce novel contributions in three key aspects: recommendation model based data rollout, recommendation-oriented hierarchical rewards, and a two-stage RL training strategy. For data rollout, we design a preference-aware TRM, with which LLMs interact to construct trajectory data. For reward design, we propose a hierarchical reward function that comprises both process-level and outcome-level rewards to optimize the interaction process and recommendation quality, respectively. For RL training, our two-stage RL strategy first guides LLMs to learn effective interactions with TRMs, followed by recommendation-oriented RL for performance enhancement.
pip install torch==2.5.1
pip install transformers==4.46.3
pip install vllm==0.6.5
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install deepspeed
pip install accelerate
pip install datasets
You can find all the datasets we used in Google Drive. Please download the file and unzip it to the data/
folder.
You can download the model parameters of the preference-aware TRM on both datasets here. Please download the file and unzip it to the server/
folder.
You can find all the run scripts in the scripts/
folder.
bash scripts/recall.sh
# Reward Server
bash scripts/reward.sh 5001 cold
# Training
bash scripts/cold_train.sh
# Reward Server
bash scripts/reward.sh 5002 rec
# Training
# ckpt_dir_of_cold_start is the model checkpoint directory during cold start RL
bash scripts/rec_train.sh ckpt_dir_of_cold_start
# Generation
# start_idx and end_idx are the starting and ending indexes of the test data respectively
# final_ckpt_dir is the model checkpoint directory after two-stage RL
bash scripts/eval_generate.sh gpu_id start_idx end_idx final_ckpt_dir
# Calculation Metrics
# test_dir is the test result directory generated by the model
python evaluation/metric_calc_rec.py --test_results_dir test_dir