Skip to content

RUCAIBox/DeepRec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepRec

Overview

In this paper, we propose DeepRec, a novel LLM-based recommender that facilitates autonomous multi-turn interactions between LLMs and traditional recommendation models (TRMs) for deep item space exploration. In each interaction turn, LLMs reason over user preferences and collaborate with TRMs to retrieve candidate items. After multi-turn interaction, LLMs rank the aggregated candidates to generate the final recommendations. We utilize reinforcement learning (RL) for optimization and introduce novel contributions in three key aspects: recommendation model based data rollout, recommendation-oriented hierarchical rewards, and a two-stage RL training strategy. For data rollout, we design a preference-aware TRM, with which LLMs interact to construct trajectory data. For reward design, we propose a hierarchical reward function that comprises both process-level and outcome-level rewards to optimize the interaction process and recommendation quality, respectively. For RL training, our two-stage RL strategy first guides LLMs to learn effective interactions with TRMs, followed by recommendation-oriented RL for performance enhancement.

model

Environment

pip install torch==2.5.1
pip install transformers==4.46.3
pip install vllm==0.6.5
pip install packaging
pip install ninja
pip install flash-attn --no-build-isolation
pip install deepspeed
pip install accelerate
pip install datasets

Datasets

You can find all the datasets we used in Google Drive. Please download the file and unzip it to the data/ folder.

Preference-Aware TRM

You can download the model parameters of the preference-aware TRM on both datasets here. Please download the file and unzip it to the server/ folder.

Quick Start

You can find all the run scripts in the scripts/ folder.

Retrieval Server

bash scripts/recall.sh

Cold Start RL

# Reward Server
bash scripts/reward.sh 5001 cold

# Training
bash scripts/cold_train.sh

Recommendation-Oriented RL

# Reward Server 
bash scripts/reward.sh 5002 rec

# Training   
# ckpt_dir_of_cold_start is the model checkpoint directory during cold start RL
bash scripts/rec_train.sh ckpt_dir_of_cold_start

Evaluation

# Generation
# start_idx and end_idx are the starting and ending indexes of the test data respectively
# final_ckpt_dir is the model checkpoint directory after two-stage RL
bash scripts/eval_generate.sh gpu_id start_idx end_idx final_ckpt_dir

# Calculation Metrics
# test_dir is the test result directory generated by the model
python evaluation/metric_calc_rec.py --test_results_dir test_dir

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published