Interaction-Weighted Resampling (IWR)

TL;DR

Observation

CRL representation is smooth in locomotion but piecewise nonlinear in manipulation.

Analysis

With a Factorized MDP (FMDP) and Gaussian interpolation:

Locomotion can be approximated by a linear transformation.
Manipulation can only be approximated by an affine transformation.
In manipulation, energy approximation error is propagated after the first interaction.

Method

Interaction-Weighted Resampling (IWR) adds sample coverage near interactions and controls error propagation.

Result

IWR improves manipulation tasks (+19.8%) on Box2D, Meta-World, and Air Hockey (sim-to-real).

Locomotion vs. Manipulation

Goal-Conditioned Reinforcement Learning (GCRL)

Sampled Future Occupancy

$$\rho^\pi(g \mid s,a) = (1-\gamma)\sum_{k=1}^{\infty}\gamma^{k-1}p^\pi(s_{t+k}=g \mid s_t=s, a_t=a)$$

Contrastive Reinforcement Learning (CRL)

Energy Function

$$E^*(s,a,g) = \log \rho^\pi(g \mid s,a) - \log \bar\rho_B(g) = \phi(s,a)^\top \psi(g)$$

InfoNCE Contrastive

$$\max_{\phi,\psi}\ \mathbb{E}\left[\log \frac{\exp(\phi(s,a)^\top \psi(g^+))}{\sum_{j=1}^{N}\exp(\phi(s,a)^\top \psi(g_j^-))}\right]$$

t-SNE visualization of $\phi(s,a)$

Locomotion

Smooth representation landscape

Manipulation

Nonlinear representation landscape
Limited control accuracy

Analysis: Piecewise Nonlinearity

Lemma 1. When interaction happens, the next representation point $\psi_{t+1}$ can only be locally approximated by an affine transformation $A_1\psi_t + b_t$, which leads to piecewise nonlinearity.

Proposition 1. In manipulation, actions only take effect if there is an interaction. Otherwise, object movement is passive, which leads to error propagation in the energy function.

$$\sup |\widehat{E}_k - E_k| \;\propto\; \|A_0^k\|\,\|e\| \;+\; \tfrac{1}{2}\|A_0^k\|^2\|e\|^2$$

Theory illustration of piecewise affine representation dynamics around interactions

Experiments

Overall Results

Task	PPO	SAC	SAC+HER	SAC+HINT	CRL	CRTR	IWR (Ours)
Air Hockey (Simulation)	0.617	0.145	0.398	0.422	0.695	0.727	0.742 (+2.1%)
Air Hockey (real-transfer)	0.160	0.215	0.129	0.125	0.477	0.465	0.500 (+4.8%)
Air Hockey (Real Robot)	0/20	0/20	0/20	0/20	5/20	2/20	12/20 (+140.0%)

Box2D (center)	0.086	0.058	0.088	0.088	0.278	0.274	0.288 (+3.6%)
Box2D (goal)	0.089	0.046	0.086	0.064	0.450	0.558	0.709 (+27.1%)
Box2D (hard)	0.060	0.042	0.064	0.076	0.317	0.365	0.565 (+54.8%)
Box2D (hard velocity)	0.148	0.149	0.152	0.139	0.387	0.377	0.436 (+12.7%)
Box2D (maze)	0.033	0.012	0.031	0.035	0.217	0.206	0.223 (+2.8%)

Meta-World (peg insert)	0.000	0.000	0.000	0.000	0.430	0.367	0.438 (+1.9%)
Meta-World (pick place)	0.000	0.000	0.004	0.000	0.266	0.305	0.570 (+86.9%)
Meta-World (push)	0.000	0.000	0.004	0.000	0.699	0.750	0.730
Meta-World (sweep into)	0.000	0.004	0.020	0.004	0.805	0.910	0.926 (+1.8%)

Average IWR improvement							+19.8%