Lemma 1. When interaction happens, the next representation point $\psi_{t+1}$ can only be locally approximated by an affine transformation $A_1\psi_t + b_t$, which leads to piecewise nonlinearity.
Proposition 1. In manipulation, actions only take effect if there is an interaction. Otherwise, object movement is passive, which leads to error propagation in the energy function.
| Task | PPO | SAC | SAC+HER | SAC+HINT | CRL | CRTR | IWR (Ours) |
|---|---|---|---|---|---|---|---|
| Air Hockey (Simulation) | 0.617 | 0.145 | 0.398 | 0.422 | 0.695 | 0.727 | 0.742 (+2.1%) |
| Air Hockey (real-transfer) | 0.160 | 0.215 | 0.129 | 0.125 | 0.477 | 0.465 | 0.500 (+4.8%) |
| Air Hockey (Real Robot) | 0/20 | 0/20 | 0/20 | 0/20 | 5/20 | 2/20 | 12/20 (+140.0%) |
| Box2D (center) | 0.086 | 0.058 | 0.088 | 0.088 | 0.278 | 0.274 | 0.288 (+3.6%) |
| Box2D (goal) | 0.089 | 0.046 | 0.086 | 0.064 | 0.450 | 0.558 | 0.709 (+27.1%) |
| Box2D (hard) | 0.060 | 0.042 | 0.064 | 0.076 | 0.317 | 0.365 | 0.565 (+54.8%) |
| Box2D (hard velocity) | 0.148 | 0.149 | 0.152 | 0.139 | 0.387 | 0.377 | 0.436 (+12.7%) |
| Box2D (maze) | 0.033 | 0.012 | 0.031 | 0.035 | 0.217 | 0.206 | 0.223 (+2.8%) |
| Meta-World (peg insert) | 0.000 | 0.000 | 0.000 | 0.000 | 0.430 | 0.367 | 0.438 (+1.9%) |
| Meta-World (pick place) | 0.000 | 0.000 | 0.004 | 0.000 | 0.266 | 0.305 | 0.570 (+86.9%) |
| Meta-World (push) | 0.000 | 0.000 | 0.004 | 0.000 | 0.699 | 0.750 | 0.730 |
| Meta-World (sweep into) | 0.000 | 0.004 | 0.020 | 0.004 | 0.805 | 0.910 | 0.926 (+1.8%) |
| Average IWR improvement | +19.8% | ||||||