Rotary Positional Embedding (RoPE)
Top Half (Sine) vs Bottom Half (Cosine).
Trough
Peak
d_model:
64
Seq Length:
1000
Theta (θ):
10000
Token Position Slice:
Position:
0
Note: High values can impact performance.