Tomasulo's Algorithm: A loop-based example

Loop: LD F0, 0 (R1)

MULTD F4, F0, F2

SD F4, 0 (R1)

ADDI R1, R1, 8

BNE R1, R2, Loop ; Branch if R1 is Not Equal to R2

Assuming that instructions for two successive iterations of the loops get issued before either of the Load operations complete. Assuming that MULTD takes 4 clock cycles to execute, how long would it take to complete these two loop iterations?