Much less a flat number of iterations and just pulling the time doesn't account for timer rollover or jitter, which can skew the results wildly across runs. It often helps to wait for timer rollover and then track how many runs can be done in a fixed amount of time.
But you're right, it's better to bench than to assume. I should build a full test and then make a proper article out of that.
Also funny you called it "optimised" by using "let', something that should make them all SLOWER given that derpy useless waste of time construct makes separate stack frames.