Using speculative decoding with something like Llama 3.1 70B as the draft model, you'd need another 140GB of memory on top of ...