[AArch64][CostModel] Fix cost for mul <2 x i64>

This was modeled to have a cost of 1, but since we do not have a MUL.2d this is

scalarized into vector inserts/extracts and scalar muls.

Motivating precommitted test is test/Transforms/SLPVectorizer/AArch64/mul.ll,

which we don't want to SLP vectorize.

Test Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll

unfortunately needed changing, but the reason is documented in

LoopVectorize.cpp:6855:

// The cost of executing VF copies of the scalar instruction. This opcode // is unknown. Assume that it is the same as 'mul'.

which I will address next as a follow up of this.

Differential Revision: https://reviews.llvm.org/D92208