Fix irq route entries exceeding KVM_MAX_IRQ_ROUTES
Last month, we experienced several guests crash(6cores-8cores), qemu logs
display the following messages:
qemu-system-x86_64: /build/qemu-2.1.2/kvm-all.c:976:
kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
After analysis and verification, we can confirm it's irq-balance
daemon(in guest) leads to the assertion failure. Start a 8 core guest with
two disks, execute the following scripts will reproduce the BUG quickly:
irq_affinity.sh
========================================================================
vda_irq_num=25
vdb_irq_num=27
while [ 1 ]
do
for irq in {1,2,4,8,10,20,40,80}
do
echo $irq > /proc/irq/$vda_irq_num/smp_affinity
echo $irq > /proc/irq/$vdb_irq_num/smp_affinity
dd if=/dev/vda of=/dev/zero bs=4K count=100 iflag=direct
dd if=/dev/vdb of=/dev/zero bs=4K count=100 iflag=direct
done
done
========================================================================
QEMU setup static irq route entries in kvm_pc_setup_irq_routing(), PIC and
IOAPIC share the first 15 GSI numbers, take up 23 GSI numbers, but take up
38 irq route entries. When change irq smp_affinity in guest, a dynamic route
entry may be setup, the current logic is: if allocate GSI number succeeds,
a new route entry can be added. The available dynamic GSI numbers is
1021(KVM_MAX_IRQ_ROUTES-23), but available irq route entries is only
986(KVM_MAX_IRQ_ROUTES-38), GSI numbers greater than route entries.
irq-balance's behavior will eventually leads to total irq route entries
exceed KVM_MAX_IRQ_ROUTES, ioctl(KVM_SET_GSI_ROUTING) fail and
kvm_irqchip_commit_routes() trigger assertion failure.
This patch fix the BUG.
Signed-off-by: Wenshuang Ma <kevinnma@tencent.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/kvm-all.c b/kvm-all.c
index 53e01d4..e98b08d 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1099,9 +1099,17 @@
uint32_t *word = s->used_gsi_bitmap;
int max_words = ALIGN(s->gsi_count, 32) / 32;
int i, zeroes;
- bool retry = true;
-again:
+ /*
+ * PIC and IOAPIC share the first 16 GSI numbers, thus the available
+ * GSI numbers are more than the number of IRQ route. Allocating a GSI
+ * number can succeed even though a new route entry cannot be added.
+ * When this happens, flush dynamic MSI entries to free IRQ route entries.
+ */
+ if (!s->direct_msi && s->irq_routes->nr == s->gsi_count) {
+ kvm_flush_dynamic_msi_routes(s);
+ }
+
/* Return the lowest unused GSI in the bitmap */
for (i = 0; i < max_words; i++) {
zeroes = ctz32(~word[i]);
@@ -1111,11 +1119,6 @@
return zeroes + i * 32;
}
- if (!s->direct_msi && retry) {
- retry = false;
- kvm_flush_dynamic_msi_routes(s);
- goto again;
- }
return -ENOSPC;
}