block: add a CoMutex to synchronous read drivers

The big conversion of bdrv_read/write to coroutines caused the two
homonymous callbacks in BlockDriver to become reentrant.  It goes
like this:

1) bdrv_read is now called in a coroutine, and calls bdrv_read or
bdrv_pread.

2) the nested bdrv_read goes through the fast path in bdrv_rw_co_entry;

3) in the common case when the protocol is file, bdrv_co_do_readv calls
bdrv_co_readv_em (and from here goes to bdrv_co_io_em), which yields
until the AIO operation is complete;

4) if bdrv_read had been called from a bottom half, the main loop
is free to iterate again: a device model or another bottom half
can then come and call bdrv_read again.

This applies to all four of read/write/flush/discard.  It would also
apply to is_allocated, but it is not used from within coroutines:
besides qemu-img.c and qemu-io.c, which operate synchronously, the
only user is the monitor.  Copy-on-read will introduce a use in the
block layer, and will require converting it.

The solution is "simply" to convert all drivers to coroutines!  We
just need to add a CoMutex that is taken around affected operations.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
diff --git a/block/cow.c b/block/cow.c
index 4cf543c..2f426e7 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -42,6 +42,7 @@
 };
 
 typedef struct BDRVCowState {
+    CoMutex lock;
     int64_t cow_sectors_offset;
 } BDRVCowState;
 
@@ -84,6 +85,7 @@
 
     bitmap_size = ((bs->total_sectors + 7) >> 3) + sizeof(cow_header);
     s->cow_sectors_offset = (bitmap_size + 511) & ~511;
+    qemu_co_mutex_init(&s->lock);
     return 0;
  fail:
     return -1;