c - How can I make code that concurrently reads and modifies an array well-defined without introducing locking? -
i'm writing program computes endgame tablebase chess variant. algorithm filling tablebase works this:
- start huge array of
unsigned char
, each member representing 1 position (we assume it's white's turn). array member if position lost, odd if won,0xff
if invalid,0xfe
if draw. - iterate on array, marking each illegal position
0xff
, each position white mated0x00
, other positions0x0fe
. - iterate on array, considering positions marked
0xfe
. check if there move leads position array member even, if there one, write 1 plus number of position member corresponding curren position. if moves lead positions indicated odd numbers (i.e. loosing position), mark position 1 plus highest of these numbers indicate how long strongest defence takes. - repeat step 3 until array doesn't change anymore.
for speed, i'd parallelize step three. careful reading yields in each iteration, ever write 1 value (the number of iteration) array. following strategy obtains:
- split array n parts, let 1 thread work on each part. let current iteration i.
- if thread has read member array , equal i, treat if set
0xfe
because means member written concurrently thread.
now there race condition in program doesn't matter right result if there aren't pink elephants (which can't exist if char
written atomically). however, since on paper there race condition, c compiler may declare program undefined , format hard disk.
what can parallelize algorithm without violating constraints of c memory model , without causing massive slowdown (e.g. adding locks)?
simplified problem description
here simplified algorithm demonstrates same concept stripped of unimportant stuff:
- start out array
unsigned char a[n]
. each array member either 0 or 1. - for each array member set 0: if other array members equal 0 or 2, set array member 2. array members checked depend on index of array member want update. there no simple relationship between index , array members need check, it's random.
since ever change 0 2, doesn't matter in order process array entries, though there technically race condition if in parallel (since concurrently read/write same object). how can tell compiler shouldn't care race condition without sacrificing performance?
this _atomic
type qualifier in c11. declare array as
_atomic unsigned char a[n];
which means each element of array can read or written atomically.
prior c11, there's no standard way this, generally, depending on implementation, datatypes atomic reads , writes. know are, you'll have @ documentation implementation using.
note default memory ordering c11 _atomic
accesses memory_order_seq_cst
(sequential consistency), , if don't need that, can use atomic_load_explicit
, atomic_store_explicit
actions weaker memory ordering (ie memory_order_relaxed
in example)
Comments
Post a Comment