Currently, we only support Linux x86-64bit architecture. We have not recently tested on Mac OS X Intel-64bit architecture, but it will probably not work. For instance, the last stable version of Debian (or Ubuntu) Linux x86-64bit should be fine.
How many threads do you run in parallel ? Do you have a lot of memory access ? Parallel threads reduce the per-thread memory bandwidth, you can try to make fewer parallel memory access.
How much is your program allocating ? If a lot, our GC is not as performant as INRIA's.
With n CPU-cores, you can expect a speed-up factor of n if the memory accesses do not kill your program performances. Sometimes you can have an even greater speed-up if you manage to reduce thread-context-switches or page faults.
With intensive memory-access, the speed-up given by the numerous cores will be dropped by the high cost of parallel or concurrent memory accesses.