I’ve been in the process of trying to get a good compile of Mesos on the Raspberry Pi for about 3 1/2 weeks now. I decided to document the challenges of this, in case it might help anyone else who is trying to do the same thing.
My first approach was to try compiling on a single Raspberry Pi. I used the recommended “make -j”, which uses an exorbitant amount of memory. (It was a big mistake to not look up the “-j” option.) I frequently got the nearly-useless error “g++: internal compiler error: Killed (program cc1plus)” mixed in with what appeared to be continued attempts to compile. After a great deal of searching, I found this was caused by the system running out of memory.
My impression at the time was that simply adding more swap memory wouldn’t be enough to allow the job to proceed to the end, so I decided to try to cross-compile instead. I spent about a week trying to get Crosstool-NG to build a cross-compile toolchain that would run under Windows. It appears to be unable to do so, in spite of the fact that it tries so very hard. I managed to get it to run to completion with settings that should have generated a working toolchain, but the files it generated were garbage. I intend to pick up this effort again later, after I get a few other things compiled. If I get it to work, I intend to document the process from start to finish, which apparently no one has done yet. (There are a lot of bits-and-pieces sets of instructions out there, but even the best of them skip steps that are necessary to make it work.)
After giving up on Crosstool-NG, I went back to trying to compile on a single Pi. I expanded the swap memory available to it. After some experimentation, I arrived at the conclusion that a 6 GB swap file was necessary. This seemed to be enough for the compile to run. I allowed it to continue to run for 3 1/2 days like this, but finally decided it was taking too long.
In hindsight, adding a 512 MB swap file probably would have been enough if I’d used “make -j2″ instead of “make -j”.
This is when I decided to try getting distcc going on a subset of my 40 Pis. Even with the troubles I’ve had with distcc, I probably should have done this sooner. I set it up on 16 Pis (1 master + 15 slaves). Using this distcc cluster, I succeeded in compiling mesos for the first time on 3/6. It can compile mesos-0.16.0 in about 3 hours. At the time, this was a wonderful thing to see:
Before this will even work, the mesos source code has to be edited in several places to remove/replace Intel-specific assembly code. I also had to install gcc-4.7 (it is not necessary to remove gcc-4.6). I will document code changes in greater detail at a later date.
A pure distcc build results in the following errors during “make check”:
I discovered that it’s possible to get ExamplesTest.PythonFramework to pass by deleting the mesos-0.16.0/build/python directory and re-running “make -j2″ without distcc:
It only takes a few hours to finish it again. But it still baffles me that this would make any difference at all. With every instance of distcc running on a Pi (with the same OS, compiler, and configuration), I should get the same result regardless of which machine is compiling which part. At present, my only guess is that distcc is deciding not to pass along my CFLAGS and CXXFLAGS. But the only documented precedents I found for this would not apply in this situation.
I’m currently trying to find the source of the ReplicaTest.Promise test failure. I believe this failure is also related to discrepancies in how distcc compiles code remotely vs a purely local compile. However, I don’t know yet which parts of mesos will need to be compiled locally before it will work, or if compiling it all locally will even be enough.