Progress at Compiling Mesos

By | March 11, 2014

I’ve been in the process of trying to get a good compile of Mesos on the Raspberry Pi for about 3 1/2 weeks now. I decided to document the challenges of this, in case it might help anyone else who is trying to do the same thing.

My first approach was to try compiling on a single Raspberry Pi. I used the recommended “make -j”, which uses an exorbitant amount of memory. (It was a big mistake to not look up the “-j” option.) I frequently got the nearly-useless error “g++: internal compiler error: Killed (program cc1plus)” mixed in with what appeared to be continued attempts to compile. After a great deal of searching, I found this was caused by the system running out of memory.

Memory Error

My impression at the time was that simply adding more swap memory wouldn’t be enough to allow the job to proceed to the end, so I decided to try to cross-compile instead. I spent about  a week trying to get Crosstool-NG to build a cross-compile toolchain that would run under Windows. It appears to be unable to do so, in spite  of the fact that it tries so very hard. I managed to get it to run to completion with settings that should have generated a working toolchain, but the files it generated were garbage. I intend to pick up this effort again later, after I get a few other things compiled. If I get it to work, I intend to document the process from start to finish, which apparently no one has done yet. (There are a lot of bits-and-pieces sets of instructions out there, but even the best of them skip steps that are necessary to make it work.)

After giving up on Crosstool-NG, I went back to trying to compile on a single Pi. I expanded the swap memory available to it. After some experimentation, I arrived at the conclusion that a 6 GB swap file was necessary. This seemed to be enough for the compile to run. I allowed it to continue to run for 3 1/2 days like this, but finally decided it was taking too long.

Memory While Compiling Mesos

In hindsight, adding a 512 MB swap file probably would have been enough if I’d used “make -j2″ instead of “make -j”.

This is when I decided to try getting distcc going on a subset of my 40 Pis. Even with the troubles I’ve had with distcc, I probably should have done this sooner. I set it up on 16 Pis (1 master + 15 slaves). Using this distcc cluster, I succeeded in compiling mesos for the first time on 3/6. It can compile mesos-0.16.0 in about 3 hours. At the time, this was a wonderful thing to see:

Mesos 0.16.0 Compile Finished

Before this will even work, the mesos source code has to be edited in several places to remove/replace Intel-specific assembly code. I also had to install gcc-4.7 (it is not necessary to remove gcc-4.6). I will document code changes in greater detail at a later date.

A pure distcc build results in the following errors during “make check”:

Python Remote ReplicaTest.Promise Error

I discovered that it’s possible to get ExamplesTest.PythonFramework to pass by deleting the mesos-0.16.0/build/python directory and re-running “make -j2″ without distcc:

Python Local

It only takes a few hours to finish it again. But it still baffles me that this would make any difference at all. With every instance of distcc running on a Pi (with the same OS, compiler, and configuration), I should get the same result regardless of which machine is compiling which part. At present, my only guess is that distcc is deciding not to pass along my CFLAGS and CXXFLAGS. But the only documented precedents I found for this would not apply in this situation.

I’m currently trying to find the source of the ReplicaTest.Promise test failure. I believe this failure is also related to discrepancies in how distcc compiles code remotely vs a purely local compile. However, I don’t know yet which parts of mesos will need to be compiled locally before it will work, or if compiling it all locally will even be enough.

4 thoughts on “Progress at Compiling Mesos

  1. Thomas

    Hi, I really think compiling would work better when cross compiling on Linux, for example on Ubuntu. That way you have much more memory and speed available. I have done this myself with great success. I understand you use Windows at this time, you can even run Ubuntu as a virtual machine if needed.

    1. David Guill Post author

      I can’t say I have enough experience with cross-compiling to know if you’re correct – you most likely are. But it’s also possible that the root cause of the difficulties I’ve had with distcc would still apply if I were cross-compiling on Ubuntu. Eventually, I’ll try to figure that out; I intend to go back to cross-compiling after I’ve made more progress on getting software installed that I don’t need to do anything special to get working.

      I will say from experience that 16 Pis running distcc can compile a lot faster than one Pi can. I haven’t tried firing all 40 of them up with distcc yet, but it would not be difficult for me to do at this stage and it’s probably what I’ll do next time I need to compile something big.

  2. Benjamin Mahler

    Hey Dave! Glad to see you’re playing around with Mesos. I’ve never seen the replica test failure before, are you compiling and running on machines with different byte-orders?

    Awhile back I wrote a distcc framework for Mesos to make this a bit easier (no need for the static host sets, and you can run it along with other frameworks):
    https://github.com/mesos/mesos-distcc

    But it will require that you install distcc on each host.

Comments are closed.