Mesos on Raspbian: My Current Status

Compiling Mesos on Raspbian (ARM) is still difficult. But it can be done, even if it doesn’t quite work as intended yet. I can tell there are other people who have been working on it for quite a while. I was able to get versions 0.15.0 and 0.16.0 to barely run in February of 2014; I wouldn’t claim it was stable, but it ran.

Eventually, I expect Apache Mesos: Getting Started‘s instructions will work on Raspian. Soon after, .deb packages for Raspbian will probably also become available. Until then, I’m hoping that the following guide might help those who are trying to get it to work.

Prerequisites

To start with, you’ll need a Raspberry Pi (or a Raspbian-compatible platform, such as a Banana Pi) that has Raspbian installed. At minimum, you’ll need either a 16GB SD card or an 8GB card and some additional external storage. Use raspi-config to expand your image to fill the SD card; the fresh image doesn’t allow enough space to finish this process.

Even better is to have a Pi cluster (multiple Raspberry Pis or compatibles) with a working installation of distcc or the ability to cross-compile on something a lot faster. However, neither of these things are absolutely necessary to compile Mesos.

Update Your System

Run the following:

sudo apt-get update; sudo apt-get upgrade

It’ll take a while, potentially a long time if your Internet connection isn’t very fast. Be prepared to busy yourself elsewhere for a while.

Set Up the Environment

The official guide lists openjdk-6-jdk as a prerequisites for installing Mesos. Until I learn otherwise, I plan to assume the openjdk-7-jdk (the default for Raspbian) will also work fine.

You’ll need to update /etc/environment. (Simply updating the environment in your current terminal session may not give the desired result.) Type the following to take you into a simple editor with /etc/environment loaded:

sudo nano /etc/environment

Your /etc/environment file will likely be empty if this is a fresh installation. Add the following lines:

export JAVA_HOME="/usr/lib/jvm/jdk-7-oracle-armhf"
export JAVA_LDFLAGS="-L/usr/lib/jvm/jdk-7-oracle-armhf/jre/lib/arm/server \
 -R/usr/lib/jvm/jdk-7-oracle-armhf/jre/lib/arm/server -Wl,-ljvm"
export JAVA_CPPFLAGS="-I/usr/lib/jvm/jdk-7-oracle-armhf/include \
 -I/usr/lib/jvm/jdk-7-oracle-armhf/include/linux"
export CFLAGS="-O2 -pipe -march=armv6j -mtune=arm1176jzf-s -mfpu=vfp \
 -mfloat-abi=hard"
export CXXFLAGS="-O2 -pipe -march=armv6j -mtune=arm1176jzf-s -mfpu=vfp \
 -mfloat-abi=hard"

Use Ctrl-O to write the file and Ctrl-X to exit.

To be sure the new environment settings take effect, log out and back in.

Alternately

In the configure step (further down), you can use the following alternative to the above lines related to Java:

./configure --disable-java

But this will disable Mesos Java support, which I always felt would be rather bad for Mesos, given how much we all depend on Java nowadays.

Install Prerequisite Packages

Do the following:

sudo apt-get install build-essential libcurl4-nss-dev libsasl2-dev maven libapr1-dev libsvn-dev

All of these packages will be needed before Mesos will successfully compile. Unfortunately, they have a fair number of prerequisites, so nearly 200 packages will be installed, totaling ~250MB.

Install gcc-4.7

You’ll need gcc-4.7 because otherwise you’ll get these errors:

./.libs/libmesos.so: undefined reference to `__sync_add_and_fetch_8'
./.libs/libmesos.so: undefined reference to `__sync_fetch_and_add_8'
collect2: ld returned 1 exit status
Makefile:2387: recipe for target 'mesos-log' failed
make[2]: *** [mesos-log] Error 1
make[2]: *** Waiting for unfinished jobs....
./.libs/libmesos.so: undefined reference to `__sync_add_and_fetch_8'
./.libs/libmesos.so: undefined reference to `__sync_fetch_and_add_8'
collect2: ld returned 1 exit status
Makefile:2381: recipe for target 'mesos-local' failed

First, use apt-get to install packages for gcc-4.7:

sudo apt-get install gcc-4.7 g++-4.7 cpp-4.7

You’ll need to update alternatives for gcc, etc. First, you’ll want to remove any existing alternatives:

sudo update-alternatives --remove-all gcc
sudo update-alternatives --remove-all g++

On a fresh installation of Raspbian, it’s most likely that no alternatives were set. So don’t worry if you get an errors.

Use the following commands to set up alternatives for gcc and g++ 4.6 and 4.7:

sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.6 10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.7 20

sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.6 10
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.7 20

sudo update-alternatives --install /usr/bin/cc cc /usr/bin/gcc 30
sudo update-alternatives --set cc /usr/bin/gcc

sudo update-alternatives --install /usr/bin/c++ c++ /usr/bin/g++ 30
sudo update-alternatives --set c++ /usr/bin/g++

Run the following two commands individually and check to be sure the options are set so cpp-4.7 and gcc-4.7:

sudo update-alternatives --config gcc
sudo update-alternatives --config g++

Now the system is set up to compile Mesos.

Download Mesos Source Code

Before downloading, check the official guide to see what version it recommends. This guide is written to work for version 0.20.0, but may work for a newer version.

As the guide says, do the following, in the path of your choice:

wget http://www.apache.org/dist/mesos/0.20.0/mesos-0.20.0.tar.gz
tar -zxf mesos-0.20.0.tar.gz

This will give you the vanilla source code for Mesos. You’ll need to modify it a bit before you’re ready to compile.

Modify Mesos Source Code for ARM Compatibility

Generally, Zookeeper is decompressed into /3rdparty by script before it’s compiled. But if it’s already there in an uncompressed state, the build scripts will assume they’ve already decompressed it in a previous attempt to compile Mesos and the uncompressed copy will be used.

So, do the following:

cd mesos-0.20.0/3rdparty; tar -zxf zookeeper-3.4.5.tar.gz

Open  for editing:

nano zookeeper-3.4.5/src/c/src/mt_adaptor.c

Go to line 483 or search for the phrase “fetch_and_add” to find the right position in code.

Replace:

int32_t fetch_and_add(volatile int32_t* operand, int incr)
{
#ifndef WIN32
    int32_t result;
    asm __volatile__(
        "lock xaddl %0,%1\n"
        : "=r"(result), "=m"(*(int *)operand)
        : "0"(incr)
        : "memory");
    return result;
#else

With this:

nt32_t fetch_and_add(volatile int32_t* operand, int incr){
#ifndef WIN32
    return __sync_fetch_and_add(operand,incr);
#else

This fix is an adaptation of a similar fix I found posted elsewhere for another program that was being ported to ARM. I don’t recall now where I originally found it and finding the original source would probably be impossible now. I wish I could remember where I found it, because I’d prefer to give credit.

This code change should be enough that Mesos 0.20.0 will compile on ARM. However, it may break it on other architectures.

Make a Big Swap File

Compiling Mesos will require more physical memory than is available on your Pi. I don’t know how much exactly. I recommend allowing at least 1GB of swap per compile thread (selected later with the make -jN option). More is better. I generally give it 6GB per thread, just to be on the safe side.

In my opinion, the best option is to put the swap file on an external hard drive. But if you have a large SD card (16GB or larger), you should have enough space for the compile and swap files. Just be mindful of the fact that using the SD card for swap memory will use up its write cycles quickly.

General advice related to making a swap file can be found here.

Configure and Make

Running the following should configure the source directory to be compiled:

./configure

To compile, use the following command:

make

Alternately, to use 2 compile threads (which you want to do if your ARM platform has 2 cores), you can use this command instead:

make -j2

The number following the -j switch will determine the maximum number of threads for the compile. If you’re using distcc with a Pi cluster, it could potentially make sense to use this command instead:

make -j40

I don’t recommend using the -j switch without a number. This will cause the compile to create a number of threads only bounded by the limitations of how many files can be compiled at once without violating prerequisite requirements of each part of the job. For a large compile like Mesos, this could potentially be over a hundred threads. Unless you have a very large cluster, just don’t do it. Even if you’re cross-compiling on a system with many cores and a huge amount of RAM, there isn’t really any benefit to letting it spawn more than 2 threads per core.

With a distcc cluster of 16 Raspberry Pis, I’ve been able to compile Mesos in about 4-6 hours. On a Banana Pi, it will finish using the -j2 switch in about a day. I’ve never finished compiling Mesos on a single Raspberry Pi, but the longest I ever let it run was roughly 3 days. However, by the above instructions, I believe it should finish on a single RPi in 3-4 days if it’s configured well for compile-time performance.

distcc Builds Only

The Mesos source code has the peculiar quirk that a distcc build and a local build won’t be entirely equivalent. As far as I know, this is only a problem for the Python egg. The following is what I posted to Twitter about it after I figured it out:

distcc-Mesos Python Egg Problem

As my screenshots imply, ExamplesTest.PythonFramework will only pass if the Python egg is compiled locally.

To do a distcc build and still get a working Python egg, my procedure has been to compile Mesos completely using distcc, delete the Python portion, and run make again. make will recompile the Python parts (a fairly small part of the build) and leave everything else alone. If you know this, it’s not a terrible inconvenience. If you don’t, it’s something you could bang your head against for days without making any progress.

I don’t know if this issue is a problem with every version of distcc and Mesos, but it was a problem for versions 0.15.0 and 0.16.0 of Mesos when I was working with them in February and March of 2014. If you’re unsure, you can test to see if this is still an issue by running make check before rebuilding the Python parts locally. I never had a problem rebuilding the Python parts and getting a pass on a new run of make check after an initial failed run.

Make Check

Once Mesos has finished compiling, run:

make check

This will compile and run the current test suite that’s used to verify that the compiled version of Mesos works correctly. Additionally, I believe it’s supposed to compile a number of examples.

Depending on which board it’s running on, it will go for roughly 4 to 6 hours to the point where it will always crash. It may sometimes need to be restarted, but it should eventually pass two groups of tests with 146 and 114 tests before failing on the test DRFAllocatorTest.DRFAllocatorProcess, which is the first in a set of 398:

[==========] Running 398 tests from 66 test cases.
[----------] Global test environment set-up.
[----------] 3 tests from DRFAllocatorTest
[ RUN      ] DRFAllocatorTest.DRFAllocatorProcess
F1105 11:22:46.579359  5797 leveldb.cpp:160] Check failed: leveldb::BytewiseComparator()->Compare(one, two) < 0
*** Check failure stack trace: ***
    @ 0xb6416f30  google::LogMessage::Fail()
    @ 0xb6418d18  google::LogMessage::SendToLog()
    @ 0xb6416b54  google::LogMessage::Flush()
    @ 0xb64194a8  google::LogMessageFatal::~LogMessageFatal()
    @ 0xb62f0924  mesos::internal::log::LevelDBStorage::restore()
    @ 0xb6360574  mesos::internal::log::ReplicaProcess::restore()
    @ 0xb6360d88  mesos::internal::log::ReplicaProcess::ReplicaProcess()
    @ 0xb6360f38  mesos::internal::log::Replica::Replica()
    @ 0xb62f2284  mesos::internal::log::LogProcess::LogProcess()
    @ 0xb62f2510  mesos::internal::log::Log::Log()
    @   0x3a6a48  mesos::internal::tests::Cluster::Masters::start()
    @   0x3a6eac  mesos::internal::tests::Cluster::Masters::start()
    @   0x3a0c4c  mesos::internal::tests::MesosTest::StartMaster()
    @   0x12e70c  DRFAllocatorTest_DRFAllocatorProcess_Test::TestBody()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @   0x51b3bc  testing::internal::HandleExceptionsInMethodIfSupported<>()
Makefile:5576: recipe for target 'check-local' failed
make[3]: *** [check-local] Aborted

This is as far as I’ve ever been able to get with make check on anything newer than Mesos 0.15.0. (Version 0.15.0 seems to get a little further, but I don’t think it had the DRFAllocatorTest.DRFAllocatorProcess test) I’ve tried looking over the related code and the main problem I’ve had in trying to find a solution to this problem is that the output it gives appears to be correct. However, I’m not sure how this code was actually intended to work; it’s likely this code’s author had some other output in mind that just isn’t obvious.

I’ve tried to figure out how to skip individual tests manually, but didn’t turn up anything useful. If I could skip tests, I could go further in the test suite to figure out if there are any other failures. As it stands now, I have no idea if there are any other failures beyond DRFAllocatorTest.DRFAllocatorProcess. It’s possible there aren’t any more. It’s also possible a failure later in the suite would give a much better indication of why DRFAllocatorTest.DRFAllocatorProcess fails. For now, that’s purely speculative.

Try Running It

When I compile Mesos 0.20.0 by the above instructions, create the directory ~/temp,  and run it, the following is what I get as output:

~/mesos-0.20.0/bin $ ./mesos-master.sh --work_dir="~/temp"
I1105 22:23:15.330361  6072 main.cpp:155] Build: 2014-11-04 17:08:44 by bananapi
I1105 22:23:15.331166  6072 main.cpp:157] Version: 0.20.0
F1105 22:23:15.334403  6072 leveldb.cpp:160] Check failed: leveldb::BytewiseComparator()->Compare(one, two) < 0
*** Check failure stack trace: ***
    @ 0xb63f2f30  google::LogMessage::Fail()
    @ 0xb63f4d18  google::LogMessage::SendToLog()
    @ 0xb63f2b54  google::LogMessage::Flush()
    @ 0xb63f54a8  google::LogMessageFatal::~LogMessageFatal()
    @ 0xb62cc924  mesos::internal::log::LevelDBStorage::restore()
    @ 0xb633c574  mesos::internal::log::ReplicaProcess::restore()
    @ 0xb633cd88  mesos::internal::log::ReplicaProcess::ReplicaProcess()
    @ 0xb633cf38  mesos::internal::log::Replica::Replica()
    @ 0xb62ce284  mesos::internal::log::LogProcess::LogProcess()
    @ 0xb62ce510  mesos::internal::log::Log::Log()
    @    0x40804  main
    @ 0xb516c82c  (unknown)
Aborted

This is a known issue with Mesos on ARM. I’m sure it will eventually be fixed. I had hoped to be the one to find the solution (as early as 8 months ago), but I suspect the one who figures it out will have a bit better understanding of how LevelDB is supposed to work.

Versions 0.15.0 and 0.16.0 would actually run if modified a little more before it was compiled. However, the make check tests still didn’t all run, which I assume means either version would eventually crash. Getting any version working properly will be a consequence of getting the bundled version of LevelDB (and possibly other components) to work correctly on ARM.

Once it Works

You can also install it using the following command in the same directory you compiled it in:

make install

But I don’t recommend installing it until it can at least finish running the entire make check test suite. Even if it doesn’t pass every test, it should at least be able to finish running them.

You’ll probably want to review the official Mesos configuration guide too.

Leave a Reply

Your email address will not be published. Required fields are marked *