The journey of running Arch Linux as an enterprise production server

So… When my last, recent contract work have reached the deployment phase. We found that our client is the only department in their entire enterprise that runs Windows Servers and Windows Servers only! Yet our code only runs on Linux and FreeBSD (and HaikuOS if you try really hard). We’ve never expected to have to deal with Windows. I’m able to get the code running under MSYS2 eventually but the performance is dreadful.

After a month explaining that WSL is a thing and discovering their IT department have 0 Linux experience. We got the green light to deploy on Linux. But what distro to use? They insist to deploy on what ever development environment we are using. Said to “minimize deployment difficulty”. This is how I deployed Arch Linux as a production server in an enterprise, the story, what I have to do and how.

I have modified various details to protect myself from being sued because of NDA.

Achievement Unlock!

More background

Our client is (a department of) a legit two-digit billion USD enterprise. And we (NSYSU’s research lab) are contracted to build a customized high performance search engine with AI tech intergrated. We worked hard and our client is happy with what we have built. Our client’s face suddenly freezes in a routine meeting. Right before delivery, we demonstrated how admins should manage the engine and settings. Features like sane systemd configs, support for BSD port-like builds, micro-service via DBus, etc.. I know we are in deep trouble right after that. They explained that they use Windows Servers exclusively and aren’t expecting such a UNIX hierarchic. Oops anyway. Both party are totally screwed aren’t we (● ∸ ●).

We started poring out code base immediately. We targeted both MSYS2 and HaikuOS. MSYS2 because it also uses pacman as it’s package manager. Giving us some access to the AUR; making dependencies easier to handle. And HaikuOS work as an intermediate because it is a basic UNIX that doesn’t have fancy Linux extensions. Ensuring we won’t got into unknown troubles on MSYS2. Days later our customer phones back saying that they found an spare server that we could deploy Linux. But which distro? They strongly insist on using whatever we are developing on. I.e. Arch

Arch Linux doesn’t have the best reputation as a server OS. Arch is known for being on the bleeding edge – you get the updates like 3 days after upstream released a new version. This is not what you want on a server – Updates may break things while they slip through CI. Yet our client insists due to them having near to zero Linux experience. Dealing with differences between Arch and other distros is too big of a challenge for them. While no amount of polite persuasion dentures them. This frankly this made my job easier – I have already written a working PKGBUILD to handle dependencies on MSYS2. And we could reuse code in PKBUILD to assist packaging on other distro in the case of them switching.

That’s how we ended up deploying Arch Linux in the server room of a enterprise. More problem to solve to. Enterprise servers don’t have access to the Internet – all code and files have to be sent-in using a hard-drive and scanned by security. Getting AUR working without Internet is going to be difficult. And Windows Server admins are expecting things to work very differently.

Getting Arch running

All of the servers lives in a intra-net due to the very strong security requirements. We can’t install software and dependencies the usual way of downloading them from mirrors. All files have to be submitted through email or packages then checked by their security team. Thus the processable way is to download all the required packages, compress them into a archive and literally mail them.

Arch wiki have an article on doing this. But it doesn’t give me the whole picture. It instructs to use pacman -Sp <package_name> to retrieve the package URL.

 pacman -Sp glibc
 file:///var/cache/pacman/pkg/glibc-2.32-4-x86_64.pkg.tar.zst

It doesn’t list the URL for the dependencies nor it generates a list of dependencies! We need pactree to do get the list of dependencies. pactree -l <package_name> gives you the list of dependencies of the package (including the package itself). Then just combine them them the UNIX way.

❯ pactree -l glibc
glibc
linux-api-headers
tzdata
filesystem
iana-etc

And thus we could use pactree in conjuncture with PKGBUILD (in case yo don’t know, PKGBUILD is a bash script) to extract the download links.

# dump_deps.sh
source $1
pkgs=(${depends[@]} ${makedepends[@]})
urls=()
for pkg in ${pkgs[@]}; do
    urls+=($(pacman -Sp $(pactree -l $pkg)))
done

echo ${urls[@]} | sort | uniq -u
❯ dump_dep.sh /path/to/PKGBUILD > deps.lst
❯ wget -l deps.lst

Remember to clear your local cache before running it. Otherwise it generates a URL pointing to your local cache, which wget doesn’t like.

Installing Arch, build and run

Downloading everything is the difficult part. Installing Arch itself isn’t any different comparing to a normal install. I’ve downloaded everything I need into a hard driver. Then point pacman to use the hard drive as a mirror. Then launches makepkg to build and install our engine. systemctl to start it. Bam, done!

Be aware of the CPU if you are using advanced SIMD optimizations. My build optimizes for AMD Zen by default. I forgot to switch it to Skylake for in the first deployment and leads to ~2% performance loss (a lot in the HPC world).

Bonus: AArch64 Linux is not friendly for casual devs

As a bonus feature, our engine supports AArch64 Linux. And that’s a huge mess. Arch Linux ARM is missing some HPC packages and have to be compiled from AUR or git. And the AArch64 ISA covers a huge range of processors. From embedded systems to very high-end server. Each having a different pipeline, issue count, out of order width, etc.. Don’t expect generic aarch64 binaries pushing the CPU to it’s max.

UNIX vs Windows/Enterprise traditions

Seeing how Windows developers act and thinks differently really educated me a lot about the different philosophies. There’s a lot I takes for granted on Linux that simply isn’t the case on Windows. For example, user-local configuration should override global settings, well-defined root directory structure, a powerful command line, etc… and a good init like systemd or OpenRC. Not sure if this is the common case on Windows servers. But in my experience, the Windows devs think:

  • Global settings should overwrite local settings
  • Markdown is not a valid way to document stuff
  • There should be a GUI for configuration
  • Log to a file. systemd’s journal is not good enough
  • An application should only access it’s executable’s directory

I can’t figure out why Markdown isn’t a valid way to document stuff. They complained Markdown have to read using special editors… but markdown is designed to be readable in plan text (ノಠ益ಠ)ノ彡┻━┻.

This block of text is intentionally displayed in **plain text** using [Markdown][1]. Not difficult to read, isn't it?

[1]: https://www.markdownguide.org

I have to communicate very clearly about every detail and con’t assume common sense. Unexpected things always comes up in places you least expect.

The fear of the command line

Should go without saying. But you don’t need the command line to work on Windows. Visual Studio have every thing in their GUI and Windows manages the entire system using a GUI of some sort.  So even senior developers I met are not familiar with the console and have trouble navigating using it.

I have to take my time leaching their admin how to use the command line and it’s conversions. Like -f is an option and --file is the long form of the same option. And UNIX commands generally outputs nothing if everything works. ex: rm file.txt displays only when there’s an error.

System management/knowledge

The way Windows are administrated is surprisingly very different than Linux. Windows Server can be configured to work together like a VMS DECnet cluster (At least files and permissions can be shared across servers). I wonder how they got the idea. They expect account syncing features between their local WSL installation the server. Which I have to explain there’s no such feature in the UNIX world (you can, but not common). And we use SSH public keys for it. Then spending hours educating their IT how public key crypto and chain of trust work.

The notion of chaining system services also seems to be new to them. DBus can launching a systemd services with a matching name. Then the service may make another DBus call to squire information from another service. For example, a diagnostic script may query the engine for statistics, if the engine isn’t running (dead or stoped) systemd may launch it and it may ask in turn launch other services. The situation gets even worse when talking about DBus over TCP.

I feel that Windows really hides system details from it’s users and even admins.


That’s the story and how. I hope you have learned something useful out of my experience. Hail Arch! Which more Arch server exists.

Advertisement

11 thoughts on “The journey of running Arch Linux as an enterprise production server

Add yours

  1. Fun read, thanks for sharing.

    I am curious how ongoing production is going for this Arch instance?

    I recently dealt with a Windows dev shop and I particularly like your observations on Windows vs UNIX traditions. 🙂

    Like

    1. Hi! Thanks for liking it. (Sorry for all the misspells, I just found them when re-reading)

      I don’t know but I think it’s still running. Our contract is on a per-project basis and doesn’t include maintenance (they want technology transfer, that why I’ve spent hours teaching UNIX). And it ended a few months ago. We did ship a few security updates because of vulns in dependencies. And the customer do ask when major CVE pops up for Linux. So I think they are still using it.

      Yeah, Windows traditions feels very alienating for an UNIX/OSX dev. What kinds of Windows traditions do you observe? I haven’t worked with Windows developers since that project. Just curious.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Website Powered by WordPress.com.

Up ↑

%d bloggers like this: