Dudemanguy's Musings

Wayland Isn't Going to Save The Linux Desktop

June 10, 2022

As you all know, the Linux desktop is dominated by Xorg. The X11 protocol originated in the 80s which later ended up being adopted by Linux desktop systems. It survived all the way to this day as Xorg, and I'm using it right now to make this post. While it is remarkable that X11 has survived so long, it's certainly not flawless. There's some questionable parts of the core protocol and old legacy baggage that is still carried to this day. Developers, unsurprisingly, wanted a fresh start from scratch free from all the legacy, dated decisions that X11 made. This is what led to the birth of Wayland, way back in 2008.

In recent years, Wayland's push to the Linux desktop space accelerated, and users could actually start using this "new" (not really new but...) protocol as their daily driver on the desktop. I, myself, was one of the early adopters in those days. The promise was a clean, lean protocol designed for modern systems that could handle all the complexities people expect in today's computing. The major GUI toolkits have all implemented Wayland support. There are several big name Wayland compositors. Other major applications like Firefox, Chromium, etc. support it now and have for years. So with all of these pieces in place, why aren't we all living in a glorious Wayland future? Sure, those guys on Nvidia may have some issues, but it should be smooth sailing for everyone else. Right?

The best statistics we have on this come from the February 2022 phoronix article on telemetry data from firefox users. It found that less than 10% of Firefox users on Linux are using Wayland in any form. Those stats also include XWayland, so that isn't an excuse either. Serious work on desktop compositors began several years ago and as far as we can tell, Wayland adoption rates are, to put it bluntly, pathetic. Now at this point, Wayland advocates may come up with several explanations such as "it's not enabled by default" and things along those lines. That can explain some of this, but if Wayland was truly a technically superior solution, adoption would not be as glacially slow.

Now, I'm no stranger to Wayland. Sway was my daily driver for many years. My primary reason for switching was merely curiosity to see what it was like. I found some things lacking, so I went and patched/fixed a few bugs. The experience was definitely worthwhile since I fixed enough mpv Wayland bugs that I eventually gained commit rights (eternally grateful for that). Thus, I became the de-facto mpv Wayland developer. It's fair to say that I am more experienced with Wayland client code than most people and reasonably know my way around the API. In the earlier days, I was more of a Wayland advocate but my attitude eventually soured as time went on.

The problems with Wayland are numerous. That, in and of itself, isn't too big of a deal if the problems were clearly fixable. It's unfortunately not. Some of the issues are technical but reasonably fixable. Other problems are technical but not really fixable due to bad design decisions made years ago. And then there are issues that are simply more ideological in nature. I'll try to explain what I think are the various shortcomings of Wayland and why I don't think the future is very bright for it. My perspective is primarily from the client side of things. Perhaps the server API is way nicer. I've only ever done trivial bugfixes for compositors (i.e. sway/wlroots), but I've done plenty of reasonably complex things as a client API user.

What even is Wayland anyway?

This is the first thing should probably be addressed. Strictly speaking, Wayland is "just a protocol" (this phrase is often used, justified or not, when pointing out some shortcomings). The implementation of this is libwayland which is what everyone uses. It's basically an asynchronous IPC library which defines several things useful to run a mini-display server. You get abstractions for outputs, inputs, and all sorts of stuff with it. This is fine and works well for what it is. The only issue is that the Wayland protocol, by itself, is far too minimal to run a modern desktop. You're going to also need several additional extensions to do this.

That is also fine. But here we get into what I consider the first big blunder of the Wayland project. Nobody sat down and wrote a strong reference implementation that included all the extensions you need for a usable desktop. There is, of course, Weston. It is OK for maybe a kiosk, but for your actual computer it was never (and still isn't) anything beyond a toy to play with. Contrast to X11 which gained a quick, defacto standard (at least on the Linux desktop, proprietary Unix systems are another matter) in the form of XFree86 (which later became Xorg). Anybody can write their own X11 implementation from scratch but why would you bother? Xorg has everything you need already and implements all the extensions you care about. Everyone targets the exact same Xorg server which leads to predictable, consistent behavior.

In contrast, Wayland never gained such a strong reference implementation. What ended up happening was that everyone wrote their own compositors from scratch. This meant people reinvented the wheel multiple times and reimplemented the same thing over and over. For clients, this is actually rather annoying. Because compositors subtly do things differently under the hood, it is entirely possible for your totally valid, completely correct, Wayland code to randomly break on some other compositor. And no joke, I literally just got a bug report of this nature where some compositor apparently no longer properly handles an mpv request. It's just a mess. Fortunately, some guys realized this may be a problem and came up with wlroots which mostly stopped the bleeding. Most people writing their own compositor will probably just use wlroots as a library nowadays which helps stop the fragmentation. This still doesn't change the fact that there are 4 different Wayland implementations you have to support as a Wayland client though: Mutter, Plasma, wlroots, and Weston. You can omit Weston if you really want since nobody really uses it, but it's still an annoying problem that doesn't exist on Xorg. A random Xorg window manager may not support some particular window hint or extension, but that's just simply the window manager developer not adding a certain feature. It's not the same as the exact same code acting differently everywhere for no real reason. I have encountered this multiple times.

Wayland's client API is gimped.

This one is more of a design choice as opposed to an actual technical limitation, but it's another downside of Wayland in comparison to Xorg. In Xorg you can do almost anything. Arguably, it has too many features, but as a user and developer you have the power and choice. Move a window to an exact location? Draw silly decorations on another window? Know when exactly the window is out of view (yes, you really can't do this on Wayland)? No problem. Xorg has nearly everything under the sun you can think of. The documentation of xcb may be pretty bad (probably why most people still use xlib), but it's an extremely sophisticated asynchronous API that basically has no limits.

To Wayland's credit, they took the good parts of xcb and made it easier to use (i.e. wrote actual good documentation), but you'll quickly run into the opinion of the developers. Xorg is essentially "mechanism not policy" in regards to development. It gives you a bunch of tools and then it's up to you do do whatever you want. Wayland is exactly the opposite. It's "policy not mechanism". Wayland gives you tools, but you're expected to use them in a certain way. It steers you and guides you to manage your windows in the "right" way. If you ask developers "why can't I do XYZ", you'll likely get an answer along the lines of "why do you need to do that" or "justify your usecase". Understandably, any piece of software has limits on its scope. I'm not criticizing that, but the limit on the scope of the Wayland ecosystem is way too small. There's a ton of stuff you can do client-side on Xorg that you straight-up can't do on Wayland and probably will never be able to do because of the development philosophy.

For a concrete example, you can, on the client side, choose a specific output to fullscreen to, but you cannot chose a specific output to start up in non-fullscreen mode. A long time ago, I made this request since it's not uncommon for users to want to start an mpv window on a different monitor or something like that. The answers I got back from Wayland developers were not very encouraging and one of them even thought it was a mistake to allow fullscreen to go to a specific output! I ended up abandoning that since I wasn't in the mood to try and write an essay justifying how people should use their computers. Sway offers a way to do this via swaymsg and I'm sure the other compositors have some sort of server-side method, but this is one of those things that should just be a standard interface like it is in Xorg. I'm not sure if the reasoning is supposed to be "security" or something (seems like a bogus argument since you can just run a shell command in the program to do the same thing), but as a whole Wayland developers have some issue with applications placing themselves on a certain location on a screen. PSTD from too many popup ads?

I don't know if I have a development philosophy, but my personal approach is pretty much the opposite. I've added a ton of features for users that I personally don't use nor even really care about. I'm not in the business of policing whatever users do with their system. They should have the tools they need to do what they want.

Wayland's lack of feature parity with Xorg cripples it.

One of the things Wayland developers were quick to say is that they aren't going to implement a feature just because it exists in Xorg. That's fair enough. Nobody really needs to use Xorg's server drawing for anything besides demo applications after all. Just use cairo or whatever. The problem is that they took out many features the users depend on with no real replacement in sight. The goal of Wayland is to completely replace X11. Breaking people's workflows is never a good selling point. Compare Wayland adoption to some other FOSS projects like Systemd or Pipewire. Regardless of your opinion of their quality, both of those offered full compatibility with the legacy, old way while adding lots of new features and functionalities. That was why people rapidly switched to those projects. To be fair, XWayland does exist, and it works well for what it is. But it's not going to save you when you want to do things with native Wayland applications. For a typical user, switching to Wayland breaks their workflow and offers nothing aside from meager gains like theoretically saving a GPU->GPU copy in some cases. That's nice, but let's be honest, no one really cares.

A particularly amusing example is screen recording not working. By design, Wayland doesn't allow clients to see the contents of another client. This is something I mostly regard as "security theater" (I don't install malware), but it's fine to make that the default. Just add some protocol to allow it right? Well in almost 14 years, they never came up with such a thing. There are many, obvious valid reasons why a user may want to record their screen, but somehow Wayland developers never bothered to address it. Hilariously enough, they didn't even fix this. The Pipewire guy did. To be fair, there were some various implementations beforehand (all compositor-specific), but in the end the solution that won was using Pipewire. Of course on the Pipewire side, it doesn't even use Wayland code at all. It communicates with Dbus to an xdg-desktop-portal. There are compositor backends for this, but the standard interface for fixing this problem is Dbus not Wayland.

That's kind of how it goes in the Wayland-world. If you want to do something that doesn't fit the vision of the developers, good luck. You'll probably have to come up with some workaround (likely with Dbus) because the Wayland protocols and its various extensions are too limited to do what you need. As a reflection of this, you can see that every compositor implements their own, custom protocols to do various things. That's great for them, but as a client it's extremely annoying. Do you want to write a Wayland screenshot tool? Well, enjoy writing a different backend for every single compositor because somehow this still isn't standardized in anyway. There is a wayland-protocols upstream, but stuff gets added there at glacial pace. Often protocol proposals are just bikeshed into oblivion with no end in sight.

Another limitation worth addressing specifically is the inability to turn off vsync. Now people use this term differently depending on the context, but to be precise, in Wayland, only mailbox mode is supported. If a client draws with no regard for vsync, what happens is that the compositor will pick one frame and throw away all the other frames (i.e. like mailbox mode in Vulkan). The reason for doing this is to avoid screen tearing. That's one of Wayland's selling points. Frames are perfect with no screen tearing. That's great, but this also has the side effect of increasing latency. I'm not personally a gamer, so it doesn't matter to me, but there are people that need this. Immediate presentation will always have less latency than the composited case (sans some hypothetical method that no one has actually implemented yet). There are people that want to have that super low latency. Their use case is valid and works fine on Xorg, but to Wayland they don't matter. There's a protocol proposal that allows for tearing updates, but it's over a year old with no progress in sight.

Wayland's render loop design is ridiculous.

This is one of those parts where the limited design of Wayland really hits you in the face. You must design your render loop in a certain way. If you don't, prepare to face the pain. How this works on Wayland is that you get a callback from the compositor (the infamous frame callback), which essentially acts as a hint telling you when to draw. That's cool. The idea is that the compositor tells you when to draw, you obey like a good client, and everything works. Unfortunately, life is not actually that simple. Most compositors throttle rendering by not sending frame callbacks when the client is hidden. Internally in Mesa, a swap interval of 1 on EGL and FIFO mode on Vulkan (AKA basically vsync) work by waiting for frame callbacks. Do you see where this is going? If your client tries to draw while it is hidden, woops it gets indefinitely stalled until you bring it back into view. Well surely the client can just check if it is visible before drawing, right? Actually no, you can't. That's right, clients have no way of knowing if they are hidden or not. It's common to come up with some heuristic to deal with this.

This is the part where Wayland developers will say something like "just draw with the frame callback". If you build a client from ground up specifically with Wayland in mind, sure this is easy. But many applications are cross platform and internally driven. Refactoring a render loop to operate completely differently, just for the sake of one platform that acts like a special snowflake is not appealing to anybody. At this point, many applications just give up and set the swap interval to 0 or Vulkan to mailbox and call it a day. In mpv, I came up with a weird hack that essentially implements a blocking call with a timeout since we need vsync obviously but blocking forever isn't acceptable. There's a heuristic that guesses when mpv is hidden (been wrong before) and it doesn't draw in that case, so we have the same idealized efficiency, just not exactly in the "Wayland-approved" way.

In the past, there was actually an externally driven render loop specifically for mpv which worked like how upstream Wayland developers say it should work. It was extremely buggy, lacked features, and totally brittle. Reverting it was definitely one of the better commits I made. And in retrospect, this actually makes perfect sense. mpv has a crapload of timing code specifically designed to deliver the frame at exactly the right time. It's been battle tested and used over many years. Why would the compositor know when to draw better than mpv? Of course, it wouldn't. And indeed, when I squeezed Wayland's weird rendering design into mpv's internal workflow, frame timings dramatically improved (even before presentation was implemented) and tons of bugs were fixed in the process. The dislike of internally driven renderloops is purely driven by ideology, not any actual technical sound reasoning. There's nothing wrong with an application managing how it should render internally. It's a natural choice for any program that operates in a cross-platform manner. Wayland is the only platform like this, and it makes operating in this way needlessly difficult for no particular reason. We can't "waste" frames but never mind that you can just set the swap interval to 0 and belt it away anyway.

Wayland's Mesa implementations are leagues behind Xorg's.

I don't like speculating much about people's thoughts, but it feels like Wayland's frame callback notion was meant to be a centerpiece of how the whole thing operated. You could technically ignore it, but it's a central part of how compositors and Mesa operate, so as a client developer you probably have to deal with it. Instead of being a helpful hint, it's mostly just annoying unfortunately. The frame callback mechanism/workflow has several obvious limitations and one needs to look no further than Mesa itself.

Both the EGL and Vulkan Mesa implementations are, quite frankly, bugged and lacking when compared to their Xorg DRI3 counterparts. In EGL's case, the spec isn't violated, but swap intervals greater than 1 are completely broken. Now in practice, no one probably uses anything other than 0 or 1, but the fact that the implementation is that inflexible speaks volumes. This is simply because of how the frame callback works and indefinitely blocks on the swap buffers call. There's no way to tell how many vsyncs have passed.

The situation on Vulkan is more dire. The indefinite blocking behavior outright violates the Vulkan spec. Giving a timeout in AcquireImage does nothing in practice because the blocking is done in PresentQueue. Only two presentation modes on Wayland actually work: fifo (well this works by breaking the spec) and mailbox. Additionally, radv currently has 4 swapchain images for seemingly no actual reason which causes input lag. amdvlk apparently uses 2 somehow which admittedly doesn't make sense to me. I would expect 3. This last one may get fixed soon since it's simple enough, but it's just an example of how far behind the implementation is.

In contrast, the Xorg code in Mesa is dramatically more sophisticated and mature. Of course, there's no weird opinionated blocking behavior (no one is crazy enough to do that besides Wayland anyway), but also there's way more supported features. Xorg in Mesa makes heavy use of the present extension which is allows it specify when exactly the pixmap can should be submitted as well as operating on fences. It all works via xcb which is appropriately asynchronous and gives Mesa basically full range to do anything. Not surprisingly, Xorg supports swap intervals greater than 1 (admittedly, this might get weird if you submit a huge number) as well as all four Vulkan presentation modes.

The Wayland situation is not all doom. One glaringly obvious improvement would be to simply use the presentation time protocol instead of frame callbacks. This is not as good as Xorg's present extension since it does not allow for a way to schedule at a specific msc nor have any support for fencing, but it would allow you to get rid of the insane blocking behavior and improve other things. I'm not particularly motivated to do it, but in a few weekends, I could definitely improve the current implementations. It's not because I'm some genius, but rather that the current behavior is so bad that anyone with some experience with graphic APIs could fix it. However, the real burning question I have is simply: why is this still half-assed? You don't need to be an expert to tell that Xorg's EGL and Vulkan implementations are vastly superior on a technical level. This isn't just some random program. This is Mesa. We all (minus Nvidia users of course) use it either directly or indirectly every day.

An implementation like this is OK as an initial/first pass. You want to just draw some things on the screen and aren't worried about the advanced details just yet. But clearly, there is a laundry list of things that need to be fixed in the back of your mind. How is it still in this state after all these years? I don't know exactly the date when Wayland support in Mesa landed, but it has to be like 10+ years at this point. This just isn't acceptable. Especially not when you consider that there are paid developers that work on this.

Wayland itself has bad core decisions.

One of the selling points of Wayland is that it's a clean protocol designed for the modern age. There's no legacy baggage to deal with and since the core of it is good, it can easily be extended for whatever new things arrive in the future. Unfortunately, that's simply not true. Wayland's core protocol has made many bad decisions that we are all still dealing with today. Any legacy stuff about X11 can be reasonably excused with "well they came up with it in the 80s". Wayland was designed in 2008, and it's already failed to correctly predict several use cases.

The big and obvious mistake to point out is fractional scaling. For some reason unknown to me, the Wayland protocol only supports integer scale values. To be frank, this is asinine and everyone pays the price for it. As higher resolution displays became common, users naturally wanted to scale the display to fractional values (1.5 and so on). Because telling users "you can't do this" to something as basic as this was a non-starter, all compositors implement a hack with this. They tell clients to scale up to the next integer and then the compositor downscales it to the correct one. So in the case of 1.5x scaling, clients are sent a scale value of 3 so they paint at 3x the resolution. Then, the compositor scales that down by 2. This is just, to be frank, incredibly stupid and wasteful. Clients (such as mpv under heavier settings) unnecessarily tax the GPU and then the end result is worse anyway. With text rendering in particular, it's noticeably more blurry.

So how does Xorg handle this? Well you can just set DPI globally or use DPI values per monitor via RANDR so actually it works just fine (gtk is notably broken on purpose). Clients can calculate exactly what the physical pixels are and can make the correct decisions while rendering. Why on earth wasn't the scale value just a wl_fixed_t to begin with? Who knows. This unfortunate choice greatly impacted the design of all compositors however which all operate in logical pixels. I'm not sure why this is the case, but the developers are extremely opposed to transforming to physical pixels inside the compositor. This strikes me has more ideologically motivated rather than anything technical. Instead of just deprecating the old buffer scale value and adding a new value, the leading proposal is introduce a new fractional scale protocol and use a hack with viewporter. Presumably this one will be adopted sometime in the nearish-future (wayland-protocols is slow as hell though). The fact that it took them almost 14 years to finally get around to fixing this is not encouraging, and there's no guarantee that this approach will not have some kind unforeseen pitfall given how odd it is.

Was it really worth it?

Xorg is by no means perfect, but it has one big thing going in its favor: it works. Yeah multi-monitor stuff is a pain. I'll even concede that multi-monitor VRR is currently broken in Xorg but works in (some) Wayland compositors. Note that moving the mouse amusingly breaks VRR in several Wayland compositors anyway. Developers, some paid and others unpaid, put in years of effort into working on Wayland. Here we are nearly 14 years later into this mess and Wayland is still leagues behind Xorg in several different ways. It's not just a matter of "oh we have a few bugs to fix". The Wayland ecosystem is woefully inferior overall, and there's really no sign of that changing anytime soon. While I'm sure Wayland works great on someone's smart TV, it's still not ready for the Linux desktop. We were told all along that Xorg is so bad and terrible that it needed to be started from scratch but at this point people need to be looking in the mirror and asking questions. If that 14 years of effort was instead focused onto solely improving Xorg, what would the result be? Surely, much more tangible results would have been gained at the end of the day.

One negative point that Xorg does have is that overall developer activity is much lower. For now, that doesn't matter too much. There's some things that would be nice (like the multi-monitor VRR mentioned earlier), but given how far behind Wayland is anyway, Xorg could have no development for another 10 years and still be more functional. That said, who knows what the future holds. Maybe Wayland will manage to get a decent HDR solution that outstrips what's currently available on Xorg (i.e. nothing). There's a protocol in the works for this which has been in development hell at least. In any case, I am fairly convinced at this point that the push for Wayland on the desktop was ultimately a wasted effort. It would have been better to fix whatever was lacking about Xorg. The most likely reality I see is that applications will simply support both X11 and Wayland, well forever. Nobody is going to drop their X11 code because so many users are going to be still using it. Perhaps someone in the future will come up a new, actually good display server that will be the thing to switch to. For now, Xorg isn't going to go anywhere anytime soon no matter what the Wayland advocates may claim.