Developing a Cloud Rendering Platform
Developing a full-scale platform can be a daunting task. What language is best for your needs, and what services should make up this platform? At what point should you justify a release, and what features should you save for future versions? If your platform is a solo project, what should you do to keep maintenance minimal, and how can you allot the required time for feature development? These were the questions I was asking myself when developing the Barista Cloud Rendering platform. Let me cover all questions, and the many answers I came to use from that point forward.
The Inception of the Barista Platform
The whole idea to create a platform came from a need to allow the ability to use cloud-computing for 3D rendering with Blender. While this existed in a small scale when developing my cloud rendering platform, automation and control did not. In addition, there were no solutions to help reduce the costs, where there was lots of opportunities. The idea was to offer a platform that removed the setup, gave users the power they would have when rendering manually, and keep the cost minimal.
The platform required at least two parts, a client, and the cloud rendering instance. While the two would work on their own, there needed to be another part that would handle the environment setup as well as a custom OS to streamline the process. In the end, Barista ended up with four parts, a client, a custom Linux distribution, a server-side agent, and a log server.
A Quick Introduction to Barista
The Barista cloud-rendering platform allows users of Blender 3D to render their projects on AWS, consisting of a local client, a linux distribution, a server-side agent, and its log server. The basic use of the client is to allow the user to interface with AWS to place their project on S3, choose EC2 servers, and choose settings for their project. When the user chooses to initiate the render, the client starts the AWS instances, and sends all of the settings to the agent on the running instances. The agent takes those settings, prepares the rendering environment, and starts the render. The log server returns its results to the client, and the agent does render cleanup and uploads the renders to S3. Once it is all complete, the servers are shutdown by the agent.
Developing a Cloud Rendering Platform and its Frameworks
Before starting any development, it was important to decide the frameworks and languages to use for rapid development. What would allow for easy creation and maintenance? The original plan was to use a dynamic language for the client, but also use a local client so the client software could work independently. Redux had just been released and its compositional style showed many benefits for the developing a cloud rendering platform. React allowed the client to have a snappy reactive frontend, and avoid annoying DOM reloads. Redux forced a declarative composition, while allowing an avoidance of side-effects. Lastly the client front-end consisted of a single bundle served on an Electron base. This allowed users to have a local copy, and avoided the need for a remote server to host an app to each client.
The Agent handles the environment setup, so I could use synchronous processing. The agent also processed tasks in a specific order and returned results, also being easy to maintain. For this reason Ruby was the the language of choice, with a subprocess to handle logs.
The OS was a simple decision, using Debian Linux as a base, and installing all required libraries, GPU drivers, and setup Xorg. This way only the files needed for the platform, were provided by the user. This included Blender add-ons, custom Blender builds, and of course their project.
Developing a Cloud Rendering Platform
Taking precautions to reduce development needs is incredibly important. One obvious but a major benefit is locking libraries to minor releases, so updates wouldn’t include potential breaking changes. The other point was the use of tests with all of the logic, allowing for easy to tests sections. The last point was to try to keep everything as simple as possible. The development of a platform like Barista required every shortcut I could give it.
The first release was decent, but like most apps, looks completely different to the version that came years later. A simplified process is key, so a fluid UX can avoid any confusion. You also cannot rely on them to read documentation, or even the error messages you put in front of them. So making the flow intuitive is by far the most important.
As an example, in the first release of Barista, users had to spin up their own instances from the client. Since it offered both EC2 spot (cheap) and on-demand (expensive) instances. Since the client required a lot of information for spot instances, users would flock to OnDemand instances since it was easier to understand. Removal of OnDemand instances and automation of spot instances made the process much easier than the first release. Taking these options away from the user’s immediate view removed a large roadblock, allowing focus on other areas.
Like any first release, Barista’s had many bugs, but thanks to the declarative style, it rarely impeded the use of the client itself. Often a bug consisted of issues with settings and the agent, or a feature issue with state. With many ways to avoid such issues, another major release focused on making sure everything worked as intended without issue.
Since feature development was definitely the main focus for the first two years, bug fixing consisted mostly of quick fixes, which often fixed the issue, but at times allowed an underlying issue to surface. To combat potential problems, type-checking avoiding issues with type-casting. While the front-end React components already used React’s type-checking system, the redux logic was lacking types. By adding types, I was able to verify that each part of the state object used the same time throughout the entire process, removing any side-effects. In many cases, it put a spotlight on functions that were doing this on purpose. With tests written for every redux action and reducer, bugs have since been non-existent.
Avoiding Support Rabbit Holes
In addition to simplifying the UX, avoiding potentially complex features can help with user support is important in developing a cloud rendering platform. Early on, I was developing the ability to use custom AWS instance image setting so users could create their own OS image. Since this feature could lead into detailed support requests regarding the creation of images, I decided upon removing the feature prior to release. Features that lead into complex support requests, lead to longer support times, and later shorter development periods. Since this was the case, avoiding complex features and trading them for automated solutions was important.
This was another reason for the removal of OnDemand instances. With spot requests being the preferred instance, I found myself spending many hours explaining how these instances worked to users. When making them the default instance, I was no longer having to explain this feature.
Long Term Support
With this platform requiring another product, specifically Blender, long term support has been a challenge. Upon the first 2 years of Barista’s development, it was not entirely apparent to me how much I needed to account for changes of the app itself. Nothing changed for many years, but once Blender hit version 2.8, many changes were flying under the radar. Since then, most issues revolved around changes in the way the render system worked, or how the scripting system was to work. While these were not drastic changes, most included the side-effects of various subsystem rewrites. With this, it was imperative to offer some layer of customization in regards to how the application was started. This way users could start new builds manually, without requiring an update to the Barista client or image.
I like to think I have shown that it’s possible to handle large projects as a solo developer, so I hope this helps anyone else looking to do the same or similar. While it is not a simple process, I hope this shows that various methods can help make sure it’s not a complete nightmare. This is the first time I have mentioned any of my side-projects by name in my technical blog, but you can find many articles related to the development of this particular platform here.