Ensuring Perl's Viability on FreeBSD: A NYCBUG-NY.PM Collaboration

Authors

James E Keenan (jkeenan@cpan.org)

Andrew Villano

Location

    New York City BSD User Group / New York Perlmongers
    November 07 2018

Synopsis

How have NYCBUG and New York Perlmongers collaborated to ensure the continued viability of Perl 5 on FreeBSD?

Introduction

Tonight Andrew Villano and I report on the progress of a collaboration between the New York City BSD User Group (NYCBUG) and New York Perlmongers (NY.pm) to ensure the continued viabilty of the Perl 5 programming language and ecosystem on the FreeBSD operating system. We'll consider:

The Perl 5 Core Distribution and Development Process

When we speak of "Perl" tonight we're talking about Perl 5, the version of the language introduced by Larry Wall in 1994 and under continuous development ever since. Perl's development takes place on an annual cycle. Every year a new production version of Perl is released in late spring. The current production version is perl-5.28.0, released in June of this year. The fact that the 28 in the middle of that version is an even number indicates a production version. With the release of 5.28.0 we began a new annual development cycle in which we issue monthly development releases whose middle number is the next higher odd number: 5.29.0 came out in July, 5.29.1 in August, and so forth -- so that we're now up to 5.29.4. On a daily basis changes to the source code are made to a git repository housed at perl.org. Once a month those changes are rolled up into a tarball for the next development release of the Perl 5 core distribution.

When we speak of the "Perl 5 core distribution" we mean the source code required to build the perl executable plus a set of libraries -- usually referred to as modules -- that are shipped with the core distribution because they are essential or highly useful for use of Perl in production environments. If, say, you are a sysadmin who primarily uses Perl to automate system administration tasks, chances are that almost all the functionality you need can be found in the core distribution.

Testing the Perl 5 Core Distribution

The Perl 5 core distribution comes with a test suite that has been built up over a 24-year period. People who have a commit bit to the core distribution are expected to run the test suite on their local machines before pushing commits to the repository. That provides a basic proof that the code changes do what they claim to do and do no harm to the rest of the core distribution -- at least on that committer's platform.

But Perl 5 has been ported to over 100 different platforms over the course of its lifetime. How do we guarantee that changes in the core distribution work on those platforms? The answer is smoke-testing. A network of volunteers maintain machines -- mostly virtual machines, in all likelihood -- which are set up with different operating systems, different versions of those operating systems and a variety of C compilers on those different OS versions. Those volunteers then listen for updates to the core distribution and the test suite for different permutations of OS, OS version, C compiler and perl configuration options. The results are transmitted to a central website, test-smoke.org. The code providing this functionality is largely found in the Test-Smoke library found on CPAN -- a library which some of the best Perl hackers in the world have been working on at our Perl QA Hackathons since 2002.

If you were to turn the clock back two-and-a-half years and go to this website, you would see that the overwhelming majority of smoke-test reports we were receiving were generated on Linux. More specifically, you would see that none had been received from FreeBSD since the current version of the website first appeared in May 2011. We knew from other sources that our annual production releases were passing all their tests on various versions of FreeBSD. But we lacked data as to whether our monthly development releases -- much less individual git commits -- were working or not.

I set out to tackle this problem by learning how to install virtual machines on my Ubuntu Linux laptop. That meant learning VirtualBox, VMWare and Vagrant -- non-trivial tasks -- but eventually I was able to install a FreeBSD-10.3 VM on my laptop, install Test-Smoke and generate my first Perl 5 smoke-test report on FreeBSD. Happily, the grade on my first smoke-test was PASS.

When FreeBSD-11.0 came out later in 2016 I set up a separate VM for that version on my laptop and began smoke-testing the core distribution there. Unfortunately, my first smoke-test report on FreeBSD-11 received a grade of FAIL. The tests that failed dealt with locale-related code which had been added to the core distribution in the five months between the May 2016 production release of perl-5.24 and the date of my first smoke-test on FreeBSD-11 in October 2016. Those same tests were PASSing on Linux and, for the most part, on FreeBSD-10.3 during those five months. But since we weren't getting smoke tests on FreeBSD-11 during that period, we had no idea we were "breaking" Perl on that OS version. It took three months of collaboration between Karl Williamson, Perl's locales expert, and myself to get all our tests to steadily PASS on FreeBSD-11.

Smoke-testing the Perl 5 core distribution over the course of an annual development cycle is therefore crucial for averting bugs in the annual production release.

CPAN: The Perl 5 Ecosystem

The Perl core distribution is, however, a relatively small part of the overall Perl ecosphere. That ecosphere largely consists of the open source modules found on CPAN -- the Comprehensive Perl Archive Network, an archive founded in 1995 and now consisting of 175,000 modules in 39,000 distributions contributed by more than 13,000 authors. Many people consider CPAN to be Perl's true "killer app".

The usefulness of any given CPAN module depends, however, not just on its own functionality but on whether it can be used with different versions of perl and on different operating systems. For nearly two decades the CPANtesters project has provided a way to determine just that. If, for example, I want to see my CPAN module List::Compare works on FreeBSD, I can enter data into a simple web form and get the results.

The Perl 5 Development Process in Relation to CPAN

When I ask, "Can a given CPAN module work with different versions of perl on FreeBSD?", I can refer both to past production releases of perl but to future versions as well -- at least insofar as a "future" version of perl is reflected in the latest monthly development release. This enables us to ask an important quality-assurance question: "Do the changes we've made in the core distribution since our last production release 'break' any CPAN modules?" If they do, then we need to evaluate the benefits of changes to the core distribution against the potential disruption to users of such CPAN modules.

In the Perl 5 core repository, the main development branch is known, for arcane historical reasons, as blead rather than, say, master. So among Perl developers this question is often referred to in shorthand as "Does blead break CPAN?" If so, to what extent? Under which configurations and on which operating systems? Is the cause of the breakage found solely within changes in the core distribution? Or have changes in the core exposed flaws in the code of a "broken" CPAN module? How do we address such breakage?

Over the past five years the Perl community has addressed these questions by formulating a concept called the "CPAN river". CPAN modules depend upon the core distribution, but they also depend on other CPAN modules. Imagine the core distribution as a stream which rises high in the mountains. As it flows down to the sea, other streams -- those are the modules on CPAN -- feed into it and it becomes a mighty river. Eventually the river reaches the sea -- which is all the Perl code in production all over the world.

But now imagine that there's pollution "upstream" or that a dam is built which impedes the flow of the CPAN river. Then all the "downstream" users suffer. What that suggests is that if we want to find out whether blead has broken CPAN, we start by testing modules "high upstream" on the CPAN river against Perl monthly development releases, then proceed downstream from there. It's probably not feasible to test all 39,000 CPAN distributions against blead, but if we test a large sample of them we're likely to get a good picture of the impact which the perl of the future will have on the CPAN of the present.

And that is precisely what we have done at NYCBUG.

The NYCBUG-NY.pm collaboration

From the fall of 2017 into the spring of 2018, I was using my Debian Linux Linode to test 1000 CPAN modules at the top of the CPAN river against Perl 5 monthly development releases. I posted that data on the Perl 5 Porters mailing list and it was used to evaluate our readiness for the perl-5.28.0 production release. However, I came to feel (i) that 1000 was too small a sample; and (ii) running this QA exercise just on Linux was likely to hide problems which could occur on other operating systems. I decided to test monthly development releases against the 3000 CPAN distributions highest upstream on the CPAN river and to do so on FreeBSD. I further recognized that the FreeBSD VMs sitting on my laptop were inadequate for this task and that I needed to collaborate with people who knew more about system management and VM management than I ever would.

So early this year I approached the NYCBUG Admin team about securing server space in the NYCBUG rack at NYInternet. George Rosamond thought this was a good idea and Mark Saad came forward to be the point person for NYCBUG on this collaboration. Mark installed a server whose hostname is perlmonger.nycbug.org large enough to hold a variety of virtual machines that could be configured for a variety of tasks with different versions of FreeBSD, OpenBSD and so forth. I spoke about this project at the March technical meeting of New York Perlmongers (NY.pm). Andrew Villano, an experienced system administrator on Windows and Linux, was in attendance that evening. He was eager to learn BSD sysadmin skills and so he stepped forward to help. We quickly realized that the FreeBSD VMs you get "off the rack" from Vagrant's website were too small to hold the data that would accumulate over the course of an annual development cycle. Andrew figured out how to enlarge a VM; I'm now going to ask him to describe that process.

Setting Up and Enlarging a FreeBSD VM on a FreeBSD Host

Most Vagrant images are purposely kept small for purposes of portability. We're going to want to enlarge our Virtual Machine so that it can accommodate more storage. First, we're going to want to login to the FreeBSD Host via ssh and su to a shared user. We created a shared user as we found that it is difficult to share Vagrantfiles among multiple users. We cd to the directory that we're keeping the Vagrantfile in. At this point we're going to have to stop the Virtual Machine as you cannot do an online resize -- issue vagrant halt to do that. Now we're going to cd to the directory of the Virtualbox VM image, let's assume the path is /home/vmuser/Virtualbox/VMs/mybox_default_12345678. You will now want to confirm the location of the Virtual Machine disk file. We'll do that with this command:

    VBoxManage showvminfo mybox_default_12345678 | grep ".vmdk"

Another issue we'll run into is that .vmdk files cannot be resized, they must be converted to .vdi files first. A VMDK file or VDI file is a file that contains the Virtualbox VM image used by Vagrant. It is analogous to an ISO for a CD. Staying in the same directory (and assuming your .vmdk file is named box-disk1.vmdk) run the command:

    VBoxManage clonehd "box-disk1F<.vmdk>" "clone-disk1.vdi" --format vdi

Let's verify the conversion was successful and the size of the new file with the command:

    VBoxManage showhdinfo "clone-disk1.vdi"

We're finally going to resize the Virtual Machine. Keeping in mind the scale is in MegaBytes the following example will resize the disk to 100 GigaBytes:

    VBoxManage modifyhd "clone-disk.vdi" --resize 102400

Let's confirm the name of the storage controller we're going to be attaching the .vdi file to with the command:

    VBoxManage showvminfo mbox_default_12345678 | grep "Storage"

Assuming the name of the storage controller retreived in the previous step is "SATA Controller", we will attach the new .vdi file to the VM via:

    VBoxManage storageattach mybox_default_12345678 \
        --storagectl "SATA Controller" \
        --port 0 \
        --device 0 \
        --type hdd \
        --medium clone-disk1.vdi

Let's bring the VM back up by cd-ing back to the directory of the Vagrantfile and then issuing the command vagrant up and logging into the VM with the command vagrant ssh.

Logical Volume Management (LVM)

Logical Volume Management (LVM) is a method of utilizing storage that is often more convenient that conventional disk partitioning schemes. LVM is able to use disks or partitions (Physical Volumes) and pool them into Groups (Volume Groups) which can be further divided up into smaller logical groups (Logical Volumes). LVM has the benefit of being able to be modified online and if designed properly, can be fault tolerant.

FreeBSD does not install any Logical Volume Management (LVM) by default. However, should you use an OS that does, you would use the requisite pvcreate, vgextend, lvextend and resize2fs commands.

On FreeBSD we install package firstboot-growfs as we cannot growfs while the filesystems are mounted and we cannot go down to another runlevel from Vagrant as that would terminate our network connectivity. This package automatically performs a growfs on the next boot. growfs will resize the filesystem similar to what resize2fs would do on Linux. It will scan the disk, examine the existing partition layout and if given no options, expand the filesystem to the size of the partition utilizing all available space.

Runlevels

Runlevels are modes of operation that exist in *NIX operating systems. Each runlevel exists for a distinct purpose.

In the directory of the Vagrantfile, run

    vagrant reload --provision

to kick off the growfs process. On the next boot, should everything be successful your primary volume should indicate a large increase in storage via a df -k command.

References:

Testing Perl monthly development releases against CPAN on FreeBSD

Once we had a properly sized FreeBSD VM sitting on the host, we had to prepare it for use in our QA process. If you're trying to install 3000 CPAN modules, chances are that many of them have external dependencies -- mainly C libraries which would have to be installed with the FreeBSD pkg utility. We conducted several dry-runs to learn the scope of packages that we needed to install. We added those packages to the Vagrantfile we used to govern the VM, as well as adding the CPAN modules which we needed to run the test-against-dev program. We installed a crontab entry which runs daily and listens for a new Perl 5 monthly developmental release.

Once a month, generally soon after the 20th, the program downloads a tarball of a monthly development release, installs it, then works through a list of 3000 CPAN modules, trying to install them in dependency order. When we're done, we parse the installation log to write the results of that installation to a JSON file for each module. Then we tabulate the results into a monthly pipe-separated-values file, which is in turn appended to a master PSV file tallying the results of this year's annual developmental cycle. We store the results in our github repository and run certain analytics programs which we report to the Perl 5 Porters mailing list mailing list.

Impacts

From the Perl point-of-view, the purpose of the test-against-dev project is to promote the viability of Perl and CPAN as an ecosphere in which technological applications can be implemented and run in production. The project alerts core language developers to possible adverse impacts of changes in the core distribution on important CPAN libraries. It also enables us to alert key CPAN developers as to where their libraries need to change to adapt to the ongoing development of the Perl 5 language. In particular, by running this QA project on FreeBSD we bring to light places where overly Linux-centric developers need to adapt their code to work on a wide variety of OSes.

In a more subtle way, this project has a benefit for BSD as well. If we use FreeBSD to demonstrate that the Perl/CPAN ecosphere continues to be viable for application development as the core language develops, then we also demonstrate that FreeBSD continues to be a viable platform on which to run applications in what is still one of the most popular, "high-level", dynamic programming languages. In short, we enhance the viability of FreeBSD itself.

And that's the result of a collaboration between NYCBUG and New York Perlmongers.

Thank you very much.