In attendance were Student keyholders: '(turino14) Associate keyholders: '(amigdal, kenta, price) Members: '() Guests: '() price: I have been refreshing myself on the code and changes made after I left. turino14: I don't think current students know about the code. price: Indeed, everyone who has touched the code is now cruft. turino14: Of current students, amigdal is the one who knows the most about scripts and XVM. amigdal: I know nothing about XVM. I've only heard of pains starting it back up when it all lost power. price: When was that? amigdal: Don't recall, but lookup manhole cover explosion near Tang Hall. price: I remember XVM being on W20. amigdal: I don't know when it was moved. XVM is hosted on several machines in W91. price: It's been several machines since we launched it. [formal start] turino14: Hello everyone. Thanks for coming. If you don't know, XVM is created in 2008 by price and others. We have forgotten since it's been a while. It has a really good use case and there is a lot of potential that MIT could benefit if we can upgrade it from what it was, or just gain some insight for what we can come up with new projects, or upgrade it so we don't get 2 server projects. turino14: I can give the floor to you, Greg, so you can tell us about the history and design. Afterwards, we can have discussion and questions about design. I'm sure many here will have several questions. At the end, we can conclude on what direction we want to move forward with XVM. price: So re the cluedump slides that Evan and I gave in 2008, I hope some of y'all have read that. That describes a couple of aspects of how it works, and we can sift through them as a prompt for what to talk about. turino14 [in chat]: http://invirt.mit.edu/cluedump.pdf price: For virtualization, it uses Xen. There are two different forms of hardware... price: [p.11] Many of the software wants configuration in /etc. Of course we could have made a file for each of those every time a VM is created, and update them. Instead, we took an approach where it's all dynamic. Xen config files [p.10] are actually Python. It turns out [o.11] you can do full-blown Python in these config files. kenta: [] price: [] price: In the normal way you would run something like `xm create myvm` so the config file would correspond with name myvm. But it doesn't have to correspond 1:1. `xm create invirt-database machine_name='inode'` [p.11] kenta: xm has been superseded by xl now. Was it upgraded? price: This is not in this cluedump: We launched XVM with Ubuntu 8.04 (hardy), because it was 2008. And from the git history, I see there was some work a few years later to upgrade to 12.04 (precise). I haven't probed too much, but it doesn't look like the OS has been upgraded since then. So part of resuming maintenance would be good to upgrade to a modern OS release, especially one still supported and getting security updates. price: That would probably require some changes to how this works, one of a number of different changes. I don't knbow anything about xl. Honestly this is probably not something I would go for if I was designing the XVM configuration system. I wouldn't want it to be arbitrary Python code. Nice that someone was able to take advantage of this. So whoever made xl had the same opinion, this design may not work, and we may have to do something a little different. xy: Still using Python 2? price: It's gotta be. Back then, Python 3 was not ready yet, for real work. When Python 3 was preferrable for new work, people were not working on XVM anymore. price: In total, it's not that much code. So the Python 3 upgrade would probably not be a lot of work. price: This is xen. price: [p.15] Another aspect of the design, is `remctl` is a pretty neat piece of software. If you are in an environment where kerberos is available, which you all are, it's like if you want to run a command in some server, but SSH is ferociously hard to control access. remctl is much more focused and transparent to see the configuration. This is an example that might be opaque now [p.19]. The SIPB office has a stereo system. And there was this system that you could play music on with a shared queue, people queuing from their own machines/laptops. For the volume, someone built a cool system where from your laptop, you could run the command. amigdal (in chat): zsr still exists, but it's not running gutenbach, nor does it have remctl set up at the moment. price: [p.19] The config says that anyone can get/set volume. XVM uses a similar system [p.23]. The XVM website itself is running on a VM, which is probably why it's such a pain to boot it all up, as the dependency graph has a lot of cycles. The way the website tells the hosts any of these things is with remctl. [p.27] There is this server in between, which knows what is running and what host to talk to. Command [p.28]. Arguments get passed [p.29]. kenta: The verb `create` in the XVM world does not mean create the disk image, it means boot the machine up. price: Yes, it corresponds to `boot` normally. price: One of the way we made things dynamic was code for xen. But most stuff does not accept code as config files. Idk if you have used FUSE before, which we chose, in particular pyfuse. So whenever the respective systems try to get the files from the system, the Python code looks at the database, reads it, and spits out the lines as the contents of the file. [p.49] price: We gave a name for the software (invirt) with the idea that it can be used at non-MIT installs. price: The way the console proxy is setup, it wants to actually have a user as the thing you console into. So we make one of these virtual users per VM at the console proxy server (xvm-console.mit.edu) which is also a VM running in one of the various hosts. We need the users exist, so we do it again in this dynamic way. Turns out the way system configures how users exist is extremely flexible. So we gave it the PostgreSQL plugin (pgsql) [/etc/nsswitch.conf, p.68]. And we told the plugin what to do. If you want to know the password and name for some user, here is the SQL statement that you should run. So [p.69] the `getent` command is a way to do what most software would do instead of grepping /etc/passwd. kenta: In that command, it wasn't clear to me the first time I took a look, but remus is not the name of an Athena user, but the name of a guest machine/ price: Yes [p.63], remus is the host and the user is broder. This was his personal VM. Evan Broder co-wrote the code with me. kenta: You seem to be fans of PostgreSQL. Was it because you liked it, or other specific fit with requirements? price: I'm sure MySQL would've worked. Postgres is still what I would recommend for a web service today. It's got a good query planner (which doesn't matter that much since not a lot of data), it has concepts of different data types, it has cleaner semantics. On some other services, you'd have been sad if you'd chosen MySQL. price: [p.73] There is a mistake, it seems it meant invirt.master.yaml. It is one YAML file with all of the host machines. All the configuration comes from that. In practice, the real important thing it does is what allows us to have a dev cluster and a production cluster, allowing them to have different hostnames and IPs. It wouldn't have worked to hardcode. Early on, we did hardcode. It was really as we were preparing for launch. price: We had an ambition of this being something that could be used at MIT, so we wanted to make it more configurable [p.84, what's next]. We never finished with either, because it was not the priority. price: [p.74] configuration in Python. [p.75] for other stuff a templating lagnuage called Mako, works fine, probably don't need to change it. Not on these slides, there's a few little CLI scripts in the tree that basically just do the equivalent of printing the hostname [p.74]. price: I think this covers the stuff that is in the slides. There is other aspects not in the slides, but first, questions? kenta: You said something is used in 10 of our packages in one of the slides. Did you create *.deb packages to help you install stuff on the host systems? price: Yes. This is one of the other things I wanted to talk about. Are there any other questions, though? price: http://xvm.mit.edu/gitweb has our repos. There's a lot of repos. I wouldn't recommend it, but it was done this way when people migrated from subversion. It was in fashion to have lots of tiny repos. We used git submodules, so repos that had other repos. price: Some commands git clone -b precise-prod git://xvm.mit.edu/invirt/packages.git git clone -b precise-prod git://xvm.mit.edu/invirt/third.git git clone -b master git://xvm.mit.edu/invirt/scripts.git for d in packages scripts third; do git -C "$d" submodule update --init done git clone git://xvm.mit.edu/invirt/doc/xvm.git doc rgabriel: Oddly enough I already have many of these cloned. Are these commands documented? price: No, but it would be good to put this on a guide. price: We have deb files, installed with apt. This style worked great for us at the time, in part because of parallel work with Debathena among SIPB members of the time. So if you look at one of these packages [xvm-prodconfig], you can see their metadata. In principle, the rules file controls the meaning of mostly everything else, but in practice it delegates to the makefile. The `.install` file says what things are installed and where. When installed, you may see .postinst in lots of packages. price: in chat http://web.archive.org/web/20130527194802/http://debathena.mit.edu/trac/wiki/Cluedump information about Debathena price: [in invirt-base/debian/rules] dh_install -> DEB helper. price: I'm still mystified at how it is that scripts/... It must work somehow. price: back in 2008, sysvinit (aka /etc/init.d/ shell scripts) was the thing; nowadays you will need to migrate to systemd (systemctl etc commands) kenta: using deb packages is yet an alternative to ansible or puppet if you are familiar with those. price: xvm-meta is the source package for a bunch of parent binary packages which help install a host machine from scratch. `cat xvm-meta/debian/control` to see the list of binary packages amigdal: [asks about postgres being a vm] price: yes, the postgres server runs on one of the system servers. but a few special servers have their config not in the DB to avoid insane dependencies. see e.g., xvm-prodconfig/sysvms/s_master amigdal: storage server, especially about dom0 price: dom0 I think lives on local disk. there is a SAN for the user VMs and the 3 system VMs, hooked with fibre-channel. amigdal: I heard of a certain packet you need to send to the SAN to get it work. but I could be misrembering. amigdal/price: the san is called iSCSI in the doc, and it is just a ping, documented at the end of doc text file. amigdal: XVM exclusively allocates memory, disk, but not CPU time. price: a continuous delivery service: in invirt-dev , the "Invertibuilder". how to change a invirt package and deliver it to systems. (better than building packages locally and scp the deb package to one or more various machines, which is what we originally did). it uses git branches on the source package source code to tell the build daemon to build a package and deploy it by publishing to the apt repository (xvm.mit.edu/invirt). amigdal: for scripts, we use AFS (and web.mit.edu) as the package repository price: some of the ACL stuff is master.yaml amigdal: how are secrets shared? price: mostly kerberos, e.g., .k5login. for communicating between machines, remctl (which uses kerberos). kenta: there is a server keytab kenta: paraVMs were created with xen-create-image , poorly or not maintained anymore. this will need research into other containerization systems nowadays. price: among other things to do, upgrades to OS, which hasn't been getting updates. amigdal: I recall also seeing ksplice updates on a different server... price: I worked for ksplice. Soon after Oracle acquired ksplice, it made it so you could only purchases live patches for Oracle Linux. But if you had a subscription before that, Oracle has still continued to publish live patches for it, so long as the original OS vendor was publishing kernel updates [which is not relevant to XVM, since is ubuntu is no longer publishing kernel updates for its Ubuntu]. amigdal: website says "ksplice legacy" ended 2023. price: moving forward the key thing is to start up a dev cluster. see packages/xvm-devconfig . A bunch of IP addresses will need to changed to no longer use the 18.* IP addresses that MIT sold. price: I'm not certain whether dev was completely separate hardware or it shared some resources. They were VMs (but not XVM VMs), so a way to do it without underutilizing a bunch of physical hardware. price: for deb development, for dev cluster, skip the source package build and just modify config files directly. Then, use `debsums -ac` to detect files that edited, and incorporate your changes into the source package. price: I don't recall what we did for storage for the dev cluster. The ISCSI IP addresses are different from prod. I think Linux software can provide an ISCSI server. price: git repo access was done through the xvm-dev moira group. looking at parents , xvm-root, xvm-root-users . Put the SIPB vice chair on it. price: we had XVM architect and XVM project leader(?); those were intended as structures so that current students are in charge. that sort of worked for a few generations then petered out. kenta: 2 cents. XVM (and SIPB) should advertise to users for new maintainers, because they have skin in the game. turino14: is there a mailing list of users? price: probably no moira mailing list, but there is a database. owner of machines are athena lockers and somehow get the email contact for lockers. you could compile a list of all of those. turino14: one of the things driving migration is Certificates deprecation. price: invirt-web package has the webserver. subdirectory code/ has the webapp. Apache. see /etc/apache2/ for certificates are configured. AuthTypeSSLCert(?), then mod_auth_sslcert. (Incidentally, there is also "use your kerberos tickets with your web browser".) browsing libapaache2-mod-auth-sslcert package, some C code. obviously if you writing this today, use Rust not C; if anders were writing this today, that's how it would be written. price: the webapp was written using cherrypy; not sure if that is maintained these days. price: security discussion. monitor syslog. price and turino14: nowadays, attackers drop a crypto miner. kenta: continue using xen, or a hypervisor not xen, both are options. price: migrating users is a thing you should consider. paravm users probably cannot easily be migrated without continuing using xen. kenta and price: users not keeping their guest system updated/upgraded is known problem for both scripts and xvm. vaguely similar to how XVM itself still continues to run today despite no OS updates for 8 and a half years. turino14: you started 4, then added more hosts. do you remember how streamlined the process was? price: doc/xvm-host-setup-notes and some prodconfig, see git log for modifications of those files. I remember there is some LVM clustering options so that different clusters accessing the same SAN don't stomp on each other. kenta, passing along notes from minutes: how well known was XVM among general non-SIPB MIT affiliates? price: you could look at the database kenta: in my memory, there were a good number of non-SIPB, because I remember conversatons of "need more RAM, need more disk". price: XVM was not originally SIPB project. It was ISDA, someone there in charge there, (D stands for development) inside IS&T "there really ought to be a cloud (perhaps not the term at the time) at MIT". He really got SIPB. He could have budgeted / hired MIT staff to run a private cloud, probably millions of dollars, but instead (motivated by scripts, debathena, athena which SIPB had done a lot/everything) asked SIPB. it was also a way to take risks in contrast to IS&T staff. price: he bought the hardware, about $30K. price: of course XVM benefitted SIPB for misc sipb projects; but also for the MIT community as a whole. price: addressing what someone added to the minutes below: "todo libvirt". I don't know much about libvirt. it is a abstraction layer with backends for different hypervisors, including xen. seems like something useful or to play with. question and answer section of the meeting: libvirt (addressed above) - How would you recommend managing such a project, where you likely want to consult alumni (who are busy) with regards to changes you'd want to make? Where you need to contact the alumni to politely "transfer" the project to current students? Would you recommend someone take point, be the 'project leader' as per the documents in AFS, etc.? price: I would frame it differently from "consult alumni". Always what is key to what makes SIPB sucessful is that it is a student organization (though some alumni need reminding of this fact). the workflow should not be that alumni are consulted on every step. you are at least entitled to get the vice chair on the root acl, then that person distributes it to other current students. a slightly less blunt option is ask to add some current people to the dev acl. price: why the root acl (vs dev acl) sooner rather later is to be able to browse the database to see what resources are being used. suggest saying: "i would like root in order to browse the database rather than planning to make changes now." - Related question: how do you determine when you trust someone to give them root access? To lead the project? price: the project leader and architect structure didn't exactly succeed. the hope was it is self-sustaining structure to pass it on, but it didn't. often someone stayed on as an "expert" which explicitly forbidden in the documents. price: scripts was originally jbarnold, then one other person (someone in his dorm) for sql.mit.edu , and that worked will for a while, it was clear who the BDFL (to borrow from Python organization terminology) was, essentially. price: xvm was different, it was brought to SIPB by IS&T. I joined XVM after it already existed, and stepped into a leadership role when I became SIPB Chair. It ended up being me and Evan Broder who wrote most of the code, and so we were clearly the people who were in charge. One thing that is important... these are two fairly different pattern in terms of who became in charge. There were people who felt responsibility, because it was kind-of theirs, and I think that requires a small number of people whose project it is, however it comes about. And that's because if something goes wrong, there'd be somebody who goes, drops everything else, and gets it fixed (this happened several times for Scripts and XVM). I would suggest, I wouldn't right out of the gate try to name a leader, but I would let that emerge. Whoever's interested, and gets involved, it will likely be somewhat evident who the people who have been doing the work are, and you want to make sure they feel empowered to be in charge, etc. Probably, at some point, when that becomes clear who's doing the work, that's when you can identify that this is the person who's in charge, who is the person who has the power to make decisions (for when you can't get consensus). kenta: If I may add my 2 cents here as cruft, in my history/recollection of SIPB, there have been very few instances of people in SIPB being given bits and abusing them in a bad way. There have been a few instances of people using them in an incompetent way, but almost no instances of actually malicious usage. I've seen a trend towards the opposite, where projects would end up dying because maintainers were too conservative in granting bits, and would recommend that current students be a little bit less conservative w.r.t. giving bits to people. at worst, people will do nothing, at best they will keep a project alive. turino14: How would you get current students to learn enough about the codebase / become competent enough to help out? price: It looked like you were all doing the key things. I noticed the chair saw that there were three people who were new, asked them what their names were, was welcoming, and was engaging with them. Try to pitch new members on how great SIPB is, how it's valuable, education, etc. That's sort-of step 1, getting people to be engaged and show up. Intermingled with that would be trying to find a project they would be interested in, either find something fresh and separate, or try to convince them an existing service would be a good match for them. One step in this chain is those instructions for how to start deveolping, that we were noticing in this call don't exist for XVM (series of commands to clone all the repos, e.g.). [Have an onboarding procedure, essentially.] This is how you push a change to the git repo, and tell invirtibuilder to build it and deploy it. You definitely want to write it down and ensure it exists. Compile some of the resources we've been discussing in this call, what is a Debian package, what should I look at and understand, etc. So that you can get them over the things that they need to figure out, to make it easy for them to do something that feels satisfying, to make a change that is then live, and such. kenta: From the technical side of whom to trust, we have a whole bunch of things for "trust but verify" like continuous integration, and such. Git commit history. Et cetera. You have a whole bunch of infrastructure, technical solutions to social problems, available. price: There's a bit of difference there, where for pure software it's easy to do that. When it's running a service that has data belonging to users, it's not always the case that someone messes up you can fix it. I think the bar is a little bit higher there; you want the people you give root access to --- I totally agree with you that it's a classic mistake to not allow anybody to touch it and it dies --- but you want to make sure they'll be reasonably conscientious, that if they're not sure what something will do, they won't do try in prod. turino14: I agree something that makes sense is to give them bits to the development cluster, maybe code review for the cluster, etc. price: You give them development bits, they deploy there, somebody else deploys it to prod. turino14: Documentation is something _very_ important to make this easy, especially for projects that end up not having current students maintaining them. I think documentation makes life easier to onboard new members; I think that's a priority. price: One useful tactic for that documentation: you're all going to be figuring stuff out, when you figure something out is a great time to write it down, put it into some documentation.