OSL Migration

The Open Source Lab environment is drastically different from our solution with Central Web Services. The largest problem we ran into was bugs that used a managed postgres cluster as oppose to running the server locally. It took a lot of time to understand which permissions were needed, since we normally use superuser access on the current local server.

The OSL also uses WSGI instead of mod_python for Apache. This required us to change our configurations around for the vhosts, but wasn't too big of a deal.

Because of the managed environment, we had to change our file structure as well as change where things are located. These have been solved with symlinks for the most part, but a few paths are still a bit different.

OSL Environment

  • Managed Gentoo
  • Cfengine
  • Managed Postgres cluster
  • WSGI

This guide details the steps taken to get all of Beaversource up and running in the new OSL environment

DB Info

  • Postgres 8.2
  • Host: dogwood.osuosl.org
  • Phppgadmin
  • Four users. One admin user and one for each database.
    • bsc_admin (Admin)
    • bsc_projects (Trac)
    • bsc_project_meta (Django)
    • bsc_elgg (Elgg)
  • Auth info located in /data

File ACL

For finer permission controls, we use file acls. The current ACL in use has been included below.

# file: data
# owner: root
# group: beaversource
user::rwx
group::r-x
group:beaversource:rwx
mask::rwx
other::r-x
default:user::rwx
default:group::r-x
default:group:beaversource:rwx
default:mask::rwx
default:other::r-x

Apache

For our Apache setup, we have two separate vhosts. These are for beaversource.oregonstate.edu and code.oregonstate.edu. The first vhost(beaversource) handles all of the web front-end including Elgg, Trac, and Django. The second vhost(code) is specifically for handling the SVN repositories. The working vhosts are included below.

beaversource.oregonstate.edu vhost:

<VirtualHost *:80>
    ServerAdmin code-admins@lists.oregonstate.edu
    ServerName beaversource.oregonstate.edu
    ServerAlias www.beaversource.oregonstate.edu beaversource.osuosl.org

    RewriteEngine On
    #RewriteCond %{HTTP_HOST} !^beaversource\.oregonstate\.edu$ [NC]
    #RewriteRule ^(.*)$ http://beaversource.oregonstate.edu$1 [R=301,L]

    DocumentRoot /var/www/beaversource.oregonstate.edu
    BrowserMatch .*Googlebot.* gb

    Alias /social /var/www/beaversource.oregonstate.edu
    <Directory /var/www/beaversource.oregonstate.edu>
        Options SymLinksIfOwnerMatch
        AllowOverride All
        Order allow,deny
        deny from env=gb
        allow from all

        RedirectMatch ^/$ /social
    </Directory>

    # Trac Configuration
    WSGIDaemonProcess trac processes=5 threads=20 maximum-requests=1500
    WSGIScriptAlias /projects /var/lib/trac/apache/trac.wsgi
    <Directory /var/lib/trac/apache>
        #This will redirect people to the elgg project listing instead of one created by trac
        RedirectMatch ^/projects/?$ /social/mod/browser/index.php?display=projects

        WSGIProcessGroup trac
        WSGIApplicationGroup %{GLOBAL}
        Order deny,allow
        Allow from all
    </Directory>

    #Django(bettse) based version of the metaproject
    Alias /media /var/lib/django/webmanagement/media
    WSGIDaemonProcess django processes=5 threads=20
    WSGIProcessGroup django
    WSGIScriptAlias /request /var/lib/django/webmanagement/apache/django.wsgi

    <Directory /var/lib/django/webmanagement/media>
        Order deny,allow
        Allow from all
    </Directory>

    <Directory /var/lib/django/webmanagement/apache>
        Order deny,allow
        Allow from all
    </Directory>

    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Cookie}i\"" beaversource
    CustomLog "|/usr/sbin/cronolog /var/log/apache2/beaversource.oregonstate.edu/transfer/%Y%m%d.log" beaversource
    ErrorLog "|/usr/sbin/cronolog /var/log/apache2/beaversource.oregonstate.edu/error/%Y%m%d.log"
</VirtualHost>

code.oregonstate.edu vhost:

<VirtualHost *:443>
    ServerName code.oregonstate.edu
    ServerAdmin code-admins@lists.oregonstate.edu

    SSLEngine on
    BrowserMatch .*Googlebot.* gb
    DocumentRoot /var/www/code.oregonstate.edu/

    <Directory /var/www/code.oregonstate.edu/>
        Options Indexes FollowSymLinks MultiViews
        AllowOverride None
        Order allow,deny
        deny from env=gb
        allow from all
        # This directive allows us to have apache2's default start page
        RedirectMatch ^/$ /svn/
    </Directory>

    <Directory />
        AuthBasicAuthoritative off
        AuthLDAPBindDN "uid=bsc_auth,ou=specials,o=orst.edu"
        AuthLDAPBindPassword <enter pass here>
        AuthLDAPURL "ldap://ldap.onid.orst.edu:389/o=orst.edu?uid?sub?(objectClass=*)" STARTTLS
        AuthzLDAPAuthoritative off
    </Directory>

    <Location /svn/>
        DAV svn
        SVNParentPath /var/lib/svn
        SVNListParentPath on
        AuthType basic
        AuthBasicProvider ldap
        AuthName "ONID login to SVN Repositories"
        AuthUserFile /dev/null
        <LimitExcept GET PROPFIND OPTIONS REPORT>
            Require valid-user
        </LimitExcept>
    </Location>

    Include /etc/apache2/vhosts.d/policies/bs_projects

    CustomLog "|/usr/sbin/cronolog /var/log/apache2/code.oregonstate.edu/transfer/%Y%m%d.log" combined
    ErrorLog "|/usr/sbin/cronolog /var/log/apache2/code.oregonstate.edu/error/%Y%m%d.log"
</VirtualHost>

<VirtualHost *:80>
    ServerName code.oregonstate.edu
    RewriteEngine on
    RewriteRule /(.*) https://code.oregonstate.edu/$1 [R=permanent]
</VirtualHost>

Subversion

We implement hooks in each project repository in order to manage the size limits of commits, enforce permissions, and let Trac interact with commits. With hardened Gentoo, we ran into issues with grsec that wouln't let us execute the chain of scripts for each hook because the absolute path to the interpreter wasn't included in the hook script(thusly relying on 'shebangs' to identify interpreters). In order to solve this problem, we first needed to ensure that the chain of hook scripts were all in a single language (purely for simplicity), and then had to append the path to that interpreter. Since the majority of our scripts were Python (all but one), we rewrote the one script in Python, and appended the path to the Python interpreter in each hook (/usr/bin/python in our case). This was done for all existing repositories with the following script, which was run from /data/svn with root privileges.

#!/bin/bash
for repo in *
do
        if [ -d $repo ]; then
                sed -i -e 's/\/data/\/usr\/bin\/python \/data/' $repo/hooks/start-commit
                sed -i -e 's/\/data/\/usr\/bin\/python \/data/' $repo/hooks/pre-commit
                sed -i -e 's/\/data/\/usr\/bin\/python \/data/' $repo/hooks/post-commit
        fi
done

We also needed to ensure that all new projects have their hooks created properly, so we edited beaversource.py to append the path when hooks were later created.

Elgg

Elgg didn't change much with the migration; the Elgg setup was pretty straightforward.

First thing we needed to do was checkout the latest Elgg tag from the admin projects source repository into the right directory:

$ cd /var/www/beaversource.oregonstate.edu
$ svn co https://code.oregonstate.edu/svn/admin/elgg/tags/20090531/ .

With Elgg checked out, we could then create the settings file. We did this by copying the example file into place.

$ cp config-dist.php config.php

The configuration file Beaversource uses is significantly different than the standard file that ships with Elgg. An example has been included below.

Elgg Configuration:

<?php
// ELGG system configuration parameters.
// You could override default values here, to see all available
// options see lib/config-defaults.php
// Note: some values are override by the values stored in database
// through admin manager

// External URL to the site (eg http://elgg.bogton.edu/)

   $CFG->wwwroot = "http://beaversource.oregonstate.edu/social/"; // **MUST** have a final slash at the end

// Database configuration

    $CFG->dbtype = "postgres7";
    $CFG->dbhost = "dogwood.osuosl.org";

    $CFG->dbuser = "bsc_elgg";
    $CFG->dbpass = "<enter password here>";

    $CFG->dbname = "bsc_elgg";
    $CFG->prefix = "elgg_";

    $CFG->sysadminemail = "code-admin@lists.oregonstate.edu";

// Settings for initial administrator, only used at installation time
    $CFG->newsinitialusername = "news";
    $CFG->newsinitialpassword = "<enter password here>";
    
    $CFG->auth = 'sso';
    $CFG->sso_user_create = true;
    $CFG->disable_passwordchanging = true;

    $CFG->mp_connectionstring = "host=dogwood.osuosl.org dbname=bsc_project_meta user=bsc_project_meta password=<enter password here>";

?>

A few notes on the Beaversource Elgg configuration:

  • We are using Postgres8, but the config still requires that we put 'postgres7'
  • The added config options are for both SSO and connecting to the project meta data(Django). These are VERY IMPORTANT.

Apache Vhost:

Alias /social /var/www/beaversource.oregonstate.edu
<Directory /var/www/beaversource.oregonstate.edu>
    Options SymLinksIfOwnerMatch
    AllowOverride All
    Order allow,deny
    deny from env=gb
    allow from all
    RedirectMatch ^/$ /social
</Directory>

Trac

WSGI Script:

import sys
sys.stdout = sys.stderr

import os
os.environ['TRAC_ENV_PARENT_DIR'] = '/var/lib/trac/sites'
os.environ['PYTHON_EGG_CACHE'] = '/var/lib/trac/egg-cache'

import trac.web.main
application = trac.web.main.dispatch_request

import trac.db.postgres_backend
trac.db.postgres_backend.PostgreSQLConnection.poolable = False

Apache Vhost:

WSGIDaemonProcess trac processes=5 threads=20 maximum-requests=1500
WSGIScriptAlias /projects /var/lib/trac/apache/trac.wsgi
<Directory /var/lib/trac/apache>
    #This will redirect people to the elgg project listing instead of one created by trac
    RedirectMatch ^/projects/?$ /social/mod/browser/index.php?display=projects
    WSGIProcessGroup trac
    WSGIApplicationGroup %{GLOBAL}
    Order deny,allow
    Allow from all
</Directory>

Django

  • Django sites are stored in /var/lib/django/
  • This makes webmanagement stored at /var/lib/django/webmanagement
  • A symlink has been made from /data/django -> /var/lib/django
  • Django 1.0 is installed and managed by the OSL.

With Django installed, we needed to checkout the webmanagement project from subversion.

$ cd /data/django/webmanagement
$ svn co https://code.oregonstate.edu/svn/admin/webmanagement/ .

With the project in place, it was important to put the settings file in place.

$ cp settings.py.dist settings.py

You can edit this file with a text editor. Open settings.py and edit the pertinent information (Email addresses, DB info, file paths). I've included an example settings file that works below.

One of the few differences that using WSGI causes effects the urls file. Since WSGI interprets the url we want to use(/request in this case) as the root mountpoint(/), its important to remove the 'request' from our urlconf. You can see this in the following diff:

URL Diff:

     # Uncomment this for admin:
-	(r'^request/admin/(.*)', admin.site.root),
-    (r'request/', include('webmanagement.project_request.urls')),
+    (r'^admin/', admin.site.root),
+    (r'^', include('webmanagement.project_request.urls')),

SSO is the last component to check that will verify that the webmanagement Django project is working properly. When you checked out webmanagement from svn, you also pulled in the SSO middleware that was written to handle OregonState? SSO auth. This should all work out of the box once you place trac.ini into /data/management. This should be completed when you get the Trac environment up and running.

I've included the sections of the Apache config and the WSGI script we use for Django.

Apache Vhost:

Alias /media /var/lib/django/webmanagement/media
WSGIDaemonProcess django processes=5 threads=20
WSGIProcessGroup django
WSGIScriptAlias /request /var/lib/django/webmanagement/apache/django.wsgi

<Directory /var/lib/django/webmanagement/media>
    Order deny,allow
    Allow from all
</Directory>

<Directory /var/lib/django/webmanagement/apache>
    Order deny,allow
    Allow from all
</Directory>

WSGI Script:

import os, sys
sys.path.append('/var/lib/django')
os.environ['DJANGO_SETTINGS_MODULE'] = 'webmanagement.settings'

import django.core.handlers.wsgi

application = django.core.handlers.wsgi.WSGIHandler()

#from webmanagement.settings import URL_PREFIX

Settings:

# Django settings for webmanagement project.

DEBUG = True
TEMPLATE_DEBUG = DEBUG

ADMINS = (
    ('Code Admins', 'code-admins@lists.oregonstate.edu'),
    )

REQUEST_EMAIL = 'request@beaversource.oregonstate.edu'

MANAGERS = ADMINS

DATABASE_ENGINE = 'postgresql_psycopg2'           # 'postgresql_psycopg2', 'postgresql', 'mysql', 'sqlite3' or 'ado_mssql'.
DATABASE_NAME = 'bsc_project_meta'             # Or path to database file if using sqlite3.
DATABASE_USER = 'bsc_project_meta'             # Not used with sqlite3.
DATABASE_PASSWORD = '<enter pass here>'         # Not used with sqlite3.
DATABASE_HOST = 'dogwood.osuosl.org'             # Set to empty string for localhost. Not used with sqlite3.
DATABASE_PORT = ''             # Set to empty string for default. Not used with sqlite3.

# Local time zone for this installation. Choices can be found here:
# http://www.postgresql.org/docs/8.1/static/datetime-keywords.html#DATETIME-TIMEZONE-SET-TABLE
# although not all variations may be possible on all operating systems.
# If running in a Windows environment this must be set to the same as your
# system time zone.
TIME_ZONE = 'America/Los_Angeles'

# Language code for this installation. All choices can be found here:
# http://www.w3.org/TR/REC-html40/struct/dirlang.html#langcodes
# http://blogs.law.harvard.edu/tech/stories/storyReader$15
LANGUAGE_CODE = 'en-us'

SITE_ID = 1

# If you set this to False, Django will make some optimizations so as not
# to load the internationalization machinery.
USE_I18N = True

# Absolute path to the directory that holds media.
# Example: "/home/media/media.lawrence.com/"
MEDIA_ROOT = ''
# URL that handles the media served from MEDIA_ROOT.
# Example: "http://media.lawrence.com"
MEDIA_URL = ''

# URL prefix for admin media -- CSS, JavaScript and images. Make sure to use a
# trailing slash.
# Examples: "http://foo.com/media/", "/media/".
ADMIN_MEDIA_PREFIX = '/media/'

# Make this unique, and don't share it with anybody.
SECRET_KEY = '2m&0fn1qkswiukfya3mfdas34*1c*5r0-av&g1p+(ct+zfadsfdaslfihw9k!^fqke'

# List of callables that know how to import templates from various sources.
TEMPLATE_LOADERS = (
    'django.template.loaders.filesystem.load_template_source',
    'django.template.loaders.app_directories.load_template_source',
#     'django.template.loaders.eggs.load_template_source',
)

MIDDLEWARE_CLASSES = (
    'django.middleware.common.CommonMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.middleware.doc.XViewMiddleware',
)

ROOT_URLCONF = 'webmanagement.urls'

TEMPLATE_DIRS = (
    # Put strings here, like "/home/html/django_templates" or "C:/www/django/templates".
    # Always use forward slashes, even on Windows.
    # Don't forget to use absolute paths, not relative paths.
    '/data/django/webmanagement/templates'
)

INSTALLED_APPS = (
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
    'webmanagement.project_request',
    'django.contrib.admin'
)

EMAIL_HOST = 'localhost'

AUTHENTICATION_BACKENDS = (
    'webmanagement.SSOBackend.SSOBackend',
#    'django.contrib.auth.backends.ModelBackend',
)

LOGIN_URL = '/request/sso/login'
SESSION_EXPIRE_AT_BROWSER_CLOSE = True
SESSION_COOKIE_AGE = 3600