Oct 3 09

mSpace2.0 Architecture Documentation

by Daniel Alexander Smith

Over the next couple of weeks we’ll be documenting how the components of mSpace work, in preparation for the open source release of mSpace2.0, a complete rewrite of mSpace that powers sites such as the iPlayer mSpace.

Firstly, a short overview of the architecture, as shown in the diagram below (click for full size):

mSpace Architecture, as of September 2009

mSpace Architecture, as of September 2009

mSpace is made up of the following components:

  1. Semantic Web Import Engine
  2. Basic mSpace subsystem
    1. mSpace server
    2. mSpace front-end
  3. Pivoting subsystem

In summary, there is an “mSpace Maker” service that uses the “Facet Ontology” (more information coming soon) to describe how to gather and query RDF in order to generate an faceted view onto the data. A file of RDF that conforms to the Facet Ontology is submitted to the mSpace Maker, which indexes the source data RDF, and creates an mSpace.

The mSpace Server is a PHP/MySQL server-side component that is queried by the mSpace front-end, a JavaScript/HTML/CSS component that faces the users. These two components are due to be open sourced as soon as possible.

In the coming weeks a number of blog posts are planned which we hope will explain the core concepts of the various mSpace subsystems to help anyone who is interested in utilising the mSpace software.

Oct 3 09

iPlayer mSpace: The power of custom Columns

by jl2

View in Quicktime (.mov)

One of the reasons for the second iteration of the mSpace client was to make it easier to provide custom column visualisations for specific data sets. Whereas in v1.0 we were content with just textual lists, in 2.0 we wanted to be able to offer a richer more tailored UI experience where suitable.

To illustrate the power of custom columns we have recently worked to put the BBC iPlayer metadata into an mSpace. For this dataset we have produced two custom columns, a DatePicker and a graphical Channel selection.

DatePicker ChannelPicker

The inclusion of these more bespoke graphical elements increases utility when browsing but also produces a more compelling experience for the user. The addition of more visual elements reduces the “fear factor” associated with long lists of text and encourages a larger number of users to interact in a more natural way

We have made the mSpace iPlayer available as a beta here:

http://iplayer.mspace.fm

As with all Web2.0 betas, this is a work in progress so if you find any bugs or have any suggestions please feel free to leave a comment!

Oct 1 09

iPhone Mobile Safari & CSS position: fixed

by jl2

When developing a web app for the iPhone it’s conceivable that you may want some UI elements to appear fixed. An example of this might be a header or toolbar.

With modern desktop browsers this type of scenario can easily be solved with the CSS property:

position: fixed

This unfortunately does not work as expected on the iPhone. Instead of staying fixed, as the user scrolls up and down the page so does the element.

Why does this happen?

Technically the CSS property is being obeyed even though the appearance is unexpected. To understand what is happening you need to remember that what you’re seeing on the iPhone is actually a viewport into the full HTML page.

This is explained well by Richard Herrera on his blog @ Doctyper:

Imagine a book in front of you. Take a piece of paper, cut a 320×416 square in it, and lay it over the book. To read the book, move the paper around and position the hole over the words you want to see. This is exactly what Mobile Safari’s viewport is doing. When you flick and scroll, you’re moving the viewport around while the website behind it stays static.

So the CSS property is valid and working its just that because the viewport is moving rather than the page, the element appears to move.

How do you work around this?

The neatest solution I found to the above issue was to use CSS Animations to fake a scrollable DIV with a fixed height. You can find code and working examples over at cubiq.org

Sep 18 09

Agile development in practice

by jl2

A number of people have asked recently what development techniques have been employed in the creation of the mSpace2.0 framework, so we thought it would be best to answer these questions here!

The development team consists of just two members which facilitates a very agile, extreme programming style of development. One of the key decisions early on in the project was to have a very clear distinction as to which code each developer was responsible for.

The mSpace framework is essentially made up of two distinct tasks:

  1. The Server: this does all the heavy lifting and data manipulation
  2. The UI: this presents the information from the server to the user

With this distinction there was a very obvious line that could be drawn, I would be responsible for the UI and Dan for the Server. Whilst both had the technical knowledge to handle code in either department we both took responsibility for our respective parts. This separation allowed the time we spent together to be focused on developing the protocol of how the two sections interconnected instead of obsessing over the inner workings of each part. Essentially we could treat both the server and the UI as a black box facilitating much more efficient and profitable meetings.

Agile Programming

Borrowing ideas from the SCRUM framework, we would have short weekly meetings first thing on a Monday to discuss progress since the last meeting and goals for the next week. These meetings would also be the time to discuss anything related to the interconnection between the server and UI (the API) and any extensions that were required to facilitate the weeks work.

To aid testing of the incremental builds we maintained a number of different data sources so that at anyone time we had multiple examples of the mSpace UI attached to multiple data sets. This was important as different data sets required different specialisation and provided a wider testing scope in terms of features and users.

Whilst simple, the techniques above have proved invaluable for our development! Hopefully this has been helpful to some, if people are interested, we could talk more about the techniques and technologies that aided the development of the server and UI?

A public beta of the mSpace code will be available soon!

Sep 11 09

PAXS Export Plugin Released

by jl2

Our export plugin for EPrints3 is ready for download from the PAXS website

PAXS Export Plugin (r9)

The plugin aims to offer advanced utility to the existing ‘Export All’ feature which is built into the core EPrints3 codebase.

So what exactly does it allow you todo?

Perhaps you are only interested in the metadata of a subset of the results produced by a search? With the ‘Export All’ functionality in EPrints3 you would not be able to choose which results to include in a metadata export, if you got 100 results you would have 100 records in your export, irrespective of how many were actually of interest to you.

The PAXS Multi Export plugin lets you pick from the list of results how many you wish to export. It also interfaces with EPrints to offer all the export formats that you have available with regular ‘Export All’. The ‘Export All’ functionality is still included should you need it, but now you have a more refinable export option!

PAXS Multi Export plugin attaches itself using Javascript to minimise the number of template files you need to modify in your installation. The plugin will degrade gracefully if Javascript is not enabled and will revert back to the standard ‘Export All’ functionality.

important: PAXS Multi Export plugin will work with a default EPrints3 installation out of the box but you may have some issues if you have heavily customised the DOM layout of your results pages. Should you need help installing the plugin on your own repository, please contact us

Aug 21 09

Tuning Sphinx to search unicode effectively

by Daniel Alexander Smith

In this post, i’m going to briefly describe some settings that I’ve found enable results from sphinx searches to most closely match the expectations of users. Specifically, for things like performing a search on “theatre” and getting results that match “théâtre” too.

Firstly, make sure your database is properly encoding utf8 (see previous posts here for some hints).

In the index definition in your sphinx.conf, you may want to have the following lines.

  • enable_star = 1 means wildcards are on.
  • min_prefix_len = 1 means that people can search for “b*”, by default it’s 3 character before a *.
  • morphology = stem_en means that query expansion using English word stemming is enabled, so that “walk” will also match “walking” for example.
  • charset_type = utf-8 tells sphinx that we’re using utf-8, and that some characters will be double-byte because of this.
  • charset_table = ….. is list of mappings that tell sphinx to translate certain characters before indexing, which means that e.g.:
    • Unicode Codepoint U+00C0 (À) should be considered equal to a.

    I got this list from http://yob.id.au/blog/2008/05/08/thinking_sphinx_and_unicode/.

This setup means that we will match “théâtre” when we do a search for “theatre.” Excellent.

	enable_star = 1
	min_prefix_len = 1
	morphology = stem_en
	charset_type = utf-8
	charset_table = 0..9, a..z, _, A..Z->a..z, U+00C0->a, U+00C1->a,
    U+00C2->a, U+00C3->a, U+00C4->a, U+00C5->a, U+00C7->c, U+00C8->e,
    U+00C9->e, U+00CA->e, U+00CB->e, U+00CC->i, U+00CD->i, U+00CE->i,
    U+00CF->i, U+00D1->n, U+00D2->o, U+00D3->o, U+00D4->o, U+00D5->o,
    U+00D6->o, U+00D9->u, U+00DA->u, U+00DB->u, U+00DC->u, U+00DD->y,
    U+00E0->a, U+00E1->a, U+00E2->a, U+00E3->a, U+00E4->a, U+00E5->a,
    U+00E7->c, U+00E8->e, U+00E9->e, U+00EA->e, U+00EB->e, U+00EC->i,
    U+00ED->i, U+00EE->i, U+00EF->i, U+00F1->n, U+00F2->o, U+00F3->o,
    U+00F4->o, U+00F5->o, U+00F6->o, U+00F9->u, U+00FA->u, U+00FB->u,
    U+00FC->u, U+00FD->y, U+00FF->y, U+0100->a, U+0101->a, U+0102->a,
    U+0103->a, U+0104->a, U+0105->a, U+0106->c, U+0107->c, U+0108->c,
    U+0109->c, U+010A->c, U+010B->c, U+010C->c, U+010D->c, U+010E->d,
    U+010F->d, U+0112->e, U+0113->e, U+0114->e, U+0115->e, U+0116->e,
    U+0117->e, U+0118->e, U+0119->e, U+011A->e, U+011B->e, U+011C->g,
    U+011D->g, U+011E->g, U+011F->g, U+0120->g, U+0121->g, U+0122->g,
    U+0123->g, U+0124->h, U+0125->h, U+0128->i, U+0129->i, U+012A->i,
    U+012B->i, U+012C->i, U+012D->i, U+012E->i, U+012F->i, U+0130->i,
    U+0134->j, U+0135->j, U+0136->k, U+0137->k, U+0139->l, U+013A->l,
    U+013B->l, U+013C->l, U+013D->l, U+013E->l, U+0142->l, U+0143->n,
    U+0144->n, U+0145->n, U+0146->n, U+0147->n, U+0148->n, U+014C->o,
    U+014D->o, U+014E->o, U+014F->o, U+0150->o, U+0151->o, U+0154->r,
    U+0155->r, U+0156->r, U+0157->r, U+0158->r, U+0159->r, U+015A->s,
    U+015B->s, U+015C->s, U+015D->s, U+015E->s, U+015F->s, U+0160->s,
    U+0161->s, U+0162->t, U+0163->t, U+0164->t, U+0165->t, U+0168->u,
    U+0169->u, U+016A->u, U+016B->u, U+016C->u, U+016D->u, U+016E->u,
    U+016F->u, U+0170->u, U+0171->u, U+0172->u, U+0173->u, U+0174->w,
    U+0175->w, U+0176->y, U+0177->y, U+0178->y, U+0179->z, U+017A->z,
    U+017B->z, U+017C->z, U+017D->z, U+017E->z, U+01A0->o, U+01A1->o,
    U+01AF->u, U+01B0->u, U+01CD->a, U+01CE->a, U+01CF->i, U+01D0->i,
    U+01D1->o, U+01D2->o, U+01D3->u, U+01D4->u, U+01D5->u, U+01D6->u,
    U+01D7->u, U+01D8->u, U+01D9->u, U+01DA->u, U+01DB->u, U+01DC->u,
    U+01DE->a, U+01DF->a, U+01E0->a, U+01E1->a, U+01E6->g, U+01E7->g,
    U+01E8->k, U+01E9->k, U+01EA->o, U+01EB->o, U+01EC->o, U+01ED->o,
    U+01F0->j, U+01F4->g, U+01F5->g, U+01F8->n, U+01F9->n, U+01FA->a,
    U+01FB->a, U+0200->a, U+0201->a, U+0202->a, U+0203->a, U+0204->e,
    U+0205->e, U+0206->e, U+0207->e, U+0208->i, U+0209->i, U+020A->i,
    U+020B->i, U+020C->o, U+020D->o, U+020E->o, U+020F->o, U+0210->r,
    U+0211->r, U+0212->r, U+0213->r, U+0214->u, U+0215->u, U+0216->u,
    U+0217->u, U+0218->s, U+0219->s, U+021A->t, U+021B->t, U+021E->h,
    U+021F->h, U+0226->a, U+0227->a, U+0228->e, U+0229->e, U+022A->o,
    U+022B->o, U+022C->o, U+022D->o, U+022E->o, U+022F->o, U+0230->o,
    U+0231->o, U+0232->y, U+0233->y, U+1E00->a, U+1E01->a, U+1E02->b,
    U+1E03->b, U+1E04->b, U+1E05->b, U+1E06->b, U+1E07->b, U+1E08->c,
    U+1E09->c, U+1E0A->d, U+1E0B->d, U+1E0C->d, U+1E0D->d, U+1E0E->d,
    U+1E0F->d, U+1E10->d, U+1E11->d, U+1E12->d, U+1E13->d, U+1E14->e,
    U+1E15->e, U+1E16->e, U+1E17->e, U+1E18->e, U+1E19->e, U+1E1A->e,
    U+1E1B->e, U+1E1C->e, U+1E1D->e, U+1E1E->f, U+1E1F->f, U+1E20->g,
    U+1E21->g, U+1E22->h, U+1E23->h, U+1E24->h, U+1E25->h, U+1E26->h,
    U+1E27->h, U+1E28->h, U+1E29->h, U+1E2A->h, U+1E2B->h, U+1E2C->i,
    U+1E2D->i, U+1E2E->i, U+1E2F->i, U+1E30->k, U+1E31->k, U+1E32->k,
    U+1E33->k, U+1E34->k, U+1E35->k, U+1E36->l, U+1E37->l, U+1E38->l,
    U+1E39->l, U+1E3A->l, U+1E3B->l, U+1E3C->l, U+1E3D->l, U+1E3E->m,
    U+1E3F->m, U+1E40->m, U+1E41->m, U+1E42->m, U+1E43->m, U+1E44->n,
    U+1E45->n, U+1E46->n, U+1E47->n, U+1E48->n, U+1E49->n, U+1E4A->n,
    U+1E4B->n, U+1E4C->o, U+1E4D->o, U+1E4E->o, U+1E4F->o, U+1E50->o,
    U+1E51->o, U+1E52->o, U+1E53->o, U+1E54->p, U+1E55->p, U+1E56->p,
    U+1E57->p, U+1E58->r, U+1E59->r, U+1E5A->r, U+1E5B->r, U+1E5C->r,
    U+1E5D->r, U+1E5E->r, U+1E5F->r, U+1E60->s, U+1E61->s, U+1E62->s,
    U+1E63->s, U+1E64->s, U+1E65->s, U+1E66->s, U+1E67->s, U+1E68->s,
    U+1E69->s, U+1E6A->t, U+1E6B->t, U+1E6C->t, U+1E6D->t, U+1E6E->t,
    U+1E6F->t, U+1E70->t, U+1E71->t, U+1E72->u, U+1E73->u, U+1E74->u,
    U+1E75->u, U+1E76->u, U+1E77->u, U+1E78->u, U+1E79->u, U+1E7A->u,
    U+1E7B->u, U+1E7C->v, U+1E7D->v, U+1E7E->v, U+1E7F->v, U+1E80->w,
    U+1E81->w, U+1E82->w, U+1E83->w, U+1E84->w, U+1E85->w, U+1E86->w,
    U+1E87->w, U+1E88->w, U+1E89->w, U+1E8A->x, U+1E8B->x, U+1E8C->x,
    U+1E8D->x, U+1E8E->y, U+1E8F->y, U+1E96->h, U+1E97->t, U+1E98->w,
    U+1E99->y, U+1EA0->a, U+1EA1->a, U+1EA2->a, U+1EA3->a, U+1EA4->a,
    U+1EA5->a, U+1EA6->a, U+1EA7->a, U+1EA8->a, U+1EA9->a, U+1EAA->a,
    U+1EAB->a, U+1EAC->a, U+1EAD->a, U+1EAE->a, U+1EAF->a, U+1EB0->a,
    U+1EB1->a, U+1EB2->a, U+1EB3->a, U+1EB4->a, U+1EB5->a, U+1EB6->a,
    U+1EB7->a, U+1EB8->e, U+1EB9->e, U+1EBA->e, U+1EBB->e, U+1EBC->e,
    U+1EBD->e, U+1EBE->e, U+1EBF->e, U+1EC0->e, U+1EC1->e, U+1EC2->e,
    U+1EC3->e, U+1EC4->e, U+1EC5->e, U+1EC6->e, U+1EC7->e, U+1EC8->i,
    U+1EC9->i, U+1ECA->i, U+1ECB->i, U+1ECC->o, U+1ECD->o, U+1ECE->o,
    U+1ECF->o, U+1ED0->o, U+1ED1->o, U+1ED2->o, U+1ED3->o, U+1ED4->o,
    U+1ED5->o, U+1ED6->o, U+1ED7->o, U+1ED8->o, U+1ED9->o, U+1EDA->o,
    U+1EDB->o, U+1EDC->o, U+1EDD->o, U+1EDE->o, U+1EDF->o, U+1EE0->o,
    U+1EE1->o, U+1EE2->o, U+1EE3->o, U+1EE4->u, U+1EE5->u, U+1EE6->u,
    U+1EE7->u, U+1EE8->u, U+1EE9->u, U+1EEA->u, U+1EEB->u, U+1EEC->u,
    U+1EED->u, U+1EEE->u, U+1EEF->u, U+1EF0->u, U+1EF1->u, U+1EF2->y,
    U+1EF3->y, U+1EF4->y, U+1EF5->y, U+1EF6->y, U+1EF7->y, U+1EF8->y,
    U+1EF9->y

In my source definition I also include the following line:

        sql_query_pre           = SET NAMES 'utf8'

Which should ensure that all characters read from the database into the indexer for sphinx will be utf-8.

Jul 30 09

MySQL Unicode INSERTs

by Daniel Alexander Smith

Note to anybody trying to get UTF-8 data into MySQL properly, it might help you to know that you can prepend strings with _utf8 so that they get INSERTed properly, e.g.:

INSERT INTO `People` VALUES (_utf8'Däniél Smîth');

There was a tiny hint in the code sample at the end of the following page that let me in on the secret:

http://dev.mysql.com/tech-resources/articles/4.1/unicode.html

Jul 30 09

Upgrading PHP and MySQL on RHEL 5.3

by Daniel Alexander Smith

Just a quick note to self that I’ve been using the method from the following site to upgrade PHP to 5.2.x on RHEL 5.3 servers.

http://binit933x.wordpress.com/2009/03/05/how-i-updated-php-mysql-on-redhat-enterprise-linux-rhel-53/

Jul 25 09

SCAMP (Semantic Centralised Aggregated Metadata Platform)

by jl2

Getting rich and usable data from a repository is a problem, the current solution is OAI but this often suffers from the internal data poorly aligning with the Dublin Core XML standard. JISC regularly run ‘Rapid Innovation’ projects which are short term light weight development projects, often focusing around repository data. The issue has always been in how much ‘innovation’ can actually be done in the short six month window due to the start up cost of having to write code to extract useful data from a series of distributed repositories.

A Proposed Solution

We propose a framework that aggregates metadata from all UK EPrints repositories, and provides a programming API in order to access the aggregated set of data in a simple and clean way. We propose to extend the technologies developed under the RichTags project to clean up and perform classification of the aggregated metadata, such as inferring the topic of the paper by the journal or conference to which it is submitted, and make this additional metadata available through the API. Rapid Innovation developers can then concentrate their efforts on how they use the metadata and not how to retrieve it.

The framework aggregates data from repositories and provides an API, an interface for programmers, to develop services that can simply tap into an integrated repository data source/service.

flow diagram detailed no WP

The framework presents a “push” based architecture whereby a repository plug-in (using live XML-RPC updates upon submission) will notify subscribing services that a new submission has been made. At this point, the aggregator performs classification of the metadata, and pushes the update to subscribed applications immediately, so that all applications use live metadata.

Jul 17 09

PAXS Project

by jl2

PAXS (Plugins for Advanced Search & eXport) is a Rapid Innovation project we are doing in association with JISC. We are building two plugins for the Eprints self archiving repository that aide usability.

We have nightly builds available from our project website: paxs.mspace.fm

More details will be coming shortly