OUseful.Info, the blog…

Trying to find useful things to do with emerging technologies in open education

Exploring ONCourse Data a Little More…

leave a comment »

Following on from my preliminary tinkering with the University of Lincoln course data API, I had a bit more of a play last night…

In particular, I started looking at the programme level to see how we might be able to represent the modules contained within it. Modules are associated with individual programmes, a core attribute specifying whether a module is “core” (presumably that means required).

oncourse programme core module

I thought it might be interesting to try to pull out a “core module” profile for each programme. Looking at the data available for each module, there were a couple of things that immediately jumped out at me as being “profileable” (sp?!) – assessment data (assessment method, weighting, whether it was group work or not):

oncourse assessment

and contact times (that is, hours of contact and contact type):

oncourse contact

I had some code kicking around for doing treemap views so I did a couple of quick demos [code]. First up, a view over the assessment strategy for core modules in a particular programme:

oncourse 0 assessment treemap

(I really need to make explicit the programme…)

Blocks are coloured according to level and sized according to product of points value for the course and assessment type weighting.

We can also zoom in:

oncourse assesssment treemap zoom

I took the decision to use colour to represent level, but it would also be possible to pull the level out into the hierarchy used to configure the tree (for example having Level 1, Level 2, Level 3 groupings at the top?) I initially used the weighting to set the block size, but then tweaked it to show the product of the weighting and the number of credit points for the module, so for example a 60 point module with exam weighting 50% would have size 60*0.5 = 30, whereas a 15 point module with exam weighting 80% would have size 15*0.8 = 12.

Note that there are other ways we could present these hierarchies. For example, another view might structure the tree as: programme – level – module – assessment type. Different arrangements can tell different stories, be differently meaningful to different audiences, and be useable in different ways… Part of the art comes from picking the view that is most appropriate for addressing a particular problem, question or intention.

Here’s an example of view over the contact hours associated with core modules in the same programme:

oncourse contact hours treemap

(Note that I didn’t make use of the group work attribute which should probably also be added in to the mix?).

Looking at different programmes, we can spot different sorts of profile. Note that there is a lot wrong with these visualisations, but I think they do act as a starting point for helping us think about what sorts of view we might be able to start pulling out of the data now it’s available. For example, how are programmes balanced in terms of assessment or contact over their core modules? One thing developing the above charts got me thinking about was how to step up a level to allow comparison of core module assessment and contact profiles across programmes leading to a particular qualification, or across schools? (I suspect a treemap is not the answer!)

It’s also worth noting that different sorts of view might be appropriate for different sorts of “customer”: potential student choosing a programme, student on a programme, programme manager, programme quality committee, and so on.

And it’s also worth noting that different visualisation types might give a more informative view over the same data structure. On my to do list is to have a play with viewing the data used above in some sort of circular enclosure diagram (or whatever that chart type is called!) for example.

Having had a play, a couple more observations came to mind about the API. Firstly, it could be useful to annotate modules with a numerical (integer) attribute relating to a standardised educational level, such as the FHEQ levels. (At the moment, modules are given level descriptors along the lines of “Level 3″, relating to a third year course, akin to FHEQ level 6). Secondly, relating to assessment, it might be useful (down the line) to know how the grade achieved in a module at a particular level contributes to the final grade achieved at the end of the programme.

Playing with the data, I also found a little bug: in resources of the form https://n2.online.lincoln.ac.uk/module_links/id/273, the nuclues_url is incorrectly given as https://n2.online.lincoln.ac.uk/module_link/id/273 (i.e. the incorrect path element module_link which should be module_links).

Written by Tony Hirst

January 19, 2013 at 2:21 pm

Posted in Project

Tagged with

Getting ONCourse With Course Data

with one comment

In a post a couple of weeks ago – COuRsE Data – I highlighted several example questions that might usefully be asked of course data, thinks like “which modules are associated with any particular qualification?” or “which modules deliver which qualification level learning outcomes?”.

As the University of Lincoln ONCourse project comes to an end [disclosure: I am contracted on the project to do some evaluation], I thought it might be interesting to explore the API that’s been produced to see just what sorts of question we might be able to ask, out of the can.

Note that I don’t have privileged access to the database (though I could possibly request read access, or maybe even a copy of the database, or at least its schema), but that’s not how I tend to work. I play with things that are in principle publicly available, ideally things via openly published URLs and without authentication (no VPN); using URL parameter keys is about as locked down as I can usually cope with;-)

So what’s available? A quick skim of the Objects returned via the API turns up some potentially interesting sounding ones in a course data context, such as: Accrediting Body, Assessment, Contact Type and Time, Course Code, Delivery Mode, Learning Outcome, Module Code, Programme Outcome.

So let’s just pick one and see how far we can get… We can start at the bottom maybe, which is presumably Module.

A call to the API on /modules returns listings of the form:

{
  id: 97,
  nucleus_url: "https://n2.online.lincoln.ac.uk/modules/id/97",
  module_code: {
    id: 28,
    nucleus_url: "https://n2.online.lincoln.ac.uk/module_codes/id/28",
    code: "FRS2002M"
  },
  title: "Advanced Crime Scene Investigation and Analysis 2012-13",
  level: {
    id: 2,
    nucleus_url: "https://n2.online.lincoln.ac.uk/levels/id/2",
    description: "Level 2"
  }
}

Looking at the record for an actual module gives us a wealth of data, including: the module code, level and number of credit points; a synopsis and marketing synopsis; an outline syllabus (unstructured text, which could cause layout problems?); the learning and teaching strategy and assessment strategy; a set of module links to programmes the module is associated with, along with core programme data; a breakdown of contact time and assessments; and prerequisites, co-requisites and excluded combinations.

There does appear to be some redundancy in the data provided for a module, though this is not necessarily a bad thing in pragmatic terms (it can make like convenient). For example, the top level of a modules/id record looks like this:

n2 modules-id top level

and lower down the record, in the /assessments element we get duplication of the data:

n2 assessments module data dupe

As something of a tinkerer, this sort of thing works for me – I can grab the /assessments object out of a modules/id result and pass it around with the corresponding module data neatly bundled up too. But a puritan might take issue with the repetition…

Out of the box then, I can already start to write queries on a module if I have the module ID. I’ll give some code snippets of example Python routines as I play my way through the API…

import urllib2, json
def gurl(url):
  return json.load(urllib2.urlopen(url))

def jgrabber(u):
  root='https://n2.online.lincoln.ac.uk'
  return gurl(root+u)

#Note that API calls are actually signed with a key (not show in URLs)
def dd(): print "-----"

#Can I look up the programmes associated with a module given its ID?
def grabModulesId(id):
  return jgrabber( '/modules/id/'+str(id) )['result']

def printProgsFromModuleID(moduleID,d=''):
  if d=='': d=grabModulesId(moduleID)
  dd()
  print 'Module:', d['title'], '(', d['module_code']['code'], d['level']['description'], ',', d['credit_rating'], 'points)'
  dd()
  for r in d['module_links']:
    print 'Programme:', r['programme']['programme_title']
    print 'Course:', r['programme']['course_title'], "(",r['programme']['course_code']['code'],")"
    dd()

'''Example result:

>>> printProgsFromModuleID(97)
-----
Module: Advanced Crime Scene Investigation and Analysis 2012-13 ( FRS2002M Level 2 , 30 points)
-----
Programme: Forensic Science
Course: Forensic Science Bachelor of Science with Honours (BSc (Hons)) 2011-12 ( FRSFRSUB )
-----
Programme: Criminology and Forensic Invesitgation
Course: Criminology and Forensic Invesitgation Bachelor of Science with Honours (BSc (Hons)) 2011-12 ( CRIFRSUB )
'''

The /module_codes call then allows us to see when this module was presented:

#When was a module presented?
def grabModuleCodesId(id):
  return jgrabber( '/module_codes/id/'+str(id) )['result']

def printPresentationsFromModuleCode(moduleCode,d=''):
  if d=='': d=grabModuleCodesId(moduleCode)
  dd()
  print 'Presentations of',d['code']
  dd()
  for r in d['modules']:
    print r['title']
  dd()

'''Example result:

>>> printPresentationsFromModuleCode(28)
-----
Presentations of FRS2002M
-----
Advanced Crime Scene Investigation and Analysis 2010-11
Advanced Crime Scene Investigation and Analysis 2011-12
Advanced Crime Scene Investigation and Analysis 2012-13
Advanced Crime Scene Investigation and Analysis 2013-14
-----
'''

We can also tweak the first query to build a bigger report. For example:

#Combination effect
def getModuleCodeFromModuleID(moduleID,d=''):
  if d=='':d=grabModulesId(moduleID)
  return d['module_code']['id']

moduleID=97

d=grabModulesId(moduleID)
printProgsFromModuleID(moduleID,d)

moduleCode = getModuleCodeFromModuleID(moduleID,d)
printPresentationsFromModuleCode(moduleCode)

One thing I’m not sure about is the way in to a moduleID in the first instance? For example, I’d like to be able to straightforwardly be able to get the IDs for any upcoming presentations of FRS2002M in the current academic year?

So what else might we be able to do around a module? How about check out its learning outcomes? My first thought was the learning outcomes might be available from the /modules data (i.e. actual module presentations), but they don’t seem to be there. Next thought was to look for them in the abstracted module definition (from the module_codes), but again: no. Hmmm… Assessments are associated with a module, so maybe that’s where the learning outcomes come in (as subservient to assessment rather than a module?)

The assessment record in the module data looks like this:

n2 modules assessement

And if we click to to an assessment record we get something like this, which does include Learning Outcomes:

n2 assessment LOs

Philosophically, I’m not sure about this? I know that assessment is supposed to be tied back to LOs, and quality assurance around a course and its assessment is typically a driver for the use of LOs. But if we can only find the learning outcomes associated with a module via its assessment..? Hmmm… Data modeling is often fraught with philosophical problems, and I think is is one such case?

Anyway, how might we report on the learning outcomes associated with a particular module presentation? Here’s one possible way:

#Look up assessments
def assessmentByModuleID(moduleID,d=''):
  if d=='':d=grabModulesId(moduleID)
  return d['assessments']

#Look up learning outcomes
def learningOutcomesByModuleID(moduleID,d=''):
  if d=='': d=assessmentByModuleID(moduleID)
  dd()
  for r in d:
    assessment=gurl(r['nucleus_url'])['result']
    print assessment['module']['title'], '(', assessment['module']['module_code']['code'], ')', assessment['assessment_method'],'Learning Outcomes:'
    learningOutcomes=assessment['learning_outcomes']
    for learningOutcome in learningOutcomes:
      print '\t- ',learningOutcome['description']
    dd()

#Example call
learningOutcomesByModuleID(97)

'''Example output

Advanced Crime Scene Investigation and Analysis 2012-13 ( FRS2002M ) Exam Learning Outcomes:
	-  Identify factors which affect persistence and transfer of trace evidence materials and demonstrate practical skills in recovery and physical and chemical analysis of trace evidence using a variety of specialised techniques
	-  Discriminate different types of hairs and fibres on the basis of morphology and optical (e.g. birefringence) and chemical properties (e.g. dye composition)
...
'''

We can also query the API to get a list of learning outcomes directly, and this turns up a list of assessments as well as the single (?) module associated with the learning outcome. Does this mean that a particular learning outcome can’t be associated with two modules? I’m not sure that’s right? Presumably, database access would allow us to query learning outcomes by moduleID?

Okay – that’s probably enough for now. The API seems to be easy enough to use, and I guess it wouldn’t be too hard to come up with some Google Spreadsheet formulae to demonstrate how they could be used to pull the course data into that sort of workspace (eg along the lines of Using Data From Linked Data Datastores the Easy Way (i.e. in a spreadsheet, via a formula)).

In the next post, I’ll a peak at some of the subject keywords, and maybe have a play at walking through the data in order to generate some graphs (eg maybe along the lines of what Jamie has tried out previously: In Search of Similar Courses). There;s also programme level outcomes, so it might be interesting trying do do some sort of comparison between those and the learning outcomes associated with assessment in modules on a programme. And all this, of course, without (too much) privileged access…

PS Sod’s Law from Lincoln’s side means that even though I only touched a fragment of the data, I turned up some errors. So for example, in the documentation on bitbucket the award.md “limit” var was class as a “bool” rather than an “int” (which suggests a careful eye maybe needs casting over all the documentation. Or is that supposed to be my job?! Erm…;-) In programme/id=961 there are a couple of typos: programme_title: “Criminology and Forensic Invesitgation”, course_title: “Criminology and Forensic Invesitgation Bachelor of Science with Honours (BSc (Hons)) 2011-12″. Doing a bit of text mining on the data and pulling out unique words can help spot some typos, though in the above case Invesitgation would appear at least twice. Hmmm…

Looking at some of the API calls, it would be generally useful to have something along the lines of offset=N to skip the first N results, as well as returning “total-Results=” in the response. Some APIs provide helper data along the lines of “next:” where eg next=offset+num_results that can be plugged in as the offset in the next call if you want to roll your own paging. (This may be in the API, but I didn’t spot it?). When scripting this, care just needs be taken to check that a fence post error doesn’t sneak in.

Okay – enough for now. Time to go and play in the snow before it all melts…

Written by Tony Hirst

January 18, 2013 at 3:34 pm

Posted in Project

Tagged with ,

Further Dabblings with the Cloudworks API

with one comment

Picking up on A Couple of Proof of Concept Demos with the Cloudworks API, and some of the comments that came in around it (thanks Sheila et al:-), I spent a couple more hours tinkering around it and came up with the following…

A prettier view, stolen from Mike Bostock (I think?)

prettier view d3js force directed layout

I also added a slider to tweak the layout (opening it up by increasing the repulsion between nodes) [h/t @mhawksey for the trick I needed to make this work] but still need to figure this out a bit more…

I also added in some new parameterised ways of accessing various different views over Cloudworks data using the root https://views.scraperwiki.com/run/cloudworks_network_d3js_force_directed_view_pretti/

Firstly, we can make calls of the form: ?cloudscapeID=2451&viewtype=cloudscapecloudcloudscape

cloudworks cloudscapes by cloud

This grabs the clouds associated with a particular cloudscape (given the cloudscape ID), and then constructs the network containing those clouds and all the cloudscapes they are associated with.

The next view uses a parameter set of the form cloudscapeID=2451&viewtype=cloudscapecloudtags and displays the clouds associated with a particular cloudscape (given the cloudscape ID), along with the tags associated with each cloud:

cloudworks cloudscape cloud tags

Even though there aren’t many nodes or edges, this is quite a cluttered view, so I maybe need to rethink how best to visualise this information?

I’ve also done a couple of views that make use of follower data. For example, here’s how to call on a view that visualises how the folk who follow a particular cloudscape follow each other (this is actually the default if no viewtype is given) -
cloudscapeID=2451&viewtype=cloudscapeinnerfollowers

cloudworks cloudscape innerfollowers

And here’s how to call a view that grabs a particular user’s clouds, looks up the cloudscapes they belong to, then graphs those cloudscapes and the people who follow them: ?userID=1174&viewtype=usercloudcloudscapefollower

cloudworks followers of cloudscapes containing a user's clouds

Here’s another way of describing that graph – followers of cloudscapes containing a user’s clouds.

The optional argument filterNdegree=N (where N is an integer) will filter the diaplayed network to remove nodes with degree <=N. Here’s the above example, but filtered to remove the nodes that have degree 2 or less: ?userID=1174&viewtype=usercloudcloudscapefollower&filterNdegree=2

cloudworks graph filtered

That is, we prune the graph of people who follow no more than two of the cloudscapes to which the specified user has added a cloud. In other words, we depict folk who follow at least three of the cloudscapes to which the specified user has added a cloud.

(Note that on inspecting that graph it looks as if there is at least one node that has degree 2, rather than degree 3 and above. I’m guessing that it originally had degree 3 or more but that at least one of the nodes it was connected to was pruned out? If that isn’t the case, something’s going wrong…)

Also note that it would be neater to pull in the whole graph and filter the d3.js rendered version interactively, but I don’t know how to do this?

However…I also added a parameter to the script that generates the JSON data files from data pulled from the Cloudworks API calls that allows me to generate a GEXF network file that can be saved as an XML file (.gexf suffix, cf. Visualising Networks in Gephi via a Scraperwiki Exported GEXF File) and then visualised using a tool such as Gephi. The trick? Add the URL parameter &format=gexf (the (optional) default is &format=json) [example].

gephiview of cloudworks graph

Gephi, of course, is a wonderful tool for the interactive exploration of graph-based data sets…. including a wide range of filters…

So, where are we at? The d3.js force directed layout is all very shiny but the graphs quickly get cluttered. I’m not sure if there are any interactive parameter controls I can add, but at the moment the visualisations border on the useless. At the very least, I need to squirt a header into the page from the supplied parameters so we know what the visualisation refers to. (The data I’ve played with to date – which has been very limited – doesn’t seem to be that interesting either from what I’ve seen? But maybe the rich structure isn’t there yet? Or maybe there is nothing to be had from these simple views?)

It may be worth exploring some other visualisation types to see if they are any more legible, at least, though it would be even more helpful if they were simply more informative ;-)

PS just in case, here’s a link to the Cloudworks API documentation.

PPS if there are any terms of service associated with the API, I didn’t read them. So if I broke them, oops. But that said – such is life; never ever trust that anybody you give data to will look after it;-)

Written by Tony Hirst

January 17, 2013 at 11:00 pm

A Couple of Proof of Concept Demos With the Cloudworks API

with 19 comments

Via a tweet from @mhawksey in response to a tweet from @sheilmcn, or something like that, I came across a post by Sheila on the topic of Cloud gazing, maps and networks – some thoughts on #oldsmooc so far. The post mentioned a prototyped mindmap style browser for Cloudworks, created in part to test out the Cloudworks API.

Having tinkered with mindmap style presentations using the d3.js library in the browser before (Viewing OpenLearn Mindmaps Using d3.js; the app itself may well have rotted by now) I thought I’d have a go at exploring something similar for Cloudworks. With a promptly delivered API key by Nick Freear, it only took a few minutes to repurpose an old script to cast a test call to the Cloudworks API into a form that could easily be visualised using the d3.js library. The approach I took? To grab JSON data from the API, construct a tree using the Python networkx library, and drop a JSON serialisation of the network into a templated d3.js page. (networkx has a couple of JSON export functions that will create tree based and graph/network based JSON data structures that d3.js can feed from.

Here’s the Python fragment:

#http://cloudworks.ac.uk/api/clouds/{cloud_id}.{format}?api_key={api_key}

import urllib2,json, networkx as nx
from networkx.readwrite import json_graph

id=cloudscapeID #need logic

urlstub="http://cloudworks.ac.uk/api/"
urlcloudscapestub=urlstub+"cloudscapes/"+str(id)
urlsuffix=".json?api_key="+str(key)

ctyp="/clouds"
url=urlcloudscapestub+ctyp+urlsuffix

entities=json.load(urllib2.urlopen(url))

#print entities

#I seem to remember issues with non-ascii before, though maybe that was for XML? Hmmm...
def ascii(s): return "".join(i for i in s.encode('utf-8') if ord(i)<128)

def graphRoot(DG,title,root=1):
    DG.add_node(root,name=ascii(title))
    return DG,root

def gNodeAdd(DG,root,node,name):
    node=node+1
    DG.add_node(node,name=ascii(name))
    DG.add_edge(root,node)
    return DG,node

DG=nx.DiGraph()
DG,root=graphRoot(DG,id)
currnode=root

#This simple example just grabs a list of clouds associated with a cloudscape
for c in entities['items']:
    DG,currnode=gNodeAdd(DG,root,currnode,c['title'])
    
#We're going to use the tree based JSON data format to feed the d3.js mindmap view
jdata = json_graph.tree_data(DG,root=1)
#print json.dumps(jdata)

#The page template is defined elsewhere.
#It loads the JSON from a declaration in the Javascript of the form: jsonData=%(jdata)s
print page_template % vars()

The rendered view is something along the lines of:

cloudscapeTree

You can find the original code here.

Now I know that: a) this isn’t very interesting to look at; and b) doesn’t even work as a navigation surface, but my intention was purely to demonstrate a recipe from getting data out of the Cloudworks API and into a d3.js mindmap view in the browser, and it does that. A couple of obvious next steps: i) add in additional API calls to grow the tree (easy); ii) linkify some of the nodes (I’m not sure I know who to do that at them moment?)

Sheila’s post ended with a brief reflection: “I’m also now wondering if a network diagram of cloudscape (showing the interconnectedness between clouds, cloudscapes and people) would be helpful ? Both in terms of not only visualising and conceptualising networks but also in starting to make more explicit links between people, activities and networks.”

So here’s another recipe, again using networkx but this time dropping the data into a graph based JSON format and using the d3.js force based layout to render it. What the script does is grab the followers of a particular cloudscape, grab each of their followers, and then graph how the followers of a particular cloudscape follow each other.

Because I had some problems getting the data into the template, I also used a slightly different wiring approach:

import urllib2,json,scraperwiki,networkx as nx
from networkx.readwrite import json_graph

id=cloudscapeID #need logic
typ='cloudscape'

urlstub="http://cloudworks.ac.uk/api/"
urlcloudscapestub=urlstub+"cloudscapes/"+str(id)
urlsuffix=".json?api_key="+str(key)

ctyp="/followers"
url=urlcloudscapestub+ctyp+urlsuffix
entities=json.load(urllib2.urlopen(url))

def ascii(s): return "".join(i for i in s.encode('utf-8') if ord(i)<128)

def getUserFollowers(id):
    urlstub="http://cloudworks.ac.uk/api/"
    urluserstub=urlstub+"users/"+str(id)
    urlsuffix=".json?api_key="+str(key)

    ctyp="/followers"
    url=urluserstub+ctyp+urlsuffix
    results=json.load(urllib2.urlopen(url))
    #print results
    f=[]
    for r in results['items']: f.append(r['user_id'])
    return f

DG=nx.DiGraph()

followerIDs=[]

#Seed graph with nodes corresponding of followers of a cloudscape
for c in entities['items']:
    curruid=c['user_id']
    DG.add_node(curruid,name=ascii(c['name']).strip())
    followerIDs.append(curruid)

#construct graph of how followers of a cloudscape follow each other
for c in entities['items']:
    curruid=c['user_id']
    followers=getUserFollowers(curruid)
    for followerid in followers:
        if followerid in followerIDs:
            DG.add_edge(curruid,followerid)

scraperwiki.utils.httpresponseheader("Content-Type", "text/json")

#Print out the json representation of the network/graph as JSON
jdata = json_graph.node_link_data(DG)
print json_graph.dumps(jdata)

In this case, I generate a JSON representation of the network that is then loaded into a separate HTML page that deploys the d3.js force directed layout visualisation, in this case how the followers of a particular cloudscape follow each other.

cloudworks_innerfrendsNet

This hits the Cloudworks API once for the cloudscape, then once for each follower of the cloudscape, in order to construct the graph and then pass the JSON version to the HTML page.

Again, I’m posting it as a minimum viable recipe that could be developed as a way of building out Sheila’s idea (though the graph definition would probably need to be a little more elaborate, eg in terms of node labeling). Some work on the graph rendering probably wouldn’t go amiss either, eg in respect of node sizing, colouring and labeling.

Still, what do you expect in just a couple of hours?!;-)

Written by Tony Hirst

January 16, 2013 at 10:08 pm

Posted in Tinkering

Tagged with ,

The Basics of Betting as a Way of Keeping Score…

with 8 comments

Another preparatory step before I start learning about stats in the context of Formula One… There are a couple of things I’m hoping to achieve when I actually start the journey: 1) finding ways of using stats to help to pull out patterns and events that are interesting from a storytelling or news perspective; 2) seeing if I can come up with any models that help forecast or predict race winners or performances over a race weekend.

There are a couple of problems I can foresee (?!) when it comes to the predictions: firstly, unlike horseracing, there aren’t that many F1 races each year to test the predictions against. Secondly, how do I even get a baseline start on the probabilities that driver X or team Y might end up on the podium?

It seems to me as if betting odds provide one publicly available “best guess” at the likelihood of any driver winning a race (a range of other bets are possible, of course, that give best guess predictions for other situations…) Having had a sheltered life, the world of betting is completely alien to me, so here’s what I think I’ve learned so far…

Odds are related to the anticipated likelihood of a particular event occurring and represent the winnings you get back (plus your stake) if a particular event happens. So 2/1 (2 to 1) fractional odds say: if the event happens, you’ll get 2 back for every 1 you placed, plus your stake back. If I bet 1 unit at 2/1 and win, I get 3 back: my original 1 plus 2 more. If I bet 3, I get 9 back: my original 3 plus 2 for every 1 I placed. Since I placed 3 1s, I get back 3 x 2 = 6 in winnings. Plus my original 3, which gives me 9 back on 3 staked, a profit of 6.

Odds are related (loosely) to the likelihood of an event happening. 2/1 odds represent a likelihood (probability) that an event will happen (1/3) = 0.333… of the time (to cur down on confusion between fractional odds and fractional probabilities, I’ll try to remember to put the fractional probabilities in brackets; so 1/2 is fractional odds of 2 to 1 on, and (1/2) is a probability of one half). To see how this works, consider an evens bet, fractional odds of 1/1, such as someone might make for tossing a coin. The probability of getting heads on a single toss is (1/2); the probability of getting tails is also (1/2). If I’m giving an absolutely fair book based on these likelihoods, I’d offer you even odds that you get a head, for example, on a single toss. After all, it’s (fifty/fifty) (fifty per cent chance either way) of whether a heads or tails will land face up. If there are three equally possible outcomes, (1/3) each, then I’d offer 2/1. After all, it’s twice as likely that something other than the single outcome you called would come up. If there are four possible outcomes, I’d offer 3/1, because it’s likely (if we played repeatedly) that three times out of four, you’d be wrong. So every three times out of four you’d lose and I’d take your stake. And on the fourth go, when you get it right, I give you your stake back for that round plus three for winning, so over all we’d be back where we started.

Decimal odds are a way of describing the return you get on a unit stake. So for a 2/1 bet, the decimal odds are three. For a 4/1 bet they’d be 5. For an N/1 bet they’d be 1+N. For an 1/2 (two to one on?) bet they’d be 1.5, for a 1/10 bet they’d be 1.1. So for a 1/M bet, 1+1/M. Generally, for an N/M bet, decimal odds are 1+N/M.

Decimal odds give an easy way in to calculating the likelihood of an event. Decimal odds of 3, (that is, fractional odds 2/1), describe an event that will happen (1/3) of the time in a fair game. That is (1/(decimal odds)) of the time. For fractional odds of N/M, you expect the event to happen with probability (1/(1+N/M))

In a completely fair book (?my phrase), the sum of the odds should lead to the summed probability of all possible events happening of 1. Bookmakers right the odds in their favour though, so the summed probabilities on a book will add up to more than 1 – this represents the bookmaker’s margin. If you’re betting on the toss of a coin with a bookie, they may offer you 99/100 for heads, evens for tails. If you play 400 games and bet 300 heads and 200 tails, winning 100 of each, you’ll overall stake 400, win 100 (plus 100 back) on tails along with 99 (plus 100 original stake) on heads. That is, you’ll have staked 400 and got back 399. The bookie will be 1 up overall. The summed probabilities add up to more than 1, since (1/2) + (1/(1+99/100)) = (0.5 + ~0.5025) > 1.

One off bets are no basis for a strategy. You need to bet regularly. One way of winning is to follow a value betting strategy where you place bets on outcomes that you predict are more likely than the odds you’re offered. This is counter to how the bookie works. If a bookie offers you fractional odds of 3/1 (expectation that the event will happen (1/4) of the time), and you have evidence that suggests it will happen (1/3) of the time (decimal odds of 3, fractional odds 2/1) then it’s worth your while repeatedly accepting the bet. After all, if you play 12 rounds, you’ll wager 12, and win on 12/3=4 occasions, getting 4 back (3 + your stake) each time, to give you a net return of 4 x 4 – 12 = 16 – 12 = +4. If the event had happened at the bookie’s predicted likelihood of 1/4 of the time, you would have got back ( 12/4 ) * 4 – 12 = +0 overall.

I’ve tried to do an R script to explore this:

#My vocabulary may be a bit confused herein
#Corrections welcome in the comments from both statisticians and gamblers;-)

#The offered odds
price=4 #3/1 -> 3+1 That is, the decimal odds on fractional odds of 3/1
odds=1/price

#The odds I've predicted
myodds=1/3 #2/1 -> 1/(2+1)

#The number of repeated trials in the game
trials=10000

#The amount staked
bet=1

#The experiment that we'll run trials number of times
expt=function(trials,odds,myodds,bet){
  #trial sets a uniform random number in ranger 0..1
  df=data.frame(trial=runif(1:trials))
  #The win condition happens at my predicted odds, ie if trial value is less than my odds
  #So if my odds are (1/4) = 0.25, a trial value in range 0..0.25 counts as a win
  # (df$trial<myodds) is TRUE if trial < myodds, which is cast by as.integer() to value 1
  # If (df$trial<myodds) is FALSE, as.integer() returns 0
  df$win=as.integer(df$trial<myodds)
  df$bet=bet
  #The winnings are calculated at the offered odds and are net of the stake
  #The df$win/odds = 1/odds = price (the decimal odds) on a win, else 0
  #The actual win is the product of the stake (bet) and the decimal odds
  #The winnings are the return net of the initial amount staked
  #Where there is no win, the winnings are a loss of the value of the bet
  df$winnings=df$bet*df$win/odds-df$bet
  df
}

df.e=expt(trials,odds,myodds,bet)

#The overall net winnings
sum(df.e$winnings)

#If myodds > odds, then I'm likely to end up winning on a value betting strategy

#A way of running the experiment several times
#There are probably better R protocols for doing this?
runs=10
df.r=data.frame(s=numeric(),v=numeric())
for (i in 1:runs){
  e=expt(trials,odds,myodds,bet)
  df.r=rbind(df.r,data.frame(s=sum(e$winnings),v=sd(e$winnings)))
}

#It would be nice to do some statistical graphics demonstrations of the different distributions of possible outcomes for different regimes. For example:
## different combinations of odds and myodds
## different numbers of trials
## different bet sizes

There are apparently also “efficient” ways of working out what stake to place (the “staking strategy”). The value strategy gives you the edge to win, long term, the staking strategy is how you maximise profits. See for example Horse Racing Staking and Betting: Horse racing basics part 2 or more mathematical treatments, such as The Kelly Criterion.

There is possibly some mileage to be had in getting to grips with R modeling using staking strategy models as an opening exercise, along with statistical graphical demonstrations of the same, but that is perhaps a little off topic for now…

To recap then, what I think I’ve learned is that we can test predictions against the benchmark of offered odds. The offered odds in themselves give us a ballpark estimate of what the (expert) bookmakers, as influenced by the betting/prediction market, expect the outcome of an event to be. Note that the odds are rigged to give summed probabilities over a range of events happening to be greater than 1, to build in a profit margin (does it have a proper name?) for the bookmaker. If we have a prediction model that appears to offer better odds on an event than the odds that are actually offered, and we believe in our prediction, we can run a value betting strategy on that basis and hopefully come out, over the long term, with a profit. The size of the profit is in part an indicator of how much more accurate our model is as a predictive model than the expert knowledge and prediction market basis that is used to set the bookie’s odds.

Written by Tony Hirst

January 16, 2013 at 1:11 am

Posted in Anything you want, Rstats

Tagged with

#Midata Is Intended to Benefit Whom, Exactly?

leave a comment »

A CTRL-Shift blog post entitled MIDATA Legislation Begins mentions, but doesn’t link to, “an amendment to the Enterprise and Regulator Reform Bill in the House of Lords”, presumably referring to paragraphs 58C*, 58D* and 58E* proposed by Viscount Younger of Leckie in the Seventh Marshalled List of Amendments:

58C*

Insert the following new Clause—

“Supply of customer data

(1) The Secretary of State may by regulations require a regulated person to provide customer data—

(a) to a customer, at the customer’s request;

(b) to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.

(2) “Regulated person” means—

(a) a person who, in the course of a business, supplies gas or electricity to any premises;

(b) a person who, in the course of a business, provides a mobile phone service;

(c) a person who, in the course of a business, provides financial services consisting of the provision of current account or credit card facilities;

(d) any other person who, in the course of a business, supplies or provides goods or services of a description specified in the regulations.

(3) “Customer data” means information which—

(a) is held in electronic form by or on behalf of the regulated person, and

(b) relates to transactions between the regulated person and the customer.

(4) Regulations under subsection (1) may make provision as to the form in which customer data is to be provided and when it is to be provided (and any such provision may differ depending on the form in which a request for the data is made).

(5) Regulations under subsection (1)—

(a) may authorise the making of charges by a regulated person for complying with requests for customer data, and

(b) if they do so, must provide that the amount of any such charge—

(i) is to be determined by the regulated person, but

(ii) may not exceed the cost to that person of complying with the request.

(6) Regulations under subsection (1)(b) may provide that the requirement applies only if the authorised person satisfies any conditions specified in the regulations.

(7) In deciding whether to specify a description of goods or services for the purposes of subsection (2)(d), the Secretary of State must (among other things) have regard to the following—

(a) the typical duration of the period during which transactions between suppliers or providers of the goods or services and their customers take place;

(b) the typical volume and frequency of the transactions;

(c) the typical significance for customers of the costs incurred by them through the transactions;

(d) the effect that specifying the goods or services might have on the ability of customers to make an informed choice about which supplier or provider of the goods or services, or which particular goods or services, to use;

(e) the effect that specifying the goods or services might have on competition between suppliers or providers of the goods or services.

(8) The power to make regulations under this section may be exercised—

(a) so as to make provision generally, only in relation to particular descriptions of regulated persons, customers or customer data or only in relation to England, Wales, Scotland or Northern Ireland;

(b) so as to make different provision for different descriptions of regulated persons, customers or customer data;

(c) so as to make different provision in relation to England, Wales, Scotland and Northern Ireland;

(d) so as to provide for exceptions or exemptions from any requirement imposed by the regulations, including doing so by reference to the costs to the regulated person of complying with the requirement (whether generally or in particular cases).

(9) For the purposes of this section, a person (“C”) is a customer of another person (“R”) if—

(a) C has at any time, including a time before the commencement of this section, purchased (whether for the use of C or another person) goods or services supplied or provided by R or received such goods or services free of charge, and

(b) the purchase or receipt occurred—

(i) otherwise than in the course of a business, or

(ii) in the course of a business of a description specified in the regulations.

(10) In this section, “mobile phone service” means an electronic communications service which is provided wholly or mainly so as to be available to members of the public for the purpose of communicating with others, or accessing data, by mobile phone.”

58D*

Insert the following new Clause—

“Supply of customer data: enforcement

(1) Regulations may make provision for the enforcement of regulations under section (Supply of customer data) (“customer data regulations”) by the Information Commissioner or any other person specified in the regulations (and, in this section, “enforcer” means a person on whom functions of enforcement are conferred by the regulations).

(2) The provision that may be made under subsection (1) includes provision—

(a) for applications for orders requiring compliance with the customer data regulations to be made by an enforcer to a court or tribunal;

(b) for notices requiring compliance with the customer data regulations to be issued by an enforcer and for the enforcement of such notices (including provision for their enforcement as if they were orders of a court or tribunal).

(3) The provision that may be made under subsection (1) also includes provision—

(a) as to the powers of an enforcer for the purposes of investigating whether there has been, or is likely to be, a breach of the customer data regulations or of orders or notices of a kind mentioned in subsection (2)(a) or (b) (which may include powers to require the provision of information and powers of entry, search, inspection and seizure);

(b) for the enforcement of requirements imposed by an enforcer in the exercise of such powers (which may include provision comparable to any provision that is, or could be, included in the regulations for the purposes of enforcing the customer data regulations).

(4) Regulations under subsection (1) may—

(a) require an enforcer (if not the Information Commissioner) to inform the Information Commissioner if the enforcer intends to exercise functions under the regulations in a particular case;

(b) provide for functions under the regulations to be exercisable by more than one enforcer (whether concurrently or jointly);

(c) where such functions are exercisable concurrently by more than one enforcer—

(i) designate one of the enforcers as the lead enforcer;

(ii) require the other enforcers to consult the lead enforcer before exercising the functions in a particular case;

(iii) authorise the lead enforcer to give directions as to which of the enforcers is to exercise the functions in a particular case.

(5) Regulations may make provision for applications for orders requiring compliance with the customer data regulations to be made to a court or tribunal by a customer who has made a request under those regulations or in respect of whom such a request has been made.

(6) Subsection (8)(a) to (c) of section (Supply of customer data) applies for the purposes of this section as it applies for the purposes of that section.

(7) The Secretary of State may make payments out of money provided by Parliament to an enforcer.

(8) In this section, “customer” and “regulated person” have the same meaning as in section (Supply of customer data).”

58E*

Insert the following new Clause—

“Supply of customer data: supplemental

(1) The power to make regulations under section (Supply of customer data) or (Supply of customer data: enforcement) includes—

(a) power to make incidental, supplementary, consequential, transitional or saving provision;

(b) power to provide for a person to exercise a discretion in a matter.

(2) Regulations under either of those sections must be made by statutory instrument.

(3) A statutory instrument containing regulations which consist of or include provision made by virtue of section (Supply of customer data)(2)(d) may not be made unless a draft of the instrument has been laid before, and approved by a resolution of, each House of Parliament.

(4) A statutory instrument containing any other regulations under section (Supply of customer data) or section (Supply of customer data: enforcement) is subject to annulment in pursuance of a resolution of either House of Parliament.”

Note that 58C/1/b states that data could be released “to a person who is authorised by a customer to receive the data, at the customer’s request or, if the regulations so provide, at the authorised person’s request.” So if I say to my electricity company that they can share the data with you (“a person who is authorised by a customer to receive the data”), the company can share the data with you if I ask them to or if you ask them. Which is presumably a bit like how direct debits work (I sign something and give it to you and you then go to my bank and request access to my bank account). So the proposed legislation seems to allow for (or at least, not exclude?) the creation of data aggregators who might start to aggregate data from a variety of “regulated persons” at my authorisation.

Note that I assume other regulations, such as the Data Protection Act, preclude those data aggregators from acting as data brokers, “companies that collect personal information about consumers from a variety of public and non-public sources and resell the information to other companies” (FTC [the US Federal Trade Commission] to Study Data Broker Industry’s Collection and Use of Consumer Data).

It’s also worth mentioning that the amendment doesn’t actually seem to set about enacting any actual midata legislation: “The Secretary of State may by regulations require…” which is presumably setting up the opportunity for the Secretary of State to bring it about through a Statutory Instrument or similar?

(In passing, the tabled amendments to the Bill also includes amendments relating to proposed amendments to the Copyright, Designs and Patents Act 1988 (part 6 of the Bill, relating to licensing of orphan works, collection licensing, duration of copyright et al.) as well as the creation of a Director General of Intellectual Property Rights (28C).)

The day before, CTRL-Shift had also published a post on Building Relationships for a New Data Age:

The challenge (and opportunity) is to start building an information sharing relationship with customers where both sides use data sharing to save time, cut costs and be more efficient – and to add new value.

In a world that’s rapidly going digital, an information sharing relationship makes it normal for individuals to provide the organisations they deal with new, additional and updated data, and for organisations to also routinely provide customers with additional data or data-based services. Information sharing relationships and services are becoming a key influence on which organisations customers choose to do business with, and how valuable this business becomes.

The question is, how do we get from A to B? From today’s ‘one way’ norm where organisations collect data about customers and send messages to them, to a more equal and valuable information sharing partnership? There are three key pillars to an information sharing relationship:

- establish a trustworthy ‘default setting’ for the use of personal data
- give users/customers control
- earn VPI (volunteered personal information) via new information services.

Volunteered personal information, a phrase straight out of the Facebook playbook…

The post then discusses the importance of getting default settings right, in part to avoid a public backlash and a “loss of trust” when folk realise the terms and conditions allow the companies involved to do whatever it is they say the company can, before describing how companies can Earn VPI via information services:

Getting default settings right and giving users control only create the context needed for a healthy information sharing relationship. They don’t actually get the information flowing. To do that, organisations need to:

- elicit valuable additional information from customers
- release and provide customers with additional information and/or information based services that help them make better decisions and make it easier for them to get stuff done and achieve their goals – i.e. services that add new value.

In theory, eliciting VPI and offering added value information services are two separate things. But in reality they are likely to advance hand in hand: with individuals offering additional information (in an environment they can trust because of default settings and user control) as a way to get additional value from information-driven services.

Hmmm… elicit valuable additional information from customers; and then release and provide customers with … services that add new value (I can play the selective cut and past game too…;-) #midata is presumably being sold to consumers on the basis of the latter, particularly those services that “help them make better decisions and make it easier for them to get stuff done and achieve their goals”.

And then we read:

In theory, eliciting VPI and offering added value information services are two separate things. But in reality they are likely to advance hand in hand: with individuals offering additional information (in an environment they can trust because of default settings and user control) as a way to get additional value from information-driven services.

In theory, eliciting VPI and offering added value information services are two separate things. In the land where the flowers grow and the flopsy bunnies frolic, blissfully unaware that they are what Farmer McGregor actually sells to the butcher, presumably at a greater price than he can sell the lettuces the flopsy bunnies eat to the local greengrocer. Or something like that.

But in reality sound the drums of doom…in reality they are likely to advance hand in hand. Erm…of course… No-one wants shed loads of transactional data for personal use…with individuals offering additional information as a way to get additional value from information-driven services.

Yep… #midata is a way of getting you to give shed loads of low quality transactional data to third parties (who may or may not aggregate it worth other data you grant them access to) and then give them a shed load more data before it actually becomes useful. Because that’s how data works…but it’s not how the dream is sold…

Hmmm… I wonder, does the draft legislation say anything about the extent to which an authorised person is allowed to aggregate and mine data from regulated person(s) that relates to data collected from different customers either of the same, or different regulated persons? Because there lies another source of those “in reality” sources of potential value add…though we really should also try to imagine what sources they might be. (Is receiving targeted ads “value add” for me over random junk mail?)

On the other side of the fence, sort of, we see a Private Member’s Bill (Ten Minute Rule Bill?) from John Denham, Labour MP for Southampton, Itchen (not, apparently, the constituency in which the University of Southampton resides…) on Supermarket price transparency which seeks to require supermarkets “to release pricing data product by product and store by store [update: Supermarket Pricing Information Bill 2012-13]. This price information would not only enable the comparison of basic product prices, but also enable consumers to understand the differences in pricing between stores within the same retail chain, or variations in pricing of goods in different areas and regions.” In addition, it is claimed that the Private Member’s Bill “would also enable efficient scrutiny of special offers, multi-buys, ‘bogofs’ and other price promotions that have been the subject of recent criticism and regulatory action.”.

PS See also So What, #midata? And #yourData, #ourData…

Written by Tony Hirst

January 14, 2013 at 8:46 pm

Posted in Anything you want, Policy

Tagged with

Press Releases and Convenient Report Publication Formats for Data Journalists

leave a comment »

One of the things that I’ve been pondering lately is how I increasingly read the news in a “View Source”* frame of mind, wanting to look behind news stories as reported to read the actual survey report, press release, or Hansard report they take their lead from (more from this in a future post…) – see for example Two can play at that game: When polls collide for a peek behind the polls that drove a couple of conflicting recent news stories. Once you start reading news stories in the context of the press releases that drove them, you can often start to see how little journalistic value add there is to a large proportions of particular sorts of news stories. When FutureLearn was announced, most of the early stories were just a restatement of the press release, for example.

[*View Source refers to the ability, in most desktop based web browsers, to view the HTML source code that is used to generate a rendered HTML web page. That is, you can look to see how a particular visual or design effect in web page was achieved by looking at the code that describes how it was done.]

I’m still a little hazy about what the distinguishing features of “data journalism” actually are (for example, Sketched Thoughts On Data Journalism Technologies and Practice), but for the sake of this post let’s just assume that doing something with an actual data file is necessary part of the process when producing a data driven journalistic story. Note that this might just be limited to re-presenting a supplied data set in a graphical form, or it might involve a rather more detailed analysis that requires, in part, the combination of several different original data sets.

So what might make for a useful “press release” or report publication as far as a data journalist goes? One example might be raw data drops published as part of a predefined public data publication scheme by a public body. But again, for the purposes of this post, I’m more interested in examples of data that is released in a form that is packaged in a that reduces the work the data journalist needs to do and yet still allows them to argue that what they’re doing is data journalism, as defined above (i.e. it involves doing something with a dataset…).

Here are three examples that I’ve seen “in the wild” lately, without doing any real sort of analysis or categorisation of the sorts of thing they contain, the way in which they publish the data, or the sorts of commentary they provide around it. That can come later, if anyone thinks there is mileage in trying to look at data releases in this way…

The press release for the UCAS End of Cycle report 2012 includes headline statistical figures, a link to a PDF report, a link to PNG files of the figures used in the report (so that they can be embedded in articles about the report, presumably) and a link to the datasets used to create the figures used in the report.

ucasEndCycleReportPR

Each figure has it’s own datafile in CSV format:

ucasCycleReportData

Each datafile also contains editorial metadata, such as chart title and figure number:

ucasendcyclereportdatafig

The released data thus allows the data journalist (or the art department of a large news organisation…) to publish their own stylised view of the charts (or embed their own biases in the way they display the data…) and do a very limited amount of analysis on that data. The approach is still slightly short of true reproducibility, or replicability, though – it might take a little bit of effort for us to replicate the figure as depicted from the raw dataset, for example in the setting of range limits for numerical axes. (For an old example of what a replicable report might look like, see How Might Data Journalists Show Their Working?. Note that tools and workflows have moved on since that post was written – I really need to do an update. If you’re interested in examples of what’s currently possible, search for knitr…)

In this sort of release, where data is available separately for each published figure, it may be possible for the data journalist to combine data from different chart-related datasets (if they are compatible) into a new dataset. For example, if two separate charts displayed the performance of the same organisations on two different measures, we might be able to generate a single dataset that lets us plot a “dodged” bar chart showing the performance of each of those organisations against the two measures on the same chart; where two charts compare the behaviour of the same organisations at two different times, we may be able to combine the data to produce a slopegraph. And so on…

The ONS – the Office of National Statistics – had a hard time in December 2012 from the House of Commons Public Administration Committee over its website as part of an inquiry on Communicating and publishing statistics (see also the session the day before). I know I struggle with the ONS website from time to time, but it’s maybe worth considering as a minimum viable product, and to start iterating…?

So for example, the ONS publishes lots of statistical bulletins using what appears to be a templated format. For example, if we look at the Labour Market Statistics, December 2012, we see a human readable summary of the headline items in the release along with links to specific data files containing the data associated with each chart and a download area for data associated with the release:

ONSstatisitcalbulletin

If we look at the Excel data file associated with a “difference over time” chart, we notice the the data used to derive the difference is also included:

onsexcel

In this case, we could generate a slope graph directly from the datafile associated with the chart, even though not all that information was displayed in the original chart.

(This might then be a good rule of thumb for testing the quality of “change” data supplied as part of a data containing press release – are the original figures that are differenced to create the difference values also released?)

If we follow the data in this release link, we find a set of links to a whole range of downloadable statistical data tables, as well as “Datasets associated with this release

onsDatawiththisrelease

It can all start getting a bit rathole, rabbit warren from here on in… For example, here are the datasets related with the statistical bulletin:

onsDatasets withRelease

Here’s a page for the Labour Market statistics dataset, and so on…

onsDataset

That said, the original statistical bulletin does provide specific data downloads that are closely tied to each chart contained within the bulletin.

The third example is the Chief Medical Officer’s 2012 annual report, a graphically rich report published in November 2012. (It’s really worth a look…) The announcement page mentions that “All of the underlying data used to create the images in this report will be made available at data.gov.uk.” (The link points to the top level of the data.gov.uk site). A second link invites you to Read the CMO’s report, leading to a page that breaks out the report in the form of links to chapter level PDFs. However, that page also describes how “When planning this report, the Chief Medical Officer decided to make available all of the data used to create images in the report, which in turn leads to a page that contains links to a set of Dropbox pages that allow you to download data on a chapter by chapter basis from the first volume of the report in an Excel format.

Whilst the filenames are cryptic, and the figures in the report not well identified, the data is available, which is a Good Thing. (The page also notes: “The files produced this year cannot be made available in csv format. This option will become available once the Chief Medical Officer’s report is refreshed.” I’m not sure if that means CSV versions of the data will be produced for this report, or will be produced for future versions of the report, in the sense of the CMO’s Annual Report for 2013, etc?)

Once again, though, there may still be work to be done recreating a particular chart from a particular dataset (not least because some of the charts are really quite beautiful!;-) Whilst it may seem a little churlish to complain about a lack of detail about how to generate a particular chart from a particular dataset, I would just mention that one reason the web developed its graphical richness so quickly was that by “Viewing Source” developers could pinch the good design ideas they saw on other websites and implement (and further develop) them simply by cutting and pasting code from one page into another.

What each of the three examples described shows is an opening up of the data immediately behind a chart (and in at least one example from the ONS, making available the data from which the data displayed in a difference chart was calculated), and good examples of a basic form of data transparency? The reader does not have to take a ruler to a chart to work out what value a particular point is (which can be particularly hard on log-log or log-lin scale charts!), they can look it up in the original data table used to generate the chart. Taking them as examples of support for a “View Source” style of behaviour, what other forms of “View Source” supporting behaviour should we be trying to encourage?

PS If we now assume that the PR world is well versed with the idea that there are data journalists (or chart producing graphics editors) out there and that they do produce data bearing press releases for them. How might the PR folk try to influence the stories the data journalists tell by virtue of the data they release to them, and the way in which they release it?

PPS by the by, I noticed today that there is a British Standard Guide to presentation of tables and graphs [ BS 7581:1992 ] (as well as several other documents providing guidance on different forms of “statistical interpretation”). But being a British Standard, you have to pay to see it… unless you have a subscription, of course; which is one of the perks you get as a member of an academic library with just such a subscription. H/T to “Invisible librarian” (in sense of Joining the Flow – Invisible Library Tech Support) Richard Nurse (@richardn2009) for prefetching me a link to the OU’s subscription on British Standards Online in rsponse to a tweet I made about it:-)

Written by Tony Hirst

January 14, 2013 at 12:02 pm

Posted in onlinejournalismblog

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 257 other followers