National Institute for Literacy
 
NATIONAL INSTITUTE FOR LITERACY ARCHIVED CONTENT


Transcript: Adult and Family Literacy

March 15, 2002

Research and Design Methods
Dr. David Francis, University of Houston

Peggy:
Hi. I'm going to let David tell you who he is. What we're trying to do is keep somebody talking to the folks on the web at all times, which is why we're having this little shuffle of there's always going to be somebody in this sort of hot zone here. So this is Dr. David Francis from the University of Houston.

And I guess I really out to say something because David's very modest and he won't. And I'll just say in my estimation, he's one of the best research methodologists in the country today. And he's heading a major research project on illiteracy that's part of our $30 million illiteracy research network and has worked extensively in the area of reading research. So I think you can trust what David has to tell you.

David Francis:
Thanks, Peggy. Can you hear me OK? As Peggy said, I'm on campus here at the University of Houston, so appropriately I was the last one to get here. What I want to do today is to talk to you about research design, from the standpoint of thinking about writing a grant application. And so we're going to talk about research design in a somewhat different way. And you'll notice that the title of this talk is "Many questions and some answers." And that is, I'm going to pose a lot more questions to you than I'm going to answer. And that's because I think when you're writing a grant and you're thinking about writing a grant, the quality of that application comes from asking yourself the right questions. Now of course, to do a research proposal, you have to have a question. And anything that-if you don't have a question, there's not really a proposal there. But anything that's not worth doing is not worth doing well. So you have to start out with a good basic question. But the success of your application will depend a lot more, I think, on the kinds of questions you ask yourself in the process of designing your study than laying out your grant application and the answers that you come to in terms of those questions that you ask yourself about the problem that you're dealing with and how best to approach that problem. I have to use finger method to control the slides, there's no button?

So the way that I want to try to approach this is to talk to you first about a sort of traditional view of experimental design and we'll talk about some specific criteria for evaluating designs because the criteria for evaluating studies is the same criteria that will be applied to evaluating your particular experimental design and your study that you propose. And then I'm going to talk to you about a somewhat expanded view of experimental design, one that I think is beneficial to you in terms of writing your grant application. And I'm going to talk to you about this from the standpoint of someone who has written a number of grant applications, but also from the standpoint of someone who has served for a number of years on review panels for NIH and the kinds of problems that I have seen in terms of applications that come into the NIH and also the Department of Education in terms of applications and why those applications sometimes fail when the people actually have a good idea, they have a good question to address, the problem that they're addressing is an important one, but they haven't operationalized it in a way that makes it successful as a grant application.

And throughout this process, you will hear me talking about specific aims and the central importance of specific aims to your research proposal. And the specific aims are those things which you intent to accomplish in your grant application. And if you think of that as a part, it's sort of the backbone of your grant application and that the specific aims are really the glue that holds the entire application together, you'll find that your applications will generally tend to be more successful. So throughout this process, as I'm sort of posing questions to you and thinking about ways in which you might want to answer those questions, I'm going to keep referring back to the specific aims because the answers to the questions, the questions that you'll pose, the answers to those questions will depend completely on your specific aims. And with different aims, you might answer those questions in a very different way. And so, if you can keep in mind throughout the process of writing your grant application what your specific aims are, your applications will be more successful.

And what time did I start and how much time to I have? You'll figure that out and you'll tell me, OK. In general, I hate to lecture. I'd much rather interact with you around this material, so as I talk about it, if you have questions, that you would raise those questions as I'm going about it. Is that like an invalid approach when we're being broadcast on the web? OK.

Peggy:
No, they just have to use the mic so that the people on the web can hear the questions. So if you have questions, I know it's a little bit inconvenient, but just make your way to one of the mics. David will see you standing up and he'll pause for your question. I think we can make it fairly natural and interactive.

D. Francis:
OK, well that would be my hope for how we approach this. So let me talk a little bit about sort of a traditional view of experimental design. Perlinger [sp] has told us that the purpose of experimental design is to help us answer questions as validly, as objectively, as accurately and as economically as possible. And in that regard, he has come up with this sort of max/min con [sp] principle. And it's important to sort of keep this principle in mind. And really, research attempts to do three things. One, design attempts to help us maximize systematic variance. And systematic variance is generally the kind of thing that we're trying to study. So for example, if we're interested in integrated approaches to family literacy versus parent-focused approaches to family literacy, our interest there is in the distinction between integrated approaches and parent-centered approaches. And the systematic variability is that which is associated with this distinction between these two types of approaches to helping parents and families gain literacy skills.

Minimizing error variance is attempting to reduce all of those things which will cause variability in the outcomes that we're interested in. So for example, if we're interested in studying family literacy, one of the things we might want to know is the degree to which parents who entered our programs not being able to read are now able to read. And we'll have specific ways in which we intend to measure that. We might measure them reading directly to their child. We might give them a text to read and look for their-try to assess their comprehension. We might attempt to assess their ability simply to read words. We might also want to assess their vocabulary skills. All these things are different kinds of behaviors that we would want to measure. And in trying to minimize error variance, what we're trying to do is to make sure that the scores that we get, the observations that we make are closely tied to the thing that we're interested in trying to measure and are not influenced by other factors such as unreliability in the test. So for example, a short test tends to give scores that will vary quite a bit from situation to situation, whereas a longer test tends to give information that's more precise. So it varies less from situation to situation. So trying to minimize error variance is a way to ensure that the scores that we obtain or the observations that we make are more closely tied to the thing that we're interested in.

And then controlling extraneous sources of variability. So for example, in our study of looking at integrated approaches to family literacy versus parent-centered approaches, it's entirely possible that one of the things that might affect the outcome of that treatment, the distinction between those treatments, is whether or not the family is a single-parent family or a multi-parent family. And so we might want to consider whether or not we should study single-parent families, multi-parent families or both because that may affect the outcome of the treatment. So in controlling extraneous sources of variability, we would consider then family structure as an extraneous source of variability here. And in controlling it, we have several options. One is we can attempt to homogenize the study with respect to that factor, in other words, we could only study single-parent families. Maybe that's the way to approach this problem for us is because we have a lot of-we're looking at the centers that we have access to, we have a whole lot of single-parent families. So maybe we'll position our study in terms of looking at this specifically in single-parent families.

Or perhaps the second approach, instead of homogenizing the study with respect to this factor, we can actually measure this factor and then control it in the analysis and use it as a variable that attempts to explain some of the treatment effect and why that treatment effect may not be the same in different kinds of families. So our two approaches are to either control it by homogenizing the study with respect to that factor or measuring it and sampling enough units of each type that we can actually assess the effect of that extraneous variable in the analysis of our study.

Well, when you think about experimental design from a sort of a traditional standpoint, you usually think about several different things. In particular, we'll think about who it is that we're going to study, the conditions that we're going to study them under, and how they're going to become assigned to those conditions. And if you read this overhead, the means by which the units of observation come to be observed sounds kind of jargon-ey. You probably think it's written by a statistician instead of someone who understands the English language. It's convoluted-it's not really convoluted. It's complicated because when you're studying variables or your studying things like literacy, the units that you want to study are not necessarily people. They may be institutions. They may not be the students, it may be the teachers. It may be the families, it may be school or classrooms. So I've written this in a way to indicated that one of the things that you need to think about is, first of all, what are the units of observation and are there multiple units of observation. Am I interested in the children and the parents and the family, which might be different than the observations you would make on the students or the parents separately. Am I interested in the people that are delivering the interventions or the instruction. Am I interested in the centers that are responsible for delivering the project, the intervention.

All of those are different kinds of units of observation and you need to decide what are the units that are most central to your aims. Now typically when we're doing a study, we might be interested in multiple units. For example, in making our evaluations of our literacy trainer's delivery of literacy intervention, we might measure that effect by looking at how the students that are working with that literacy trainer benefit from that intervention. But we might also make observations on the person doing the training, as well. The conditions that we intend to observe, obviously that's the most critical thing in terms of the specific aims of our study. What is it that we're trying to evaluate, what is it that we're trying to understand from a theoretical perspective?

But we also have to decide how it is that the units, that is the things that we're trying to observe, are going to get paired up with the conditions that are important to our specific aims. And there's a number of ways in which that might happen. First of all, we might decide that we want to observe the same people, the same units in multiple conditions. That is, I want to know how something works and the way I'm going to evaluate this is by giving treatment A and then giving treatment B to the same units. Or it might be that I need to give one treatment to one group of individuals and another treatment to another group of individuals, or one approach to instruction to one group and another approach to instruction to another group. It may not make sense in the context of the things that I'm interested in trying to study. For example, an integrated approach versus a parent-centered approach, it may not make sense to try to do that with the same units, to give them a parent-centered approach first and then an integrated approach second. That may or may not make sense. So if you wanted to study the difference between an integrated approach and a parent-centered approach, you might want to study some people under a parent-centered approach and different people under the integrated approach.

And we also have to decide not just am I going to study these conditions on the same individuals or the same units, or am I going to study different units and different conditions, but how am I going to pair them up? When I think about connecting up the individual units with the conditions, am I going to do it randomly so I'll sort of put the names in a hat or put the units in a hat and put the treatments in a hat and just kind of pair them up, or is it going to be through some non-random process like self-selection, allowing the units to decide what they'll do. And what kind of concerns does using a non-random study do to my ability to know and understand what it is that I'm finding. And we're going to talk a lot about that as we go on, so I'm not going to talk more about that now. The main thing to understand here is just that there are different ways in which the units get paired with the conditions. Different approaches have different strengths and weaknesses. And one of the things that you have to do in thinking about your study is what's the best approach for me to take, given my resources, given my specific aims, given the kinds of things that I can do, what does it make sense for me to do here in terms of pairing up these units with these conditions.

I still don't know my time.

Peggy:
You're doing fine [unintelligible]

D. Francis:
Well I know that but I have a lot of slides.

Peggy:
40 to 45 minutes.

D. Francis:
I'm going to jump over those and just come back to this point of when we think about sort of traditional view of experimental design, we usually think of who, under what conditions and how am I going to make the assignments to the conditions. And typically when you take an experimental design class, that's about what people will talk about and then they'll talk about different ways of making these assignments and how do you analyze the data from those assignments but that's about the extent of it.

And I think that doesn't help you very much when it comes to writing your grants. Well it helps you a little but there are other things that you could think about that would help you more.

We'll still want to ask questions like, who will be observed but we're going to also want to think about what is it that will measure and how often will we measure it. So what needs to be assessed and when does it need to be assessed. When does it need to be assessed and how frequently does it need to be assessed.

Again, what are the conditions of observation and how do I pair those up. So it's not really all that different and it may seem somewhat to you like a trivial distinction to sort of add these other components but I would like to make the argument that if you can think a little bit more expansively as you are designing your studies and writing up your grant applications, that in fact your applications will be stronger and will hang together better. And in particular if, as you're going through these questions, you stay focused on your specific aims and how the answers to each of these questions gets dictated on the basis of your specific aims, your grants will hang together much better.

So I want to give you a couple of references because what we're going to talk about next are some criteria for evaluating studies and these are some specific references that you might want to look at in that regard, and in particular sort of a tried and true reference to experimental design, especially in educational settings is Campbell and Stanley. This is a fairly old reference at this point but it is still, I think, an excellent reference and one for people-I still use it when I teach experimental design. I still make my students read it because I think it's a very readable introduction, thinking about how to evaluate studies. Cook and Campbell is an elaboration on Campbell and Stanley that focused more on quasi-experimental studies. It is more technically difficult to read but it is still an excellent resource. And it focused, as I said, exclusively on studies where randomization is not possible. The book by David Cropvool [sp], and my apologies to Dr. Cropvool if I'm not pronouncing his name correctly, is really an elaboration both the Cook and Campbell and Campbell and Stanley books. And he attempts to update the language somewhat as well as to elaborate somewhat on some of their concepts. So it recommend it to you as well. And then the last book by Rosenbaum is specifically about observational studies. And it is a more recent reference and talks about some statistical approaches that are more recent for dealing with studies where you don't have the luxury and an assignment.

Well, in order to sort of do this, I want to talk a little bit-let's just do this, meaning to talk about how to evaluate studies. I need to sort of set this up in the sense that I need to make a distinction between science knowledge and other forms of knowledge because we all have a sense sort of, a common sense sense of what we mean by knowledge. But sometimes that doesn't necessarily coincide with that we might mean by scientific knowledge.

Knowledge results when we have a consensus of opinion about something and in science, the consensus gets formed around a set of rules that we agree, within a science, to accept as the way in which we'll form a consensus. Each particular thing that we might agree is a piece of knowledge we might call a knowledge claim. And we sort of hold those knowledge claims in an area as acceptable until we have evidence to suggest that there's some reason to believe that this may not be the case. So we never really confirm knowledge claims in science, we can only dis-confirm them. So we sort of hold them as acceptable until we have evidence to suggest otherwise. The knowledge base in an area is built up out of the set of things that we would consider to be knowledge claims.

So each of these claims, and as a result, the base of knowledge is built up out of the claims that result from individual studies. And it's important to understand that individual studies do not, by themselves, that is a single individual study does not by itself establish a knowledge claim nor does it establish a knowledge base. Those knowledge claims and the knowledge base in an area are a result of a collection of studies in an area. And those studies will frequently not be in complete agreement with one another. But they will point in a certain direction until the preponderance of evidence is strong enough to suggest that a certain claim can be accepted, that there is a consensus around a particular claim.

So the goal or the value of any particular study is the degree to which it can reduce uncertainty about a particular claim with the understanding that no study can reduce that uncertainty from 100 percent to zero. If you think about everything that we might think we know in science, it starts out with an uncertainty of 100 percent. And the goal is to get that uncertainty down to a low enough level that virtually everyone agreed that that's the case, that we can say that this is-that we know this.

For example, I suspect if I ask people to raise their hands with respect to who believes that smoking causes lung cancer in humans, most of us would raise our hands. But there is no single study that you can point to that establishes that as a fact but there are a collection of studies, both treatment studies and cohort studies and a variety of different kinds of studies, there are experimental studies in animals. And there is a body of evidence that, when we look at it, we say, "You know, the bulk of the evidence here really suggests that smoking causes-cigarette smoking causes lung cancer in humans." And there aren't very may people that would disagree with that claim today. Now some people might want to qualify it and say, "Well it's not really the tobacco smoking but it's the additives in the tobacco that are responsible for the carcinogenic effects of smoking in humans as it relates to lung cancer." So that's a different issue. So you see how no single study can establish a claim and the claims get established on the basis of a series of studies and the value of any particular study is the degree to which that particular study reduces uncertainty about a phenomenon.

That implies that we should have some criteria for being able to evaluate these knowledge claims, that is these things that we say we will accept as facts. And the criteria are especially important when we are thinking about how to reduce uncertainty or arrive at consensus in areas where there might only be a partial consensus. So how can we move from having a partial consensus to a stronger consensus?

Well, the criteria for evaluating individual knowledge claims are, in fact, the same criteria that we use to evaluate studies. And I'm going to talk about them from the standpoint of Campbell and Stanley. I've thrown up some additional language that refers to the other texts that have talked about these things but I think the Campbell and Stanley language is the easiest to keep track of and it will really sort of give you an entrée into the entire literature in this area. So I'm going to refer to the Campbell and Stanley language.

And in the rest of what remains of this talk, we're really only going to focus on the first three of these. I'm not going to talk a lot about the last one, maybe a little bit. But in particular, we'll think about internal validity, we'll think about external validity, statistical conclusion validity and construct validity of both cause and effect.

Internal validity is the extent to which we can make a causal inference on the basis of a particular study. So internal validity deals explicitly with our ability to infer a cause/effect relationship between the thing we're manipulating or the thing we're studying and the outcomes that we're looking at. So in a case of looking at integrated versus parent-centered family literacy interventions and its effect on family literacy, the degree to which one of those causes effects on family effects on family literacy that are different from the other, that's a question about internal validity. We're thinking cause/effect and we're thinking about the strength of our study from a cause/effect point of view, we're talking about the internal validity of the study. When we think about the body of evidence in an area from a cause/effect point of view, we think in terms of the collection of studies and the internal validity of that collection of studies.

There are a number of threats that arise to the internal validity of a study. To say that something is a threat or that it's a potential threat does not mean that it is in fact occurring in a particular study, that it has happened and that consequently, the treatment did not cause the effect. To say that something is a potential threat to the internal validity of a study means that it may have been operating and it is incumbent on you as an investigator to do what you can to evaluate the likelihood that these threats are, in fact, operational or non-operational in your particular study. So I'm going to talk a little bit about what some of these threats are. I'm not going to go into a lot of detail but just to give you sort of a sense of this: history effects are things that are happening that are external to the treatment and external to the organisms - organisms, god that sounds so bad - the units that you're trying to study that could have resulted in the cause/effect relationship. So for example, things that might be happening in society that could result in a cause/effect-that could result in a change in the outcome that is separate from the treatment.

So for example, if I have a study that I've done where I'm looking only at one group, I'm looking at integrated family literacy, that's the only thing I've done and I've got a group of parents and I assess those parents their families on some literacy outcomes before the treatment. I then put them in a family literacy program that's an integrated family literacy and I look at their outcomes three months later, and I say, "Well look, there's a difference, therefore it's the family literacy program that caused that difference." Well, the fact is there are many other things that could have been operating to cause that difference between the scores that we saw before the intervention and the scores we saw after the intervention.

History effects would be things happening external to the people. So for example, maybe this community started an alternative family literacy program or a new television show focused on literacy and most of the parents or many of the parents in your program started watching that. And that program had a strong effect on literacy. You don't know about it so there's some event external to your study that, in fact, is responsible for this change that you're seeing.

Maturation effects are effects that are internal to the units. So for example, units get older, they get tired, they get bored, maybe they get energetic and they get excited and that causes them to then engage in more literacy practices, which then are ultimately responsible for the difference between the pre-test and the post-test. So maturation effects are effects that are internal to the units that you're trying to study, as opposed to history effects which are external.

Testing effects are maybe the fact that I have them the pre-test changes their results on the post-test. So the fact that I'm engaged in assessment causes assessments to improve. That's a possibility.

Instrumentation effects are different from testing effects in that instrumentation effects refer to changes in the instrument being responsible for the change that you see. So for example, being the good experimenter that I am, I give form A at the pre-test of a test and I give form B at the post-test of the test. Unbeknown to me, form B is easier than form A so I get a big improvement on the post-test but it's not because people know more, it's because I've given them an easier test. That's an instrumentation effect.

One place where this occurs where people don't think about it occurring is when you're doing observations and making judgments on the basis of human observation. So for example, we're observing parents in a literacy setting and we're making judgments about their literacy practices in those two settings, the observers may become more sensitized to certain kinds of behavior over time so that they get better at seeing certain kinds of things and therefore, we see more of that later in the study than we saw earlier in the study. Or they become desensitized to certain kinds of things or their criteria for evaluating that yes, something happened changes with time. Those are instrumentation effects because here the humans making the observations are, in fact, your instruments. So you as an experimenter have to build into your study mechanisms for safeguarding that these kinds of instrumentation effects are not what's responsible for the outcomes that you're seeing.

External validity concerns the degree to which the results that you're finding are going to generalize beyond the specifics of your study. So when I find something in a particular study, to what extent will what I found actually characterize what would be found in another study. If I did this study in a different setting, would I get the same result? If I used a different kind of outcome measure, would I get the same result? If I used a different form of this outcome measure, would I get the same result? What if I access a slightly different population, for example there are different kinds of family literacy centers. Suppose I did my study in one particular type of family literacy center like neighborhood centers, would I get a different result if it focused on church-based centers? So external validity concerns the degree to which your findings will generalize beyond the specific situations that you're studying in your particular study. You can think about generalization across people, across settings, across time, across measurements.

Rather than talk about threats to external validity, what I'd like to do is sort of talk about the tradeoff between internal validity and external validity because these are two forces, if you will, that sort of tend to work in opposition to one another. They are both things that we would like our studies to have. We would love for our studies to have strong internal validity to give us really important information about cause/effect relationships and at the same time, provide us information that we know is going to generalize across settings and across people and across time and across measures. And in many cases, the external validity of a study is largely due-is partly due to the phenomena that is being studied. If I was studying basis physiological processes of visual perception, I would not find large differences across settings and across time and across different kinds of people. But when I'm talking about trying to study family literacy outcomes as a result of family literacy interventions, we can expect that that kind of phenomenon is not going to generalize to the same extent that something like studying basic physiological processes is going to generalize.

So when you go about the process of designing your study and carrying it out, you're going to find that you have these two sort of opposing forces: the desire to have generalizability, which in some ways means having lots of different kinds of people, lots of different kinds of settings, lots of different kinds of measures, things that are sort of uncontrolled is at odds with the desire to have strong internal validity, which requires that you have a lot of control over the settings, a lot of control over the individuals, a lot of control over the delivery of the treatment.

So what happens is we find ourselves having to sort of balance out the desire for strong internal validity with the desire for strong external validity and all of our studies, all of our studies, are a compromise or a series of compromises between the desire for strong internal validity and the desire for strong external validity. And how do you decide where to come down on this compromise? It's your specific aims. Your aims tell you which of these two forces deserves the most attention. Not hopefully your aims are predicated on a clear understanding of the literature and the problem so that the problem is also dictating what your study should be accomplishing and therefore, where your energy should be focused. The field should be telling you the most important question to be addressed for this particular problem is internal validity or hey, we know these interventions work, the real critical question for these interventions is for whom do they work the best, under what conditions do they work the best, OK. So the field will dictate to you to a certain extent where this balance should come down but it should show up in your specific aims.

So why do we want to do randomized studies? Well, randomization is, in fact, the key to strong internal validity. Randomization doesn't rule out all of the threats to internal validity but it rules out all of the sort of straightforward, the operating threats to internal validity. So when you want to design a study that will be strong in internal validity, you start with trying to use randomization. It's the single step that you can take at the outset that will help you with respect to internal validity.

So why not use randomization? When we have studies that are studies that don't involve randomization, we call these quasi-experiments. They're quasi-experiments because they're not "true" experiments. A true experiment implies that we have controlled the assignment of units to conditions in some random way so that chance has entered into our study in a controlled way, that is we can assess the effects of chance. But when we're not able to assign units randomly to the conditions that they'll be observed in, which is quite frequently the case in studying family literacy, in studying education in general.

There are a number of barriers to the use of random assignment in these settings. So we need to be able to design good quasi-experiments. What are some of the barriers to random assignment? Well, for example, if I am interested in doing an integrated family literacy approach versus a parent-centered family literacy approach, it would be very hard for me to do that and ask a particular trainer or teacher to use the family-centered approach with certain clients and the parent-centered approach with other clients. Once the teacher gets a sense that, hey, this approach seems to work better, guess what's going to happen in the other approach? Your study is going to fall apart. So randomization of families to teachers is probably not going to work, probably also not going to work because if the teacher is working a certain schedule and the parents got a different schedule then those two can't hook up. So simple randomization wouldn't work there.

But you might be able to randomize centers. You might be able to run family-centered approaches in one kind of center, you know in one group of family literacy centers, and parent-centered approaches in another group of family literacy centers. Or you might be able to have some teachers within that center using family center-is that what I called it - integrated family approaches versus parent-centered approaches, so that maybe half the teachers in each center are doing one or the other, you still have some concerns there about teachers talking with one another. So that's a little bit easier to manage than trying to get the same person to deliver one treatment to one group and another treatment to a different group.

So there are a number of reasons why randomization or at least simple randomization may not work. Sometimes it's not ethical. There are certain conditions under which it is not ethical to randomize. We can't randomize people to be smokers or non-smokers. Sometimes we're interested in studying intact groups. We're interested in whether or not family-centered literacy practices delivered to Latino families have the same effect with Latino families as they do with African-American families or with Caucasian families. We can't randomly assign people to be in one of those three ethnic groups. And we're interested in actually studying intact groups and the degree to which our approaches are equally useful across those three different kinds of families. So some variables that we're interested in studying simply don't lend themselves to random assignment. So when that's the case, we have to think about quasi-experiments and we have to design those quasi-experiments in a way that allows us to make as strong an inference as possible about internal validity. But it will always be weaker than the kind of inference that we could have made about internal validity if we would have had the luxury of random assignment, even though it may be stronger with respect to external validity.

It's important to understand, once you find yourself in the realm of having to design a quasi-experiment, to understand that not all quasi-experiments are created equal. Different quasi-experiments, different designs for quasi-experiments have certain strengths and weaknesses. And in particular, when you deal in the realm of quasi experiments, it is important to have multiple time points for observation, to have multiple observations, multiple groups.

For example, a quasi-experiment with a single group will be so seriously threatened in terms of internal validity as to not make the study worth doing. That first study I described to you where we had one group, we test them at the pre-test, we do family integrated literacy practices and then we measure them at the outcome, there are so many reasons that that difference between that pre-test and the post-test could be due to anything other than the treatment, that that study is not worth doing. You have to have multiple groups. That's what "K" refers to is groups, multiple groups. If I just had multiple groups and I only assessed them after the literacy practices had been delivered, I wouldn't know if they weren't different before. So in a quasi-experiment, you have to have multiple time points of observation.

You have to assess before the intervention and then you have to assess after the intervention. And you need to have multiple measures. "P" refers to measures. You need to measure a large number of things. In general, you want to measure things that you would expect to be related to the treatment, things that should change as a consequence of the treatment, things that should change as a consequence of the changes that happen because of the treatment. For example, if I had a literacy practice that I'm going to engage in where I'm going to help parents develop decoding skills, the ability to recognize words that are printed on a page, I would expect that my intervention is strongly emphasizes decoding skills. I would expect the largest effect of my intervention on decoding skills. But I would also expect - and the only reason for doing a study like this is because I expect that by improving decoding skills I should also improve comprehension, the ability of people to understand what it is that they read, because they have better access to the words on the page. So I should also measure comprehension but I should not expect the same size effect on comprehension as I expect on decoding because there are many other things that also affect comprehension which my treatment has not addressed. So we think of decoding here as a proximal outcome and comprehension as a distal outcome.

And we should measure both proximal and distal outcomes in our quasi-experiments. But it's also helpful to measure things that we expect not to be affected by the treatment. And the reason for that is that if our treatment is what we say it is, then it should affect those things that are related to it and we should not see differences in these groups in things that are unrelated to that. If we see differences in variables that are unrelated to the treatment, it suggests that our treatment groups were different in some way that we didn't know about, OK. So it is a way of bolstering the argument that there is not some generalized difference between these groups, but rather the difference is specific to what we did with them, OK? So we want measures of proximal outcomes, we want measures of distal outcomes and we want measures of things that ought not be different and ought not change as a consequence of the treatment in order to sort of triangulate this inference about internal validity. And again, we haven't proved cause/effect but if our study does these things, then our quasi-experiment will do more to reduce the uncertainty about that cause/effect relationship than if we had not measured both proximal and distal out comes and measured things that are not related to the treatment.

OK, we're in good shape, I think. Statistical conclusion validity refers to does the statistical evidence imply a true relation, not necessarily a cause/effect relation, but just a relation. I like to translate this into have the data been analyzed correctly. And there are a number of reasons why people fail at this particular step. One is that they answer the wrong question. For example, that little what looks like a "P" is actually the Greek letter rho, and it stands for population correlation. So for example, I might be looking at the correlation between decoding skills and comprehension. And I might want to know if that correlation is the same in one group of clients and in another group of clients. Maybe I've designed an intervention in such a way that I should be disrupting that relationship. I should be making decoding skills more important to comprehension in one group and less important to comprehension skills in another group.

So my hypothesis is that the correlation of decoding and comprehension is the same in group 1 as it is in group 2. I don't have a pointer so-that's that, OK. But what I do is I test whether or not the correlation is zero in the first group and then I test whether the correlation is zero in the second group. And let's say I can reject this hypothesis that the correlation is zero in the first group so I find some non-zero correlation in that group. But in this group I get a zero-I don't have enough evidence to say that the correlation is not zero. So I to, "Oh, see the correlation is not the same in these two groups." Not true, I haven't tested that. I tested something very different from that. And it's entirely possible that my test for this correlation being zero would tell me that it's not. This one would tell me the opposite. And in fact, if I tested whether or not these two are equal, I couldn't reject that hypothesis, OK. That would be called an invalid statistical conclusion: trying to infer this hypothesis from separate tests of these two hypotheses, OK.

Sometimes in our zeal to get something from our grant, we will simply test anything and everything that we can think of. We sometimes call these fishing expeditions, and the more tests that we run, the more likely we are to find something. Sometimes we call this pony research. That's an off-color joke that I'll tell you when I'm not on camera. The process of testing many hypotheses will lead to the possibility of rejecting some hypotheses not because, in fact, they are worth of being rejected, but in fact because of statistical artifacts in the study. So we need to take care to make sure that we don't engage in that process.

Sometimes our analyses will ignore very important assumptions about the statistical design and consequently, we'll form invalid conclusions on the basis of our analyses because we haven't taken into account the real operating characteristics or design features of the study. One example that happens frequently in educational research is that I study children in classrooms but when I do the analysis, I analyze the data in terms of the children and not in terms of the classrooms. Children in classrooms are not independent of each other. Think about it. If we were all sitting here and I was trying to talk to you about this information and we had somebody standing in the back of this classroom, jumping up and down and shouting and screaming the whole time and then I did it again in a different environment where that person wasn't here jumping up and down and yelling and screaming, what you got from those two lectures would be quite different. Well, that same thing happens in individual classrooms in schools. So children within a classroom, what they learn or what they acquire from within that classroom is not independent of the other individuals in that classroom. So when I analyze my data, I must take into account the fact that those observations that I make on children in the same classroom are not independent of one another. And if I ignored that assumption, I will tend to have a statistical test that doesn't behave in the way that the data are behaving, OK.

So let's talk a little bit about this expanded view of-I've got about five minutes or so, 10 minutes? This is just a recap of what our expanded view is. Who is going to be assessed? This is clearly relevant from the standpoint of external validity and here, we need to distinguish between who is the target population, that is who are the units that we would really like to study, and who are the units that we actually have access to? My interest might be in teachers in family literacy centers, but what I have access to are teachers in neighborhood family literacy centers in Houston, OK. While that may not be too bad, from the standpoint of external validity, it may in fact be that teachers in family literacy centers in neighborhood centers within the city of Houston are a lot like teachers in family literally centers in neighborhood centers elsewhere in the country. So clearly, you can see sort of the link here between the accessible population that I'm going to be able to work with, and the target, which is the broader population. And my real interest is to what extent does what I find in this accessible population relate to this broader target population.

It clearly concerns selection but it includes a number of other things such as sampling units: am I going to sample teachers, am I going to sample families, am I going to sample kids. What exactly are the units that are being sampled and assigned to these treatment or intervention conditions. When I think about the study in terms of who will be assessed, some assessments are very expensive to collect. Fred is going to talk to you about some stuff that is really costly. That's why you should never do qualitative research, it's very expensive. So the point is different kinds of observations have different costs associated with them. And a well-designed study will take that into account and you may do some kinds of observations on everyone because they're inexpensive to get and others on a subset of people within the study because they're more costly to get. So you'll make some decisions about what assessments to get on everyone and which assessments to get on subsets of individuals in order to elaborate more on the cheap, easy assessments that you can make to get a better sense of what you are missing from those observations by engaging in more expensive observations but not on everyone or not every occasion of measurement, OK. So we think in terms of-what I want to just think in terms of well if I have to get observational data, I have to get it on everybody on all the same times that I might make my cheaper observations.

What is it that is going to be assessed? We already talked about this a little bit but I just want to hit on some of these points again. You don't want to just assess only things that are likely to be affected by the treatment unless we're doing a really tightly controlled experimental study. And I would argue probably even then, you'd want more expansive measurement than that. If you're going to take the time to do a study that's going to take you two to three years to do and is going to cost the government more than I'll make in a lifetime to do, then you probably want to know more from the outcome of that study than just did it affect this one particular measure, OK. Usually getting the subjects and completing the study is the hardest part. So we'll collect more assessments to minimize the cost of the next study and point us in the right direction. Again, we want to look at skills that are not likely to be affected by the treatment, as well as those things that are closely tied to the treatment and those things that are related to the treatment but more distally related.

Do I have specific hypotheses about variables that are likely to mediate the effects of the treatment? So for example, a mediating variable in the family literacy study might be time spend engaged in literacy practices outside the literacy study. I might want to ask parents to provide me with a diary of how much time they spend engaged in literacy practices outside of the center because even though I've delivered good instruction, it may require that people engage in the practice outside of the instructional setting to actually benefit from it. So that might be a mediating variable, that we see different effects that when I look at the amount of time spent, you see different-you see-I want to be careful how I say this - that there are not differences between treatment and control. The difference between the treatment and the control is the same at given levels of the intervention, at different levels of the mediating variable, time. A moderating variable is something that actually changes the effect of the treatment. Maybe the treatment has a big effect at some levels of this other variable and a small effect at other levels. Maybe time here functions more like a moderator and that if people don't engage in literacy practices outside of the center, there's no difference between the two kinds of treatment, and when they do engage, there are large effects of the treatment. And the effect is actually graded in relationship to the amount of time. The more time they spend, the bigger the treatment effect, OK. That would be an example of a moderating relationship. And you need to ask yourself, are there specific variables related to your aims that are likely to either mediate or moderate the treatment and if so, you should measure them and analyze for those effects.

What kind of assessment do you need? Do you need a norm reference test? Do you want a criteria reference test? Criteria reference test would be something that measures specific aspects of the behavior that you're interested in and can tell you whether or not people have mastered specific aspects of the behavior. Do you need something that is what we'd call a growth measure, that is something that will be sensitive to change over time that you can measure frequently and that you can monitor growth on.

Do you need individually administered assessments, can you use group administered assessments. Do you need to make observations? If you have a treatment study, what kind of observations do you need to make of the treatments to show that in fact the treatments were delivered in the way you intended so that you can know that what was done was what you intended to be done. Again, how you answer these questions is going to depend on what your specific aims are.

When and how often to you assess? This is really critical when you're thinking about literacy studies or studies in general in education. If you don't control the timing of assessments, then you will get differences between your groups simply on the basis of the fact that you measured them at different points in time. If I assess children three months into first grade versus five months into first grade versus six months into first grade, I will not see the same behaviors, at least not in the same quantities or at the same level of skill. You need to take into account when and how often to assess.

I'm not going to talk about this because we've covered it. And since it's like time to finish, I'll finish. So when you're thinking about writing your grants, it is really important that the specific aims of your proposal, and by the way, you guys have not lived up to your end of the bargain, there has not been a single question. When you are writing your grant, it is really important to keep in mind your specific aims. The specific aims are literally the most important part of the grant that you write. It's not the background that's significant, it's not the work you've done in advance to figure out that this is why the study needs to be done or the data that you have to point you in the direction for the study. It's not the design and it's not the methods and it's not the analysis. It's the aims. Everything starts with the aims. If it's not worth doing, it's certainly not worth spending a lot of money to do it. But even more than that, it's the degree to which you can take your specific aims and relate them to the background and significance. Why does the background literature support these particular aims? Why do the studies that have already been done say that these are the aims that should be investigated in this grant. How did the aims tell me how to design the study, which subjects to use, what kind of measures to assess, how often to assess.

All of those things should be predicated on the basis of what it is that you are attempting to accomplish in this grant. And the reason for letting the aims filter through the grant and not just be isolated in the front part of the grant is that when reviewers sit around the table and three people have read your grant and 30 people are discussing your grant, the ability of the three people to communicate to the other 27 what it is you're trying to do, why it is you're trying to do it, why it is important and how it is you're going to accomplish it, the extent to which they can keep clear in their mind your aims and how your aims relate to all the other aspects of your grant, it's going to make it easier for them to communicate that to everyone else and for everyone else to understand the importance of what it is you're trying to do.

So I'm going to stop there but I do want to read this quote from Dr. Tookey [sp], who is a great data analyst and is the person who really started people thinking about trying to model data. And it is really critical that you keep this in mind as you write your grant. It is better to have an approximate answer to the right question than an exact answer to the wrong question, OK. I'll stop there.

Peggy:
Great. I see that we have a question, so we have 15 minutes for questions and answers and I understand there is a mechanism for the people who are logged in on the web, and there evidently are quite a few, [crosstalk] you have to take questions from people you can't see. There is an email address, I guess, so that you can email your questions and they'll bring them up to us. Meanwhile, we have a question in the audience. Go ahead.

Participant:
Yes, I'm interested in the question of the discourse of the grant. The reason for my question is that, as you may know, there is a gap in the field of adult literacy and family literacy between those who are most immersed on the ground in the field with practice and researchers. And my specific question is, to what extent do the proposals need to reflect the discourse of the presentation that you just made, as opposed to the content of the presentation that you just made? Does that make sense to you?

D. Francis:
Not to me as a reductionist, but-

Participant:
But how explicitly does the discourse of the field of experimental research need to be reflected in the actual language of the grant? So if you have a design that is a good design but which doesn't use the terminology that you've been using today, is that a disadvantage?

D. Francis:
Let me talk about that because I do think I understand what you're getting at. And I think after you listen to Fred's presentation, you'll understand that it's not that we're looking for, or that the - I'm not looking for anything - but that Alex and Peggy are looking for specific kinds of studies, the important thing is that what it is that you're doing and how your study is designed and laid out is predicated on the basis of what it is you're intending to accomplish. And where you'll run into problems is that what you're saying you're trying to do is to study this as a cause of that, and what you've done is designed a passive observational study where you're going to be going in and just interviewing people and trying to look for connections between things on the basis of a broad set of interviews with a small set of people. That's a design that's disconnected from the broader role of trying to study cause/effect.

If, on the other hand, your goal is to understand sort of the breadth and to provide a rich and accurate description of this phenomenon, then designing and experiment would be disconnected from what it is you're trying to accomplish.

So the critical thing is you create the connection between what it is you're trying to do and how it is you're going to do it. And that connection is entirely up to you. Now, the question that you're tying to answer, the importance of that question is predicated on the basis of where we're at in the field. So if we're at a place, if people feel that we're at a place in the field where we're beyond rich description, but what you're proposing to do is rich description, your grant will not fare well.

If, on the other hand, if you're proposing a strong experiment in an area where we don't even know what the phenomenon is, your grant is not going to fare very well. So it's incumbent on you to understand the field and what needs to be done and then, given what needs to be done, connecting that up with how to do it, OK. Does that answer your question?

Peggy:
When you have a question, come to the mic because we want everyone, including people logged in on the web, to be able to hear your question. And if you see that we're winding down on an answer and you have a question, go ahead and start your way to the mic.

D. Francis:
Before you ask a question, let me say one more thing to the last questioner. If your concern is that you just use the same language, I don't think that's the problem, I don't think that's going to get you in trouble. It's the principles that need to be there in the grant. For example, thinking in terms of-thinking in terms of internal validity and external validity, whether you call it internal validity and external validity is not the critical factor, it's that you're thinking about-you might talk about generalizability or the degree to which findings are not restricted to this particular environment or setting. There are lots of different ways to describe these phenomena. You're not going to be held accountable for the language, per se, but more the concepts of what's implied there.

Participant:
My question pertains to what is going to be assessed, in looking at the difference between the mediating and the moderating factors, what determines it? Is it basically the extent to which you get a significant output in the end or do you need to define these as we design the experiment?

D. Francis:
Well, there's a fundamental distinction between mediation and moderation. And moderation implies that the effect of something, the effect of one thing on another thing changes at different levels of a third thing. And how we evaluate that is, there's a very specific approach to doing that kind of thing statistically. We're actually looking for interactions, OK. Whereas a mediating relationship that if I control for this third thing, there's not relationship between the first thing and the second thing. And there's a very specific way to approach that analysis; in particular, we would relate the first to the second and then we would introduce the third. And what we would expect to find is that the relationship between the first and the second diminishes as we introduce the third. So there are very specific statistical approaches to answering those questions.

Participant:
[unintelligible]

Peggy:
Any other questions?

Participant:
You all want a break that badly?

D. Francis:
Yeah, that's right, I'm keeping them from the bathroom.

Peggy:
OK, then given that we're going to break a few minutes early, I expect nobody to be late. Try to be in your seat by 11, I mean like really ready to listen because we're going to be here at a minute before 11 ready to get Fred on camera. Thanks.

Return to LINCS Media Center