Assessing psychology students ’ difficulties in elementary variance analysis

In this paper we present a study where we assess the difficulties in understanding some elements of variance analysis using a questionnaire administered to a sample of 224 Psychology students after taking a data analysis course. We analyse the selection of a variance analysis model, the understanding of assumptions and of the associated linear model, the computations involved and the interpretation of results. These results provide information in an area where little prior research is available.


Introduction
Statistical inference plays a prominent role in human sciences, including psychology, since most research in these areas is based on generalising the findings in data collected from samples to population.Few social science researchers can do their work effectively today without reference to empirical information and statistics provides a set of tools to manage, organise, describe and interpret this information.
In spite of the relevance of a proper data analysis to sustain scientific advancement, the use and interpretation of statistics in social sciences are not always appropriate, as shown in diverse review papers (e.g.BATANERO;DIAZ, 2006;HARLOW;MULAIK;STEIGER, 1997).These papers criticise researchers' excessive confidence in statistical significance and misinterpretation of statistical inference results.These bad practices concern particularly hypotheses tests and lead to a paradoxical situation, where, on one hand, a significant result is required to get a paper published in many journals and, on the other hand, significant results are misinterpreted in these publications (FALK;GREENBAUM, 1995;LECOUTRE;LECOUTRE, 2001).
Misconceptions and misinterpretations of statistical inference have also been found in many studies with university students (e.g.CASTRO-SOTOS at al. 2007;HARRADINE;BATANERO;ROSSMAN, 2011;KRAUSS;WASSNER, 2002;VALLECILLOS 1994).Most of these studies focus on understanding the level of significance, a concept which is defined as the probability of rejecting a null hypothesis, given that it is true.The most common misinterpretation of this concept consists of switching the two terms in the conditional probability; i.e., interpreting the level of significance as the probability that the null hypothesis is true, once the decision to reject it has been taken.There is also confusion between the roles of the null and alternative hypotheses as well as between the statistical alternative hypothesis and the research hypothesis (CHOW, 1996).Other studies were centred on the interpretation of confidence intervals and show that, even when students master the computations, often misinterpret the meaning of a confidence interval (see CUMMING;WILLIAMS;FIDLER, 2004).
In this paper we are interested in students' understanding of variance analysis, a method frequently used in social sciences and which has received little attention from statistics educators.Although understanding the meaning of significance level and confidence intervals is needed to correctly apply variance analysis, there are however other points needed for a correct understanding and application of this method.Using responses from a sample of Psychology students after studying the topic, we analyse their understanding of the following points: selection of a particular variance analysis model; assumptions needed to apply the method; understanding of the linear model associated to a specific variance analysis model, the computations involved, and the interpretation of results.In the next sections we describe the research background and method, discuss the results, and conclude with some implications for improving the teaching of variance analysis in the social sciences.

Theoretical Framework
In our research we use some ideas from the onto-semiotic approach to mathematics education (GODINO; BATANERO;FONT, 2007).In this framework, mathematical knowledge has a socio-epistemic dimension, since it is linked to the activity in which the subject is involved and depends on the institutional and social context in which it is embedded.Mathematical activity is oriented towards solving a problem, and described in terms of practices or sequences of actions, and is regulated by rules institutionally established, (DRIJVERS et al., 2013).The authors distinguish between institutional and personal meanings for a mathematical object; in both cases meaning is linked to the mathematical practices carried out by somebody (a person or an institution) to solve specific mathematical problems.Around these mathematical practices different rules (concepts, propositions, procedures) emerge; these rules are supported by mathematics language (terms and expressions, symbols, graphs, etc.), which, in turn is regulated by the rules.All these objects are linked to arguments that serve to communicate the problem solution properties and procedures, and to validate and generalize them to other contexts and problems (GODINO, 2002).
For the specific case of variance analysis the set of meanings carried out in order to solve a related problem include, identifying the particular variance analysis model to be applied to face a problem; understanding and checking the assumptions needed to apply the method; understanding the linear model associated to a specific variance analysis model, being able to carry out the computations involved, and interpreting the results of the analysis.At the same time the students should master the verbal and symbolic language needed to express all these mathematical practices.We are interested in the personal meaning of variance analysis achieved by the students (that can be assessed through their responses to the questionnaire built for this research) and in comparing this personal meaning with the institutional meaning of variance analysis in statistics.

Previous Research
The second element to support our study is previous research on the topic, which has only focused in the understanding of three concepts linked to variance analysis: the difference between dependent and independent variables; the role of randomisation in the interpretation of results, and interaction.
Among the few studies related to the teaching and learning of variance analysis Rubin and Rosebery (1990) implemented a teaching experiment aimed at studying the difficulties in interpreting some basic ideas of variance analysis in a context of experimental design.Their results suggest confusion in distinguishing between independent, dependent and extraneous variables; this distinction is essential to differentiate the response variable from the factors when performing a variance analysis.Another difficult point in this study understood the role of randomization in balancing individual differences, which is an important assumption of variance analysis.
The concept of interaction is important in multi factor variance analysis, since the presence of interaction should be taken into account when evaluating the effect of a factor for the different levels of another factor; for this reason, misinterpretation of interaction often leads to incorrect conclusions.However, Rosnow and Rosenthal (1991) indicated that interaction is the most universally misunderstood result in the field of psychology, and this has been reflected in a series of studies that examine results of variance analysis published in scientific journals.In an empirical study of articles published in prestigious journals, Zukerman, Hodgins, Zuckerman, and Rosenthal (1993) found that approximately one third of these papers failed to correctly interpreting interaction.Umesh, Peterson, McCann-Nelson, and Vaidyanathan (1996) found that 75% of papers in several social science journals contained interaction-related errors.Pardo, Garrido, Ruiz and San Martin (2007) in another study of papers published along 5 years found that only 13 of them analysed and interpreted the interaction.A common error (79.1% of the reviewed papers) was to analyse and interpret separately each factor with no consideration of interaction.Green (2007) also found difficulties in interpreting the concept of interaction in variance analysis in university students.
Even when the above studies provide a guide for the lecturer, there are numerous elements involved in variance analysis and few studies have directly monitored what students actually learn after a regular course at the University.To fill this gap and complete previous research we present the current study, where specifically we consider the selection of model, understanding the model assumptions and verification thereof, understanding how variance analysis tables are obtained as well as interpreting their results.In this regard, our work provides new results in this incipient field.Below we describe the methodology employed and discuss the results.

Method
The sample consisted of a 224 undergraduates in the second year of the Bachelor of Psychology program at the University of Huelva, Spain.
They completed the questionnaire after finishing two data analysis courses (one year long each).The research has been carried out in collaboration with the lecturer who monitored the teaching of these students' along two--years in the data analysis courses.He provided information on the courses content and helped developing the questionnaire.The first course included descriptive statistics (data distribution and representation; measures of centre and spread) and probability (simple, compound and conditional probability; random variable, binomial and normal distribution.In the second year the course content was statistical inference (confidence intervals, statistical tests and variance analysis).In both courses the students performed practices of data analysis with the SPSS software using real data taken from different experiments or provided by the lecturer.
The questionnaire was given to the students as a part of their final assessment, to assure they had studied the topic.The type of items selected for the questionnaire were familiar to the students, as we used the typical format (multiple options) these students used in all their examinations and all the students in the sample had taken similar tests in other topics and in the first year in the data analysis course.
The questionnaire comprised eight multiple-choice items (see Appendix).The items were selected after examining the content of variance analysis taught to these students along the course.Other statistical lecturers and some experts in statistics education research helped to select the particular items and to fix the wording of each item.All the items were first tried with a pilot sample of 93 students, and the difficulty and discrimination indexes as well as the whole reliability of the instrument (Crombach's Alpha = 0,789) were found to be satisfactory for the research.
In Table 1 we summarise the content assessed in the different items.This content includes the main components of knowledge related to variance analysis: a) Selection of a variance analysis model with identification of the situations where this model is applied, and understanding the number of variables and levels in a specific model; b) Understanding the variance analysis model, its assumptions and the decomposition of the variance; c) Understanding the computations in the table of variance analysis: F statistics, mean square and sum of squares; d) Interpreting results from a variance analysis: both from an analysis of variance table and from computer outputs.The questionnaire was completed by the students as part of the final evaluation of the course; consequently their responses computed in the final score of these students (the final evaluation also contained more item related to other topics as well as some open-ended problems).An incorrect answer in an item was given a negative score, and therefore students only replied to the items were they were confident in their response.This rule of the didactic contract common for other subjects, explain the fact that the percentage of no responses is high in some items.

Results and discussion
Once the students' responses were collected we carried out the data analysis.In Table 2 we present the percentages of correct responses in each item, with the correct option marked.It is clear from this table that the difficulty of the different items was not homogeneous, and that a part of the students did not achieve a full understanding of some components of variance analysis, in spite that they prepared for the examination.The percentage of non-responses also varied among the items; as we have previously suggested, since students were penalised by giving incorrect responses, in case of being uncertain about their responses they did not reply the items.The examination of these percentages of non-responses suggests a different confidence of students in their knowledge of different contents.
Finally we remark that all distractors were selected by some students; this is an indication that the distractors were well selected (we tried to reflect potential errors of the students in these distractors).To provide a synthesis of results and compare the different items in Table 3 we present the difficulty index and 95% confidence intervals.Items have been re-ordered from the easiest to the most difficult and a brief sentence describing the item content has been added.From this table we see that the most difficult contents were interpreting results from a variance analysis either from a table or from a computer output, as well as recognising the variance analysis assumptions.On the other hand the easiest part was understanding the computations involves and the meaning of different statistics (except by the F statistics, of medium difficulty); the identification of the model used from a computer output and understanding the decomposition of variance was also easy.In the next sections we comment these results with more detail.

Selecting a variance analysis model
In the first item we assess the students' understanding of basic elements of elementary variance analysis, and the difference between the situations where a variance analysis model and the t-test should be used.Since in the problem proposed the teacher randomly divided the population into three groups and applied a different technique to each group the correct answer is c).Choosing either option (a) or (b) would indicate that students confuse the variance analysis model and the two-sample t test.
About half the students chose the correct answer, which suggest they have competence in interpreting the situation described in the item and in choosing the most suitable model to solve problem.23.7% of the sample chose option (a); although they were able to differentiate in the problem context the idea of independent and related samples, a point considered difficult by Rubin and Rosebery (1990), they did not choose the appropriate procedure, since the t-test is only valid for comparing two.With data in the problem coming from three independent samples, and when the goal is to test is some of these samples differ, a variance analysis model should immediately come to mind.Another 7.6% of students incorrectly selected the t test, and furthermore confused independent and related samples.Finally about 18% provide no response.
In the second item we investigate whether the student understands the difference between factor and level of a factor and discriminate the situations where a researcher should apply a two-way variance analysis.The correct answer is (b).By choosing option (a), the student confuses the concepts of factor and level.Moreover in the situation described in option (a) the adequate method is the paired samples t test.Option (c) indicates an inability to distinguish between the two-factor variance analysis model and a multivariate analysis with just one factor.
This item is also related to the selection of a model, and was correctly answered by a high percentage of students (63.8%), which indicates good discrimination between the ideas of factor and level.A small percentage (4.9%)selected option (a) and then did not recognize the situations where they should select a two-factor variance analysis, confusing it with the paired samples t test.Another small group (9.8%, option c) confused the two-way variance analysis with a multivariate model with one-factor.A relative high percentage of students did not respond to the question (21.4%).

Understanding the variance analysis assumptions and model
Item 3 is directed to assess the students' understanding of the assumptions required to apply a variance analysis.The correct answer (c) was selected by 43.3% of participants (see Table 2), showing a moderately level of difficulty for this item.11.2% of students chose option (a); these students forgot the variances of the different populations to be compared should be statistically identical (homoscedasticity) to apply.variance analysis Although violating this assumption is not serious if all the samples have identical number of cases and in case of using the fixed model, it may lead to contradictory results when samples are very different in size or in considering a random model of variance analysis (DUNN; CLARK, 1987).A few students (4%) chose option (b), forgetting the assumption of normality.To determine the importance of this assumption, the student should be aware that a moderate deviation from normality is not a major concern in fixed--effects models (MONTGOMERY, 2008).This item had a high percentage of non-response (41.5%); this suggest students were not confident in their knowledge of the variance analysis assumptions.
In item 4 we assess the students' understanding of the decomposition of variability that underlies a specific model of variance analysis.This decomposition is fundamental in the model and explains why we apply the name "variance analysis" to a method where the main goal is comparing means.The correct answer c) was selected by 69.2% of students (Table 2); the majority of the sample, who showed and understanding of the decomposition of total variability in variability due to each factor, variability explained by interaction between factors and random variability in the model proposed.
Students who chose option a) (5.8%) did not consider the variability due to each individual factor or to the interaction; part of them may confuse the model proposed with the single factor variance analysis.12.5% of the sample (selecting option b) confused the variability decomposition with that corresponding to a repeated measures factor.While more than half of the sample was able to associate an appropriate statistical model with two-factor variance analysis, 18.3% of the sample [options a] and b) did not associate the interaction with full two factor variance model analysis.This result confirms the findings of several authors who have pointed out to the difficulty of understanding the concept of interaction (ROSNOW; ROSENTHAL, 1991;Pardo et al, 2007), while other authors suggested there is also difficulty in understanding how the interaction is related to the interpretation of the main effects (GREEN, 2007).Finally 12.5% of students did not respond to this item.
In Item 9 students should interpret an output from a variance analysis performed with the help of SPSS.The students are requested to identified the model used (a factorial two way analysis of variance).The percentage of correct responses was very high; 74.6%, while only 4.9% assumed the analysis was a test of means in independent samples.These students did not realize there were two different factors being analysed or else confused the idea of factor with that of population.In addition 9.4% of students assumed the analysis corresponds to repeated sample one way analysis of variance.The error here is confusing independent and dependent samples an error described by Rubin and Rosebery (1990).The percentage of non-response (11.2%) was low.

Understanding the computations in variance analysis
We used three items to assess the students' understanding of the steps needed to complete a variance analysis table.Item 5 assesses the participants' understanding of the definition of F statistics as a quotient of variances.These students previously had studied the F distribution in other topics; for example, in inferences related to the population variance.The correct answer a) was selected by 57.1% (Table 2), so that the difficulty of this item was only moderate.These students understood that the F statistics is used to compare the variability between groups with the random variability; in case both variances are identical or close to each other, the difference between groups would be attributed to random variation and not to the effect of factors.
In option b) selected by only 7.1% of students, the variance among groups is confused with that corresponding to subjects; this comparison would not help in comparing the different groups; and therefore, in addition to misunderstanding the calculation method, these students did not grasp the logic of variance analysis.Students who chose option c) (21.9%) believed that variance analysis for repeated measures should compare the interaction between subject-group variables, thus failing to identify this as an additive model.We found no previous research to compare our results.
In items 6 and 7 the students should complete a variance analysis table, and, consequently, understand how the components involved in this table for a two-factor variance analysis are determined.Results in both items suggest a good understanding of the variance analysis table and its computation.
The correct answer to item 6 (related to the sum of squares) is c) that was selected by 72.3% of students (Table 2); therefore this this item was very easy for our participants.3.6% chose option a), giving the observed F value for that factor In option b) the value of the sum of squares for factor B is confused with the mean square for factor A (4%).
In item 7 the value of the mean square of a factor is required.We found 72.3% of correct responses (see Table 2), indicating a high proportion of students understanding the mean square and its computation.Those choosing option a (3,1%) confused the mean square of the factor with the mean square of the interaction, while those choosing option c (3.1%) confused this value with the observed F for factor B. A moderately high percentage of students did not respond (20.4%).

Interpretation of results
Finally, in item 8 we evaluate how well students perform when interpreting the results of the variance analysis table to make a decision about rejecting or not a hypothesis on the effect of the factors involved in the situation.In order to respond to the question, students must first complete the variance analysis table, which will give them three F values, all statistically significant.This means that there is effect of the two factors and of the interaction.To achieve this conclusion students must also correctly interpret a significant result; interpretation has been difficult in previous research (e.g., in VALLECILLOS, 1994).
The correct answer is a), since the p-value calculated for factor A is very close to 0 and the result would be very unlikely in case there was no influence of this factor on the dependent variable.This item was very difficult for the students, as we found a low percentage of correct responses, 33% (Table 2), and a high percentage of non-responses (48.2%).A student who chose options b) or c) did not consider Factor B or the interaction to be statistically significant.Although these responses were selected by few students, the item still was difficult in agreement with results of other research were students are asked to interpret results of statistical analyses (e.g.VALLECILLOS, 1994).Also relevant in this regard is research on confidence intervals like that by Olivo (2008) who suggested that interpretation of statistical analyses is much more difficult than performing the calculation or understanding the definitions.
Another item where students are asked to interpret the results of a variance analysis is item 10 were the data provided are the output from a SPSS analysis.The difficulty of this item for the students is again visible, since only 26.8% provided the correct response.13.8% of the sample selected option a) in suggesting that Factor A (type of work) influences significantly the work satisfaction of women.This result is not shown in this study (see Table 5), since the p value corresponding to this factor is very high.This response suggests a poor understanding of the meaning of a p-value and inability to interpret it, in agreement with results from previous research, such as that by Vallecillos (1994).
Students selecting option c (16.5%) misinterpreted the interaction between both factors, even when the p-value for the interaction is high (0.855).Again, here is visible a misinterpretation of p-values that may be linked to misinterpretation of interactions described by Green (2007) and Pardo et al. (2007).There was a very high percentage of no response in this item.

Implications for teaching variance analysis
In our investigation more than acceptable results were obtained in many of the items, even though our sample consists exclusively of psychology students, who are not academically used to work with mathematical models.In our opinion, these satisfactory results are due to the fact that, prior to the evaluation, our students had taken two statistics courses (a first-and second-year data analysis course).They initially studied the basics of hypothesis test in general and in the second year, they had the opportunity to apply what they had learnt about hypothesis tests to solve problems involving comparison of means and proportions in one and several samples, as well as variance analysis.Despite the high degree of non-responses our results showed the students' competence to solve questions directly related to calculation, select a model and understand the decomposition of the variance.
Not all errors were eradicated during the two courses taken by students.We found that some errors persisted after completing the two--year cycle, which, as noted above, comprised two courses in statistics in which special emphasis on statistical inference.The results obtained in item 8 and 10 agree with those of Vallecillos (1994) and Olivo (2008), who clearly observed the difference in the difficulty level between performing computations and interpreting statistical results.
We found no research evaluating the students' understanding of variance analysis assumptions.We believe it is important to emphasize these assumptions and the consequence of violating the same, as a poor comprehension makes researchers unable to interpret the results of their work or to critically evaluate the results of research published in journals in their field.
We agree with Vallecillos (1994) and Diaz, Batanero and Wilhelmi (2008) that the p value and the level of significance are concepts that should be sufficiently clear and well-connected because correct decision-making ultimately depends on them.We need to review how statistical inference is taught, as stated in Vera, Diaz and Batanero (2011).It would be important to begin introducing these objects informally in middle school in the current tendency of informal inference (see, for example ZIEFFLER et al., 2008).We also would like to encourage other researchers to further analyse students' understanding of variance analysis and propose educational activities that contribute to improving the learning of these concepts.To study the effect of some motivational variables on achievement, two variables are controlled: A: "Type of motivational training" (A1: instrumental, A2: attributional; A3: control) and B: "classroom environment" (B1: cooperative, B2: competitive, B3: individual).45 subjects were selected and divided into groups for each experimental condition.An incomplete variance analysis table is provided.

Mean square F value
Factor A 70  A researcher studied the influence of two factors on women's work satisfaction.Factor A is Type of work with 3 different levels (A1: wok with poor qualification, A2: work with middle qualification; A3: highly qualified work).Factor B is time flexibility with two levels (B1: flexible; B2 rigid).Results of performing the analysis with SPSS are presented in Table 5: Ítem 9.The model of analysis of variance applied in Table 5 is

Item 2 .
A researcher should use a two-factor, fixed effect ANOVA when: a) The study involves an independent variable with two levels.b) The study involves two independent variables, each of them with two or more levels.c) The study involves two dependent variables.Item 3. The assumptions required to apply ANOVA, are: a) Independence of observations, normal distribution, and additivity.b) Independence of observations, equal variances, and additivity.c) Independence of observations, normal distribution, and equal variances.Item 4. In a two-way, fixed effects ANOVA the total variability is split into the following components: a) Total variability = Between groups variability + error variability b) Total variability = Between groups variability + Between subjects variability + error variability.c) Total variability = factor A variability + factor B variability + interaction variability + error variability.Item 5. A researcher uses a repeated measures one-way variance analysis and obtain empirical F value of 8.16, this means that: a) CM between groups / CM error = 8.16.b) CM between subject / CM error = 8.16.c) CM between groups / CM inter subject = 8.16.
The value of the square sum for factor B (see Table4

Item 7 .
The value of the mean square for factor A (see Table4One conclusions from the study is (alfa = 0.05) a) Factor A ("training") influences achievement.b) Factor B ("classroom climate") has no effect on achievement.c) There is no interaction among factors.
: a) Comparing means in independent samples b) One factor variance analysis with repeated measures c) Two factors variance analysis Ítem 10.A possible conclusion of the study presented in Table 5 is (α=0,05) a) There is statistical significance difference in work satisfaction depending on the type of work b) There is effect of factor B: flexible time on work satisfaction.c) There is interaction between flexible time and type of work

Table 1 -
Content assessed by item

Table 1 -
Content assessed by item

Table 2 -
Percentage of correct responses per item(n = 224)

Table 3 -
Item difficulty and confidence interval

Appendix. Questionnaire with correct responses marked in boldface Item 1
. A teacher believes that new physical activities will help improving elementary school children's motor skills.The teacher group randomly divides his group of children into three equal parts.He gives each group a different type of exercise in order to compare which type of exercises gives the best results.Which statistical techniques should be applied to test whether the methods applied provide different results?a) t test for comparing independent means.b) t test for comparing related samples.c) Completely randomized one way variance analysis

Table 5 -
SPSS output in variance analysis