Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Procedimentos para teste da unidimensionalidade
Prof. Dr. Ricardo PrimiUSF
segunda-feira, 16 de setembro de 13
Sumário• Definição
• Modelos (uni, multi, bi-fatorial)
• Modelo “operacional”: unidimensionalidade essencial
• Procedimentos
• AFE scree plot
• AFE análise paralela
• AFE com itens dicotômicos
• Full information factor analysis (FIFA)
• Comparação Bi-Factor com Unidimensionalidade
• Análise fatorial dos resíduos
• AFC
• Programas
• SPSS: scree plot e análise paralela
• TESTFACT: AFE e AFC bi-factor e FIFA
• MPLUS AFE, AFC bi-factor e FIFA
• WINSTEPS
segunda-feira, 16 de setembro de 13
O que é unidimensionalidade ?• Uma dimensão controlando o escore no item
• Discriminação e carga fatorial
segunda-feira, 16 de setembro de 13
Construtos(latentes((exemplo(da(educação)(
– Domínio(de(saberes((MAT):(equações,(sistemas(de(equações,(funções((etc..)(((hCp://www.khanacademy.org/math/algebra).(
– Causa:(raciocínio(e(conhecimento(matemáJco(– Transmissão((como(a(causa(se(estrutura):(via(educação(ensinoLaprendizagem(
Gq(e(RQ((
Causa((latente)(Solução(de(problemas(matemáJcos((observáveis)(
segunda-feira, 16 de setembro de 13
Construtos(latentes((exemplo(da(medicina)(
– Roséola:(febre,(nariz(escorrendo,(tosse(e(pintas(vermelhas(na(pele(
– Causa:(vírus(do(herpes(humano((HVH@6)(– Transmissão((como(a(causa(se(estrutura):(via(saliva((
HVH@6(
Causa((latente)(Sintomas((observáveis)(
segunda-feira, 16 de setembro de 13
Análise(Fatorial(
• Por(exemplo,(como(sabemos(se(um(conjunto(de(sintomas(tem(uma(causa(comum(?(– Associação/correlação:((
Causa((latente)(Sintomas((observáveis)(
?(
segunda-feira, 16 de setembro de 13
Análise(Fatorial:(psicologia(
• Estudo(fatorial(da(inteligência(– Aplicam8se(vários(testes(de(conteúdos(diferentes((raciocínio(lógico,(memória,(cria?vidade,(solução(de(problemas,(etc..)((
– Calcula8se(uma(matriz(de(associações(entre(os(vários(testes((matriz(de(correlação)(
– Agrupam8se(os(testes(pelas(correlações(– Analisam8se(os(testes(e(se(inferem(os(fatores((causas(dos(desempenhos).(
segunda-feira, 16 de setembro de 13
segunda-feira, 16 de setembro de 13
segunda-feira, 16 de setembro de 13
segunda-feira, 16 de setembro de 13
Análise(paralela(
• Foram(extráidos(grupos(de(itens((fatores)(de(1(a(5(e(examinados(para(ver(se(faziam(sen;do(
• O(modelo(de(dois(fatores(foi(o(mais(coerente((fatores(correlacionados)(
0 50 100 150
05
1015
Eigen values of tetrachoric/polychoric matrix
Factor Number
Eige
n va
lues
of o
rigin
al a
nd s
imul
ated
fact
ors
and
com
pone
nts
PC Actual Data PC Simulated DataFA Actual Data FA Simulated Data
0 50 100 150
05
1015
Eigen values of tetrachoric/polychoric matrix
Factor Number
Eige
n va
lues
of o
rigin
al a
nd s
imul
ated
fact
ors
and
com
pone
nts
PC Actual Data PC Simulated DataFA Actual Data FA Simulated Data
segunda-feira, 16 de setembro de 13
Representação gráfica dos modelos SEM
the researcher is hoping to measure versus how much is
due to secondary dimensions? This is exactly the question
the bifactor model can address.In most IRT applications, researchers explore whether
their item-level data are sufficient for Model A as opposed
to Models B or C. We argue that the more interestingcomparisons are between Model A (unidimensional) versus
Model D (bifactor), and the choice between Model C
(multiple correlated dimensions) versus Model D. For theremainder, we will explore these comparisons using the
CAHPS!2.0 data introduced previously. Specifically, let us
assume that we wish to measure a single construct (i.e.,PPC), and thus we ask the standard question, ‘‘does this
item set represent a single construct, or are the data too
multidimensional for the application of a unidimensionalIRT model?’’
The first set of columns in Table 3 display estimated
factor loadings from exploratory principal axis factoringsof the polychoric correlation matrix for one-, two-, and
five-factor extractions, with oblimin rotations. The five
factor solution represents the a priori domains while the
two factor solution represents a plausible alternative [15].MICROFACT [39] was used to conduct these analyses.
The first column in Table 3 displays the loadings under
Model A. Although the items vary widely in their loadings,an argument can be made that there is a ‘‘strong’’ general
dimension here. All items load reasonably well (>.40) on
the first factor; the first five eigenvalues for this matrix are6.8, 1.4, .9, .8, and, .7 and thus the ratio of the first to
second eigenvalues is 4.9. The Goodness-of-fit (GFI) sta-
tistic [12] is .982, the mean residual is .001 with standarddeviation of .06. Finally, the total variance explained by the
first factor is 43%.
On the other hand, reasonable arguments could also bemade that the data are multidimensional and that applica-
tion of a unidimensional IRT model is not appropriate.
Specifically, both two-factor and five-factor solutions aresubstantively interpretable. The two-factor solution can be
MODEL A MODEL B
MODEL C MODEL D
Fig. 1 Four possible latentvariable models
Qual Life Res (2007) 16:19–31 23
123
segunda-feira, 16 de setembro de 13
Tipos&de&modelos&mul,dimensionais&
segunda-feira, 16 de setembro de 13
O que é multidimesnionalidade ?10 Chapter 16
1
8
5
3
2 1
3
2
ITEMS LATENT DIMENSIONS
9
7
6
4
1
3
2
1
9
8
7
6
5
4
3
2
ITEMS LATENT DIMENSIONS
Between ItemMulti-Dimensionality
Within ItemMulti-Dimensionality
Figure 16-2. A Graphic Depiction of Within and Between Item Multi-dimensionality
5. FIRST ISSUE: WITHIN VERSUS BETWEEN MULTIDIMENSIONALITY
To assist in the discussion of different types of multidimensional models and tests we have introduced the notions of within and between item multidimensionality (Adams, Wilson & Wang, 1997; Wang, 1994; Wang & Wilson, in 1996). A test is regarded as multidimensional between item if it is made up of several unidimensional sub-scales. A test is considered multi-dimensional within item at least one of the items relate to more than one latent dimension.
The Multidimensional Between-Item Model. Tests that contain several sub-scales each measuring related, but supposedly distinct, latent dimensions are very commonly encountered in practice. In such tests each item belongs to only one particular sub-scale and there are no items in common across the sub-scales. In the past, item response modelling of such tests has proceeded by either (a) applying a unidimensional model to each of the scales
EDMS 724 – Modern Measurement Theories (Spring 2008, Dr. André A. Rupp)
Graphical Illustration
aj1 = 1.8, aj2 = .3, dj = .3, Dj = -.273
see Ackerman (1994)
294 Assessment 18(3)
to the standard ICC in order to account for the second latent variable.
Nonparametric, nonmonotone, and multiple group IRT models have all emerged (see van der Linden & Hambleton, 1997). However, many of these extensions pose formidable challenges to applied researchers (e.g., complex formulas, lack of accessible software for performing the analyses, and unknown statistical properties). Technological and theoreti-cal advances must be made before the wide array of IRT models becomes accessible.
Applications of Item Response Theory to Clinical AssessmentModel Selection
The fundamental role of an IRT equation is to model exam-inees’ test response behavior. As previously reviewed, a number of IRT models can be selected for this purpose. In clinical assessment, Rasch models enjoy great popularity in Europe and have seen moderate use in the United States. However, because Rasch models demand identical item dis-crimination parameters, they often fail to fit scales developed with older technology (e.g., Tenenbaum, Furst, & Weingarten, 1985). Although Rasch models may be applicable to content-specific subscales couched within more complex frameworks (e.g., Bouman & Kok, 1987; Chang, 1996; Cole, Rabin, Smith, & Kaufman, 2004), they appear to be inappropriate for
scales measuring psychological syndromes. Nevertheless, because of the beneficial properties of the model, fit of a Rasch framework should be given thorough consideration. Two-parameter models appear to be more congruent with existing clinical measures than their Rasch (one-parameter) counterparts (Reise & Waller, 1990). They have accurately reproduced observed data where Rasch models have failed (e.g., Aggen, Neale, & Kendler, 2005; Cooper & Gomez, 2008; Ferrando, 1994; Gray-Little, Williams, & Hancock, 1997). Owing to its greater flexibility and its congruence with common factor theory, the two-parameter model is more common in clinical assessment.
The three-parameter model has been applied to clini-cal assessment less commonly. Although the model adds flexibility to analyses, conceptualizing the impact of “pseudo-guessing” on items related to personality and psy-chopathology can be difficult. On clinical tests, the lower asymptote parameter has occasionally been thought of as being indicative of a response style (e.g., social desirability, true response bias, etc.; see Zumbo, Pope, Watson, & Hubley, 1997). For example, if examinees are unwilling to respond openly to an item concerning sexual practices, drug use, mental health, and so on, responses could be drawn toward more conservative options. Rouse, Finger, and Butcher (1999) fit a three-parameter model to scales from the second edition of the Minnesota Multiphasic Personality Inventory (MMPI-2; Butcher et al., 2001) and found substantial cor-relations between estimates of lower asymptotes and indices
Anxiety
–20
2
Dep
ress
ion
–2
0
2
0.0
0.2
0.4
0.6
0.8
1.0
Figure 2. Item characteristic surface for a multidimensional item response modelAnxiety and depression are used as examples of two distinct latent variables that both influence the probability of item endorsement.
at Serials Records, University of Minnesota Libraries on August 27, 2011asm.sagepub.comDownloaded from
segunda-feira, 16 de setembro de 13
Unidimensionalidade essencial
• Pressuposto de quase todas as análises clássicas!
• Um fator dominante e fatores secundários, embora existam, são negligenciáveis
10 Chapter 16
1
8
5
3
2 1
3
2
ITEMS LATENT DIMENSIONS
9
7
6
4
1
3
2
1
9
8
7
6
5
4
3
2
ITEMS LATENT DIMENSIONS
Between ItemMulti-Dimensionality
Within ItemMulti-Dimensionality
Figure 16-2. A Graphic Depiction of Within and Between Item Multi-dimensionality
5. FIRST ISSUE: WITHIN VERSUS BETWEEN MULTIDIMENSIONALITY
To assist in the discussion of different types of multidimensional models and tests we have introduced the notions of within and between item multidimensionality (Adams, Wilson & Wang, 1997; Wang, 1994; Wang & Wilson, in 1996). A test is regarded as multidimensional between item if it is made up of several unidimensional sub-scales. A test is considered multi-dimensional within item at least one of the items relate to more than one latent dimension.
The Multidimensional Between-Item Model. Tests that contain several sub-scales each measuring related, but supposedly distinct, latent dimensions are very commonly encountered in practice. In such tests each item belongs to only one particular sub-scale and there are no items in common across the sub-scales. In the past, item response modelling of such tests has proceeded by either (a) applying a unidimensional model to each of the scales
segunda-feira, 16 de setembro de 13
Procedimentos
• SPSS: scree plot
• Eigenvalue X ordem
• Eigen 1 / Eigen 2 > 5
• SPSS: RanEingen (Enzmann, 1997)
• Cria V variáveis para N sujeitos com distribuição aleatória. r v1 vs rv2 = 0. (dados paralelos)
• Extrai os fatores dessa matriz
• Repete i vezes
• Calcula a média dos primeiros, segundos, terceiros ... fatores
• Enzmann, D. (1997). RanEigen: a program to determine the parallel analysis criterion for the number of principal components. Applied Psychological Measurement, 21, 232. (http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Software/Enzmann_Software.html)
segunda-feira, 16 de setembro de 13
!
segunda-feira, 16 de setembro de 13
Exemplo1 ENEM 2006
Resultados
Dois fatores 20,6% e 2,9% da variância (Eig1/Eig2 = 4,54, Rotação Promax r (f1 vs f2)=0,67
Modelagem Rasch e Extração de contrastes r =0,50
quinta-feira, 2 de junho de 2011segunda-feira, 16 de setembro de 13
Exemplo 2: QFCP
• SPSS ...
segunda-feira, 16 de setembro de 13
Análise Fatorial Exploratória (AFE) com itens dicotômicos
FERGUSON, G. A. (1941). The factorial interpretation of test dificulty. Psychometrika, 6 (5): 323-329.
segunda-feira, 16 de setembro de 13
segunda-feira, 16 de setembro de 13
segunda-feira, 16 de setembro de 13
Full information factor analysis e análise fatorial dos resíduos
• FIFA (Testfact)
• Análise fatorial via TRI
• Estima-se os parâmetros (geralmente o modelo de 3 parâmetros, c é informado) e deriva-se as cargas a partir dos parâmetros a.
• TRI multidimensional !
• Análise fatorial dos resíduos (WINSTEPS)
• Aplica-se o modelo unidimensional (1o fator) todos os itens
• Calcula-se os resíduos
• Roda-se a análise de componentes principais nos resíduos (2o, 3o, 4o fatores) sem rotação
• Dividi-se os itens que formam grupos nos contrastes
• Verifica-se se os escores estimados a partir desses itens estão correlacionados (avalia-se a interferência das dimensões secundárias)
segunda-feira, 16 de setembro de 13
Comparação Bi-Factor com Unidimensionalidade
• Em essência uma AFC
• Passo 1: análise unidimensional tradicional
• Passo 2: análise bifatorial
• Compara-se as cargas no fator geral das duas soluções
• Se a os fatores específicos não “distorcem” a medida as cargas no fator geral de ambos os modelos serão similares. Isso é interpretado que, mesmo com a presença de fatores específicos a medida do fator geral não é distorcida
the researcher is hoping to measure versus how much is
due to secondary dimensions? This is exactly the question
the bifactor model can address.In most IRT applications, researchers explore whether
their item-level data are sufficient for Model A as opposed
to Models B or C. We argue that the more interestingcomparisons are between Model A (unidimensional) versus
Model D (bifactor), and the choice between Model C
(multiple correlated dimensions) versus Model D. For theremainder, we will explore these comparisons using the
CAHPS!2.0 data introduced previously. Specifically, let us
assume that we wish to measure a single construct (i.e.,PPC), and thus we ask the standard question, ‘‘does this
item set represent a single construct, or are the data too
multidimensional for the application of a unidimensionalIRT model?’’
The first set of columns in Table 3 display estimated
factor loadings from exploratory principal axis factoringsof the polychoric correlation matrix for one-, two-, and
five-factor extractions, with oblimin rotations. The five
factor solution represents the a priori domains while the
two factor solution represents a plausible alternative [15].MICROFACT [39] was used to conduct these analyses.
The first column in Table 3 displays the loadings under
Model A. Although the items vary widely in their loadings,an argument can be made that there is a ‘‘strong’’ general
dimension here. All items load reasonably well (>.40) on
the first factor; the first five eigenvalues for this matrix are6.8, 1.4, .9, .8, and, .7 and thus the ratio of the first to
second eigenvalues is 4.9. The Goodness-of-fit (GFI) sta-
tistic [12] is .982, the mean residual is .001 with standarddeviation of .06. Finally, the total variance explained by the
first factor is 43%.
On the other hand, reasonable arguments could also bemade that the data are multidimensional and that applica-
tion of a unidimensional IRT model is not appropriate.
Specifically, both two-factor and five-factor solutions aresubstantively interpretable. The two-factor solution can be
MODEL A MODEL B
MODEL C MODEL D
Fig. 1 Four possible latentvariable models
Qual Life Res (2007) 16:19–31 23
123
segunda-feira, 16 de setembro de 13
The role of the bifactor model in resolving dimensionalityissues in health outcomes measures
Steven P. Reise Æ Julien Morizot Æ Ron D. Hays
Received: 25 August 2006 / Accepted: 30 January 2007 / Published online: 4 May 2007! Springer Science+Business Media B.V. 2007
AbstractObjectives We propose the application of a bifactormodel for exploring the dimensional structure of an item
response matrix, and for handling multidimensionality.
Background We argue that a bifactor analysis can com-plement traditional dimensionality investigations by: (a)
providing an evaluation of the distortion that may occur
when unidimensional models are fit to multidimensionaldata, (b) allowing researchers to examine the utility of
forming subscales, and, (c) providing an alternative to non-
hierarchical multidimensional models for scaling individ-ual differences.
Method To demonstrate our arguments, we use responses
(N = 1,000 Medicaid recipients) to 16 items in the Con-sumer Assessment of Healthcare Providers and Systems
(CAHPS!2.0) survey.
Analyses Exploratory and confirmatory factor analyticand item response theory models (unidimensional, multi-
dimensional, and bifactor) were estimated.
Results CAHPS! items are consistent with both unidi-mensional and multidimensional solutions. However, the
bifactor model revealed that the overwhelming majority of
common variance was due to a general factor. After con-trolling for the general factor, subscales provided little
measurement precision.
Conclusion The bifactor model provides a valuable toolfor exploring dimensionality related questions. In the
Discussion, we describe contexts where a bifactor analysis
is most productively used, and we contrast bifactor withmultidimensional IRT models (MIRT). We also describe
implications of bifactor models for IRT applications, and
raise some limitations.
Keywords Bifactor model ! Unidimensionality
assumption ! Item response theory ! Multidimensional itemresponse model ! Health outcomes measurement
Item response theory (IRT) [1] methods were developed inthe context of large-scale assessment to more efficiently
and accurately measure broadband dimensional constructs
such as verbal and quantitative aptitude. In recent years,IRT methods have been applied in a wider variety of
substantive contexts, especially health outcomes research
[2–6]. This article is not an introductory or didactic reviewof IRT methods as applied in the health outcomes domain.
Such reports are available in a variety of sources [7, 8].Rather, this article is aimed toward researchers who are
familiar with IRT methods, and who are actively working
on IRT applications. We assume the reader is familiar withthe literature that relates factor analytic and IRT models
[9–13].
We discuss two related topics in the application of IRTmodels to health outcomes measures: (a) (uni)dimension-
ality assessment, and (b) application of hierarchical versus
non-hierarchical multidimensional models. We first draw adistinction between narrow and broad constructs/measures.
We then argue that for broader measures with diverse
indicators, researchers should consider use of the bifactormodel for representing the (multi)dimensional structure of
their data. We argue that a bifactor model: (a) allows for
the examination of the distortion that may occur whenunidimensional IRT models are fit to multidimensional
data, (b) allows researchers to empirically examine the
utility of forming subscales, and, (c) provides an alternative
S. P. Reise (&) ! J. Morizot ! R. D. HaysDepartment of Psychology, University of California, Franz Hall,Los Angeles, CA 90095-1563, USAe-mail: [email protected]
123
Qual Life Res (2007) 16:19–31
DOI 10.1007/s11136-007-9183-7
segunda-feira, 16 de setembro de 13
!
1!
!
1
La utilización del modelo bifactorial para testar la unidimensionalidad de una batería de
pruebas de raciocinio.
The use of the bi-factor model to test the uni-dimensionality of a battery of reasoning tests
Running Head: Item factor analysis of a Battery of Reasoning Tests
Ricardo Primi,∗ Marjorie Cristina Rocha da Silva and Priscila Rodrigues Santana,
Graduate Program in Psychology,
University of San Francisco, Brazil
Monalisa Muniz,
University of Vale do Sapucaí, Brazil
and
Leandro S. Almeida
Institute of Education, University of Minho, Portugal.
Please address correspondence to:
Ricardo Primi Universidade São Francisco Laboratório de Avaliação Psicológica e Educacional (LabAPE) Rua Alexandre Rodrigues Barbosa, 45 CEP 13251-900, Itatiba São Paulo, Brazil. E-mail: [email protected]
Web: www.labape.com.br
∗ The research activities of the first author, which resulted in this article, are financed by the
Brazilian National Council for Scientific Research (CNPq) and the São Paulo Research
Foundation (FAPESP).
!
3!
!
3
Abstract
The Battery of Reasoning Tests 5 (BPR-5) aims to assess the reasoning ability of individuals
using sub-tests with different formats and contents that require basic processes of inductive
and deductive reasoning in their resolution. The BPR has three sequential forms: BPR-5i
(for children from first to fifth grade), BPR-5 – Form A (for children from sixth to eight
grade) and BPR-5 – form B (for high school and undergraduate students). The present study
analysed 412 questionnaires concerning BPR-5i, 603 questionnaires concerning BPR-5 –
Form A and 1748 questionnaires concerning BPR-5 – Form B. The main goal was to test the
uni-dimensionality of the battery and its tests in relation to items using the bi-factor model.
Results have indicated that the assumption of a general reasoning factor underlying different
contents items is supported.
Key-words: Battery of reasoning tests, factorial validity, item response theory, bi-factor
model
Resumén
La Batería de Pruebas de Raciocinio (BPR-5) tiene como objetivo evaluar la capacidad de
razonamiento de las personas utilizando pruebas menores con diferentes ítems y contenidos,
pero que presentan relaciones en lo referente a la inducción y la deducción que intervienen
en su resolución de la tarea. La BPR tiene una organización secuencial: BPR-5i (para niños
de 1º a 5º grado), BPR-5 versión A (del 6º al 8º grado) y BPR-5 versión B (enseñanza
secundaria y terciaria). El presente estudio evaluó los datos de 289 protocolos de la BPR-5i,
603 de la BPR-5 versión A y 1748 de la BPR-5 versión B. El objetivo principal fue poner a
prueba la unidimensionalidad de la batería y de las pruebas que la componen. Los resultados
confirmaron la existencia de un factor único relatado con el razonamiento
independientemente del contenido de las tareas.
Palabras clave: Batería de Pruebas de Raciocinio, validez factorial, desarrollo cognitivo,
teoría de respuesta al ítem, calibración de la prueba.
segunda-feira, 16 de setembro de 13
Exemplo BPR-5 A, B e i
!
22!
!
22
Figure 1. Item examples of BPR sub-tests
Abstract Reasoning A B C D E
?
Verbal Reasoning
Day : Night is as Bright :
A. Light B. Energy C. Dark D. Clarity E. Cloud
Numerical Reasoning
1 3 5 7 9 ? ?
Spatial Reasoning A B C D E
Mechanical Reasoning
What level (A, B, C) allows a person to reach a greater depth after jumping? If equal mark D.
Practical Reasoning (Only in BPR-5i)
John's house is nearby Anthony’s home. One house is white and the other is grey. Anthony's
house is not white. State what is the colour of the house of each of these two men.
segunda-feira, 16 de setembro de 13
!
18!
!
18
Table 1. Summary results of bi-factor and full information factor models
BPR-5i
BPR-5 A BPR-5 B
Bi-factor model g bi (%) 35.00 34.43 30.72 s AR (%) 8.61 2.35 2.39 s VR (%) 6.57 2.24 2.77 s MR (%) - 3.04 4.28 s SR (%) - 2.88 2.58 s NR (%) 6.29 3.93 4.41 s PR (%) 3.25 - - Uniqueness 40.25 51.13 52.83 Reliability (of g-factor) .92 .90 .90 χ2 27040.8 /105 63719.7 / 256 177566.0/1443 CFI .981 .974 .951 TLI .980 .973 .949 RMSEA .017 .014 .017
Uni-dimensional model (Full information)
Eig 1 / Eig 2 35.9/6.4 (5.6) 37.7/5.15 (7.32) 39.37/6.41 (6.14) g full 44.08 37.42 40.42 Reliability (of g-factor) .95 .95 .95 χ2 34891.1 / 207 65848.91 /371 182457.3 /1558 CFI .904 .924 .876 TLI .902 .922 .873 RMSEA .038 .024 .027
Corrected Chi-Square difference test for the weighted least squares estimator (WLSMV) �χ2 1421.33 1540.46 3905.26 df 102 115 113 p <.0001 <.0001 <.0001
segunda-feira, 16 de setembro de 13
!
23!
!
23
Figure 2. Scatter plots of factor loadings in g factor with the bi-dimensional model for the youngest children (top), A (lower left) and B Forms (lower right).
segunda-feira, 16 de setembro de 13