التقارير الفنية المتخصصة تحميلA Comparison of IRT Theta Estimates and Delta Scores From the Perspective of Additive Conjoint Measurement Benjamin W. Domingue & Dimiter M. DimitrovThe National Center for Assessment (NCA) is piloting an approach to scoring and equating of tests with binary items referred to as delta-scoring method (Dimitrov, 2015). The purpose of this study was to evaluate the intervalness of the D scores, obtained under the delta-scoring method, in comparison with the intervalness of “theta” scores obtained under the three-parameter logistic model in item response theory (IRT). The question of interest was which scores (D or theta) are more consistent with the axioms of additive conjoint measurement (ACM; Luce & Tukey, 1964). This question was addressed through the approach of ConjointChecks (Domingue, 2014) with the use of real data from a large-scale assessment at the NCA. This study provides evidence that the D scores produce fewer violations of the ordering axioms of ACM than do the theta scores. ... المزيد تحميلDifferential Item Functioning Report for the New Arabic Test Yong LuoThe current report focuses on differential item functioning (DIF) analysis of the 50 items in the Arabic test. Given the small sample size (N=275), many DIF detection methods based on item response theory (IRT) are not feasible and we use the Mantel-Haenszel (MH) method (Holland & Thayer, 1988) as the main tool for the subsequent DIF analysis. ... المزيد تحميلUsing a Multiple Indicators Multiple Causes (MIMIC) model to Examine Item and Scale Performance across Different Response Time Groups Ioannis TsaousisTR164-2016 Abstract Item response latency has become a valuable source of information in modern psychometrics, especially after the increasing use of Computer Based Tests (CBT) in educational and psychological assessment. Previous research has shown that response latency is related to several internal (e.g. item difficulty, item discrimination, etc.) and external (e.g. time framework, item length, etc.) characteristics of the tests, but also to temperamental characteristics of the respondents (e.g., fast vs. slow respondents). The aim of this study was to investigate the extent to which speededness (i.e., how fast or slow an individual respond to an item/scale) influences overall performance in three different cognitive areas (verbal, quantitative, and advance functioning). To examine the above research question data from 8,475 individuals completing the computerized version of the Postgraduate General Aptitude Test (PAGAT) were analyzed. To determine the extent to which speededness affects scale scores, a Multiple Indicators Multiple Causes (MIMIC) model was applied, using as a covariate variable the time group membership (i.e., fast vs. slow respondents). The results sugge ... المزيد تحميلExamining Psychometric Properties of Qiyas for Arabic (L1) A Rasch Analysis Approach Amjed A. Al-OwidhaOne central required process during the stage of test construction and development is to check for the quality of test items to examine if they are functioning and behaving as planned. Such process is called reliability and validity studies. Test developer, in this stage, usually use and select stringent measurement models that is suitable for the type of examinees' responses on the test itself. The purpose behind that is to make sure that the data under study was appropriately handled before validation taken place. The purpose of this study was to examine the psychometric properties of trail version of Qiyas for Arabic (L1) test using Rasch analysis. The Qiyas for Arabic (L1) is a Large – Scale Standardized Test that was developed recently by NCA to measure four Arabic Language skills for Native Speakers. Over 200 Examinees' responses were analyzed in this study. The initial results were supporting the notion that Qiyas L1 test maintains satisfactory psychometric properties. ... المزيد تحميلThe Effects of Repeated Testing on Person Performance on the Computerized Version of the GAT and PGAT Tests Georgios SideridisTR157-2016 Abstract The purpose of the present studies was to explore how subsequent testing influences examinee performance using the novel methodology of latent transition analysis. Participants in Study 1 were 5,091 examinees who took the computerized version of the GAT test across three time intervals. Four groups of individuals were created and empirically tested using latent class analysis with the predictors being 4 ability testlets (from low to high ability items). The four groups that emerged resembled 4 levels of performance such as those in the interquartile range. Results indicated that individuals were very stable at high ability levels with most movement observed with the above average group moving towards high ability. Interestingly, the most unstable group comprised low achievers who moved towards all possible groups, even the highest ability one. Transitioning from Time 2 to Time 3 essentially replicated the transitioning from Time 1 to Time 2. Study 2 attempted to replicate the above findings with the computerized version of the PGAT, which measures 3 general domains. Results from the latent profile analysis suggested that besides a low and a high ability ... المزيد تحميلInvestigating the Psychometric Properties of the New Arabic Language Test in both IRT and CTT frameworks Amjed A. Al-Owidha Yong LuoOne central required process during the stage of test construction and development is to check for the quality of test items to examine if they are functioning and behaving as planned. This is part of the validity & reliability studies. The test developer, in this stage, usually use and select stringent measurement models that is suitable for the type of examinees' responses on the test itself. The purpose behind that is to make sure that the data under study was appropriately handled before validation taken place. The purpose of this study was to examine the psychometric properties of trial version of Qiyas for Arabic (L1) test using Rasch IRT analysis and also Classical Test Theory (CTT). Over 200 Examinees' responses were analyzed in this study. Given the rather small sample size it was deemed useful to conduct analyses in both CTT and IRT frameworks and to take results from both studies into consideration when making judgments about the quality of the test (items). The Qiyas for Arabic (L1) is a Large – Scale Standardized Test that was developed recently by NCA to measure four Arabic Language skills for Native Speakers. The initial results were supporting the notion that ... المزيد تحميلThe Effects of Pretesting on Computer Based Testing Using the GAT Georgios SideridisTR152-2016 Abstract The purpose of the present study was to explore how subsequent testing influences examinee performance using the novel methodology of latent transition analysis. Participants were 5,091 examinees who took the CBT test more than once. Four groups were created and empirically tested using latent profile analysis using a subtests comprised of four aggregates reflecting quartile performance levels. Thus, four ability groups were created based on their performance on the quantitative component of the CBT. After testing how individuals transitioned between ability classes results indicated that examinees were mostly stable when they belonged to the high ability group at time 1 with only 4.5% of them moving to the immediately lower ability group. Individuals transitioned mostly from lower ability groups to higher ability groups with the largest numbers of transitioning being with the ability group right above the one examinees were at time 1. Thus, the hypothesis of practice effect is most likely supported. Interesting effects were observed with individuals in the above average group transitioning greatly (51% of them) to the highest ability group. Findings tha ... المزيد تحميلItem Analysis Report for the New Arabic Test Yong LuoThe current test was taken by 501 students, and hence the subsequent analyses were based on a sample of 501 students. Due to the small sample size, only classic test theory (CTT) is used as the measurement framework for analysis conducted in this report. ... المزيد تحميلExamining the prevalence and impact of non- attempted items in NCA educational tests Iasonas LamprianouThe aim of this study is to examine whether NCA achievement test scores are affected by response strategy decisions. The dataset consisted of the responses of 34,500 examinees to 52 verbal and 44 quantitative items. It was found that the frequency of missing responses in the data was very small, both for the Verbal and the Quantitative tests. The examinees who produced missing responses on the one test, also tended to produce missing responses on the other test. Coding the missing responses as missing rather than as incorrect did not affect either the model-data fit of the Rasch models, or the difficulty estimates of the items. Also, coding the missing responses as missing rather than as incorrect did not affect noticeably the ability estimates of the overall sample. It was not possible to find evidence that examinees of lower ability tended to produce more missing responses. It is suggested that, although the phenomenon is not important enough to cause concerns regarding the validity and the reliability of the examination results, it should be monitored regularly. ... المزيد تحميلModeling Group Specific Differential Item Functioning in the STAPSOL Test Khurrem JehangirFit to item response theory (IRT) models in educational testing can be compromised by the presence of group-specific differential item functioning (GDIF). The current study proposes methods to detect GDIF and explores the feasibility of improving the fit of the measurement model by using group-specific item parameters to model GDIF. In this approach, it is assumed that a scale consist of both items which are free of GDIF and items with GDIF. The first set of items ensures the validity of the measure across groups. The second set of items is calibrated concurrently with the first set of items and both sets of items contribute to measurement precision. The procedure is used to model high DIF items in the STAPSOL test of Arabic proficiency. Using data of the groups that participated in this exam, concurrent maximum marginal likelihood (MML) estimates of the parameters of the Rasch model (1PLM) are obtained. Then information on observed and expected response frequencies is used to identify GDIF items. Group-specific item parameters are introduced for the items with the largest effect sizes of GDIF and new MML estimates are obtained. The impact of using group-specific item parame ... المزيد 12345678910... محتوى الصفحة شارك على