Reliability and Validity of the FCCERS-3
The FCCERS-3 is a revision of the widely used and documented Family Child Care Environment Rating Scale-Revised and Updated (FCCERS-R, 2007), one in a family of instruments designed to assess the overall quality of early childhood programs. Since the concurrent and predictive validity of the previously published ERS instruments are well established and the current revision maintains the basic properties of the original instrument, the focus of the first field studies of FCCERS-3 has primarily been on the degree to which the revised version maintains the ability of trained observers to use the scale reliably, and on the basic functioning of the instrument. Additional studies will be needed to document the continued relationship with other measures of quality, as well as to document its ability to predict child outcomes. As further studies are conducted, they will be posted on the ERSI website (www.ersi.info).
After extensive revision, the authors conducted small pilot trials of the FCCERS-3 in the late spring and summer of 2018 and a larger field test of the scale that fall. The field-test study sample consisted of 63 FCCHs in the following states: Georgia (15), Pennsylvania (18), Washington (15), and Wisconsin (15). FCCHs were recruited with the goal of having a sample of roughly 1/3 lower-quality programs, 1/3 moderate-quality programs, and 1/3 higher-quality programs, based on available data from state QRISs. The resulting sample was somewhat skewed, with field-test data showing relatively few high-scoring programs and more in the moderate- to lower-quality ranges, but an adequate distribution was achieved to allow for examination of use of the scale across quality levels in these states.
Indicator Reliability. Indicator reliability is the proportion or percentage of scores that exactly match for each indicator by the two assessors independently completing FCCERS-3. Across the 33 Items in the FCCERS-3, there were a total of 477 indicators in the field test version. Assessors were instructed to score all indicators for each FCCH. The average reliability for exact matches across all of the indicators and assessor pairs was 85.5%. The indicators were scored either Yes or No, with several indicators allowed to be assigned NA (not applicable) in specified circumstances. In addition, six of the Items could be scored NA. In such cases, NA was assigned as the score for all indicators in the Item(s). A few indicators scored below 75% exact agreement. Subsequent to the field test, the authors examined those indicators and either eliminated the indicators or made minor adjustments to improve the reliability. The elimination of those indicators resulted in an increase in indicator reliability. The final version of the scale includes 464 indicators across the 33 Items.
Item Reliability. Because of the nature of the scoring system, it is theoretically possible to have high indicator agreement but low agreement at the Item level. Two measures of Item agreement were calculated. First, we calculated the agreement between pairs of observers within 1 point on the 7-point scale. For the full 33 Items, exact agreement occurred in 62.2% of the cases and agreement within 1 point was obtained 86.4% of the time. Item agreement within one point ranged from a low of 76.2% on two Items (Item 3, Arrangement of Indoor Space for Child Care and Item 16, Art, to a high of 95.2% for four Items (Item 22, Appropriate Use of Screen Time, Item 23, Promoting Acceptance of Diversity, Item 24, Gross Motor and Item 30, Interactions Among Children).
A second, more conservative measure of reliability is Cohen’s kappa. This measure takes into account the magnitude of difference between scores. For measures with an ordinal scale, a weighted version of the kappa is most appropriate and is used here. The mean weighted kappa for the 33 Items was .64. Kappas ranged from a low of .43 for Item 21, Math/Number, to a high of .96 for Item 22, Appropriate Use of Screen Time. Two Items had weighted kappa of .500 or below. The Items with lower reliability received minor edits to help improve reliability. The edits made for indicators discussed above should result in somewhat higher kappas for the low-scoring Items without changing their basic content. These changes are included in the printed version of the scale. Even using the more conservative measure of reliability, the overall results indicate an acceptable level of reliability for the instrument as a whole.
Intraclass Correlation. A third way of looking at reliability, intraclass correlation, examines the level of agreement between observers when the observers assess quality independently. It accounts for both the correlation between two observes and also takes into account differences in absolute magnitude of the two assessors’ ratings. We assessed the absolute agreement intraclass correlation coefficient in a two-way mixed model, average estimates, where 0 represents no correlation between assessments and 1 represents perfect correlation. At the Item level, the mean coefficient was .96, with a range from .76 for Item 3, Arrangement of Indoor Space for Child Care, to 1.0 for Item 22, Appropriate Use of Screen Time. Coefficients for the subscales and for the total score are shown in the table below. All of these measures indicate cohesion of the measurement of quality by the FCCERS-3.
Intraclass Correlation: Subscales and Full Scale
Subscales |
ICC |
Subscale 1: Space and Furnishings |
0.73 |
Subscale 2: Personal Care Routines |
0.80 |
Subscale 3: Language and Books |
0.91 |
Subscale 4: Activities |
0.88 |
Subscale 5: Interaction |
0.92 |
Subscale 6: Program Structure |
0.87 |
Full Scale (Items 1-33) |
0.96 |
Internal Consistency. Finally, we examined the scale for internal consistency. This is a measure of the degree to which the full scale and the subscales appear to be measuring common concepts. Cronbach’s alphas of .6 and higher are generally considered acceptable levels of internal consistency. Overall, the scale has a high level of internal consistency, with a Cronbach’s alpha of .97. This figure indicates a high degree of confidence that a unified concept, which we call quality of the environment, is being measured. A second issue is the degree to which the subscales also show consistency—that is are they measuring some construct consistently. Below is a table showing the alphas for each subscale:
Internal Consistency |
|
Subscale |
Cronbach’s Alpha |
Subscale 1: Space and Furnishings |
0.74 |
Subscale 2: Personal Care Routines |
0.81 |
Subscale 3: Language/Books |
0.92 |
Subscale 4: Activities |
0.92 |
Subscale 5: Interaction |
0.93 |
Subscale 6: Program Structure |
0.88 |
Full Scale (Items 1-33) |
0.97 |