FINAL REPORT: Validation of Test Methods for Assessing Neurodevelopment in Children1
A number of individual tests or tasks yielded predictive sensitivity and specificity estimates for some risk groups that could be considered high. The ROC analyses demonstrated that a number of tests and tasks had predictive sensitivity and specificity significantly greater than chance, to measure a wide spectrum of subtle effects across the several neurodevelopment domains studied. Of the 63 tests and tasks across five domains, 23 tests or tasks (approximately one third) across four of the five domains had a high probability of predicting at least one risk category, as summarized in Table 20. These endpoints included electrophysiological measures of brain activity, learning and problem-solving ability, auditory processing ability, and motor control ability.
|Domain and Endpoint||Category of Developmental Risk|
|Visual and Auditory Information Processing|
|BAER Amplitude @4000 Hz 70dB||√|
|BAER Amplitude @6000 Hz 70dB||√|
|Auditory Processing: Pitch Pattern|
|Trials 1+2 Correct + Reversed||√||√|
|Trials !+2 Correct Only||√||√|
|Trials 1+2 Gesture – Verbal Correct + Reversed||√|
|Monitoring and Vigilance|
|Percent Alarms 10-15 minutes||√|
|Average Tracking Error||√|
|% Hazard 10-15 min||√|
|Fine Motor Control|
|P300 Amplitude Cz||√|
|Mean CPT RT||√||√|
|Median Inter-response TimeTime||√|
|Median Pauses to the Final One||√|
|Total Choices: High Reward, Long Delay||√|
|Mdn Latency: High Button Choices||√||√|
|Paired Associate Learning|
|Average Errors to Success||√||√|
|Average Trials to Success||√|
|DMS Task, Percent Correct Long Delay||√|
The battery proved more sensitive and specific to deficits in overall cognition as measured by the IQ. Nearly one-third (18) of the 63 tests and tasks showed sensitivity and specificity of 70% or higher in predicting IQ. Of these 18 endpoints, five were also sensitive and specific to LD or neonatal risk. Since IQ, LD, and neonatal risk are not highly correlated either in our sample or in the general population, this finding may indicate a weakness in the battery for use in assessing a broad array of neurotoxic exposure effects. On the other hand, our adoption of the criterion of 70% as the cutoff for acceptable sensitivity and specificity was arbitrary. Many other measures performed less than 70%, but significantly better than chance in detecting both LD and neonatal risk. This question deserves further study, by applying the battery to selected groups of affected children for several reasons. Areas under ROC curves are dependent upon the specific behavior, i.e., a test or task outcome, as well as overlap between those who do or do not have the condition expected to influence the task performance. Therefore, specificity and sensitivity may vary with toxins, as well as parameters of neurotoxic exposure, such as dose, age at exposure, duration of exposure, and others. Replication of our results using statistical approaches other than ROC curves such as linear and non-linear multiple regression would be useful. Further, separate secondary analyses comparing test performance as a function of verbal IQ and performance IQ, and follow up of the school records of children in our cohort might reveal better information on at least LD than was possible to determine using a psychometric classification criterion.
Our regression analyses indicated that covariate effects varied with specific domain and test or task. Factors such as age at testing and gender accounted for considerable variance in many analyses and areas under the curve varied accordingly. Also, performance on a number of computer-based tests and tasks was influenced by experience with computer manipulanda such as keyboards, joy-sticks, and mice, and by experience playing video games. Also, some of the auditory processing and auditory system tasks were affected by the ability to hear the stimuli. Finally, some ROC curves describing an association between a particular test or task and one risk factor were influenced by one or more other risk factors. These results were interpreted as indications of factors that modified the area under the ROC curve, a particular strength of the ROC regression modeling strategy we followed.
Covariates such as gender, age at testing, and even handedness should not be viewed as confounders, but rather as potentially important determinants of selective neurotoxic effects. Neurotoxicants may cause IQ deficits, and may be selective in their effects on males and females. Our data strongly suggest that when tests or tasks reported in this study are employed in future studies, these covariates must be ascertained and employed to adjust the results accordingly. Subsequent analyses of these covariate data are also warranted to amplify the effect modification by some of the covariates. For example, the influence of gender on the predictiveness of some tests and tasks may prove useful in future studies that hypothesize gender effects on the neurotoxicity of some substances.
The most consistent sensitivity and specificity resulted for the learning and problem-solving tasks comprising the CANTAB. Given the types of deficits that defined our three risk categories, these results should not be surprising. The family of CANTAB tasks taps learning skills under various complexity conditions. These skills are basic to effective cognition, and depend less on motor skill or sensory functions. Nevertheless, many of the CANTAB tasks were affected by but not dependent upon familiarity with computers, handedness, and gender. The results indicated that combining the CANTAB paired associate learning, delayed match-to-sample, and dimensional shift tasks with selected endpoints from the FI/Choice paradigms, would afford the best chance to predict risk in any of the three categories tested in this study. Therefore, we recommend that these tasks form the backbone of the final battery.
A number of the CANTAB tasks also were affected by experience using computers or playing video games. Performance was generally enhanced by such experience. It is not clear from our data why these differences occurred. The CANTAB tasks all involve responding to a touch screen or a computer keyboard, where motor proficiency might have been influenced by previous experience with similar tasks. As such, this outcome should be factored into the design of any study using the CANTAB.
The CANTAB tasks collectively do not emphasize language and auditory processing, nor are they correlated with sensory motor functioning. Any comprehensive battery should include tasks that do attend to these domains. Our data suggest that the fine motor control and the monitoring tasks combined with the pitch-pattern sequence test probably add sufficient scope to the battery to accomplish this end. The Pitch-Pattern Sequence task can be acquired inexpensively, and is relatively easy to administer to a child and could be attached to any battery involving the CANTAB with little difficulty. The fine motor control and monitoring and vigilance tasks are more complex, involving separate computer programs and apparatus that must be calibrated. Including these promising tasks in a final battery would be enhanced with some attempt to integrate them into the same computer used to run the CANTAB. Following White and colleagues (1994) suggestion, we recommend that the battery be modified to include these measures when either language processing or motor control functions are hypothesized to be affected by specific exposures
The electrophysiological endpoints were the least promising components of the battery based upon their application in this study. Only one or two auditory and visual endpoints showed sensitivity and specificity that was highly predictive of any of the three risk status categories tested in this study. Our results appear similar to findings for BAERs reported by Majnemer & Rosenblatt (2000), where predicting developmental outcomes at school age among high-risk newborns was limited by false negatives. Cognitive evoked potentials, OAE and BAER measures proved feasible to administer in field settings, even if they are costly and somewhat cumbersome from the child’s point of view. However, we recommend that they be used as secondary tasks if investigational hypotheses predict effects on peripheral conductive or sensorineural hearing that in turn might influence information processing involving the auditory system. The behavioral measures associated with the cognitive evoked potentials, mainly involve a visual continuous performance task that as expected did detect difference in IQ and presence of learning disability and could also be added to the battery.
The battery was initially conceived of as a supplement to, not a replacement for more traditional tasks and tests, even though their sensitivity and specificity might not be known. The battery was also intended to supplement and serve as a second tier evaluation following assessments that used such tools as the PENTB and the NES or replacing them for use in research studies. The battery should prove superior to more traditional tests such as the IQ test, since it targets specific functions that may be more sensitive to subtle neurotoxic effects. Never-the-less, the battery should not be considered as a replacement for assessing global cognitive ability using an IQ test, especially in research studies when the expected deficits might be severe. However, administration of an IQ test requires considerable time and expertise.
Since our results indicated that the auditory measures are affected by hearing status, a basic battery should include tympanometry and audiometry. Although it was not necessary to screen our cohort for undetected visual loss (all of our subjects with corrective lens were tested with their eyewear), this variable may need to be controlled in other cohorts. Therefore, visual acuity should also be included in the battery. Snellen acuity can be ascertained rapidly and accurately at practically no cost.
It was disappointing to find that neither of the two tests or tasks in the battery designed to measure functioning in the somatosensory domain demonstrated high sensitivity or specificity in predicting any risk category. It may be that none of the three risk categories should have been expected to include somatosensory abnormality. Although it would be unlikely to vary with IQ, we did expect it to be associated with both LD and neonatal risk. It may also be that our measures of somatosensory functioning (visual spatial contrast sensitivity and scotopic visual form discrimination) were more dependent on visual functions, which might have been less likely to vary with any risk category. The issue of sensitivity and specificity of tests of somatosensory functioning should be investigated further, since such functions are thought to be influenced by some neurotoxicants and should be measured by the battery.
As noted in Tables 16, 17, 18, and 19, there was substantial overlap in the sensitivity and specificity of specific measures within the same test or task that our analysis indicated had high predictiveness of risk status. The absolute number of these overlapping measures was small, and the overlap within a particular risk category was also small. Our data also indicated that tests or tasks within a particular domain showed little overlap. These data do not seem to support pruning of various measures within a test or task, although this conclusion might change when the battery is applied to populations of children actually exposed to different neurotoxicants.
There are some limitations to this study. The cohort studied may have been biased. We were unable to ascertain whether differences were present between subjects in the study sample, and those who made up the sampling population, because of confidentiality of records. The subjects were solicited through the NCCC staff and their names were not turned over to us until they had accepted our offer to participate in the study. Thus, we do not know who refused. There were some subjects who gave initial consent but did not show up for testing. But they would not be representative of those who declined to participate in the beginning. We were also unable to review school records or retrieve interim medical histories. Differences between sample and population could have been systematic and therefore might have introduced unaccountable variance to our results. It is also an inevitable consequence of sampling clinical populations.
Generalizability of our findings will be established only through continuing research, especially on exposed populations. However, the tests and tasks domains studied in this project measured behaviors that are known to be affected by various neurotoxicants, and the domains represent a wide enough spectrum of human function that different parts of the battery, combined with other standardized tests, should prove useful in future studies, such as the National Children’s Study.
Expanded Testing for Visual and Auditory Attention
The External Advisory Committee observed that there may be unnecessary overlap among our measures of auditory attention. It was suggested that we might be able to simplify the battery if we adopted a single measure of continuous auditory performance and continuous visual performance. They suggested that we consider the Test of Variables of Attention (TOVA.), a computerized measure that evaluates attention and impulse control, to be given to subset of 294 children who participated in the study. The TOVA permits use of similar and comparable measures of visual and auditory attention and might prove easier to both administer and interpret than the measures used in the current study.
It was not possible to implement this recommendation during the course of main data collection. However, a research copy of the TOVA was obtained from the publisher and a small pilot study is underway involving children who participated in the original study. Fifty children are being recalled and will be administered both the Visual Attention and the Auditory Attention portions of the TOVA.
The visual portion of the test, depicted in Figure 20, requires that the child identify, by pushing a control button, the visual target stimulus each time it is presented. Two stimuli are given. The target stimulus is a square with a smaller square at the top. The non-target stimulus is a square with a smaller square at the bottom.
Fig. 22. Visual displays for the TOVA.
During the auditory portion of the test the child is presented with two auditory tones presented through external speakers. The target tone is a Middle G and the non-target tone is a Middle C.
The stimuli are presented for 100 msec. every 2 seconds. The target stimulus is presented infrequently (72 out of 324 presentations) during the first half of the test and frequently (252/324 presentations) during the second half of the test. Each child will require about 1.25 hours of testing. About two children will tested each day three to four days per week during after-school hours. Each child will receive a cash payment of $10 for his or her participation. Edna Young is undertaking the testing.
This project began in November 2003. Data collection should be completed by February 2004 and data analysis should require no more than six months.
The analysis plan includes measures of central tendency and variability and ROC curves for the three risk variables for the following endpoints:
- Errors of Omissions: reported as a percentage and representing inattention
- Errors of Commissions: reported as a percentage and representing impulsivity
- Response Time: a measure of processing time for correct target responses
- Response Time Variability: a measure of the subject's response time variance or inconsistency in response times.
- d': referring to accuracy of target and non target discrimination derived from Signal Detection Theory and can be interpreted as "perceptual sensitivity".
- ADHD Score: tells how similar the performance is to the ADHD profile
- Post Commission Response Time: a measure of time in milliseconds that the subject took to respond to a target immediately after a commission had been recorded.
- Anticipatory Responses: represent the subject's "guess" to the next pending stimulus and is considered a measure of test validity; and
- Multiple Response: sum of the multiple responses when the button is pressed more than once for a stimulus presentation.
The TOVA may serve to replace the original measures of visual and auditory attention. If the pilot is successful, its use would also eliminate one test (The Auditory Continuous Performance Task) which was given orally by an examiner, and substitute the TOVA, adding one more task that can be administered by a computer. The TOVA would also lend itself to serve as the behavioral task used during the recording of cognitive evoked potentials.