Reply to Letter to the Editor: Age Assessment by Demirjian’s development stages of the third molar: a systematic review.

by Veslemøy Rolseth, Pål Skage Dahlberg, Øyvind Bleka and Gunn Elisabeth Vist

Age assessment by Demirjian’s development stages of the third molar: a systematic review

To the Editor,

Thank you for the opportunity to reply to the Letter to the Editor by Roberts, McDonald and Lucas about our systematic review: “Age assessment by Demirjian’s development stages of the third molar: a systematic review” in European Radiology 2018 [1]. In their letter, Roberts et al. claim that our review is too limited and in some ways misleading as to the value of Dental Age Estimation.

Our systematic review on Demirijan’s stages of third molar is a part of our work to systematically review the most commonly applied biological age assessment methods. We chose Demirjian’s stages since we found the largest number of studies on this grading system. Our aim was to analyse whether studies showed heterogeneity between populations, and, if possible, combine studies into estimates of age from each development stage. Results given as mean and variation of chronological age within stage was by far the most prevalently used, and it was chosen as the outcome measure of the meta-analysis. In the process, the problem with this way of displaying results became obvious to us: the results mimic the age structure of the reference sample. While several researchers in the field share our concern with this bias, it seems like Roberts et al. do not.

Since the ages of the unaccompanied young asylum seekers often are unknown, we are puzzled by the statement in the Letter by Roberts et al. “The majority of subjects for whom DAE is required are young males who are perceived as adults and who, wishing to take advantage of the social support for children or minors, persist in the claim of being under eighteen years old”. We have not seen any documentation that proves they are definitely “adult young males”. We ask Roberts et al. to reference their source.

We agree that stage H as the final stage is most relevant to the 18 year threshold, and this is also written in the discussion of our paper. We have not in any way claimed that stage G is used to determine the threshold of eighteen years. The aim of our publication was to review the mean age of all Demirjian’s stages for the lower left third molar. Stage G is presented in the text of the paper. However, all the other stages, including stage H, are shown in figures in the Supplementary material.

We find the reference to the English Immigration Courts by Roberts et al. irrelevant, since criteria in different countries undoubtedly differ. We describe the distributions and the evidence according to scientific criteria.

In their critique of Figure 3 in our systematic review, we believe Roberts et al. illustrate where they have misunderstood the age mimicry bias concept: We do not claim that the “number of subjects in each age bands of a stage” is evidence of age mimicry. “Age mimicry” is a selection bias, so one obviously needs to look at the initial bias of the included reference population, and not the results given as age from stage. The data in Figure 3 is taken from Lee et al. [2] which has almost equal numbers in each age group of their reference sample and thus is a well conducted study (avoiding age mimicry) leading to the results that Roberts et al. describe as a normal distribution. If the study had included e.g. four times the number of 15 years old individuals compared to 16 years old in their reference sample, this distribution would be different with relatively more individuals of the age group 15 falling into stage F. Hence, the distribution of stage F would be skewed by the age distribution of the reference sample. This would have given a biased estimate for stage F and is an example of age mimicry.

It is not correct that we could have performed the truncation exercise in Figure 3 with any study. We could not have done this with a study only presenting mean age within stage. However, we chose one of the best designed studies we could find to illustrate the point, which was Lee et al. This study shows the number of persons from each age group falling into each stage and because of this frequency table (Table 4 in Lee et al. [2]) we were able to visualise how truncation of a data set affects the results.

Roberts et al. incorrectly claim that we do not explain how the data used in Figure 3 was extracted. It is clearly written in the article (page 7): “Figure 3 exemplifies the effect of the included age range in the reference sample by using data from this study (Table 4, page 158, data for tooth 38 in males [13])”. Hence, it is clear that it is data from males that is extracted. The left side of table 4 in Lee et al. (2009) gives data for the tooth 38 in males and is in accordance with our Figure 3. The difference between the two tables is that we have left out the age groups 7 and 8 years. Otherwise, the numbers in the two figures are exactly the same. It is worth adding that the figure is only to illustrate a point, so it would not be sensitive to the exact nature of the extracted data. We have not in any way meant to do any disservice to the excellent study by Lee et al. and are truly sorry if this is the impression we leave behind. We have tried to emphasize how this study is one of the best examples of a well conducted study in terms of portraying an unbiased distribution of age within stage.

Roberts et al. claim we have “failed to acknowledge what was essentially the same process from a prior publication [6].” This publications seems to focus on the upper age limit (the age prior) chosen in order to give an adequate probability distribution for stage H [3]. The same paper also suggests removing individual data outside 3 times standard deviations from the mean within a stage. It is difficult to fathom how this paper represents any similar “process” compared to our publication.

To the criticism of our choice only to include peer-reviewed publications, it is clear from our inclusion criteria that we only assess peer-reviewed papers for our meta-analysis. We recognise that it is a potential weakness that some data sets may be left out. However, we also acknowledge the quality control/ quality enhancement that the peer-review process contributes. The link provided by Roberts et al. to “several large data sets”, most of them using the Demirjian’s stages, also includes data sets published in peer-reviewed journals. There are other data sets presented without references (maybe not published elsewhere?) and providing limited information about how the data and participants were selected, how the method/ assessment was conducted etc. Hence, an informed risk of bias assessment of these data sets would be challenging.

Roberts et al. continue: “The failure of the authors to recognise the logically limiting effect of Censoring [6] results in incorrect estimates of the mean values used in Figure 5”. This statement is confusing and they must have misunderstood the figure. Please note that the mean estimates of Demirjian’s stages in Figure 5 are not our calculations, but the authors of the studies’ own estimates. The purpose of this figure is to show how the mean estimates are skewed in the direction of the reference sample of the study.

To clarify this point we can give an example from Figure 5: Mean age of stage C cannot be estimated to be under 15 years old when the reference sample only includes individuals from 15 years and up (as in Lopez et al. [4]). Hence; the study by Lopez et al. and Lee et al. cannot be compared.  But the study by Lee including lower ages probably shows a less biased measure of age within stage C.

Roberts et al. claim: “Because the samples are appropriately derived the mean and stand deviation represent the average age of the characteristic and the variation in the population. This is not bias, it is inferential statistics, plain and simple! ”

As to the representativeness of the samples included in the studies reviewed, we do agree that the summary statistics from the studies represent the information from the reference sample of the study, and only for that single reference sample. However, the problem arises when you compare studies and/or use the results for age estimation of individuals from other populations (like young asylum seekers). We know that such studies have been and are still in use in age assessment based on third molar development in several European countries. In order to compare studies and/or use them in age estimation of individuals with unknown age it is of utmost importance to have a reference sample with an even age distribution  and with a proper age span covering all ages that can possible fall into the described stages.

Roberts et al. criticise our recommendation of a Bayesian approach to age estimation by referring to the paper of Patrick Thevissen from 2010 [5]. Roberts et al. claim: “This paper unequivocally demonstrates that there is no advantage in using Bayesian statistics when compared to conventional regression methods”. However, Thevissen et al. clearly states that since the performance of an approach might be related to age, they use a reference sample with a uniform age distribution for the comparison of the two methods. It is also worth pointing to what is written in the conclusion of the paper: “Although the Bayesian model does not outperform the classical approaches, it allows a more appropriate discrimination of subjects being older than 18 years and produces more meaningful prediction intervals” [5].

The reason why we suggest a Bayesian approach is to take into account that the age distribution of a study may not be uniformly distributed. This can be handled by training a model which is not affected by the age distribution, and in the second step the prior age distribution can be assumed specifically for a tested individual.

We want to stress that many researchers in the field of dental age estimation are most aware of the age mimicry bias. Below is a paragraph from the introduction of a paper by Tangmose et al. where Patrick Thevissen is the second author [6].

The majority of reference studies investigating dental age report the correlation between chronological age and the stages of third molar development as means and standard deviations for the third molars individually: upper right (UR), upper left (UL), lower left (LL), and lower right (LR) [6], [7], [8]. Unfortunately, mean ages are affected by the age composition of the reference sample. This bias is known as age-mimicry [9]. Although small, when used for age estimation purposes this bias may affect whether the examinee is assessed as a child or an adult. For example, the mean age at stage R3/4 in UR has been reported as both 17.0 years and 18.3 years [8], [10]. Thus, simply taking the mean of means or using discrete age intervals may become attractive, although actually methodologically wrong”.

As a concluding remark we would like to add that age mimicry is by no means restricted to the field of dental age estimation. We found biased age estimation studies based on (I) skeletal age of the hand by the Greulich & Pyle atlas [7], (II) ossification stages of the medial clavicular epiphysis [8] and (III) ossification stages of the knee [9]. By focusing on this rather obvious selection bias we hope that more unbiased studies can be available to the research community, and decision makers such as in a court of law, in the future.

Meanwhile, biased studies keep being published. One recent example is by Jayaraman & Roberts [10], in which they make a direct comparison of age within stage between a Caucasian and Chinese population. These two populations consist of reference samples of different age structure and should from our perspective not be directly compared in the manner they are due to age mimicry bias.