Letter to the Editor: Age Assessment by Demirjian’s development stages of the third molar: a systematic review.

by Graham Roberts, Fraser McDonald, Victoria S Lucas

Age assessment by Demirjian’s development stages of the third molar: a systematic review

Dear Editor,

We read with concern the systematic review of the above title. After reflecting on the content of the review we arrived at the conclusion that the review, by limiting itself to Tooth Development Stages (TDS) described by Arto Demirjian [1] is too limited and in some ways misleading as to the value of Dental Age Estimation (DAE) when assessing the age of young subjects without birth records.
The majority of subjects for whom DAE is required are young males who are perceived as adults and who, wishing to take advantage of the social support for children or minors, persist in the claim of being under eighteen years old.

The first concern is the use of Demirjian’s TDSs in isolation. It is difficult to comprehend why the penultimate Stage G has been used. The most relevant at the 18 year threshold is stage H which is the final stage of development.

It is helpful to look at the data from three of the studies used by the systematic review team.

As can be seen from the last column on the right none of the probability values indicate that the subject is definitely over 18 years old – this affirmation would require a probability of 1.00. This stringent criterion is the standard required by the English Immigration Courts. In practical terms this rules out using third molars as the sole criterion at the 18 year threshold. Although the systematic review team have focused on Stage G, presumably as an illustration, Stage H is of much greater importance in relation to Dental Age Estimation in this context.  As regards subjects who may be under eighteen years old it is important to include second permanent molars.  The limitation of presenting the data for Stage G only raises the question of how this single tooth can justifiably be used to perform DAE.

Within the systematic review the section on truncation is inappropriate. This ‘false’ procedure is something that can easily be illustrated with any data set. This can be perceived from Figure 3 of the review.  The left side of figure 3 displays data for TDSs from A to H by one year age bands. There is no explanation as how the data were extracted from the paper quoted [5]. A rereading of this paper from Korea reveals that it is impossible for the readers of the systematic review to see how this extraction of data has taken place.  What is worse is there is no indication as to the gender of the individuals for whom the data in Figure 3 refers. Is the data pertaining to males or females, or a combination of both?  We have tried to relate the calculations to the total of all the subjects used in Figure 3 (n = 771). The tables provided in the original paper for the Lower Left Third Molar (tooth 38) as Table 4 give totals of 786 for males and 964 for males.  It is impossible to relate the figures in the systematic review in Figure 3 to Table 4 in the original paper from South Korea [5], figure 3 in the systematic review. In presenting the data in the way that they have the systemic reviewers have done a disservice to the Korean research workers. It may be argued that this is presented only as an illustration of how data truncation can alter the values for the output data. This is an inappropriate use of the Korean data in this context as it could easily be illustrated with any data set. Surely this is at the ‘undergraduate’ level of data management. The authors have failed to acknowledge what was essentially the same process from a prior publication [6].

The graphic representation of ‘Truncation’, the right side of Figure 3, illustrates clearly what happens if you cut off large amounts of data i.e. participants from 9 to 23 years, and subsequently by cutting off a year at the lower end and a year at the upper end successively down 15 to 17 years. It is not surprising that this changes the mean or average value and the extent of the ± 1 sd. This is a wholly artificial procedure. There is not one single research paper in DAE that has performed this type of misleading procedure with the intent of carrying out DAE.

A further concern is the way in which the authors have selected the papers for detailed scrutiny. They have limited the search to ‘…potentially eligible publications…’. This was a mistake of restriction as there are several large data sets published which provide detailed information on the data of all the Demirjian TDS over several different ethnic groups [2]. The limitation to journal publication searches combined with the failure to explore references within the selected bibliography reference lists has resulted in a serious shortfall of the data available. The failure of the authors to recognise the logically limiting effect of Censoring [6] results in incorrect estimates of the mean values used in Figure 5 (a). (b). (c), and (d). The reason for this were put to the authors and the response was ‘… we consider censoring of data inappropriate.’ This is unsatisfactory as it is easy to see how uncensored data leads to the inclusion of redundant data.

The authors introduce the so-called Quality Assessment with the introduction of the 95% Confidence Interval, and the 95% Prediction Interval.

The underlying belief in this paper is that the data sets chosen by the authors demonstrate ‘… the high risk of age mimicry bias …’  It is contended that this is again an inappropriate and therefore misleading interpretation of the data from the papers quoted. The figures produced in the papers are based on carefully drawn samples. These samples are from large clinical radiographic archives. Although they are not true random samples it is the view of Forensic Odontologists that the subjects (dental patients) are physically and developmentally normal. Because of the large numbers available for study, it is believed that the samples used for DAE are reliable and representative of the populations from which they are drawn. They are act as robust surrogates for a randomly drawn sample from the populations studied. The claim by the authors of this review that they demonstrate ‘…age mimicry bias… ‘ is factually incorrect and misleading.

One of the supporting elements of DAE research is that many authors have validated their methods by conducting masked DAE of subjects of know age. These demonstrate unequivocally the validity of the RDSs used [8-11]. This is sound biological, clinical, and statistical practice [7] and is the appropriate approach to derive a representative sample from a population and use the information obtained to make inferences about all the individuals from that population. Such carefully chosen samples yield reliable answers. The claim that the samples chosen by the authors of this systematic review are biased is not sustainable. All the authors’ of the studies used by the reviewers have taken care to ensure the samples are representative of the populations from which they are drawn.

The use of one sd coupled with the 95% Confidence interval for each of the stages (Figure 5 (a), (b), and (c)) demonstrates clearly what is well known in Dental Age Research and development of third molars. The mean Age at Assessment of the Demirjian Stages of LL8 increase steadily from Stage A through to Stage H. This is clearly apparent for 5 (a), 5 (b), and 5 (c). This is not novel and is consistent with the many studies on third molar development assessed using Demirjian TDSs.

The claim that the number of subjects in each age band is evidence of ‘…age mimicry bias…’ is misleading. The underlying presentation of the data in DAE samples is complex. The Or category of a given stage (A to H) will vary across different age bands. This is because of the underlying natural variation of tooth development. This can be seen in Fig. 3 of the systematic review in, for example, stage F. As can be seen the number of patients demonstrating stage F at 13 years is only 1, rising to 5 at 14 years, 27 at 15 years, 24 at 16 years, 22 at 17 years, 15 at 18 years, 3 at 19 years, and 1 each respectively at 20 years and 21 years. Looked at from a different viewpoint it is graphically reminiscent of a Normal distribution.

The difficulty is that it is possible to have evenly balanced numbers in specified age bands However, as is shown in the left side of Figure 3 in the systematic review the numbers of TDSs vary. This is the classic presentation of the data for DAE studies. It is a normal and robust sign of an appropriate reference population from which a representative population has been drawn.

The claim that the samples for third molar studies represent an ‘age mimicry bias’ is incorrect. Clearly the summary statistics from these samples represent the information derived from the sample.  Because the samples are appropriately derived the mean and stand deviation represent the average age of the characteristic and the variation in the population. This is not bias, it is inferential statistics, plain and simple!

It is helpful to try and understand the reasons for the approach adopted by the Norwegian systematic review team. The implication would appear to be based in the source of some of the references offered in the bibliography. These originate with a paper published on Paelodemography [12].  and subsequent conference proceedings published in the form of a book [13.] The underlying issue with the paelodemographic methods is that investigators are forced to rely on samples for which the true age is not known. This is expressed as ‘… despite the absence of a reliable method for estimating the age distribution for the adult skeletons.’ This brings to the fore the unbridgeable difference between the approach of Anthropologists, obliged to work with incomplete and varied specimens where, it is acknowledged that the methods of age estimation are not reliable, and the straightforward approach used by forensic odontologists who use clinical records from living subjects for whom the exact age is available – the date of the radiograph from which is subtracted the date of birth. The Chronological Age (CA) is the Gold Standard by which forensic odontologists determine the reliability of their Reference Data Sets. Paelodemographicologists do not appear to have had that luxury and have been forced to use an oblique approach using complex and sophisticated statistical routines. Dental Age Estimation (DA) of the living is, by comparison a simple procedure and is based entirely on the comparison with the Gold Standard of chronological age.

It is difficult to see how the statistical contortionism so carefully expounded [13] relates to what is, in reality, a relatively straightforward practical problem.

One of the consequences of Forensic Age Estimation is the need to justify the age estimation provided to a court of law. Is it possible to imagine the confusion in the mind of a judge when faced with an equations similar to the above (10.12) on the grounds that this estimates the ‘…target age-at-death distribution’. It is incomprehensible to non-specialists. It is impossible to visualise such mathematical thinking working to provide an age estimate of a living subject of, for example, 17.5 years. The intellectual capabilities of even the most mathematically minded judge would be tested beyond comprehension if such evidence were placed before the court.

The authors of the systematic review have attempted to promote the use of Bayesian statistics in age estimation by quoting modern papers. They have failed to draw attention to the principle outcome of the excellent paper by Professor Patrick Thevissen [14]. In this paper Professor Thevissen compared outcomes from “‘linear and polynomial statistical regression analysis and a newly constructed Bayesian model …’  … both models provide similar accuracy.” This paper unequivocally demonstrates that there is no advantage in using Bayesian statistics when compared to conventional regression methods.
Perhaps the most disappointing aspect of the systematic review is that the authors have confined themselves to the electronic searches of journals whilst ignoring the large amounts of reference data available through reference list searches and web based searches.

It reveals the inherent weakness of Systematic reviews that such reviews are only as reliable as the care and thoroughness of the procedures followed by the reviewers.

An important part of the process of DAE is the ability to communicate with the clients, their representatives, and the adjudicating authorities.

On that issue alone the reviewers have struggled.