64 Hypothyroidism

KEY FACTS

• Scoring systems for the estimation of the probability of hypothyroidism based on the clinical picture have been developed. That of Billewicz rates symptoms according to their importance; others use a simpler count of the number of symptoms.

• The clinical assessment is important: it helps the physician to order thyroid function tests (TFTs) in those for whom they are appropriate, it allows a distinction between clinical and subclinical hypothyroidism, and it helps in the detection of situations in which the TFTs are misleading.

• A history of predisposing causes, the examination of the thyroid, the pattern of the TFTs and the thyroid autoantibody result usually allow the GP to diagnose the aetiology of the hypothyroidism.

A PRACTICAL APPROACH

* If the clinical picture raises even a small suspicion of hypothyroidism, check the TFTs. Most patients do not have the classical clinical picture. If an abnormal result is obtained, repeat the test after a wait of at least 2 weeks.

* If the free T₄ (FT₄) and free T₃ (FT₃) are low and the thyroid stimulating hormone (TSH) is raised, hypothyroidism is confirmed. If there are symptoms, it is overt, if not, it is subclinical.

* If the FT₄ and FT₃ are low and the TSH is normal or low, suspect secondary hypothyroidism. Consider referral for investigation of the hypothalamic/pituitary axis. However, a commoner cause, especially if only the FT₃ is low, is the failure of peripheral conversion of FT₄ to FT₃ seen in any serious illness and in old age.

* If the FT₄ and FT₃ are normal and the TSH is raised, suspect subclinical hypothyroidism. Check anti-thyroid antibodies to decide on the prognosis and need for surveillance.

* If the TSH is normal and the clinical picture is not suggestive of hypothyroidism: hypothyroidism is effectively excluded.

* If the situation fits none of the above, seek specialist advice.

It is tempting to think that the diagnosis of hypothyroidism is completely resolved by the accuracy of modern thyroid function tests (TFTs). Unfortunately, three factors complicate the diagnosis:

1. The typical clinical picture is uncommon; non-specific complaints may not suggest the possibility of hypothyroidism to the doctor.

2. Subclinical hypothyroidism is common, especially in older women. The finding of abnormal TFTs may be incidental. The clinical picture should also confirm the diagnosis, or at least offer no better explanation.

Page 417

3. The interpretation of TFTs can be rendered difficult by a number of factors, especially by two conditions which reduce the peripheral conversion of T₄ to T₃: serious concomitant illness and old age.

Prevalence

The prevalence of overt, as opposed to subclinical, hypothyroidism varies with age and sex as shown in Table 64.1.¹

Table 64.1 Prevalence of hypothyroidism

Population	Prevalence	95% confidence interval
Women aged 40–60	0.5%	0.3–0.7%
Women aged >60	0.7%	0.5–0.9%
Women aged 70–80	2%	1.5–2.8%
Men aged 40–60	0.5%	0.3–1.0%
Men aged >60	0.7%	0.4–1.2%

Clinical diagnosis

The clinical features of hypothyroidism are well described in medical textbooks. A review has summarised the likelihood ratios attached to individual features.² No single symptom or sign is diagnostic, with the highest LR+ being for coarse skin (5.6) and the lowest LR− for the absence of periorbital oedema (0.6) and of an enlarged thyroid (0.6). It is not clear how the various features can be combined to produce a post-test probability; the likelihood ratios cannot be applied sequentially because they are not necessarily independent of each other.

In attempts to provide an overall assessment, two types of scoring systems have been devised: one, the Billewicz score, which combines the history and examination³ (Table 64.2), and more recent scores based on the number of suggestive symptoms.

Table 64.2 The Billewicz score in the diagnosis of hypothyroidism³

The Billewicz score

In Billewicz’s patients, a score of +25 or over correctly identified 34 hypothyroid patients, gave no false positives, but missed 22 who had scores between −29 and +24. A score below −30 correctly identified 95 patients out of 162 as euthyroid, included none who were hypothyroid, but was no help in sorting out the 123 patients who had scores between −29 and +24, of whom 56 were hypothyroid (see Table 64.3 for the likelihood ratios and probabilities derived from these figures).

Table 64.3 The probability of hypothyroidism, according to the Billewicz score, at an initial risk of hypothyroidism of 5%

Score	Likelihood ratio (95% CI)	Probability of hypothyroidism
Score very high ≥+25	Infinity (14 706–infinity)	100%
Score very high ≥+25	0.4 (0.3–0.5)	2%
Score intermediate +10 to +24	7.3 (2.9–18)	28%
Score intermediate +10 to +24	0.5 (0.3–0.8)	3%
Score very low ≤− 30	0.0 (0.0–0.1)	0–0.5%
Score very low ≤− 30	2.4 (2.0–2.9)	11%

Follow each row from left to right to see how the score alters the probability of hypothyroidism.

The initial risk of 5% might be that of a 70-year-old woman with a facial appearance suggestive of hypothyroidism.

Be careful in interpreting Table 64.3. The post-test probability of 11% for a score that is above −30 includes those with scores from −29 to 0 and above. It is therefore less useful than the more precise probabilities given for scores of +10 to +24 and 25 and above.

Page 418

How to use the Billewicz score

The scores may be used in two ways: either as a guide to what are the most discriminating symptoms (feeling cold, dry skin and hoarseness) and signs (slow movements, coarse skin and a sluggish ankle jerk); or as a complete score from which the post-test probabilities can be calculated.

When applying the score to an individual patient, very high or very low scores make the pre-test probability irrelevant. The LR+ of a high score is so high that, even if the pre-test probability is hardly more than the general population (because, say, of a vague complaint such as fatigue), a score of +25 or above makes hypothyroidism virtually certain. Similarly, a score of −30 or below virtually rules it out. However, a score of −29 to +24 rules it neither in nor out. It is in these cases that the post-test probability matters, when it becomes the pre-test probability in the interpretation of laboratory tests.

However, two cautions should be noted:

1. Billewicz’s score was validated in secondary care where patients were scored either by inexperienced house physicians, or by more experienced registrars, in Glasgow or Aberdeen. There was complete agreement on only three-quarters of all scores. GPs might score patients differently.

2. Billewicz’s doctors performed better when using an information sheet which advised them how to score. That sheet is not published and so subsequent doctors cannot know how close they are to scoring the questions ‘correctly’.

Scoring systems based on symptoms alone

More recent studies have confirmed that a firm diagnosis, or the firm exclusion of the diagnosis, is rarely possible, based on symptoms alone.⁴ In one study, only 30% of cases had symptoms recognisable as thyroid-related and 17% of controls had at least one of the same symptoms.⁵ The older the patient, the less likely is it that typical symptoms will be present.⁶

However, what does emerge is that a simple count of the number of suggestive symptoms, regardless of which ones they are, is useful. The more symptoms the patient has, the more likely is the diagnosis of hypothyroidism.⁵

• A study from India found that more than five symptoms carries an LR+ of 19 while fewer than two symptoms and signs carries an LR− of 0.1.⁴

• A US case-control study⁵ analysed symptoms in more detail and divided them into two types:

current symptoms, of which only a hoarse voice, dry skin and muscle cramps were significantly more common in patients with hypothyroidism than in controls; and

symptoms which have changed over the last year, of which 13 were identified as significant. In decreasing order of usefulness they are: a voice that is deeper or hoarser, finding mental arithmetic more difficult, drier skin, puffier eyes, more often constipated, feeling colder, memory worse, thinking slower, muscle cramps more often, muscles weaker, more depressed, more tired.

Page 419

Menstrual changes and hypersomnia were not significantly more common in cases than controls and nor was coarser hair (because too few cases reported it for the difference to reach significance).

From this study, the most useful likelihood ratios proved to be those related to a simple count of symptoms. A change in seven or more symptoms had an LR+ of 8.7 (95% CI 3.8 to 20) and an unhelpful LR− of 0.7 (95% CI 0.6 to 0.8) while a change in three or more symptoms had an LR+ of 2.8 (95% CI 2.0 to 4.1) and an LR− of 0.5 (95% CI 0.4 to 0.7).

Summary of the implications of the clinical examination

If the clinical picture of hypothyroidism is strong (Billewicz score ≥+25), the correct diagnosis is very likely to be hypothyroidism. If the clinical picture is strongly against hypothyroidism (Billewicz score −30), the patient is almost certainly not hypothyroid. However, if the clinical picture is less clear (−29 to +24) the diagnosis could go either way.

If symptoms only are assessed, without, as in the Billewicz score, the examination being included, a score of 7 or more suggestive symptoms which have changed in the last year shifts the probabilities usefully in favour of the diagnosis while a low score shifts the probability only slightly against it.

The value of the history and examination is not in making a definitive diagnosis but in assisting the clinician to gauge the pre-test probability before performing thyroid function tests.

Does a family history of thyroid disease alter the probabilities?

The US study quoted above⁵ found that a family history of thyroid disease was present in 42% of those who were hypothyroid but was also present in 18% of controls. This gave an LR+ of 2.5 (95% CI 1.6 to 4.0) and an LR− of 0.7 (95% CI 0.6 to 0.9). A family history therefore increases the probability of hypothyroidism slightly and its absence reduces it slightly. The value of the question lies more in gaining an understanding of what the patient’s experience of thyroid disease might be, rather than in assisting with the diagnosis.

Thyroid function tests

* Check the TSH. If the clinical suspicion of hypothyroidism is strong, check the free T₄ at this stage to save time.

A third generation TSH has a sensitivity and a specificity of 99%,⁷ giving the likelihood ratios and probabilities at different levels of initial risk shown in Table 64.4.

Table 64.4 The probability of hypothyroidism, according to the TSH

Initial risk of hypothyroidism	Likelihood ratio	Probability of hypothyroidism
1.5%	99	60%
1.5%	0.01	<0.1%
50%	99	99%
50%	0.01	1%

Follow each row from left to right to see how the TSH alters the probability of hypothyroidism.

The initial risk of 1.5% is that of the 70-year-old woman. A risk of 50% is that of a woman aged 70 with a Billewicz score of 20.

Note that the lower initial probability means that a raised TSH supports the diagnosis but does not prove it, while a normal result rules it out. Conversely, the higher initial probability means that the TSH rules the diagnosis in, if raised, but cannot totally rule it out if normal.

Caution: the TSH can be depressed by concomitant illness and is occasionally raised during recovery. During concomitant illness, its sensitivity remains high at 99% but its specificity falls to 95%.⁷

* Check the free T₄ and free T₃. Free T₄ is 90% sensitive and 90% specific, falling to 60% and 80% in a patient with concomitant illness, and free T₃ is 97% sensitive and 97% specific but becomes useless in a patient with concomitant illness (see box below).

Understanding TFTs

The classic feedback loop. Put simply, thyroid function is controlled by the pituitary production of TSH, which in turn is regulated by feedback to the hypothalamus and the pituitary of plasma T₄ and T₃. If the thyroid gland fails, plasma T₄ and T₃ levels fall and so plasma TSH rises. In mild disease, this rise may return T₄ and T₃ levels to normal. If the hypothalamus or pituitary fails, free T₄, T₃ and TSH fall, or, at least, TSH is inappropriately low (while possibly within the normal range) for the low T₄ and T₃.

Sick euthyroid syndrome. Serious concomitant illness, and old age, can disturb this picture by reducing peripheral conversion of T₄ to T₃. Certain drugs; lithium, amiodarone, non-selective beta-blockers in high dosage and corticosteroids, can do the same. Free T₃ may be low but the patient has no thyroid disease; and TSH and free T₄ may be low because the failure of conversion of T₄ to T₃ in the hypothalamus and pituitary reduces TSH secretion.

Total T₄ and T₃ are unreliable because they are dependent on the concentration of thyroid binding proteins, which are raised in pregnancy, use of oral contraception and other drugs.

Misleading elevation of the TSH. The TSH may be artificially raised due to:

(a) the presence of heterophile antibody or antibody to thyroid hormone; they interfere with the test

(b) drugs: amiodarone, sertraline, cholestyramine; to add to the confusion, amiodarone can also cause true hypothyroidism

(d) rare congenital defects which may only be detected in adult life because the patient is clinically euthyroid⁸

(e) adrenal glucocorticoid insufficiency

(f) renal failure⁹

(g) undertreated hypothyroidism where the dose of thyroxine has been increased in the last 8 weeks.

Page 420

What level of TSH is abnormal?

The normal range is usually quoted as 0.45 mU/L to 4.5 mU/L with a mean of 1.5 mU/L.¹⁰ However, these cut-off points are arbitrary. Patients at the upper range of normal may have subclinical hypothyroidism; indeed, those with a TSH >2.5 mU/L have serum cholesterol levels higher than those with low-normal values, suggesting that some of them are hypothyroid.⁹

What is the cause of the hypothyroidism?

The commonest causes are:

(a) autoimmune thyroiditis: this is the cause in 60–70% of patients with hypothyroidism

(b) previous surgery or radiotherapy: 20–30% of cases of hypothyroidism fall into this category.

Other causes are uncommon or rare:

(a) drugs: amiodarone or lithium; anti-thyroid drugs, cytokines (e.g. interferon alpha)

(b) congenital hypothyroidism

(d) iodine deficiency (though this is a common cause in areas of iodine deficiency)

(e) thyroid blocking substances in the diet (e.g. brassicas and cassava)

(f) The transient hypothyroidism of viral thyroiditis, or postpartum thyroiditis

(g) Riedel’s thyroiditis

(h) secondary to hypothalamic/pituitary failure.

Deciding on the aetiology

* Check that the TFTs suggest primary hypothyroidism (i.e. that the TSH is raised). If not, refer for suspected hypothalamic/pituitary failure.

* Check that there is no history of surgery or radiotherapy to the neck.

* Check that the patient is not taking drugs that could suppress thyroid function.

* Check that the patient is not postpartum.

* Examine and refer if the thyroid is tender (acute thyroiditis), or hard (Riedel’s thyroiditis).

* Send blood for anti-thyroid antibodies. They are almost always present in autoimmune thyroiditis, i.e. the sensitivity is almost 100%. However, they are not very specific: they are present in 10% of the healthy female population (giving LR+ 10 and LR− 0.01) and 2% of the male population (giving LR+ 49 and LR− 0.01). This means that they are very useful in confirming a diagnosis already reached, as above. They should not be used to make a diagnosis of autoimmune thyroiditis in a patient who is euthyroid or in whom the clinical picture suggests another cause.

Page 421

Subclinical hypothyroidism

Definition.

Elevated TSH with a normal FT₄ in a patient with no clinical evidence of hypothyroidism. The elevated TSH should not be due to one of the causes in the box on p. 420.

Prevalence.

This is high and rises with age. A US study found that 4.6% of the population, thought to be free from thyroid disease, had a raised TSH; in 4.3%, the hypothyroidism was subclinical.¹¹ In the UK, the Whickham study found a similar prevalence.¹² In those aged 65 and over, the prevalence is triple (1.7% overt, 13.7% subclinical).¹¹

Of those with subclinical hypothyroidism, 2–5% progress to overt hypothyroidism each year.¹⁰ The risk is higher in those with higher TSH levels and in those with anti-thyroid antibodies. A therapeutic trial of thyroxine is needed to see whether any symptoms are due to the condition or are coincidental. About 20% of patients report subjective improvement. If thyroxine is not given they need regular surveillance; only in 5% does the TSH revert to normal.¹⁰

Example

A 72-year-old woman is found to have atrial fibrillation. Thyroid function tests are performed but, far from showing hyperthyroidism, they show a raised TSH and FT₃ and FT₄ that are borderline low. Her GP has to decide whether this is overt or subclinical hypothyroidism.

He questions her along the lines of the Billewicz score but finds himself unable to score her answers. She says she ‘doesn’t sweat and never has’, her skin has ‘always been dry’ she’s ‘always hated the cold’, she’s deaf ‘but isn’t everyone my age?’ etc. He decides to use the US criteria in which only a change in symptoms counts as positive. This works better and her score is only 2 (she is more tired and her memory is worse).

He calculates that this lowers the probability of hypothyroidism from the baseline of 2% for her age to 1% (Fig. 64.1). Examination adds nothing and he concludes that her hypothyroidism is subclinical.

Figure 64.1 The probability of overt hypothyroidism after a score of 2 in the example below.

He presents this patient’s case to his partners at a clinical meeting and they question whether the formal scoring was worthwhile. He argues that it was. Without it he would have worried that her tiredness and deterioration in memory might be due to hypothyroidism. He might have been tempted into a trial of thyroxine and a positive placebo response might have meant that she was inappropriately treated long-term. As it is, he will monitor her without treatment.

Page 422

REFERENCES

1 Helfand M, Redfern C. Screening for thyroid disease: an update. American College of Physicians. Ann Intern Med. 1998;129:144-158.

2 McGee S. Evidence-based physical diagnosis. Philadelphia: Saunders, 2001.

3 Billewicz W, Chapman R, Crooks J, et al. Statistical methods applied to the diagnosis of hypothyroidism. Q J Med. 1969;150:255-266.

4 Indra R, Patil S, Joshi R, et al. Accuracy of physical examination in the diagnosis of hypothyroidism: a cross-sectional, double-blind study. J Postgrad Med. 2004;50:7-10.

5 Canaris G, Steiner J, Ridgway E. Do traditional symptoms of hypothyroidism correlate with biochemical disease? J Gen Intern Med. 1997;12:544-550.

6 Doucet J, Trivalle C, Chassagne P, et al. Does age play a role in clinical presentation of hypothyroidism? J Am Geriatr Soc. 1994;42:984-986.

7 Dolan JD, Wittlin SD. Hyperthyroidism and hypothyroidism. In: Black ER, Bordley DR, Tape TG, Panzer RJ, editors. Diagnostic strategies for common medical problems. 2nd edn. Philadelphia: American College of Physicians; 1999:473-483.

8 Dayan CM. Interpretation of thyroid function tests. Lancet. 2001;357:619-624.

9 Roberts C, Ladenson P. Hypothyroidism. Lancet. 2004;363:793-803.

10 Surks M, Ortiz E, Daniels G, et al. Subclinical thyroid disease: scientific review and guidelines for diagnosis and management. JAMA. 2004;291:228-238.

11 Hollowell J, Staehling N, Flanders W, et al. Serum TSH, T4, and thyroid antibodies in the United States population (1988–1994): National Health and Nutrition Examination Survey (NHANES III). J Clin Endocrinol Metab. 2002;87:489-499.

12 Tunbridge W, Evered D, Hall R, et al. The spectrum of thyroid disease in a community: the Whickham survey. Clin Endocrinol. 1977;7:115-125.