In 2002 when the Task Force treated
women over the age of 39 as a homogeneous group, it recommended all women 40
years and older be screened every one to two years.
In 2009
when it reviewed screening for women of different ages, it recommended women
age 50-74 be screened biennially rather than annually, advised against routine
screening for women age 40-49, and because it felt the data were scanty, made
no recommendation for women 75 and older.
The public backlash to the new
recommendations was fierce and immediate.
In response, the Task Force changed one--- its recommendation against
routine screening of women age 40-49. The revised recommendation reads: “The
decision to start regular, biennial screening mammography before the age of 50
years should be an individual one and take patient context into account,
including the patient’s values regarding specific benefits and harms.” As explanation, the Task Force added the
following footnote to its website: “On December 4 2009, the USPSTF unanimously
voted to update the language of their recommendation regarding women less than
50 years of age to clarify their original and continued intention.”
I
reviewed the Task Force’s research. I
agree with some, not all, of the recommendations.
I chose
to focus on three sections of the Task Force’s research: (1) the data obtained
from the Breast Cancer Surveillance Consortium, (2) the predictions by six
models of reduction-in-mortality rates for twenty different screening
strategies, and (3) the results of the decision analysis used to decide
between routine annual and biennial screening. I think the three provide enough
information to understand how the Task Force arrived at its recommendations.
The Breast Cancer Surveillance Consortium Data
The Consortium is part of the National Cancer Institute. It consists of a network of seven mammography
registries linked to tumor and pathology registries and a central Statistical
Coordinating Center. The Consortium supplied the Task Force with information for
600,830 women age 40 years and older who had at least one screening mammogram
every two years between 2000 and 2005. If a woman had two or more mammograms
during that period, one was randomly selected.
The Consortium data in the first table below
provide information about screening mammography for the following five
age groups: 40-49, 50-59, 60-69, 70-79, and 80-89 years. The table is like a
composite photograph of screening mammography for women of different ages.
The top third of the table contains the rates
(per 1000 screening mammograms) for invasive breast cancer (cancer that’s moved
outside a duct), DCIS (cancer that’s fully enclosed by duct), and the number of
false-positives and false-negatives. The middle third contains the number of
patients (per 1000 screening mammograms) undergoing: (1) mammography to diagnose
one case of invasive breast, (2) additional imaging to diagnose one case of
invasive breast cancer and (3) biopsy to diagnose one case of invasive breast
cancer. As you read across the rows, you can see how the numbers change as the
groups get older.
The bottom
third of the table contains the results of calculations I made for (1) the
number of true positives (the number of screen-detected DCIS plus the number of
invasive breast cancers), (2) the number of true negatives (1000 minus the sum
of the true positives, false positives and false negatives), (3) the
sensitivity, (4) the specificity and (5) the positive predictive value (PPV) of
screening mammography for each of the age groups. (In case you’ve forgotten,
true positives occur when radiologists detect breast cancer in women who have
it and false positives when the women don’t. True negatives occur when
radiologists don’t detect breast in women who don’t have it and false negatives
when the women do.)
Sensitivity
and specificity are the most popular measures of the effectiveness of
screening. Sensitivity is equal to the
number of true-positives divided by the sum of the number of true positives and
false negatives. It’s a measure of how well a method detects all those who have
a disease.
Specificity is equal to the number of true
negatives divided by the sum of the number of true negatives and false
positives. It’s a measure of how well a method detects all those who don’t have
a disease. According to the Task Force report, the sensitivity of screening
mammography generally ranges from 77% to 95% and the specificity from 94% to
97%. The sensitivity and specificity for the Consortium sample of women are
somewhat lower.
Positive predictive value (PPV) is equal to
the number of true positives divided by the sum of true and false positives.
It’s a measure of how well a method detects only those who have a disease and
reveals how likely radiologists’ suspicions of breast cancer will be true or
false.
For the Consortium sample, positive predictive
value ranges from 2.6% to 12.5%. In other words, 87.5% to 97.4% of positive
screenings turned out to be false alarms.
Mammography screening is not nearly as impressive looking when measured
by positive predictive value. It’s unfortunate that it’s reported so rarely. If
women knew how often radiologists’ suspicions of breast cancer are false positives,
they might feel a lot less anxious when asked to come back for additional
imaging and/or biopsies.
Age Group
Screening Result 40-49 50-59 60-69 70-79 80-89
Screening Result 40-49 50-59 60-69 70-79 80-89
False negatives 1.0 1.1 1.4 1.5 1.4
False positives 97.8 86.6 79.0 68.8 59.4
Additional imaging 84.3 75.9 70.2 64.0 56.3
Biopsy 9.3 10.8 11.6 12.2 10.5
Screen-detected
invasive cancer 1.8 3.4 5.0 6.5 7.0
Screen-detected
DCIS 0.8 1.3 1.5 1.4 1.5
# Patients undergoing
mammography 556 294 200 154 143
to diagnose one case invasive cancer
# Patients undergoing
additional imaging 47 22 14 10 8
to diagnose one case invasive cancer
# Patients undergoing biopsy 5 3 2 2 1.5
to diagnose one case invasive
cancer
True positives 2.6 4.7 6.5 7.9 8.5
True negatives 898.5 907.6
913.1 921.8 930.7
Sensitivity 72% 81% 83% 84%
86%
Specificity 90% 91% 92% 93% 94%
PPV 2.6% 5.1% 7.6% 10.3% 12.5%
The
data in the table show that the sensitivity and number of true positives for
40-49 year-old women are too low relative to those in the next older age group
( 50-59 year-old women). In addition, the numbers of patients undergoing
mammography and additional imaging to diagnose one invasive breast cancer for
women in their 40s are almost twice as high than for women in their 50s. The
differences between the two adjacent age groups suggest a discontinuity from
one developmental stage to another, analogous to leaving childhood and entering
adolescence. The changes between the two
groups do not resemble the smaller, more continuous-looking changes between the
four older age groups.
The
average age of menopause for U.S. women is 51 years. Most women in their 40s are
premenopausal. Pre- and post-menopausal breasts differ. Premenopausal breast
tissue tends to be dense. It may resemble and/or obscure breast cancer on a
mammogram. Tissue that appears to be breast cancer or may be hiding breast
cancers will invariably lead to too many false positives, additional images,
and biopsies. The Consortium data indicate screening mammography is less
effective and potentially more harmful for premenopausal women.
In
contrast, the data indicate that screening mammography is increasingly more
effective for postmenopausal women as they age.
The numbers of false positives, additional imagining, the numbers of
patients undergoing mammography /additional imaging/biopsy to detect one
invasive cancer, sensitivity, specificity and positive predictive value all
improve. The potential benefit of mammography screening, in fact, appears to be
quite good for women in their 70s and 80s.
The
Consortium data don’t link screening outcomes to treatment. We don’t know how women detected with breast
cancer fared during or after treatment. The six models, in contrast, do attempt
to link screening and treatment.
The Six Models
Models are used to make
predictions. They use available data. And, they make assumptions about the data.
The reality their predictions depict is less like a photograph and more like a cubist
painting.
Generally
speaking, models are tested against reality. We know how accurate their
predictions are and how much we can rely on them. The models with which we have
the most experience are probably those that forecast the weather.
In
previous collaborations, the six models estimated how much treatment or
screening each contributed to decreases in breast cancer mortality rate. According
to the Task Force, their “qualitative estimates” were “similar.” That’s not
saying much. It means their predictions lined up in the same order from high to
low, but the absolute values of their predictions didn’t match. It’s like having
a weather forecast that can reliably tell you it’s going to be warmer tomorrow,
but not what the temperature will be.
Each model was developed at a different cancer
center: Dana-Farber Cancer Institute, Boston; Erasmus Medical Center,
Rotterdam; Georgetown University, Washington, D.C./Albert Einstein College of
Medicine, Bronx; M.D. Anderson Cancer Center, Huston; Stanford University, Palo
Alto; and the University of Wisconsin,
Madison/Harvard, Boston. Their task was
to predict the percentage of reduction in breast cancer mortality associated
with screening vs. no screening for twenty screening strategies (ten annual and
ten biennial) beginning and ending at different ages (40-69 years, 40-79 years,
40-84 years, 45-69 years, 50-69 years, 50-74 years, 50-79 years, 50-84 years,
55-69 years, and 60-69 years).
The
models compared the probability of unscreened women dying of breast cancer to
the probability of screened women dying of breast cancer during their lifetime.
To estimate the probability of unscreened women dying of breast cancer from age
40 to death, the models used data gathered from a cohort of women born in 1960 being
followed from the age of 25 until their death. Estimates of the future
incidence of breast cancer were extrapolated forward using breast cancer
incidence data available in 2000. Using the data and extrapolations, the models
estimated 3% of unscreened women will die of breast cancer.
Thus,
if a model predicts 2.7% of screened women will die of breast cancer, the
probability of dying is 0.3% less than 3% and equivalent to a 10% reduction in
mortality rate. A 0.3% reduction in deaths (or 10% reduction in mortality rate)
is equal to about three fewer women dying of breast cancer per 1000
women screened.
I
averaged the reduction-of-mortality rate across the six models for each of the
twenty screening strategies and arranged them from highest to lowest reduction
in mortality rate in the following table.
Without exception the reduction in mortality rate is higher for each
annual screening strategy than its corresponding biennial screening strategy.
Age Screening Interval Reduction Mortality Rate # Mammograms Efficiency Rating
40-84 Annual 39.5 36,550 E (8th)
“ Biennial 31.8 18,708 E
(7th)
40-79 Annual 36.8 34,078 b
“ Biennial 29.2 17,241 b
50-84 Annual 35.0 26,905 b
“ Biennial 28.5 13,837 E
(6th)
50-79 Annual 32.7 24,419 b
“
Biennial 26.2 12,366 E
(5th)
50-74 Annual 28.7 21,330 i
“
Biennial 23.2 11,066 E
(4th)
40-69 Annual 28.5 27,428* i
“ Biennial 21.7 13, 831* i
45-69 Annual 26.8 22,546* i
“
Biennial 21.2 11,694* i
50-69 Annual 23.8 17,737 i
“
Biennial 18.3 8,947 E
(3rd)
55-69 Annual 19.5 13,009 i
“
Biennial 15.7 6,890 E
(2nd)
60-69 Annual 14.3 8,438 i
“
Biennial 11.0 4,263 E
(1st)
Mean Annual 29.0 23,244
“ Biennial 22.7 10,884 ________
The
models estimate annual screening will reduce mortality rate by an average of
29% and biennial screening, by 22.7%. For annual mammography that translates
into a 2.1% probability of dying from breast cancer (about 9 fewer deaths) and
biennial mammography a 2.3% probability of dying from breast cancer (about 7
fewer deaths). Thus, annual screening
would result in about two fewer deaths per 1000 women screened than biennial
screening. In terms of number of deaths averted (ignoring the harms of
screening), the benefit of biennially screening is about 77% of the benefit of
annual screening. (Using slightly different figures, the Task Force found
“screening biennially maintained an average of 81% of the benefit of annual
screening.”)
I don’t
trust the screening strategies save as many lives as the models predict given that
the average number of true positives in the Consortium data is 5.6 detected per
1000 women for one screening round or 11.2 detected per 1000 women per two
screening rounds. Biennial screening can’t possibly save more lives (7) than
the number breast cancers detected (5.6). And, although it’s not impossible,
it’s highly unlikely that annual screening saves 9 of the 11.2 breast cancers
detected.
I am
more inclined however, to trust that annual screening saves about two more
lives than biennial screening. A major advantage
of modeling is that one is able to select certain conditions, hold them
constant and apply them to different possibilities. That means exactly the same
data and assumptions were applied to each annual screening strategy and its
corresponding biennial strategy. Given they were perfectly matched, it’s highly
likely the rate of mortality is higher for annual screening and probably averts
two more deaths than biennial screening.
In an
earlier post I wrote that I believed early detection didn’t save lives. One reason is that I’ve always worried about
how many non-life-threatening cancers and non-existent cancers (biopsy errors)
contributed to the number of lives “saved by screening.” The result that under identical conditions annual
screening saves more lives than biennial screening indicates that more frequent
screening and by implication, early detection, might save lives.
Two
models (Erasmus Medical Center and the University of Wisconsin) explicating assumed
there would be cases of DCIS that would not be life-threatening and the
University of Wisconsin model assumed, in addition, some small invasive cancers
would not be life-threatening. The
Erasmus model’s predictions indicate annual screening would save an additional
two lives and the University of Wisconsin model’s predictions, an additional
three lives, again indicating that screening more frequently would save lives
and implying that early detection could save lives, even when
non-life-threatening breast cancers are removed from the data.
This is
the first research I’ve seen that directly addresses some of my doubts and I’m
beginning to believe that early detection might, in fact, save lives. However,
the only model (M.D. Anderson) that explicitly assumed a better prognosis for
screening-detected than for clinically-detected early-stage breast cancers predicted
the lowest average reduction-in-mortality rates (20.5 for annual and 21.7 for
biennial screening strategies) ---lower than the overall averages for both
annual and biennial screening strategies.
The Decision Analysis
The Task Force rated the screening strategies’
effectiveness on two dimensions simultaneously: (1) number of mammography
screenings they required (a measure of human and financial cost) and (2)
reduction in mortality rate (a measure of benefit). To do this, it borrowed and adapted a
decision analysis from economics.
A screening strategy was “efficient” if it had
a higher reduction in mortality rate and required fewer mammograms than
another. It was “inefficient”
(“dominated” in economics lingo) if it either had a lower reduction in
mortality rate or required more mammograms.
The strategies were classified into three
categories: (1) if a strategy was dominated by other strategies in five of the
six models, it was “inefficient;” (2) if was never dominated, it was
“efficient” and (3) in all other cases, it was “borderline.” The Task Force’s
categorizations---efficient (E),
borderline (b) and inefficient (i)
----for each of the twenty screening strategies are listed in the rightmost
column of the table above.
Seven
of the eight “efficient” screening strategies turned out to be biennial and six
of the eight initiated screening with women their 50s. These two results contributed to the Task
Force’s recommendations for biennial screening for women 50- to 74-years-old
and against routine screening for women in their 40s.
The large differences between the number of
screenings required and the small differences in reductions of rate of
mortality between screening strategies make it difficult to see if one of the
two dimensions (number of mammograms and reduction in mortality rate) trumped
the other.
I’ve
included in the rightmost column of the table above the order in which the “efficient”
strategies were listed in the Task Force’s table from top (1st) to
bottom (8th). The order from 1st to 8th is in
exact reverse order to reduction-in-mortality rate and in perfect corresponding
order with number of mammograms.
I also calculated
two correlation coefficients, one between the rank, from 1 to 20, of each of
the twenty strategies with its reduction-of-mortality rate and the other with
number of mammograms each strategy required. The correlation (0.22) with predicted reduction in mortality was too low to be
significant, indicating no association the strategies’ rankings and their
predicted reduction in mortality. The
second correlation (0.43) with number of mammograms, was significant, indicating a positive
association between the strategies’ rankings and the number of mammograms they
required.
It appears the decision analysis as adapted by
the Task Force did, in fact, favor the harms of screening (as measured by the number of
mammograms required) over the benefits of screening (as measured by the
reduction in rate of mortality). That’s
not good, but neither is it necessarily that bad.
A woman
who is recalled for additional screening and learns she doesn’t have breast
cancer may experience several days of additional anxiety. A woman recalled for biopsy may experience
greater anxiety for a longer period of time, pain and/or disfigurement. A woman
whose screening culminates in a true positive who doesn’t have breast cancer (a
biopsy error) or whose breast cancer will never be life-threatening may
experience life-long anxiety, life-long discomfort of lymphedema caused in an
arm by auxiliary lymph node extraction, recurring pain due to the severing of
nerves during lumpectomies or mastectomies and many other side effects of
treatment---all for no benefit whatsoever.
If we
could assign weights to the harms, our estimates of screening’s harms would be better.
But we can’t. The data don’t exist. And, since there’s no way to identify either
those women whose biopsy results were wrong or those whose cancers will never
become life-threatening, they will always appear to have benefited the most
from screening, when in fact they are the most grievously harmed.
Unfortunately, number of mammograms is the only
estimate of harm we have. We know the less often a woman is screened, the less she’s
likely to be harmed. And, depending upon how one would weigh the possible harms
if one could, a bias in favor of number of mammograms could be wrong---or
right.
The
Task Force’s recommendation against routine screening for women aged 40-49 is
not biased. Neither is its recommendation for women 74 years-old and older, although,
I think, given the Consortium data, it may be too conservative.
If you
follow the columns listing the number of mammograms and the reduction in rate
of mortality from the top row down of the table above, you can see that the
number of mammograms required for the screening strategies consistently
decrease in order with the decreases in reduction of mortality rate until you
reach the “40-69 years annual and biennial screening strategies” and ”45-69 years annual and biennial screening
strategies.” The number of mammograms
required for these strategies are out of line; they’re too high (see asterisks
in the fourth column of the table above).
That’s
not true however, when the screening strategy groups women in their 40s with
women in their 70s and 80s (See the four top rows of the table.) It appears the benefits of screening for
women over 70 may compensate for the extra mammograms needed for women in their
40s.
I compared
the predicted reductions-in-mortality rate for the four screening strategies
initiating screening beginning with women in their 40s and ending with women in
their 70s and 80s to those beginning with women in their 50s (e.g., annual
screening of 40-84 year-olds vs. annual screening of 50-84 year-old women,
etc.). When 40-year-old women are included a strategy, the probability of dying
from breast is 2.9% (about one fewer death per 1000 women screened). (Using a different analysis of the data, the
Task Force concluded that “greater mortality reductions could be achieved by
stopping at an older age than by initiating screening at an earlier age.”)
In
general, the models’ predictions for women in their 40s, 70s and 80s confirm what
the Consortium data suggest for women in these age groups--- screening appears
to be better for women in their 70s and 80s and worse for those in their 40s.
The results indicate the models do, to some extent, accurately reflect the
different realities for women in their 40s, 70s, and 80s with breast cancer.
Breast
cancers diagnosed in 40-year-old women are more likely to be aggressive; the
cells of their breast cancers are more likely to divide and proliferate more
quickly; and, the women are more likely to die of their breast cancer no matter
how soon their breast cancers are detected or treated. In contrast, breast
cancers diagnosed in older women are more likely to be indolent; the cells of their
breast cancers more likely to divide and proliferate more slowly; and the older
the woman, the more likely her breast cancer will be indolent and her treatment
successful.
The
models made many assumptions about treatment.
I’m going to discuss the two assumptions I know something about: (1)
that premenopausal women with hormone-receptor-positive breast cancers would be
treated with tamoxifen and postmenopausal women with hormone-receptor-positive
breast cancers treated with an aromatase inhibitor and (2) that patients would be
100% compliant with these treatments.
Tamoxifen
and aromatase inhibitors interfere with the growth and spread of breast cancer;
tamoxifen by attaching itself to estrogen-receptor-positive breast cancer cells
and acting as a barrier between the cells and estrogen; and aromatase
inhibitors by stopping the action of aromatase, an enzyme needed to manufacture
estrogen. Since most of the estrogen in premenopausal women is produced
directly by their ovaries and is not dependent upon the enzyme, aromatase
inhibitors can only interfere with the production of estrogen elsewhere, e.g. in
women’s adrenal glands or bones.
The
Food and Drug Administration approved aromatase inhibitors for postmenopausal
women, but many doctors prescribe them “off-label” for premenopausal women. That means the women must either have their
ovaries removed or take another potent drug to eliminate the estrogen being
produced by their ovaries.
They become
postmenopausal in a matter of weeks. Their hair thins. Their skin thins and dries
out, often making sex painful and unpleasant.
Some lose bone mass, a serious problem for women as young as 40. And many
suffer from bone, muscle and/or joint pain. Some take the drug irregularly to
deal with the side effects. Others stop completely.
The results
of trials comparing tamoxifen to aromatase inhibitors show that aromatase
inhibitors benefit some women by allowing them to live longer before their
breast cancer recurs, but it doesn’t prolong their lives overall.
Neither
of the models’ assumptions about the treatment of premenopausal women with
hormone-receptive breast cancer is realistic. Many premenopausal women are not
treated with tamoxifen and neither the women nor their doctors are 100%
compliant. Unrealistic assumptions about treatment undermine the reliability of
the models’ absolute predictions of the reduction in rate of mortality. In this
case, they may have contributed to their being too high.
That said,
I appreciate the model builders’ taking treatment into account; very little data
exists linking screening and treatment. It’s
as if an impenetrable door exists between the two. For example, the Consortium data don’t follow up
women who’ve been diagnosed with breast cancer. And, trials evaluating
treatment don’t report if participants have been screened. In neither case do
we know how screened patients fared. Assumptions made by models may be better
than nothing. Actual data would be a lot better.
Rationing of Health Care
I think the Task Force’s biggest mistake was failing to adequately
communicate the results of its research before announcing its new
recommendations. That failure pretty much guaranteed its recommendations would
be perceived by many as a crude attempt to ration health care. And, its subsequent retraction of the recommendation
for women in their 40s probably reinforced that perception.
Although
rationing screening would immediately ration treatment, not much, if any, money
would be saved. Everyone with a cancer eventually shows up in a doctor’s office,
a clinic or a hospital complaining of symptoms and needing treatment. It anything is rationed, it seems more reasonable,
I think, to ration treatment directly.
Compared to the cost of treatment, the cost of
screening is insignificant, insignificant enough that one hospital is willing
to provide prostate screening for free. An
article about Dr. Otis Brawley, medical director of the American Cancer
Society, published in USA Today last January, related that the hospital’s marketing
executive voluntarily told Dr. Brawley about
how his (the marketing executive’s) hospital was providing “’free’ prostate
screenings as a way to find patients for more lucrative radiation treatments,
cancer surgeries, even incontinence therapy and impotence drugs.”
Finally,
based on what I know: (1) if I were 40 and/or premenopausal, I wouldn’t be
screened (In fact, I can hardly believe the Task Force retracted the one
recommendation for which it had the best evidence.) ; (2) if I were in my 70s or
80s, taking into account my age and how likely my breast cancer would be
indolent, I would be screened maybe every 2 to 3 years and would make sure I was
treated with the least aggressive treatment; and (3) if I were between 50 and 70 years old,
I would consider being screened biennially in my 50s and annually in my 60s.
That’s
based on what I know, but it's not enough. I would like to know more. For instance I would like to know how many DCIS or invasive breast
cancers are not life-threatening and when they’re likely to occur. That would
mean we would have to know enough about breast cancers to identify those
that are not-life-threatening. Unfortunately,
I think the path which much of breast cancer research appears to be on is
not likely to lead to that information.
My next
post will be about how the marketing of medicine cultivates our ignorance, takes
advantage of our trust and misdirects cancer research.