Patient Power (mtz): May 2012

In 2002 when the Task Force treated women over the age of 39 as a homogeneous group, it recommended all women 40 years and older be screened every one to two years.

In 2009 when it reviewed screening for women of different ages, it recommended women age 50-74 be screened biennially rather than annually, advised against routine screening for women age 40-49, and because it felt the data were scanty, made no recommendation for women 75 and older.

The public backlash to the new recommendations was fierce and immediate. In response, the Task Force changed one--- its recommendation against routine screening of women age 40-49. The revised recommendation reads: “The decision to start regular, biennial screening mammography before the age of 50 years should be an individual one and take patient context into account, including the patient’s values regarding specific benefits and harms.” As explanation, the Task Force added the following footnote to its website: “On December 4 2009, the USPSTF unanimously voted to update the language of their recommendation regarding women less than 50 years of age to clarify their original and continued intention.”

I reviewed the Task Force’s research. I agree with some, not all, of the recommendations.

I chose to focus on three sections of the Task Force’s research: (1) the data obtained from the Breast Cancer Surveillance Consortium, (2) the predictions by six models of reduction-in-mortality rates for twenty different screening strategies, and (3) the results of the decision analysis used to decide between routine annual and biennial screening. I think the three provide enough information to understand how the Task Force arrived at its recommendations.

The Breast Cancer Surveillance Consortium Data

The Consortium is part of the National Cancer Institute. It consists of a network of seven mammography registries linked to tumor and pathology registries and a central Statistical Coordinating Center. The Consortium supplied the Task Force with information for 600,830 women age 40 years and older who had at least one screening mammogram every two years between 2000 and 2005. If a woman had two or more mammograms during that period, one was randomly selected.

The Consortium data in the first table below provide information about screening mammography for the following five age groups: 40-49, 50-59, 60-69, 70-79, and 80-89 years. The table is like a composite photograph of screening mammography for women of different ages.

The top third of the table contains the rates (per 1000 screening mammograms) for invasive breast cancer (cancer that’s moved outside a duct), DCIS (cancer that’s fully enclosed by duct), and the number of false-positives and false-negatives. The middle third contains the number of patients (per 1000 screening mammograms) undergoing: (1) mammography to diagnose one case of invasive breast, (2) additional imaging to diagnose one case of invasive breast cancer and (3) biopsy to diagnose one case of invasive breast cancer. As you read across the rows, you can see how the numbers change as the groups get older.

The bottom third of the table contains the results of calculations I made for (1) the number of true positives (the number of screen-detected DCIS plus the number of invasive breast cancers), (2) the number of true negatives (1000 minus the sum of the true positives, false positives and false negatives), (3) the sensitivity, (4) the specificity and (5) the positive predictive value (PPV) of screening mammography for each of the age groups. (In case you’ve forgotten, true positives occur when radiologists detect breast cancer in women who have it and false positives when the women don’t. True negatives occur when radiologists don’t detect breast in women who don’t have it and false negatives when the women do.)

Sensitivity and specificity are the most popular measures of the effectiveness of screening. Sensitivity is equal to the number of true-positives divided by the sum of the number of true positives and false negatives. It’s a measure of how well a method detects all those who have a disease.

Specificity is equal to the number of true negatives divided by the sum of the number of true negatives and false positives. It’s a measure of how well a method detects all those who don’t have a disease. According to the Task Force report, the sensitivity of screening mammography generally ranges from 77% to 95% and the specificity from 94% to 97%. The sensitivity and specificity for the Consortium sample of women are somewhat lower.

Positive predictive value (PPV) is equal to the number of true positives divided by the sum of true and false positives. It’s a measure of how well a method detects only those who have a disease and reveals how likely radiologists’ suspicions of breast cancer will be true or false.

For the Consortium sample, positive predictive value ranges from 2.6% to 12.5%. In other words, 87.5% to 97.4% of positive screenings turned out to be false alarms. Mammography screening is not nearly as impressive looking when measured by positive predictive value. It’s unfortunate that it’s reported so rarely. If women knew how often radiologists’ suspicions of breast cancer are false positives, they might feel a lot less anxious when asked to come back for additional imaging and/or biopsies.

Age Group
Screening Result 40-49 50-59 60-69 70-79 80-89

False negatives 1.0 1.1 1.4 1.5 1.4

False positives 97.8 86.6 79.0 68.8 59.4

Additional imaging 84.3 75.9 70.2 64.0 56.3

Biopsy 9.3 10.8 11.6 12.2 10.5

Screen-detected invasive cancer 1.8 3.4 5.0 6.5 7.0

Screen-detected DCIS 0.8 1.3 1.5 1.4 1.5

# Patients undergoing mammography 556 294 200 154 143

to diagnose one case invasive cancer

# Patients undergoing additional imaging 47 22 14 10 8

to diagnose one case invasive cancer

# Patients undergoing biopsy 5 3 2 2 1.5

to diagnose one case invasive cancer

True positives 2.6 4.7 6.5 7.9 8.5

True negatives 898.5 907.6 913.1 921.8 930.7

Sensitivity 72% 81% 83% 84% 86%

Specificity 90% 91% 92% 93% 94%

PPV 2.6% 5.1% 7.6% 10.3% 12.5%

The data in the table show that the sensitivity and number of true positives for 40-49 year-old women are too low relative to those in the next older age group ( 50-59 year-old women). In addition, the numbers of patients undergoing mammography and additional imaging to diagnose one invasive breast cancer for women in their 40s are almost twice as high than for women in their 50s. The differences between the two adjacent age groups suggest a discontinuity from one developmental stage to another, analogous to leaving childhood and entering adolescence. The changes between the two groups do not resemble the smaller, more continuous-looking changes between the four older age groups.

The average age of menopause for U.S. women is 51 years. Most women in their 40s are premenopausal. Pre- and post-menopausal breasts differ. Premenopausal breast tissue tends to be dense. It may resemble and/or obscure breast cancer on a mammogram. Tissue that appears to be breast cancer or may be hiding breast cancers will invariably lead to too many false positives, additional images, and biopsies. The Consortium data indicate screening mammography is less effective and potentially more harmful for premenopausal women.

In contrast, the data indicate that screening mammography is increasingly more effective for postmenopausal women as they age. The numbers of false positives, additional imagining, the numbers of patients undergoing mammography /additional imaging/biopsy to detect one invasive cancer, sensitivity, specificity and positive predictive value all improve. The potential benefit of mammography screening, in fact, appears to be quite good for women in their 70s and 80s.

The Consortium data don’t link screening outcomes to treatment. We don’t know how women detected with breast cancer fared during or after treatment. The six models, in contrast, do attempt to link screening and treatment.

The Six Models

Models are used to make predictions. They use available data. And, they make assumptions about the data. The reality their predictions depict is less like a photograph and more like a cubist painting.

Generally speaking, models are tested against reality. We know how accurate their predictions are and how much we can rely on them. The models with which we have the most experience are probably those that forecast the weather.

In previous collaborations, the six models estimated how much treatment or screening each contributed to decreases in breast cancer mortality rate. According to the Task Force, their “qualitative estimates” were “similar.” That’s not saying much. It means their predictions lined up in the same order from high to low, but the absolute values of their predictions didn’t match. It’s like having a weather forecast that can reliably tell you it’s going to be warmer tomorrow, but not what the temperature will be.

Each model was developed at a different cancer center: Dana-Farber Cancer Institute, Boston; Erasmus Medical Center, Rotterdam; Georgetown University, Washington, D.C./Albert Einstein College of Medicine, Bronx; M.D. Anderson Cancer Center, Huston; Stanford University, Palo Alto; and the University of Wisconsin, Madison/Harvard, Boston. Their task was to predict the percentage of reduction in breast cancer mortality associated with screening vs. no screening for twenty screening strategies (ten annual and ten biennial) beginning and ending at different ages (40-69 years, 40-79 years, 40-84 years, 45-69 years, 50-69 years, 50-74 years, 50-79 years, 50-84 years, 55-69 years, and 60-69 years).

The models compared the probability of unscreened women dying of breast cancer to the probability of screened women dying of breast cancer during their lifetime. To estimate the probability of unscreened women dying of breast cancer from age 40 to death, the models used data gathered from a cohort of women born in 1960 being followed from the age of 25 until their death. Estimates of the future incidence of breast cancer were extrapolated forward using breast cancer incidence data available in 2000. Using the data and extrapolations, the models estimated 3% of unscreened women will die of breast cancer.

Thus, if a model predicts 2.7% of screened women will die of breast cancer, the probability of dying is 0.3% less than 3% and equivalent to a 10% reduction in mortality rate. A 0.3% reduction in deaths (or 10% reduction in mortality rate) is equal to about three fewer women dying of breast cancer per 1000 women screened.

I averaged the reduction-of-mortality rate across the six models for each of the twenty screening strategies and arranged them from highest to lowest reduction in mortality rate in the following table. Without exception the reduction in mortality rate is higher for each annual screening strategy than its corresponding biennial screening strategy.

Age Screening Interval Reduction Mortality Rate # Mammograms Efficiency Rating

40-84 Annual 39.5 36,550 E (8^th)

“ Biennial 31.8 18,708 E (7^th)

40-79 Annual 36.8 34,078 b

“ Biennial 29.2 17,241 b

50-84 Annual 35.0 26,905 b

“ Biennial 28.5 13,837 E (6^th)

50-79 Annual 32.7 24,419 b

“ Biennial 26.2 12,366 E (5^th)

50-74 Annual 28.7 21,330 i

“ Biennial 23.2 11,066 E (4^th)

40-69 Annual 28.5 27,428* i

“ Biennial 21.7 13, 831* i

45-69 Annual 26.8 22,546* i

“ Biennial 21.2 11,694* i

50-69 Annual 23.8 17,737 i

“ Biennial 18.3 8,947 E (3^rd)

55-69 Annual 19.5 13,009 i

“ Biennial 15.7 6,890 E (2nd)

60-69 Annual 14.3 8,438 i

“ Biennial 11.0 4,263 E (1^st)

Mean Annual 29.0 23,244

“ Biennial 22.7 10,884 ________

The models estimate annual screening will reduce mortality rate by an average of 29% and biennial screening, by 22.7%. For annual mammography that translates into a 2.1% probability of dying from breast cancer (about 9 fewer deaths) and biennial mammography a 2.3% probability of dying from breast cancer (about 7 fewer deaths). Thus, annual screening would result in about two fewer deaths per 1000 women screened than biennial screening. In terms of number of deaths averted (ignoring the harms of screening), the benefit of biennially screening is about 77% of the benefit of annual screening. (Using slightly different figures, the Task Force found “screening biennially maintained an average of 81% of the benefit of annual screening.”)

I don’t trust the screening strategies save as many lives as the models predict given that the average number of true positives in the Consortium data is 5.6 detected per 1000 women for one screening round or 11.2 detected per 1000 women per two screening rounds. Biennial screening can’t possibly save more lives (7) than the number breast cancers detected (5.6). And, although it’s not impossible, it’s highly unlikely that annual screening saves 9 of the 11.2 breast cancers detected.

I am more inclined however, to trust that annual screening saves about two more lives than biennial screening. A major advantage of modeling is that one is able to select certain conditions, hold them constant and apply them to different possibilities. That means exactly the same data and assumptions were applied to each annual screening strategy and its corresponding biennial strategy. Given they were perfectly matched, it’s highly likely the rate of mortality is higher for annual screening and probably averts two more deaths than biennial screening.

In an earlier post I wrote that I believed early detection didn’t save lives. One reason is that I’ve always worried about how many non-life-threatening cancers and non-existent cancers (biopsy errors) contributed to the number of lives “saved by screening.” The result that under identical conditions annual screening saves more lives than biennial screening indicates that more frequent screening and by implication, early detection, might save lives.

Two models (Erasmus Medical Center and the University of Wisconsin) explicating assumed there would be cases of DCIS that would not be life-threatening and the University of Wisconsin model assumed, in addition, some small invasive cancers would not be life-threatening. The Erasmus model’s predictions indicate annual screening would save an additional two lives and the University of Wisconsin model’s predictions, an additional three lives, again indicating that screening more frequently would save lives and implying that early detection could save lives, even when non-life-threatening breast cancers are removed from the data.

This is the first research I’ve seen that directly addresses some of my doubts and I’m beginning to believe that early detection might, in fact, save lives. However, the only model (M.D. Anderson) that explicitly assumed a better prognosis for screening-detected than for clinically-detected early-stage breast cancers predicted the lowest average reduction-in-mortality rates (20.5 for annual and 21.7 for biennial screening strategies) ---lower than the overall averages for both annual and biennial screening strategies.

The Decision Analysis

The Task Force rated the screening strategies’ effectiveness on two dimensions simultaneously: (1) number of mammography screenings they required (a measure of human and financial cost) and (2) reduction in mortality rate (a measure of benefit). To do this, it borrowed and adapted a decision analysis from economics.

A screening strategy was “efficient” if it had a higher reduction in mortality rate and required fewer mammograms than another. It was “inefficient” (“dominated” in economics lingo) if it either had a lower reduction in mortality rate or required more mammograms.

The strategies were classified into three categories: (1) if a strategy was dominated by other strategies in five of the six models, it was “inefficient;” (2) if was never dominated, it was “efficient” and (3) in all other cases, it was “borderline.” The Task Force’s categorizations---efficient (E), borderline (b) and inefficient (i) ----for each of the twenty screening strategies are listed in the rightmost column of the table above.

Seven of the eight “efficient” screening strategies turned out to be biennial and six of the eight initiated screening with women their 50s. These two results contributed to the Task Force’s recommendations for biennial screening for women 50- to 74-years-old and against routine screening for women in their 40s.

The large differences between the number of screenings required and the small differences in reductions of rate of mortality between screening strategies make it difficult to see if one of the two dimensions (number of mammograms and reduction in mortality rate) trumped the other.

I’ve included in the rightmost column of the table above the order in which the “efficient” strategies were listed in the Task Force’s table from top (1^st) to bottom (8^th). The order from 1^st to 8^th is in exact reverse order to reduction-in-mortality rate and in perfect corresponding order with number of mammograms.

I also calculated two correlation coefficients, one between the rank, from 1 to 20, of each of the twenty strategies with its reduction-of-mortality rate and the other with number of mammograms each strategy required. The correlation (0.22) with predicted reduction in mortality was too low to be significant, indicating no association the strategies’ rankings and their predicted reduction in mortality. The second correlation (0.43) with number of mammograms, was significant, indicating a positive association between the strategies’ rankings and the number of mammograms they required.

It appears the decision analysis as adapted by the Task Force did, in fact, favor the harms of screening (as measured by the number of mammograms required) over the benefits of screening (as measured by the reduction in rate of mortality). That’s not good, but neither is it necessarily that bad.

A woman who is recalled for additional screening and learns she doesn’t have breast cancer may experience several days of additional anxiety. A woman recalled for biopsy may experience greater anxiety for a longer period of time, pain and/or disfigurement. A woman whose screening culminates in a true positive who doesn’t have breast cancer (a biopsy error) or whose breast cancer will never be life-threatening may experience life-long anxiety, life-long discomfort of lymphedema caused in an arm by auxiliary lymph node extraction, recurring pain due to the severing of nerves during lumpectomies or mastectomies and many other side effects of treatment---all for no benefit whatsoever.

If we could assign weights to the harms, our estimates of screening’s harms would be better. But we can’t. The data don’t exist. And, since there’s no way to identify either those women whose biopsy results were wrong or those whose cancers will never become life-threatening, they will always appear to have benefited the most from screening, when in fact they are the most grievously harmed.

Unfortunately, number of mammograms is the only estimate of harm we have. We know the less often a woman is screened, the less she’s likely to be harmed. And, depending upon how one would weigh the possible harms if one could, a bias in favor of number of mammograms could be wrong---or right.

The Task Force’s recommendation against routine screening for women aged 40-49 is not biased. Neither is its recommendation for women 74 years-old and older, although, I think, given the Consortium data, it may be too conservative.

If you follow the columns listing the number of mammograms and the reduction in rate of mortality from the top row down of the table above, you can see that the number of mammograms required for the screening strategies consistently decrease in order with the decreases in reduction of mortality rate until you reach the “40-69 years annual and biennial screening strategies” and ”45-69 years annual and biennial screening strategies.” The number of mammograms required for these strategies are out of line; they’re too high (see asterisks in the fourth column of the table above).

That’s not true however, when the screening strategy groups women in their 40s with women in their 70s and 80s (See the four top rows of the table.) It appears the benefits of screening for women over 70 may compensate for the extra mammograms needed for women in their 40s.

I compared the predicted reductions-in-mortality rate for the four screening strategies initiating screening beginning with women in their 40s and ending with women in their 70s and 80s to those beginning with women in their 50s (e.g., annual screening of 40-84 year-olds vs. annual screening of 50-84 year-old women, etc.). When 40-year-old women are included a strategy, the probability of dying from breast is 2.9% (about one fewer death per 1000 women screened). (Using a different analysis of the data, the Task Force concluded that “greater mortality reductions could be achieved by stopping at an older age than by initiating screening at an earlier age.”)

In general, the models’ predictions for women in their 40s, 70s and 80s confirm what the Consortium data suggest for women in these age groups--- screening appears to be better for women in their 70s and 80s and worse for those in their 40s. The results indicate the models do, to some extent, accurately reflect the different realities for women in their 40s, 70s, and 80s with breast cancer.

Breast cancers diagnosed in 40-year-old women are more likely to be aggressive; the cells of their breast cancers are more likely to divide and proliferate more quickly; and, the women are more likely to die of their breast cancer no matter how soon their breast cancers are detected or treated. In contrast, breast cancers diagnosed in older women are more likely to be indolent; the cells of their breast cancers more likely to divide and proliferate more slowly; and the older the woman, the more likely her breast cancer will be indolent and her treatment successful.

The models made many assumptions about treatment. I’m going to discuss the two assumptions I know something about: (1) that premenopausal women with hormone-receptor-positive breast cancers would be treated with tamoxifen and postmenopausal women with hormone-receptor-positive breast cancers treated with an aromatase inhibitor and (2) that patients would be 100% compliant with these treatments.

Tamoxifen and aromatase inhibitors interfere with the growth and spread of breast cancer; tamoxifen by attaching itself to estrogen-receptor-positive breast cancer cells and acting as a barrier between the cells and estrogen; and aromatase inhibitors by stopping the action of aromatase, an enzyme needed to manufacture estrogen. Since most of the estrogen in premenopausal women is produced directly by their ovaries and is not dependent upon the enzyme, aromatase inhibitors can only interfere with the production of estrogen elsewhere, e.g. in women’s adrenal glands or bones.

The Food and Drug Administration approved aromatase inhibitors for postmenopausal women, but many doctors prescribe them “off-label” for premenopausal women. That means the women must either have their ovaries removed or take another potent drug to eliminate the estrogen being produced by their ovaries.

They become postmenopausal in a matter of weeks. Their hair thins. Their skin thins and dries out, often making sex painful and unpleasant. Some lose bone mass, a serious problem for women as young as 40. And many suffer from bone, muscle and/or joint pain. Some take the drug irregularly to deal with the side effects. Others stop completely.

The results of trials comparing tamoxifen to aromatase inhibitors show that aromatase inhibitors benefit some women by allowing them to live longer before their breast cancer recurs, but it doesn’t prolong their lives overall.

Neither of the models’ assumptions about the treatment of premenopausal women with hormone-receptive breast cancer is realistic. Many premenopausal women are not treated with tamoxifen and neither the women nor their doctors are 100% compliant. Unrealistic assumptions about treatment undermine the reliability of the models’ absolute predictions of the reduction in rate of mortality. In this case, they may have contributed to their being too high.

That said, I appreciate the model builders’ taking treatment into account; very little data exists linking screening and treatment. It’s as if an impenetrable door exists between the two. For example, the Consortium data don’t follow up women who’ve been diagnosed with breast cancer. And, trials evaluating treatment don’t report if participants have been screened. In neither case do we know how screened patients fared. Assumptions made by models may be better than nothing. Actual data would be a lot better.

Rationing of Health Care

I think the Task Force’s biggest mistake was failing to adequately communicate the results of its research before announcing its new recommendations. That failure pretty much guaranteed its recommendations would be perceived by many as a crude attempt to ration health care. And, its subsequent retraction of the recommendation for women in their 40s probably reinforced that perception.

Although rationing screening would immediately ration treatment, not much, if any, money would be saved. Everyone with a cancer eventually shows up in a doctor’s office, a clinic or a hospital complaining of symptoms and needing treatment. It anything is rationed, it seems more reasonable, I think, to ration treatment directly.

Compared to the cost of treatment, the cost of screening is insignificant, insignificant enough that one hospital is willing to provide prostate screening for free. An article about Dr. Otis Brawley, medical director of the American Cancer Society, published in USA Today last January, related that the hospital’s marketing executive voluntarily told Dr. Brawley about how his (the marketing executive’s) hospital was providing “’free’ prostate screenings as a way to find patients for more lucrative radiation treatments, cancer surgeries, even incontinence therapy and impotence drugs.”

Finally, based on what I know: (1) if I were 40 and/or premenopausal, I wouldn’t be screened (In fact, I can hardly believe the Task Force retracted the one recommendation for which it had the best evidence.) ; (2) if I were in my 70s or 80s, taking into account my age and how likely my breast cancer would be indolent, I would be screened maybe every 2 to 3 years and would make sure I was treated with the least aggressive treatment; and (3) if I were between 50 and 70 years old, I would consider being screened biennially in my 50s and annually in my 60s.

That’s based on what I know, but it's not enough. I would like to know more. For instance I would like to know how many DCIS or invasive breast cancers are not life-threatening and when they’re likely to occur. That would mean we would have to know enough about breast cancers to identify those that are not-life-threatening. Unfortunately, I think the path which much of breast cancer research appears to be on is not likely to lead to that information.

My next post will be about how the marketing of medicine cultivates our ignorance, takes advantage of our trust and misdirects cancer research.

p.s. I learned about aromatase inhibitors when I volunteered to help analyze the data of a study on their side effects. Two reports were published online by Breast Cancer Action, a non-profit advocacy group. If you’re interested, they can be found at http://bcaction.org/wp-content/uploads/2011/11/bca_ai_report_jan_23-indd.pdf and http://archive.bcaction.org/uploads/PDF/AIReport.pdf.

Patient Power (mtz)

Thursday, May 31, 2012

Task Force Recommendations: Screening and Age

The Breast Cancer Surveillance Consortium Data

The Six Models

The Decision Analysis

Rationing of Health Care

About Me

Blog Archive