Sunday, November 17, 2013

Parachute falsification, the FDA and non-science

I seem to have annoyed some statistics fundamentalists with the last statistics post. So let me spell it out from first principles...

The current gold-standard in epistemology is Karl Popper's view that all knowledge is provisional - and you are entitled to believe your educated guess so long as it agrees with experiment. However when someone does an experiment that contradicts your "knowledge" you are no longer entitled to your educated guess. Your educated guess goes into the book of failed theories.

To continue to believe your educated guess is non-scientific.

Popperism applies not only to your theories but to your meta-theories - your theories as to how to test knowledge.

A simple example suffices. People - good, clever people, for centuries believed that the only way to determine whether something was right was to look in the Bible. If the Bible supported it then it is right.

Nowadays we have developed several well-tested theories which are in direct contradiction to the Bible (see deep time in geology and evolution for the best examples).

Not only is the creation myth in Genesis falsified but so is the meta-theory that the appropriate test for knowledge is that it is printed in the Bible.

The FDA's meta-theory

During the FDA panel hearing for Lemtrada (an MS drug under development from Genzyme/Sanofi), the panel voted no on the proposition that the trials are adequate and well-controlled, and yes on the proposition that applicant provided substantial evidence of effectiveness of alemtuzumab [Lemtrada] for the treatment of patients with relapsing forms of MS.

The FDA staff asserted that you could not possibly vote no on whether the test was adequate and well-controlled and yes on effectiveness.

In doing this the FDA staff were asserting a meta-theory - the theory that the only test of effectiveness is an adequate and well-controlled test.

I observe that adequate and well-controlled test is something defined in legislation (see here). And that parachutes as a device to prevent trauma and death to people who jump out of planes have never been subjected to an adequate and well-controlled test as defined in the legislation.

Yet we know that parachutes are effective despite the absence of an adequate and well-controlled test.

In other words we have a contradiction to the meta-theory. The meta-theory demands an adequate and well-defined test as the only method of knowledge and the parachute example is a direct counter-example.

The FDA's meta-theory has been contradicted.

However, much to my surprise, and several years after the publication of the famous parachute paper the FDA staff (and some of my blog readers) still asserts their meta-theory.

I have now categorized the FDA staff in the pseudo-science camp - those who - like their fellow-traveller creation scientists - support their meta-theory in the face of direct falsification.

Hopeless. Stupid too.





John

28 comments:

Julius Poh said...

Just wondering... Was it actually "FDA staff" or just some FDA guy/spokesman's offhand comment?

Josh Silverman said...

The requirement of well-controlled trials for a finding of substantial evidence of efficacy does not come from "a meta-theory of FDA staff."

It comes from the statute itself. Section 505(d) of the Food, Drug and Cosmetic Act, 21 usc 355(d), provides:

"the term 'substantial evidence' means evidence consisting of adequate and well-controlled investigations, including clinical investigations, by experts qualified by scientific training and experience to evaluate the effectiveness of the drug involved, on the basis of which it could fairly and responsibly be concluded by such experts that the drug will have the effect it purports or is represented to have under the conditions of use prescribed, recommended, or suggested in the labeling or proposed labeling thereof."

http://www.gpo.gov/fdsys/pkg/USCODE-2010-title21/html/USCODE-2010-title21-chap9-subchapV-partA-sec355.htm

There are certainly reasons to argue that the bar for approval should be set lower (or higher), but that's where it currently is in the U.S.

Nick said...

Philosophically that's all fine but to play the devil's advocate here, are you sure the FDA is making a logic claim or are they making a process claim? Because if I were in the FDA I might be willing to ignore common sense if it bought me a procedure that weeds out bad science. There are a good number of poorly designed studies and often the people running them have an interest in seeing them work.

pelicans said...

Could you quote where the FDA universally claims 'that the only test of effectiveness is an adequate and well-controlled test'? Their arguments seem to support this claim specifically in the context of approval of alemtuzumab for multiple sclerosis.

I think you're making a straw-man argument here. As far as I could tell, none of the respondents in the previous post were arguing that double-blinded RCTs were the only valid source of knowledge available to humans either. Pharmacology is complicated and often counter-intuitive and difficult to test, and that is why it seems reasonable for the FDA to require a higher level of evidence for drug approval than we as a society do for the use of parachutes.

Can you agree that there could exist a world where, despite agreeing with your view of epistemology (as they certainly do, as does pretty much everyone), the FDA might still not believe that a poorly conducted clinical trial is sufficient evidence to justify approval of a drug for a specific condition?

Claude said...

Dear John,

I had the same idea about the FDA advisory panel, when I listened to their votes and deliberations on another drug. They have their bureaucrat hats on, not their doctor hats.

Thanks for teaching me the phrase "parachute falsification." Off to find the paper you mention...

John Hempton said...

There is a requirement in the legislation for full double-blind tests but it waives those requirements when the test is functionally impossible (this is an example) or where the results are obvious (the legislation mentions anaesthetics - and I suggest you try be the control group for operations on anesthesia), and where the placebo results in known death/disability (I guess MS is a good example).

The legislation is far less ideological about this than FDA staff.

FDA staff asked ideological and loaded questions of the panel. They also asked questions that betrayed an underlying non-science in their approach.

J

Fred Schwed said...

You may be correct that the FDA is using the wrong techniques to determine whether Lemtrada is effective. But your goal as an investor is to predict the outcome of their process, not to critique its efficacy. This post smacks of the same type of sour grapes that we've seen from the goldbugs when inflation failed to materialize.

CHT said...

I'm very much Talebian in this regard - the burden of proof, for matters of (potentially) large consequences, which are concave; is on the science to prove it is not harmful. He covers this in Antifragility.
John, this was a brave article for a number of reasons, well done!

Martin Barry said...

John, I think the main issue is that the first question seems to have been phrased as "Was it a well-controlled trial?" which the panel could only honestly answer "no".

The FDA officials should really have asked "Did the trial meet the standards set by the legislation with any relevant exemptions applied?" to which the panel could have answer "yes".

The FDA's reaction to the second question's vote would only make sense if they had asked that revised first question and got a "no" from the panel.

Seems designing good questions to vote on is just as hard as designing good trials. :-P

Anonymous said...

John,

as much as I respect you, I still believe you're wrong on this, and the parachute is apples and oranges.

Firstly, the fact that there's no double-blind test on parachutes is irrelevant. Given that we can perfectly predict what happens with parachutes, we know what the test would do. We could (if someone could actually design it), carry it out, but then, we could carry DBT for just about anything, including sun coming up in west. It's like requiring DBT for 1+1=2. We don't do that, as following the basic axioms of an agreed system, that's what you get (and is the only result you can get). It would be pointless.

Then there's a bit which I'd argue is actually even more important. DBT is designed not for proving experiment or not (DBT is over and above the experiment). It's to try to remove a _bias_ that _could_ affect the experiment. It's about controling a variable. What bias would you like removed in the parachute experiment? Do you disagree that a medical experiment may contain a bias? Do you disagree that bias/placebo effect is a variable that, in medical experiment, should be controlled for?

If I say I can't rule out the bias in the test, then I cannot say that the experiment was success - because I failed to control a vital variable. DBT is ultimately about controling that variable (and controling it is irrelevant for parachute test, because we can perfectly predict it, thus control for all relevant variables directly). The best I can say is that on a sample with not all variables controlled, the test succeeded, and thus it's _possible_ that the (in this case) drug is effective. When we say "yes, it's effective, but no, the experiments didn't control for a variable we know can seriously affect the outcome, and it has been known to show "effectivness" of ineffective treatments", then you are contradicting yourself.

dearieme said...

"the parachute is apples and oranges": a friend of mine argues that it's perfectly reasonable to compare apples and oranges; it's comparing apples and Tuesday that he frowns on.

Anonymous said...

John, I was 100% on your side until you wrote your creation example. Creation is rejected not due to any empirical evidence but because it is not inherently falsifiable. The core element of falsifiability is the assumption of a space-time universalism. Because of this, NO creation theory (including the high school Big Bang Theory where space-time emerges from a point at t=0) is scientific. Any creation theory suffers from two inherently not falsifiable problems. At t=0, there is a space-time inconsistency that is not falsifiable in this universe. Further, at t=1, there is the problem that the location of all matter/energy and the exact laws of nature. This is the multiverse problem.

This is not terribly controversial and physicists like Hawking have recognized this and so have tried to create an infinite time universe to get around the philosophical problem.

Some philosophers have tried to argue that science has "proven" universalism. This is very tricky but you end up in a circular reasoning problem because there is no science without the assumption of universalism, but even in the most general of your Popper framework, I can always offer a creationist theory that is completely consistent with a non-creationist theory by simply saying that everything was started at some time z and thenceforth, there was time-space invariance. So the two worlds would be empirically identical. You would only reject the created one because of your assumption of time-space invariance.

White Picket Fence said...
This comment has been removed by the author.
White Picket Fence said...

In the US military we do control for a variable that John might not be considering. The packers periodically jump a randomly selected chute to keep them honest and focused. I'm not sure John is right on the merits here.

Jon S. said...

If they are following a principle of "first, do no harm" the distinction you draw is pedantic: maybe technically correct but practically useless.

FDA staff guy says don't throw patients out of airplanes.

Anonymous said...

Dear John,

Besides other insightful comments to possible flaws in your admittedly nice logic, especially it being more of a procedural case then a logic case, let me add another.

FDA not necessarily claim purely "the only test of effectiveness is an adequate and well-controlled test."

Given price of error, they might actually be claiming that "the only test of effectiveness they are willing to take as reasonably guarantee no further complications is an adequate and well-controlled test."

In other words, while the degree of present knowledge might be bipolar, there are more grades to expectations, and one could within reason claim that his trust in further sustainability of knowledge gained through one specific strict method is higher then in knowledge gained otherwise, and that in mission-critical situations (such as approving medicines) he will demand this, and no less, grade of certainity.

Regards,
Dmitry.

Michael Nau said...

Interesting post, very thought-provoking. As a social scientist, I'd like to add that some of the commenters skeptical of your argument have an unnecessarily narrow and stereotyped view of the scientific method.

Causality is a very tough thing to prove, and folks like Popper would say that it is essentially impossible. The trick is designing tests that are "rigorous enough" to provide support for a claim unless/until it is falsified. For example, in my field, we cannot have double-blind random treatment assignments to marriage, childbirth or college, but we can still devise clever ways with statistics to estimate the risk of an event occurring or the "effects" of such events.

When ethics or other considerations apply (as in the case of MS treatments), it reminds me of what us social scientists have been dealing with all along. Just because it doesn't look like what applied physicists do does not mean that it is not science. The scientific method needs to be adapted to the object of study. Obviously, when people are the object of study, we need to be methodologically flexible and creative.

John Hempton said...

Its blatantly obvious that double-blind trials can't be designed for lots of things - try anaesthetics for operations versus say no anaesthetic.

Its also obvious where some things work (see anaesthetics).

The panel was full of MS specialists. The FDA staff full of statisticians.

These were people from different worlds. The statistics paper was statistics-ideology driven - as if statisticians have the only valid method. The question that the FDA staff asked of the panel was (a) loaded, and (b) incompetent.

My view: the staff view at the FDA in this case is worthless because this was a drug for which (i) double blind was impossible and (ii) the staff got hung up on that point.

J

Gabe said...

The staff question was a long time coming. Genzyme/Sanofi failed to persuade FDA that the clinical trial of Lemtrada should not, or could not, include a DBT -- over many years and on many occasions, e.g.:

--the FDA-Genzyme conference call of Sept. 13, 2005 ("necessary to provide strong double-blind evidence of a treatment effect");

--the FDA-Genzyme meeting of November 21, 2006 ("totally blinded study is more likely to be found persuasive");

--the protocols submitted on March 16 and 21, 2007 (to which FDA responded it "strongly recommends that you use a double-dummy placebo...");

--the March 17, 2010 meeting ("Blinding procedures were discussed in detail... unblinding of physicians and patients remains a significant problem..."); and

--the Jan. 24, 2011 meeting with FDA ("the lack of double blinding has consistently concerned us. The lack of blinding remains a major concern.").

Genzyme/Sanofi made John's argument, e.g. on November 21, 2006: "the sponsor was hesitant to have a treatment arm with ineffective therapy, such as a placebo..." But FDA responded directly: "FDA again noted that they prefer double-blinded, controlled studies, especially for the pivotal trials."

John Marler's clinical review in the CDER report (starting at p. 13) gives all the details.

If Genzyme/Sanofi's argument has not worked yet, I doubt it will prevail between now and March 2014. Doubling down on the CVR's.... could potentially yield multiples, but there is a reason for that.

buffettinvestor said...

Dear John,

by researching the Alemtuzumab case it is evident that
a) Alemutumab is a highly effective treatment comparable to Natalizumab
b) With a different profile of serious side effects (Thyroidism (~20%) and ITP (1-3%) vs. PML (1-3%)for Natalizumab

I try to obtain statistics on differentiating opinions between FDA and EMA in the past. So if you have any suggestions regarding the belief that the FDA Statisticians will be muted by MS specialists I would be grateful.

Kindly SaS

Anonymous said...

The statute that you link to basically requires the following for an “adequate, well-controlled” study:

1. A clear statement of the objectives;
2. A valid comparison with a control;
3. Adequate assurance that they have the disease or condition;
4. Assignment minimizes bias;
5. Minimize bias on the part of subjects, observers, and analysts;
6. The methods of assessment are well-defined and reliable; and
7. Adequate analysis.

I suspect that your complaint about the parachute test rests solely on the idea that there can’t be a valid comparison to a control because we don’t want to throw 50% of the subjects from an aircraft.

However, why don’t you think that an “historical control” is sufficient?

“(v) Historical control. The results of treatment with the test drug are compared with experience historically derived from the adequately documented natural history of the disease or condition, or from the results of active treatment, in comparable patients or populations. Because historical control populations usually cannot be as well assessed with respect to pertinent variables as can concurrent control populations, historical control designs are usually reserved for special circumstances. Examples include studies of diseases with high and predictable mortality (for example, certain malignancies) and studies in which the effect of the drug is self-evident (general anesthetics, drug metabolism).”

I suspect that “people falling from aircraft” present just the type of special circumstances where the control population can, in fact, be well assessed – especially after WWI, no?

DCE in NYC

Anonymous said...

Did anyone catch, in the comments that followed the vote on Question 1, that 3 of the NO voters agreed the trial as adequate however they did not feel it was also well-controlled. Said another way, had the question been, was the trial adaquate, the vote would have been at least 9-8. If you have an adequate trial for a terrible disease that shows effectiveness against not a placebo, but a very effective control itself...shouldn't the drug be allowed?

gv said...

I feel uncomfortable with your line of reasoning, though not with your conclusion but for different reasons.
Analogies usually oversimplify reality and that is the reason why they are so popular. On top irony is used to shield that simplification from criticism. Let's continue the analogy.
We have the test, though not well-controlled. Human failure, negligence and foolishness provide us with the placebo or a drug/chute that doesn't work. And the statistics are overwhelming.
Or do I need to continue the silliness of the analogy and say that a chute that opens half compares with the psychological effect of the placebo?

There's a lot more. Briefly: today Popper's theory led to the arrogance of knowing while it is equally important to realize there's still a lot we don't know (in physics we could catch the effect of the chute in a solid formula I guess, but do I want to take a drug based on provisional knowledge alone?), who bears the responsibility (the para makes his choice while the doctor/FDA agent needs to decide over the fate of the patients), ...

John Hempton said...

Historical control is not adequate. It is almost certain that a large proportion of people who jumped from airplanes without a parachute were suicidal.

Suicidal people have a high death rate.

[I know its an old joke about correlation and causality - but in this case the historic control is Okay...]

--

Also as a counterfactual I reckon more people would have died as a result of parachute accidents than non-suicidal people who jumped from planes.

J

Doug Friedman said...

John,

Thank you for sharing all of your thoughts on Lemtrada. I have been following the situation as well and had a few thoughts / questions for you.

I agree with your points regarding the ridiculousness of requiring a double blind study for patients where death/disability is a likely outcome. A placebo in such an event is clearly unethical. However, my concern is not about what is the correct decision, but what decision the FDA will actually make.

Do you have any insight into how the FDA makes the final decision? Are Dr. Mentari, Dr. Marler and Dr. Yan involved in the decision or is the extent of their involvement capped with the background report? Do you have a sense of the background of individuals that make these decisions (generally more statisticians or doctors)? Why would the FDA be inclined to come to the logical decision opposed to agreeing with their staff’s background report? I would be extremely grateful to hear your thinking along any of these lines of questioning.

Secondly, as you know FDA approval is not the only payout attached to the right. Based on the Sanofi MS presentation from October, Lemtrada will be launching in 10 European countries and Argentina over the coming year. Using www.atlasofms.org that amounts to a bit over 500k MS patients.

From an article (http://www.cbgnetwork.org/5275.html) I estimated pricing at ~€53,000 for the first treatment and ~€32,000 for the second treatment assuming dosages of 60mg and 36mg per the trials (this is ~$72,000 and ~$43,000 converted to USD, respectively). Simply applying these prices to the sales target means Lemtrada would need to reach ~1.1% of the total MS population in these countries. Now it obviously isn’t this simple – there are questions of up-take in the market, coverage, inventory stocking, pricing will invariably differ by country and how many of those patients are currently being successfully treated on other drugs are just some of the complicating factors. However, it is worth noting that Lemtrada is considered a first-line therapy in the EU, which even if approved in the US will not be the case. In short John, do you have a sense of how to properly handicap the probability of Lemtrada hitting the $400mm milestone without FDA approval?

Doug

Anonymous said...

Re: historical control

Given the tone-free medium, I may be missing just how firmly your tongue is planted in your cheek, but I'd note that somehow "falling from an aircraft" morphed into "jumped from an aircraft."

We don't really need intentionality here, right?

On the more substantive issue, I agree with you that the presumption that the second condition cannot be true given the falsity of the first is likely incorrect.

However, I also think that the FDA can't easily modify the interpretation of "adequate and well-controlled," so they aren't free to be correct in the way you posit.

All rules are both under- and over-inclusive, but that doesn't make them poor tools. Just like George Box's quote that "all models are wrong, but some are usefule."

DCE in NYC

John Hempton said...

I think we do need to exclude suicidal people as creating bias.

Suicidal tendency is correlated with jumping out of airplanes.

Suicide is correlated with death.

We need to exclude that correlation for a test to be well-controlled.

So historical examples are not good.

We need a true double-blind test.

---

Now yes I am teasing the double-blind fundamentalists. But then I think what they argue (and I am including the FDA staff here) is unscientific nonsense. I class it with the creationists...

J

Anonymous said...

Rebif was approved in 2002 with two trials. One EVIDENCE trial was an open label, rater blinded, precisly how Lemtrada was tested. The FDA was so enamored with this trial in 2002 that it included this in Rebif's label:

“Patients treated with Rebif 44 mcg [micrograms] sc [delivered subcutaneously] tiw [3 times per week] were more likely to remain relapse-free at 24 and 48 weeks than were patients treated with Avonex 30 mcg im [delivered intramuscularly] qw [once per week].”

How does the FDA justify the inclusion of the data from an open label trial for rebif, while completely disregarding the validity of the data from an open labal trial for Lemtrada?

In the first trial for Rebif, 200 patients injected themselves 225 times over 18 months with inactive placebo. 225 fake shots in the butt or thigh. Is that science, or some type of tactic taken from WWII Germany?


General disclaimer

The content contained in this blog represents the opinions of Mr. Hempton. You should assume Mr. Hempton and his affiliates have positions in the securities discussed in this blog, and such beneficial ownership can create a conflict of interest regarding the objectivity of this blog. Statements in the blog are not guarantees of future performance and are subject to certain risks, uncertainties and other factors. Certain information in this blog concerning economic trends and performance is based on or derived from information provided by third-party sources. Mr. Hempton does not guarantee the accuracy of such information and has not independently verified the accuracy or completeness of such information or the assumptions on which such information is based. Such information may change after it is posted and Mr. Hempton is not obligated to, and may not, update it. The commentary in this blog in no way constitutes a solicitation of business, an offer of a security or a solicitation to purchase a security, or investment advice. In fact, it should not be relied upon in making investment decisions, ever. It is intended solely for the entertainment of the reader, and the author. In particular this blog is not directed for investment purposes at US Persons.