Raw Dogs

December 1, 2009

Andrew Revkin has a recent post addressing the climate data set from CRU and the FOIA request to access the raw data that CRU apparently now claims to have discarded. To facilitate the discourse and cut speculation off at the pass, Real Climate has helpfully provided a set of links to publicly available data sets. Not outrageously, noting the disappearance of the raw data, some in the contrarian community are concerned that manipulation happened sometime between the collection of the raw data and the “cleaning up” of that data.

This whole discussion reminds me of another recent flap in which batshit crazy dentist-lawyers were pounding down the doors of Hawaii’s state offices to obtain an actual long-form birth certificate from the current occupant of the White House. If you recall, the question in that case was whether a proxy certificate (issued and validated by official state institutions) should “count” as evidence of Obama’s live birth in Hawaii. Taitz and her entourage were concerned about monkey-business in the Statehouse,  insistent that the only acceptable validation of Obama’s US citizenship would be the yellowed and coffee-stained document that was signed on the day of Obama’s birth by the attending physician. Anything else was just a proxy, open to manipulation.

As a result, some people called for Obama simply to release his actual long-form birth certificate and be done with the politicized controversy. The White House responded that they didn’t actually have a long-form birth certificate. Alas, this was a sure indication that Obama could not be an American citizen. Why else would the Obamas and the State of Hawaii bury such an important document? Every decent American citizen, as we know, is a meticulous file-keeper.

Likewise, friend and colleague Roger Pielke Jr. argues on political grounds that CRU should’ve saved the raw data. They’d then be in a position to release it later, if there were any questions about the research. And hey, whaddoyaknow, there are questions about the research!

It’s probably true that as a matter of political expediency, it would’ve been better for CRU to release the raw data than to have to issue a statement that they don’t have it. Looks and smells fishy to the skeptical folk.

Unlike Obama, who I think oughtn’t release his long-form birth certificate on constitutional grounds — it sets a bad precedent if he capitulates to political pressure to release this document — maybe it should be the standard that all collectors of data hold on to their original notes. Ought all data to be saved?

Good question, let’s explore…

Ought all ethnographers hold on to their original documentation? Ought all cardiologists hold on to their EKGs from ten years ago, for all of their patients, in all circumstances, no matter what? Ought all journalists keep their notepads forever into the future, or can they rely, after a time, on their own translation of their notes? Ought all professors keep raw data from their students, in the form of the actual papers with the actual comments, as evidence of the grades that we offer, or can we just rely on our gradebooks?

Maybe so. Certainly, to avoid political liability, to cover their asses, it would make sense to do so. The more data we have, the easier it is to make our case. But sometimes, we can’t hold all that data. Sometimes it is simply a matter of space, particularly when you’re talking about thousands of weather stations collecting analog data. That’s probably less true in recent years, where we can store data digitally, forever and ever. But if the data is stored digitally, then it is certainly not unreasonable to want the analog stuff as well, to get as close to the point of instrumentation as possible.

Moreover, huge amounts of data are a problem organizationally. How the hell do we keep it straight? The organizational decisions we make about storage of the data — where to file it, for instance — are themselves alterations of the data.

In this case, to be sure, if the data were available, it would at least set this particular political issue to rest. The problem is that there will always be lingering questions about the authenticity and rawness of data, no matter the field. There are always questions about the reliability of instrumentation, the software used to interpret the data, the techniques of translation, and the methods of collection. This is one of the reasons that people like Karl Popper rejected verificationism (Hi Smokey!) and Pierre Duhem rejected Popper’s falsificiationism. In principle, all data, even the most reliable, is only ever a proxy for what’s happening in the world.

However, it seems to me therefore that, in principle — which does not address the political concern that Roger raises — we ought to reject the call for the original raw data.

Yep. You heard that right. I think we basically need to trust the cleaned-up CRU data. For one thing, the actual raw data doesn’t exist. That’s the best we’ve got. That’s all there is. That’s all we’re going to be able to get. But that’s not my point. The more important point is that data is always questionable, even the rawest of raw data, so there will always be layers of questions about the data anyway.

So there.

A somewhat tangential observation, of course, is that the 1000 or so e-mails released in the CRU hack are themselves cleaned-up from the original data-set of what must’ve been hundreds of thousands of e-mails. They too have been filtered and culled by a third party — a party clearly aware of and plugged into the political context. Should this data also not be trusted?



  1. keep spinning Ben

    meanwhile, in the real world

    UK climate scientist to temporarily step down

    LONDON (AP) — Britain’s University of East Anglia says the director of its prestigious Climatic Research Unit is stepping down pending an investigation into allegations that he overstated the case for man-made climate change.

    The university says Phil Jones will relinquish his position until the completion of an independent review into allegations that he worked to alter the way in which global temperature data was presented.

    • That’s the politics speaking. I’m not spinning anything. Just calling it like I see it.

      • Ben – you’ve been spinning the CRU scandal on a consistent basis, quick to claim there’s nothing there and when things are pointed out you try to excuse it on one basis or another. Is your flavor of “environmental ethics” only about how humans should treat the planet and nothing to do with the ethics of the people involved?

      • sure Ben, you’ve been the paragon of objectivity

        the weak spin you’re offering to excuse the emails is irrelevant, Jones and Mann are both being investigated, I guess the two Universities involved had a bit more integrity.

        I doubt you are fluent in Fortran, even if you were you’d be unable to spin the harry readme file

        It was included by the mysterious FOIA for a reason

        That reminds me, why no word on the “Police investigation” into the “cyber terrorism theft” of the emails?

        I wonder if they already know who FOIA is? This is going to get real tricky for the spin meisters.

      • It’s good that Penn State and CRU are starting investigations, but we don’t know what they will result in. The emails by Jones seem to be the most damning, and if he actually did what he said and suggested, there probably will be repercussions. But I expect that most of the investigations will clear the people involved, including Mann.

        There just isn’t as much there, there, as some people are hoping and praying for.

      • read this carefully Dean

        Phil Jones wrote:
        Can you delete any emails you may have had with Keith re AR4? Keith will do likewise. He’s not in at the moment – minor family crisis.
        Can you also email Gene and get him to do the same? I don’t have his new email address.
        We will be getting Caspar to do likewise.
        I see that CA claim they discovered the 1945 problem in the Nature paper!!
        Michael Mann replied:
        Hi Phil,
        laughable that CA would claim to have discovered the problem. They would have run off to the Wall Street Journal for an exclusive were that to have been true.
        I’ll contact Gene about this ASAP. His new email is: generwahl@xxxxxxxxx.xxx
        talk to you later,

        notice how Mann just agrees with Jones, yep I’ll contact Gene ASAP!! Here’s his new address!!

        why do you think Mann made his only response to this at Romm’s site, where he could be sure there wouldn’t be any inconvenient comments. Heh Heh

        both of them are toast

      • Icecore,

        It is refreshing to hear someone who is interested in evidence and not hearsay or conspiracy. So, with that in mind:

        A) Correct me if I am wrong, but isn’t Romm copying and pasting Mann’s response?

        B) Correct me if I am wrong, but there is no reason given as to why the emails should be deleted. If there is not wrongdoing mentioned in the email, then how do we know that the email was to be deleted because of a wrongdoing? I am looking for evidence here and there isn’t any. Occasionally, I ask colleagues to delete emails because the subject is sensitive and to be protected (say discussions of job candidates or tenure files). Deleting emails isn’t necessarily wrong (though it may be imprudent – it causes suspicion for example).


      • I’m with Jay. It’s not spinning. I’m looking for anything that will either exculpate or incriminate a given figure. If it’s not there, I’ll tell you, just as I wrote a whole post telling Mann what he might do if he wants to relieve himself of burden. If you don’t like what I’m saying, that doesn’t make it spin. That’s your problem.

      • > Correct me if I am wrong, but there is no
        > reason given as to why the emails should
        > be deleted. If there is not wrongdoing
        > mentioned in the email, then how do we
        > know that the email was to be deleted
        > because of a wrongdoing?

        Regarding the deleting issue that ice core mentions, Phil Jones was copied on an E-mail on May 27th E-mail from Tim Osborn to Caspar Ammann stating that UAE had received a FOIA request and inquired if Ammann had E-mails or other documents sent to UAE regarding the IPCC AR4 assessment. Two days later Jones E-mailed Mann with the subject heading “IPCC & FOI” asking Mann to delete E-mails he had with Keith Briffa regarding IPCC AR4. This is the context and it shows Jones knew of the active FIOA request and coordinated these deletion of the requested E-mails specifically relating to AR4 among Mann, Briffa, Ammann and Gene Wahl with FOIA in mind. Jones is clearly in the wrong here.

        The E-mails and some discussion of it is in this thread:


        where you will see Ben working hard to come up with a reason to exculpate Jones while failing to acknowledge anything incriminating. Shhh… he thinks he’s unbiased in this matter.

      • I’m not working that hard. Those are plausible reasons. I’ve already admitted that the matter of deletion of e-mails with regard to a FOIA request is one on which we need to know more. This thread is about ousting people in peer review. Tell me again what is incriminating or problematic about saying that one will go through official channels to oust a given editor?

      • JimR,

        Thanks for the context; I hadn’t read that thread. Here is my hypothesis. David Holland is mathematician who models (roughly) the Greenland Ice Sheet.


        However, he has written on “bias and concealment” in the IPCC in 2007.


        Thus, Jones and Mann knew this and wanted to delete emails to obstruct Holland’s FOIA request. This much I think is evident. However, without knowing the content of the emails (possibly) deleted, nothing more can be said given the evidence. Specifically I don’t think we can infer they were hiding something. Still, I am now inclined to agree that at least Jones and maybe Mann acted inappropriately.


      • Not working that hard? Ben, you’ve been quick to come up with reasons to excuse these scientists (seemingly before you had even read what was in them suggesting “not much there”). When incriminating E-mails are pointed out your strongest response is “much more compelling case”, yet later you posit Jones may be in the clear if he had “bad legal advice”! Now the most you will say about incriminating evidence is “we need to know more”. You’ll forgive me if that doesn’t equate in my mind to someone “looking for anything that will either exculpate or incriminate a given figure”? Yes to the former, demonstrably NO to the latter.

        > This thread is about ousting people
        > in peer review.

        Is it? Are you sure?

        Looks to me like the head post is about releasing data where you strike a blow against transparency in climate science by saying we should just trust the cleaned up CRU data. No surprise there.

        Try reading the Harry log file (ask for help for a programmer if you need to) and then say we should just trust CRU data, cleaned up or not. And from what the very agitated Harry has to say I’d have to wonder if it can ever be cleaned up.

      • Jay – I’m very curious if the content of those E-mails will come out. Most corporations and Universities archive E-mails so if they were deleted on the professors computer the Universities involved may still have them. Jones didn’t get to his position by being stupid so one would think there was something incriminating in them for Jones to violate FOIA.

        Two of the scientists mentioned in the E-mail deletions were Caspar Ammann and Gene Wahl who were in the IPCC AR4. Bishop Hill wrote an excellant post on the Ammann and Wahl paper and what it took to get it into IPCC AR4 simply to support Mann’s hockey stick.

        It’s an interesting story.


      • JimR,

        I will read the link you sent in the morning. There maybe something incriminating there but as I say we just don’t know yet. However, what we do know is that Holland would have been someone they didn’t like and would have tried to stifle for poorly chosen reasons. Here is an actual link to the paper of interest:


        I haven’t had a chance to do more than skim it but you can see why they would have been upset with him.


      • This thread here, Jim.

      • > This thread here, Jim.

        Ben, either you are confused, tried to link to another thread and forgot the link or are trying to change the subject in light of your bias showing. You seem to make as many logical errors as I make typos! 😉

        Jay – thanks for the link.

      • Yep. I was confused. Sorry about that. I thought I was commenting in this thread:


  2. oops sorry, just meant to post the last 2 paragraphs

    [Fixed — Ben]

  3. Howdy Ben,

    A couple of other thoughts:

    A) The original data was gotten rid of well before Jones, Mann, etc. appeared at CRU. Thus, we should recognize that they are not responsible for the discarded data. My suspicion is that people didn’t realize the original, non-normalized data would be requested.

    B) RealClimate has linked the data that has been in existence and available for a long while and they are just making it more easily available.

    C) There are several major climate research centers which house long-term data of which CRU is one – GISS is another example and they are relatively independent of each other and at the end of each year compare yearly statistics. So, even if you are suspicious of CRU, you need some argument to worry about GISS, etc.


  4. As a rule of thumb, when a paper is published, all of the data and the programming which processed the data used in the publication should be archived.

    That is what GISS finally did, and it did enable the correction of several errors – which improves the quality of the data and its relability.

    CRU advertises their data as value-enhanced data (or words to that effect) – and yet today cannot validate how its value was enhanced.

    That is a problem for them.

    Any time actual observations are altered, the original data should be kept – so others can replicate the changes made – to determine whether they actual enhanced the value of the data – or diminished its value.

  5. You talk as if I didn’t mention that the value was enhanced. It’s true that it’s probably better to have the raw data than the enhanced data, but my point was that all data is enhanced in some way, even the rawest of raw data. We’re always viewing the data through the lens of whatever instrument we’re using. Simply because there are enhancements doesn’t mean that the data is tainted or manipulated or bad or unusable. It’s data. That’s what we have. That’s what we have to go with.

  6. A quick point. When there is raw data at a center like GISS they can corroborate the adjusted against the raw and see if there are differences as RickA notes. However, we can do the same between GISS and CRU. If the raw GISS data is significantly different from the adjusted CRU, we have a way of independently evaluating the later.

    This is powerful because each climate center’s data differ but usually not very much and so differences will stand out.


  7. One thing I haven’t seen discussed much is that if these folks were so inclined to delete emails, why were these emails available? Why would they delete others and not these?

    People often discuss doing things in moments of frustration that when it comes down to hitting the button, they may decline to do. I wouldn’t be surprised if the investigations now being started find that nothing was actually deleted.

    That still leaves the issue of peer review. But there as well, what pressures or actions were actually taken?

  8. […] this relates to claims like mine from yesterday, asserting that, effectively, all data is […]

