Sunday, January 25, 2015

Trends, breakpoints and derivatives - part 2

In part 1, I discussed how trends worked as a derivative estimate for noisy data. They give the minimum variance estimator for prescribed number of data points, but leave quite a lot of high frequency noise, which can cause confusion. I also gave some of the Savitsky-style theory for calculating derivative operators, and introduced the Welch taper, which I'll use for better smoothing. I've chosen Welch (a parabola) because it is simple, about as good as any, and arises naturally when integrating (summing) the trend coefficient by parts.

I gave theory for the operators previously. The basic plan here is to apply them, particularly second derivative (acceleration) to see if it helps clarify break points, and the general pattern of temperatures. The better smoothing might seem contrary to detecting breakpoints, since it smooths them. But that actually helps to avoid spurious cases. I'll show here just the analysis of GISS Land/Ocean.

I'll start with the spectrum of acceleration below. As I said in Part 1, you can actually get much the same results by differencing the smooth (twice for accel), or smoothing the difference. But the combined operator shows best what is happening in the frequency domain.

Wednesday, January 21, 2015

Trends, breakpoints and derivatives

This post is partly following a comment by Carrick on acceleration in time series. We talk a lot about trends, using them in effect as an estimate of derivative. They are a pretty crude estimate, and I have long thought we could do better. Acceleration is of course second derivative.

Carrick cited Savitzky-Golay filters. I hadn't paid these much attention, but I see the relevant feature here is something that I had been using for a long time. If you want a linear convolution filter to return a derivative, or second derivative etc, just include test equations applying to some basis of powers and solve for the coefficients.

I've been writing a post on this for a while, and it has grown long, so I'll split in two. The first will be mainly on the familiar linear trends - good and bad points. The second will be on more general derivatives, with application to global temperature series.

Historic progress of temperature records

2014 as a record warm year has been in the news lately. I made plots of the progress of the current "record year" in each of the usual datasets (as plotted here). Each rectangle shows on left, the height of the then record year, and the time it held the record. Datasets are listed below the graph.

There have been suggestions that records are a figment of adjustment processes. The TempLS plots shown are based on unadjusted GHCN and ERSST 4.

The plots are based on annual averages to date. For eg HADCRUT and Cowtan and Way, that means 2014 to November. Use the buttons to click through.

Tuesday, January 20, 2015

So 2014 may not have been warmest?

That has been the meme from people who don't like the thought. Bob Tisdale, at WUWT, gives a rundown. There is endless misinterpretation of a badly expressed section in the joint press release from NOAA and GISS announcing the record.

The naysayers drift seems to be that there is uncertainty, so we can't say there is a record. But this is no different from any year/month in the past, warmest or coldest. 2005 was uncertain, 2010 also. Here they are, for example, proving that July 1936 was the hottest month in the US. Same uncertainties apply, but no, it was the hottest.

So what was badly expressed by NOAA/GISS. They quoted uncertainties without giving the basis for them. What do they mean and how were they calculated? Just quoting the numbers without that explanation is asking for trouble.

The GISS numbers seem to be calculated as described by Hansen, 2010, paras 86, 87, and Table 1. It's based on the vagaries of spatial sampling. Temperature is a continuum - we measure it at points and try to infer the global integral. That is, we're sampling, and different samples will give different results. We're familiar with that; temperature indices do vary. UAH and RSS say no records, GISS says yes, just, and NOAA yes, verily. HADCRUT will be very close; Cowtan and Way say 2010 was top.

I think NOAA are using the same basis. GISS estimates the variability from GCMs, and I think NOAA mainly from subsetting.

Anyway, this lack of specificity about the meaning of CIs is a general problem that I want to write about. People seem to say there should be error bars, but when they see a number, enquire no further. CI's represent the variation of a population of which that number is a member, and you need to know what that population is.

In climate talk, there are at least three quite different types of CI:
  • Measurement uncertainty - variation if we could re-measure same times and places
  • Spatial sampling uncertainty - variation if we could re-measure same times, different places
  • Time sampling uncertainty - variation if we could re-measure at different times (see below), same places
I'll discuss each below the jump. (The plot that was here has been moved to new post)

Thursday, January 15, 2015

Temperatures 2014 summary

I headed the last post on 2014 "Prospects for surface temperatures 2014 final". In my town, the evening paper used to come in three editions, announced by many newsboys - Final, Late Final, and Late Final Extra. So this is Late Final - my excuse is that GISS is dragging its feet (and NOAA hasn't even posted its November MLOST file).

I ran the TempLS Grid version, and it showed a considerable rise for December - from 0.518°C to 0.638°C. That actually makes December the warmest month of 2014. TempLS Mesh is also showing a greater rise with extra data, now from 0.59°C to 0.655°C. So I think it is time to make predictions (while we wait):

2014 Jan-Dec2010 Jan-Dec
GISS Land/Ocean0.670.66
NOAA L/O0.680.65
HADCRUT 40.5630.556

This is on the basis that GISS agrees with TempLS mesh, and NOAA/HADCRUT with TempLS grid. As you see, HADCRUT and GISS narrowly reach a record, NOAA with more to spare. Actually, my GISS estimate came to 0.675, so 0,68 is equally likely.

Update: GISS and NOAA have now released their results with a  joint press release. GISS gave 0.68°C as their 2014 value; NOAA announced 0.69°C (re 20th Cen ave, it's worse than I thought ;)).

Update. There is an active plot of the historic record years of all major indices (and also both TempLS) in this later post.

Friday, January 9, 2015

December TempLS up 0.045°C - some 2014 records likely

After earlier (false) signs of a greater rise, with 3833 stations reporting, TempLS mesh has risen from 0.591 in Nov to 0.636 in Dec 2014. The Nov number rose a little with later data, so Dec is now back to October levels. The report is here.

The Ncep/Ncar index showed a similar fall/rise, but only came back to about August level. GISS should track the TempLS mesh level reasonably, so a record is likely there, as with NOAA. HADCRUT remains uncertain.

Monday, January 5, 2015

Monckton and Goddard - O Lord!

Viscount Monckton of Brenchley has produced yet another in his series of the "Great Pause" - now 18 years 3 months. He uses only the troposphere average RSS - to quote Roy Spencer  on how RSS is differing from his UAH index:
"But, until the discrepancy is resolved to everyone’s satisfaction, those of you who REALLY REALLY need the global temperature record to show as little warming as possible might want to consider jumping ship, and switch from the UAH to RSS dataset."

Lord M heard. But in his latest post he is defensive about it. He says:
"But is the RSS satellite dataset “cherry-picked”? No. There are good reasons to consider it the best of the five principal global-temperature datasets."

There is an interesting disagreement there. Carl Mears, the man behind RSS, says
"A similar, but stronger case can be made using surface temperature datasets, which I consider to be more reliable than satellite datasets (they certainly agree with each other better than the various satellite datasets do!)."

You can see in this plot how much an outlier RSS is. The plot shows the trend from the date on the x-axis to present. You can see the blue RSS crossing the axis on the left, around 1996. That is Lord M's Pause. No other indices cross at all until UAH in 2008. In the earlier years, UAH often has the highest trend.

Anyway, Lord M cites in his defence "The indefatigable “Steven Goddard” demonstrated in the autumn of 2014 that the RSS dataset – at least as far as the Historical Climate Network is concerned – shows less warm bias than the GISS [3] or UAH [2] records."

He shows this graph:

No details on how HCN is done, but certainly there is no TOBS adjustment, which for USHCN is essential. That is the main problem, but the clearly wrong averaging contributes. In the past, Goddard has vigorously defended his rights as a citizen to just average all the raw data in each month (eschewing anything "fabricated"), and I'm sure that is what we see here.

So what is wrong with it? We saw the effects in the Goddard spike. The problem is that in each month, a different set of stations report. SG is averaging the raw temperatures, so what kind of stations are included can have big differences in average temp, without any actual change in temp. If a station in Florida drops out, the US average (SG-style) goes down. Nothing to do with the weather.

NZ Prime Minister Muldoon understood this. When the NZ economy hit a rough patch, he was scornful of locals leaving for Australia. But he took consolation. He said that this would improve the average IQ of both countries. It helped me - I can now figure out what he meant.

I wrote at some length about the Goddard spike issues here. But this example gives a simple case of the problem and an easy refutation of the method.

Every month, a different group of stations reports. Suppose we switch to a world in which temperatures do not change from year to year. Each reporting station reports the long term average for that month. So there is no change of actual weather (except seasonal). But the population of stations reporting varies in the same way as before.

For a fixed subset of stations, the average would be constant, as it should. But here it isn't. In fact, over time, the average goes down. That is because the stations dropping out (as they have, recently) tend to be warmer than most. I don't know why, but that is what the graph shows. It covers the period from 1979 to 2013, and shows the Goddard average raw in blue and the average of averages in red. It also shows the trends over this time, with slope on the legend in °C/century.

And that is the key. The cooling (in long term average) of the set of reporting stations induces a spurious cooling trend of 0.33°C/cen. That isn't large relative to the actual warming trend, but it makes a significant difference to the plots that Lord M showed. And it is simple error.