SciGuy

A science blog with Eric Berger
Nov 10, 2011

Pi. Pie. Oh my.

This is spreading around the interwebs. As a nerd, I thought I’d do my part.

All I can say is, “Wow! A double pi(e)! What does it mean?!”

Share  

How statistically significant were Houston’s six ridiculously hot months?

A month ago I noted that during the six-month period from April through September the city of Houston’s average monthly temperature  ranked in the top 10 on record, and five of the six months have been among the three warmest on record. I wondered how rare such a streak might be, and turned to statistical wunderkind JohnD. You may recall his work with the August heatwave as well as the accuracy of seasonal hurricane forecasting.

In any case, here’s his tour-de-force analysis…

Nota Bene: Before we get into the analysis, a note of caution. The following post contains mathematics and relates to weather. If you dislike either topic, you would probably be better served by skipping over to another post. {It also involves eigen functions, but those are just cool.}

The past six months have been extraordinary, weather-wise. Between the drought and the number of temperature records reached or broken, we appear to have set a world’s record for breaking records. But the question has arisen – how unexpected has the past year been? More exactly, what was asked was “(What is) the probability of the city, with 118 years of data, recording six consecutive months of top-10 warm monthly temperatures?”

That turns out to be a very difficult question to answer, which means that it is the sort of question that we science-types like best (except when they are “left as an exercise for the student”). Fortunately, mathematicians long-ago worked out the solution to tackling difficult problems: start with the easy part first.

Here the easy part is simply looking at which years had the ten warmest and ten coldest average monthly temperatures for each month. To do that, we again use the ThreadEX data for 1899 to 2011. If you recall our previous excursion into monthly temperatures, we saw that all of the months except October were good approximations to the Gaussian normal function. The values for October clustered a little closer to the mean than a Gaussian would predict (normalized kurtosis of 1.67), which means that we can use the Gaussian but must remember that it will over-predict the spread of values (a more sophisticated analysis would use a Pearson distribution to correct for the kurtosis).

Looking at the data for each month’s average temperature gives the following Top Ten Coldest Years:

Click to enlarge.

And for the Top Ten Warmest Years:

Click to enlarge.

Looking at the tables, it becomes clear that “Top Ten” is a rather squishy parameter. There are several ties, leading to the odd result of having 124 months in the 120 “Top Ten Warmest Month” slots, and having 125 months in the 120 “Top Ten Coldest Month” slots. This will become important in a moment.

An alternative way to look at the data is to plot up it up with the months as the X-axis and the years along the Y-axis. If we put a red dot wherever there is a Top Ten Warmest month, and a blue dot where the Top Ten Coolest months reside, we get the following:

Looking at the chart, you can see that the data has a slight tendency to cluster. This is what we would expect if Houston’s temperatures are driven by some set of medium-term natural phenomena. A purely random fluctuation should not show a large number of clusters, whereas a repeated forcing event that lasts for a few months would. But are how many clusters would a random effect create?

To decide that, we have to know how likely they are. If the odds of being in the Top Ten are the same for any given month (i.e., that the weather is purely random, like flipping a fair coin), then the probability of any given month being in the Top Ten is simply (125/(12*123)=) 8.47%. And this would be true whether or not the previous month was in the Top Ten. However, there is a slight complication that we need to remember when calculating the odds of any two months in a row being in the Top Ten; there are more ways of having two months in a row than most people realize.

To understand this, consider the odds of flipping a coin three times and getting two heads in a row. Most people would say that the odds are 1:4 (1*1:2*2); they would be wrong. The actual odds of getting two heads in a row are nearly twice as high because there are three ways that you could get that outcome: ttt, ttH, tHt, tHH, Htt, HtH, HHt, HHH. Similarly, the odds of getting a Top Ten month two times in a row in a given year are not simply (8.47%*8.47%=)0.717%; instead, they are more than eleven times greater (because there are 11 two month periods in a calendar year, plus the possibility of having a longer run) at 7.89%. To get the exact probability, we can use a binomial table. The only tricky part is remembering to find the expected number of times for having six Top Ten months in a year, then adding that to the expected number of times for having five Top Ten months in a year, and so on back to the start in order to get a proper distribution. Doing so gives us this plot:

If you compare the prediction to the observation, you’ll notice that the expected values for the first three categories are definitely wrong. Why is that? The most obvious answer is that Mother Nature doesn’t live by the Gregorian calendar. By imposing the artificial constraint of “calendar year”, we have biased the outcome. Fortunately, the solution is simple (albeit tedious); we simply calculate a running sum that gives the number of Top Ten months in any consecutive twelve-month period. However, this adds a subtle change to the data. Because we are looking at twelve month long periods and not at calendar years, the number of “years” has increased by quite a lot. This happens because two consecutive years give thirteen possible year long periods (Jan1-Dec1, Feb1-Jan2, Mar1-Feb2, Apr1-Mar2, May1-Apr2, Jun1-May2, Jul1-Jun2, Aug1-Jul2, Sep1-Aug2, Oct1-Sep2, Nov1-Oct2, Dec1-Nov2, Jan2-Dec2).

The proportions of the expected values won’t change, but the total numbers will because we are now adding together 1,462 different runs. When we chart the results up on a log10 plot (to make it easier to read), we get:

The chart clearly shows two things: First, that the behavior of the Top Ten months is much more reasonable when the artificial constraint of “calendar year” is removed. And second, that the data still doesn’t match a random distribution. But this may be due to another artificial constraint – the very definition of “Top Ten”. If Top Ten were a strong constraint, then there would only be 240 Top Ten months (120 each for warmest and coolest); instead, there are 249. In addition, for many months the difference between tenth and eleventh place is less than 0.1°F, which is statistically meaningless.

We fix this by looking at how far each value is from the mean. Because the monthly temperature distributions are reasonably close to being Gaussian, we can use a trick known as the Z-score, which allows us to compare different Gaussian distributions as if they were the same by providing each value’s distance from the mean in standard deviations. If we use the one-σ limit for the data to define “warm”/”cool”, then the chart of monthly extremes fills in, with the 1960s warm period and the 1970s cool period becoming more obvious, and the chart of consecutive anomalous months becomes more consistent between the cool and warm anomalies. In addition, the current warm trend stretched back seven months, instead of six.

Here’s another way of visualizing the data:This is also seen in the table of the data:

One σ Monthly Anomalies
Length of Run Warm Anomalies Cool Cool Anomalies
0 1229 1244
1 or more 233 218
2 or more 64 62
3 or more 21 25
4 or more 8 10
5 or more 3 3
6 or more 1 1
7 1 1

Unfortunately, the mark-one eyeball tells us that the binomial expected values are still a poor fit to the data. Thus, the hypothesis that this was a binomial problem was wrong, and we need a more powerful mathematical tool to deal with the data. We must move into Bayesian statistics.

In its simplest form, Bayesian statistics can be thought of as being like eating M&Ms. In a typical bag of M&Ms there are eight blue, five red, seven orange, five yellow, six green, and five brown candies. As a result, you have a 1 in 4 chance of grabbing a blue M&M when you pull the first candy out of the bag. (This assumes that you have enough self-control to only eat one candy at a time and are not pigs like the author who gobbles them down by the handful.) That then changes the distribution so that there are now seven blue, five red, seven orange, five yellow, six green, and five brown candies. Your odds of grabbing a blue candy are now only roughly 1 in 5 and your odds of grabbing some other color have increased. Each time you grab a candy, the odds change in a way that can be predicted by the candies that you have already eaten. This is known as conditional probability, and is the workhorse for Bayesian statistics. It is the basis of “card counting” in casinos and is important for speech recognition programs, medical research, and climate effects (to name a few).

We can do the same sort of analysis for the Top Ten data by comparing the change in each category with respect to the previous one. Looked at that way, the probability of getting a run is:

One σ Monthly Anomalies
Length of Run Warm Anomalies Cool Anomalies
0 84.063% 85.089%
1 or more 15.937% 14.911%
2 or more 4.378% 4.241%
3 or more 1.436% 1.710%
4 or more 0.547% 0.684%
5 or more 0.205% 0.205%
6 or more 0.068% 0.068%
7 0.068% 0.068%

And now the hidden pattern begins to emerge. Because we are looking at a Gaussian distribution, we expect 16.7% of the values to be further than one σ from the mean; what we observe is 15.4%. Based on the data, it appears that only 1 in 4 months following an anomalous month will also be anomalous; put another way, if December was normal and January is unusually warm, then there is a 75% chance that February will either be normal or unusually cool.

But it is the longer-length runs that are truly interesting. There is roughly a 36% chance of having a third anomalous month in a row, so that if December was normal and January and February were unusually warm there is a 1 in 3 chance of having a warm March. And that ratio appears to be constant for each succeeding run; 36% of the three month runs go on to be four month runs and 36% of them go on to be five month runs. Applying those values to a Bayesian prediction gives:

It is no surprise that the Bayesian prediction is a better fit to the data as it was derived from the data. What may surprise you is that this makes it possible to answer the question that was asked at the start: “What are the odds of having seven consecutive months of anomalously warm temperatures?”

To find the odds for a run of seven months, we simply multiply the odds for each preceding set: 0.167*0.28*0.36*0.36*0.36*0.36*0.36=0.000283 or 1 in 3500. So we can expect to see a seven month long stretch of anomalously warm temperatures about once every 3500 year long periods. However, we have to remember that a year long period is not the same as a year! A 300 year long stretch will contain ((300*12)+1=)3601 year long periods. Thus, given the level of uncertainty in our data, it would be reasonable to say that we can expect to see seven consecutive months of abnormally warm (or cold!) temperatures about once every 300 years. (The upper and lower limits of this estimate are left as an exercise for the student.)

In conclusion, we can say the following:

  • This year has been unusual but not unprecedented
  • Sometimes the question you start with isn’t the one that you need to answer
  • Statistics is fun

As always, this post has benefited greatly from the input of the reviewers. And, as always, any mistakes are mine, not theirs.

Share  
Nov 09, 2011

See inside a hurricane, only with snow

Weather geeks will appreciate the Bering Sea “super storm” that reached its peak intensity early Wednesday, bottoming out in the 940-945 millibar range.

That’s a low-pressure value typically associated with major hurricanes.

The storm has been bringing some really gnarly weather to Nome, Alaska, including a seven foot storm surge as can be seen in the plot below.

Sea water levels at Nome. (NOAA)

Wave heights reached 40-feet offshore.

A Youtube user named WeatherNut27 has posted the video below showing conditions as of 8 p.m. last night in Nome. Looks, umm, unpleasant.

Anyway, please think of the good people in Nome tomorrow when you’re enjoying sunny skies and temperatures in the upper 60s tomorrow.

Share  

Energy Agency: World locking itself into unsustainable future, climate crisis

The International Energy Agency, founded in the wake of the 1970s oil crisis, had some blunt words on climate change and energy today in its annual World Energy Outlook.

“As each year passes without clear signals to drive investment in clean energy, the “lock-in” of high-carbon infrastructure is making it harder and more expensive to meet our energy security and climate goals,” said Fatih Birol, IEA Chief Economist.

Birol used to work for OPEC.

According to the agency four fifths of the energy infrastructure — power plants, factories, etc. — that will raise atmospheric levels of carbon dioxide to 450 parts per million are already locked in, leading to an estimated 3.5 Fahrenheit degree increase in the global temperature.

Nearly half of new energy used in the last decade came from coal. (IEA)

With few significant changes between now and 2017, which seems likely to me me after the first decade of this century, the 450 ppm level will be locked in by the year 2017.

Additionally, if China and other developing economies grow to a level of per capita energy use like that of the Westernized world, primarily using fossil fuels such as coal, global temperatures could eventually increase by as much as 11 degrees Fahrenheit.

The message from reports like this is clear: The world is going to grow its economies and use the most cost effective forms of energy, regardless of concerns about climate change.

If that’s the case, and it is, then what are the policy options for those worried about the implications of a warming world?

Share  

The graphic warnings cigarette companies don’t want you to see

As we discussed this summer, the FDA has ordered tobacco companies to print graphic warning labels on their cigarette packages in a bid to further reduce smoking in the United States.

Almost immediately after the FDA issued its demands the tobacco companies sued in the District of Columbia district court. Now Judge Richard J. Leon has granted the tobacco companies a temporary block on the labels.

Here’s a look at the labels that won’t be appearing on cigarette packages any time soon:

1 of 9 | Share

FDA cigarette health warnings

.

The judge’s ruling (see .pdf) essentially concludes that the government’s labeling program violates the tobacco companies’ First Amendment right to free commercial speech:

Unfortunately for the Government, the evidence here overwhelmingly suggests that the rule’s graphic-image requirements are not the type of purely factual and uncontroversial disclosures that are reviewable under this less stringent standard. Indeed, the fact alone that some of the graphic images here appear to be cartoons, and others appear to be digitally enhanced or manipulated, would seem to contravene the very definition of “purely factual.” That the images were unquestionably designed to evoke emotion – or, at the very least, that their efficacy was measured by their “salience,” which the FDA defines in large part as a viewer’s emotional reaction, further undercuts the Government’s argument that the images are purely factual and not controversial.

Interestingly, at least 43 other countries require graphic warnings on cigarette packages, according to the Campaign for Tobacco Free Kids. There is some scientific evidence the labels work, but not everyone agrees.

Anyway, I’m far from being a constitutional lawyer, but it seems to me that if the labels prevented a few kids from smoking they would be promoting the general welfare of the country.

Share  
Nov 08, 2011

Storm system brings rain, area-wide tornado watch to Houston

A strong storm system associated with a cold front has produced several tornadoes in and near Texas today, including one just to the northeast of Houston. As a result the entire Houston metro area remains under a tornado watch until 8 p.m. this evening.

Mike Smith, at Meteorological Musings, has posted a helpful map showing where the tornadoes have developed.

(Meteorological Musings)

The Houston tornado developed just to the southeast of Kingwood, and the Atascocita Fire Department made a confirmed sighting 2 miles to the east-southeast of Humble as the tornado moved quickly to the northeast.

There were no immediate reports of injuries or damage.

Today’s storms are associated with a moist and unstable airmass,  and an approaching cold front is triggering numerous showers and thunderstorms.

Some areas just north of Houston have already received significantly more than 1 inch of rain from the storms, but amounts could be considerably less than that to the south and southwest of Houston.

The storms should continue producing rain until around midnight, when the cold front’s dry air should move into the area, setting up the region for a great second half of the work week.

Share  

Storms arriving: NE Harris County under a tornado warning

The National Weather Service has issued a tornado warning for northern Harris County, as well as western Liberty County and southeastern Montgomery County.

Area of tornado warning. (National Weather Service)

Forecasters says a severe thunderstorm is moving through this area to the northeast that is capable of producing tornadoes. Included in the line of the warning is Bush Intercontinental Airport.

The warning expires at 2 p.m.


							
Share  

Let’s pretend the fly-by asteroid strikes Texas. What happens next?

Today’s the big day, when Asteroid 2005 YU55 will pass within about 200,000 miles (and come slightly closer to the moon tomorrow) of our fair planet.

This closeness of the approach can be seen in the short movie below (click it to activate).

Click to play movie. (Jet Propulsion Laboratory)

The very dark asteroid is estimated to measure about 400 meters across. Here’s a photo of the rock as it sped through space in our direction yesterday. It’s a pretty good shot considering the rock was still more than 600,000 miles away at the time.

(NASA/JPL-Caltech)

We’re all familiar with asteroids presenting end-of-the-world scenarios because of movies like Armageddon, in which a “Texas”-sized asteroid threatened Earth.

This, of course, is laughable because Texas is about 1,400 kilometers across and the largest known asteroid in the solar system, Ceres, measures just 900 km in diameter. For a deeper dissection of Armageddon’s scientific flaws, see here.

(Side note: The final scene of Armageddon offers an interesting take on the Russian approach to fixing mechanical problems with spaceflight equipment, especially in light of Sunday night’s launch of a Soyuz spacecraft. But I digress.)

Armageddon: Fun movie, flawed science. (Touchstone Pictures)

Anyway, it wouldn’t take an asteroid the size of Texas to cause a global catastrophe. According to NASA an asteroid would need to have a diameter in excess of 2 km to pose planetary-wide environmental consequences.

And 2005 YU55 is much smaller than that. Which is not to say it would not have an impact.

So what would happen if YU55, traveling relative to Earth at a velocity of 30,000 mph, struck the planet? Bad things, but not catastrophic things unless you’re living underneath the impact.

Just for fun, let’s say it hit land about 100 miles west of Houston (it was nice knowing you, Schulenburg).

This particular asteroid would probably produce a crater about 4 miles across. If it hit 100 miles from Houston it would produce a wind moving through the city at about 35 mph, and make a noise something like very loud traffic. We would also experience seismic shaking equivalent to about a 6.8 magnitude earthquake. There would be some dust.

If you’re planning ahead, for those living in Katy, be sure to evacuate toward the east.

You can model your own asteroid impact effects at the delightful Impact: Earth! website.

Source: Wikipedia. Cloud texture from public domain NASA image.

Share  
Page 1 of 42812345102030...Last »