Someone Was Wrong On The Internet

To content | To menu | To search

Wednesday 18 September 2013

Miscaptioning Atrazine

What a difference a few words makes. Today's offering is a figure caption from Wikipedia. Maybe it's unfair to pick on Wikipedia - but since it has become the launching point for many an inquiry, I don't think they should be exempted from scrutiny. All things considered, I think Wikipedia is a good thing. I'm a big fan of not having barriers to knowledge for people outside of academe. Given the open and egalitarian nature of Wikipedia, there's far more that's right with it than wrong. The downside of Wikipedia is that it takes time to craft quality articles from a neutral perspective when anyone at all can contribute to writing and editing. It will never be the Encyclopedia Britannica but it has become a great place to start a research project on the net.

I debated whether to even bother with a post about one small figure caption on Wikipedia. Then I realized that if the same figure caption had shown up in a scientific article that I had been asked to peer review for a journal, I would have no mercy on the article authors. Why? Because figure captions matter. A lot of science professionals read articles outside their discipline by skimming in the following manner: first one reads the abstract followed by the figures and figure captions. Depending on the ego and nastiness of any given scientist, some would include a third step which would be to check the references to see if one had been cited. After all, it really is a publish and perish world out there and citations matter.

Basically, figure captions matter. When you consider that journalists and bloggers often lift figures out of journal articles and reprint them in internet or newspaper content, then figure captions matter a whole lot more than one would think. So in this context, I decided that, yes, I would indeed pick on just one short figure caption in Wikipedia.

Earlier today, I was reading a string of comments on Facebook about a murderer and his victims. Someone made a comment speculating that the murderer could have poisoned one of his victims with atrazine. This immediately hit my HUH? filter big time and left me wondering how much atrazine comprised a lethal dose for an adult human.

These days, I tend to look at Wikipedia first for regulatory, physical chemistry and toxicology information since many chemical pages on Wikipedia often include that info. If the Wikipedia page is any good, there will be a link back to a public health, industrial hygene, health physics or envirnomental science authority or journal where cited numbers can be verified. For the record, unless I already know a number off the top of my head (for example, I know most of the EPA MCLs for heavy metals by heart), I almost always verify numbers, especially if I'm going to be commenting or blogging about it later. Just as a quick FYI, the CDC is even better than the EPA if you want to look up understandable environmental and toxicological info about pollutants.

Getting back to our main topic here, which is a figure caption on the English-language Wikipedia site for atrazine, I found the comment from Facebook rather odd since herbicides are not popular or widely used poisons for homicides. As I suspected after looking at the toxicology numbers for atrazine, the amount needed to poison someone would be several tablespoons. Nope, atrazine would make a lousy homicide poison on the basis of quantity required. I suspect it would also taste bad too. Arsenic and strychnine are in no danger of being displaced as effective human poisons by atrazine. I'm sure that's a great relief to know! You can sleep better tonight knowing that evil atrazine from the blue earth corn fields of Minnesota will not waylay you and bring you to death's door before you wake.

Of course, atrazine has its own little anti-fan club because of its use in American farming, for cereal crops and especially maize, the iconic crop of the Midwest. Like all other things that farmers put on their crops in liquid form, atrazine has infiltrated into drinking water aquifers wherever farming is big. If you believe that atrazine is a danger to public health or the environment, then this is a matter of concern.

Regardless of the real or imagined danger posed by atrazine, having good facts at hand on its spread, prevalence and impact is necessary for meaningful debate. For the people out there who go to Wikipedia - and no farther - for their information, getting the facts right on the page for Atrazine strikes me as highly desirable. Now there are a few things that could use some fixing on this wiki page, but the one and only figure caught my eye immediately. Here's what it looks like, straight off my monitor screen: atrazine2.png

Did you spot the caption below the figure? "Atrazine use in pounds per square mile by county."

I made the mistake of really getting eye tracks all over this figure BEFORE blowing it up for finer inspection. Right off the bat, I thought all that green-level use of atrazine in New England was off-base. Seriously, New England - the home of rocky ground and non-existent top soil - was using that much atrazine? You don't use atrazine if you're farming apples, potatoes, maple syrup, trees or cows - which are all the main aggie products in the New England states. Now look at California and southern Idaho - especially southern Idaho where one of the biggest crops is barley. I would have thought the atrazine use in these area would be much higher than on the figure.

So I enlarged the figure: atrazine.png

I just love how the highest usage area overlaps the Midwest corn belt. Check out the non-linear scale too. There's all sorts of fun on this figure.

The enlarged figure did two things for me. First, I could actually read the text inside the figure box. I couldn't before because I've reached that point of middle age where I should really be wearing reading glasses and I'm too vain to enlarge the type size like an old person. After enlarging the figure, I could read the rest of the text on the figure and saw that the original caption was "Average annual use of active ingredient (pounds per square mile of agricultural land in county)."

Wow! That's a big difference. Use per square mile of farm land in a county is a lot different than use per mile of all land in a county! This figure would never convey how much bulk atrazine was being spread around on a per area basis. It only tells you how likely it is that farms will use atrazine on a county by county basis, regardless of how much farm land is in any given county.

The bottom line is that the Wikipedia caption that's big enough for an old person like me to read is misleading. As soon as I can figure out how to send in edits to Wikipedia, I'll try to fix this caption.

The second thing that enlarging the figure did for me was confuse me terribly. If the figure is showing me usage BY COUNTY, then I should be able to discern county shapes in the data but I should not be able to pick up details smaller than counties. The problem here is that there are features in the data that are obviously smaller than whole counties.

For starters, you can pick out pieces of interstates, like I-80 west of Chicago and the I-39 corridor in northern Illinois. You can see the Platte River in eastern and central Nebraska. You can see Columbus, Indianapolis, Peoria, and Cleveland but not Toledo or Des Moines. Cities and rivers are at scales finer than counties. A figure that's captioned as presenting data on a "by county" basis is mislabeled if you're seeing details smaller than counties.

The explanation turns out that the figure really isn't on a per county basis in a weird sort of way but you have to go to the source of the data to find that out. The source of the figure turns out to be respectable and reputable. The data and the figure both are from a very recent USGS report on pesticide usage in the USA. The complete citation is: Thelin, G.P., and Stone, W.W., 2013, Estimation of annual agricultural pesticide use for counties of the conterminous United States, 1992–2009: U.S. Geological Survey Scientific Investigations Report 2013-5009. You can also find it online at (accessed 18 Sept 2013). The authors of this USGS report did something kinda strange with their data and I'm left wondering why they bothered since it strikes me as somewhat counter-intuitive. Here's their explanation from the USGS webpage that explains how they made the pesticide usage maps in their report:

Individual crop types....were reclassified to simply differentiate agricultural land (including pasture and hay) from non-agricultural land....then generalized to one square kilometer cell size and the percentage of agricultural land for each cell was calculated. The proportion of county agricultural land included in each one square kilometer cell was multiplied by the total county use for each pesticide to calculate the proportional amount of use allocated to each cell. To display pesticide use on the annual maps for each compound, all of the cell values nationwide for the entire period were divided into quintiles and a color-coded map was generated for each year; the quintile classes were converted to pounds per square mile.


You follow all of that? They proportioned out the farm land in each county by one kilometer cells, allocated to each cell the amount of pesticide known for the county multiplied by the proportion of farmland in the cell, and then rebinned it all to present it on one national map in units of pounds of pesticide used on a per square mile basis. At the scale of the entire country, this conversion from kilometers to miles is a monstrous amount of work which would not change the level of detail one could see on the maps in their report. For their purpose, the conversion step was essentially superfluous!

One last thing. If you sit down and actually read this USGS report, you'll discover that the usage numbers for almost all the pesticide and herbicide data broken out by county is estimated based on statewide data.

My brain hurts.

Wednesday 20 March 2013

Black Lung Cases Go Up While Black Lung Deaths Go Down.

While pondering a return to deaths caused by nuclear energy, I decided to look at the number of nuclear industry deaths vs. the number of coal mining and coal-burning power plant deaths. Doing this right should involve not only direct deaths (i.e. death by industrial accident) but also indirect deaths from chronic occupational diseases. As I was collecting my data, I spotted a handful of news reports from last Summer claiming a resurgence of black lung disease. Two of those reports were done by NPR, a news outlet I like and usually trust for unbiased news.

NPR reported an "surge" of black lung cases (, accessed 3/21/13):

Incidence of the disease that steals the breath of coal miners doubled in the last decade, according to data analyzed by epidemiologist Scott Laney at the National Institute for Occupational Safety and Health (NIOSH).

The NPR report showed the following graphic on their website as support for their statement and cited the National Institute for Occupational Safety and Health ("NIOSH") data (ibid.):


The NPR report along with several other news reports (e.g.,, accessed 3/20/13) claim an increase in black lung cases, especially in coal miners with over 25 years experience and in miners with relatively short experience. The upward trend in the bins on the far right of their graphic appears to support this. (Since this data is based on a NIOSH program for screening miners to find black lung symptoms, I'm labeling this as "screened-miner data" in the rest of this post.)

There's a problem here. CDC numbers for black lung incidence have some big data gaps. Here's the raw NIOSH data fresh off of the CDC website (; accessed 3/20/13):


It is a bit disturbing that in categories with no data, there is a percentage reported. This is a problem since the missing data makes it impossible to test the claim that decadal black lung rates have doubled. There is a possible workaround and that is to use the number of annual black lung deaths. People with advanced-stage black lung do not live long so any increases in the number of new black lung cases should be reflected in the number of black lung deaths. Oddly, this isn't apparent in the death statistics. Here's NIOSH's own graphic for black lung cases by year (; accessed 3/20/13).:


There is a way to possibly reconcile the news report claims and the actual raw NIOSH data. The news reports look at the percentage of screened miners with black lung symptoms as revealed by chest x-rays, whereas the raw death statistics deal with death only. Because of this, it is possible that a real increase in black lung cases has not yet had time to impact the reported rates of black lung deaths. If this is the case, then there should be an increase in annual black lung deaths in the immediate future.

There is a second possibility to account for the uptick shown in the NPR graph for the most experienced miners. The NPR graph lumps all the screened-miner data into five year averages. Given the obvious wobble in the annual NIOSH death figures, the apparent increase in the screened-miner averaged data could be a statistical fluke. It's an old trick to massage one's numbers when binning by changing the bin size or shifting the bin position. The trends shown on the NPR graph of NIOSH data are not as nice nor as conclusive is one uses a smaller bin size. Any real trend of increasing black lung cases should be as apparent in the annual data (bin size = 1 year) and in the half-decade data (bin size = 5 years). Here's the NIOSH balck lung incidence data plotted by year:


Looking at the black lung rate data on an annual basis shows that there is a lot of variability from year to year. The one period that this is not true is the 1990s where the rates smoothed out. Given the overall variability, it is possible that the hypothesized increase from the 1990s to the 2000s is really the result of data variability. Given the low overall numbers of black lung cases, variability is not at all surprising. This is how a lot of small datasets behave. At this point, one can argue that the 1990s data are the odd man out here due to their lack of variability. Such a hypothesis is equally plausible compared to a claim that black lung cases have doubled. The variability in the annual plot of black lung rates calls the decadal increase in black lung incidence into question. Given the small number of data points and the gaps in the discrete data, the increased black lung incidence rates are a dataset with some troubles.

Another problem with trying to use the screened-miner data is that the screening may not be representative of all miners because NIOSH screening for black lung is voluntary. There is no real control on who gets screened. A further factor involves where NIOSH collected their data. NIOSH offered screening to miners in 16 states; however, NIOSH offered enhanced additional screening to underground miners in just the "hot spot" states of Virginia, West Virginia and Kentucky (, accessed 3/21/13). This raises the possibility of real bias in the NIOSH screened-miner data both by area and by mine type (underground vs. surface).

Regardless of the decreasing death rate, researchers at NIOSH do believe that the number of black lung cases is increasing (e.g., CDC, Pneumoconiosis and advanced occupational lung disease among surface coal miners--16 states, 2010-2011: MMWR Morb Mortal Wkly Rep. 2012 Jun 15;61(23):431-4); however, even if the black lung rate doubled from the 1990s to 2000s as reported by NPR, that rate would still be an order of magnitude less than rates for 1970s. And this is what NPR labeled as a "surge" in black lung cases. It is worth noting that the news reports appear to have targeted and emphasized the increased number of black lung cases in the youngest and oldest miners when compared to the non-sensational presentation of data on the NIOSH website and in peer-reviewed studies by NIOSH researchers. A quick cruise through recent papers and abstracts on tells a different story from the news reports. After looking at long-term rates of black lung, the only less-than-trivial increase in black lung disease was in underground miners in Central Appalachia (ibid.). Small underground coal mines were singled out as having five times the rate of black lung compared to large mines, especially in Appalachia. Oddly, x-ray images of surface coal miners showed an unexpected incidence of silicosis along with some observations of black lung. (Laney AS, Attfield MD. (2010) : Coal workers' pneumoconiosis and progressive massive fibrosis are increasingly more prevalent among workers in small underground coal mines in the United States. Occup Environ Med. 2010 Jun;67(6):428-31. doi: 10.1136/oem.2009.050757.) It was the news reports which made a big deal out of the relative increase in black lung cases, not NIOSH.

Frankly, it's a mess. Only time will tell if the black lung death rate catches up with the NIOSH screened-miner black lung symptoms data. Given the problems with the black lung incidence rates, using the death stats as a surrogate has great appeal. The death stats have none of the problems that the rate data have. The virtue of death statistics is their simplicity. There is usually no second-guessing or doubts of biasing with death stats. The screened-miner data is really a mess in comparison. While I'm not completely sure that someone was wrong on the internet, it is more than certain that someone was confusing!

Pushing the data around masks the ongoing tragedy of black lung disease. While an increase in cases for the whole country is debatable, there is data to support that black lung cases in Central Appalachia and in small underground mines really have increased. Black lung disease in this country greatly decreased after 1970 because of the regulation of coal dusts that started in 1969. This is a clear cause and effect relationship between regulation and desired result. If the coal dust regulations are faithfully followed, black lung cases become increasingly rare. The tragedy here is that black lung is one of the truly preventable occupational diseases. Arm waving about data trends and variability will not make the black lung "hot spot" in Appalachia go away - only better enforcement of the coal dust regulations will do that.

You may want to note that the Mining Safety and Health Administration's budget for coal mine inspection and safety enforcement has been steadily cut for the last two decades so enforcement of the coal dust regulations is now uncommon compared to 30 years ago.

Anyone can play with the black lung data compiled by NIOSH at: (accessed 3/21/13).