This paper explores the relationship between the average number of attending fans for each baseball team, along with the temperature of the city the team is located, with the success of that team in its last home game. Although it is expected that other factors besides the ones mentioned can strongly influence which team wins or loses a baseball game, it is hypothesized that the nicer the weather and the more people attending, the better the home team plays. What is found throughout the course of the study is that common sense does not always prevail and there may be times where variables that might be linked to determination are in fact no significant at all.
There is nothing like baseball on a sunny, warm spring afternoon into the evening. Baseball is a sport that has long been considered as America’s favorite past-time. It is a simple sport involving two teams taking turns swinging at a ball thrown by the opposing team with some form of a bat. The professional record of baseball started with a New York metropolitan sudden obsession with the sport during the mid-1800s with the National Association of Base Ball Players (Sullivan). The association lasted from 1871 to 1875 until the first formally structured league was formed, The National League. Shortly thereafter in 1901, the American League joined the National League as a formal, professional circuit of competitive baseball. Today, the same two leagues exist and the sport is as popular as ever. Each year, the champions from the American and National Leagues compete in an event known as “The World Series of Baseball” in which both teams compete for the ultimate championship. The excitement generated from the event and others leading up to the event serves as a social and cultural bond among people in society, despite their differences. The 1980s era strengthened that cultural shift in sports. Today, baseball is still considered one of America’s favorite sports and a major source of leisure and recreation for Americans of all backgrounds. This paper provides a statistical analysis involving the temperature and average attendance versus the home team’s ability to sure. It is hypothesized that the better the weather and the higher the average attendance, the better the team’s performance as far as runs scored.
The most important part of statistical analysis is acquiring raw data. Without data, there would be nothing to analyze. In the case of baseball, data has been well kept long before the widespread use of databases and electronic recording systems. Some general statistics kept on baseball are the average attendance throughout the course of a season, the total number of attendance throughout the course of an entire season, the leagues average total attendance for the entire season, team winning percentage. There are also extensive records kept on batting and pitching averages. There is even a website known as “wunderground” that displays all cities where major league baseball teams are playing and the weather in that city. As far as statistical analyses, baseball is a great sport to study because of the availability of statistics.
There are several different sources of data used in this statistical analysis. First, since the study uses temperature and attendance as dependent variables, an issue needs to be decided. The attendance statistics readily available through the Major League Baseball website are displayed as the teams’ average for the entire year. This would be ok if the study were comparing the average temperature of the city the team is located in for the entire year. However, weather carries a wide variation due to different season caused by the earth’s tilt on its axis (Earth’s). Therefore, this study calls for a game-by-game analysis. Unfortunately, the only easily accessible records for attendance are the various teams’ average attendance over the course of an entire season. To solve this difference in “scale”, this study assumes that the average attendance for each game during the year is constant each game, leaving the only variation in weather and runs scored. The data that has been collected is displayed in a table located in the appendix of the paper. The names and attendance records of the teams were collected from the baseball almanacs attendance records. For each team, the average attendance for each year is listed. The records go as far back as the early 1900s. There are many new teams that have emerged since then, meaning their entire history of average attendance is recorded. This type of information is difficult to find for other sports. The weather was simply collected from The Weather Channel’s website, city by city on or around the date of May 7, 2013. The temperature for each city varies from 65 degrees to 84 degrees. Finally, the runs scored by each team at their last home game are shown as the independent variable of the experiment.
There are several different statistics that can be pulled from the data acquired. The average attendance of the baseball teams is 30893 attendants. Per game, per team, it can be clearly seen that baseball is a lucrative industry. The average temperature for the games sampled is 73.33, perfect temperature not being too hot nor too cool. The average runs scored per game is about 4 runs. This shows a certain level of excitement can be expected during each game, whereas other sports like soccer can continue on for some time without any scoring. In each variable, the median falls slightly short of the average. There is no mode for attendance due to the wide range, but the temperature had two modes (69 and 76) and runs with a single mode of 4.
The averages for each of the variables are helpful to note in order to have an idea of the central tendency of the data. The variances for each of the variables are a little less helpful in that they do not present much of any new information about the variables. Nonetheless, finding the variance is essential to deriving the standard deviation. The standard deviation can arguably be considered as the most important variable because it shows the area surrounding the mean that a bulk of the population is located. Noting the descriptive statistics of the independent and dependent variables, the following two tables show the individual relationship between temperature and attendance correlated with runs at home scored.
Initially, it appears there may be a positive relationship between the temperature and runs scored in the last home game. However, it appears as if there is no relationship between the average attendance and runs scored in the last home game. It is important to note the relationship between the individual variables before moving onwards towards the all-encompassing regression analysis. It can already be predicted that average attendance will have little determining effect on the number of runs scored in the last home game. At the same time, it may be possible that the sample selected is not representative of the correlation as the whole population of data or the effects of attendance may have different roles during the year. In May, when the season is just beginning, the crowd may be enjoying the game peacefully and watching what goes on. But near October when playoffs are near, the crowd’s energy can intensify greatly, boosting the morale of the home team while disturbing the concentration of the visiting team. As of now, it appears as if there are more factors that may affect the runs scored in the last home game.
This section provides and analyzes the frequency distribution of each variable and the implications it has for the study. The following is the frequency distribution for each variable:
First is the frequency distribution for average attendance. It appears the average is around 25,000 to 30,000, but then there are those major teams that attract averages between 40,000 and 45,000.
The results of the frequency distribution show an even spread surrounding the mean, a significant difference from the FD of temperature. The temperature cure seems to be slightly less normally distributed, with the average temperature being between 70-74 degrees with a couple of outliers in the southern cities like Houston.
Finally, the frequency distribution for seems to be centered between 1 run and 4 runs with the higher runs scored pulling the average up to four. However, 66% of the runs scored fall in between 1 and 4. Interesting to note is no home team in our sample scored 0 runs.
These frequency distribution histograms are helpful in visualizing how the data for each variable relates to itself. It is important when moving forward in the study.
Before the study can proceed with hypothesis testing, a regression analysis using the variables must be constructed in order to show the relationship between the independent and dependent variables. For the purposes of this study, and due to the differences in scale between the two independent variables and dependent variable, the method involved will use a rating system to rate the attendance and weather data. Since it is hypothesized at the beginning of the study that the higher the temperature and attendance rate of the team the higher the runs scored during the last home game, data near the bottom of the distribution will be rated as “1” and ratings towards the stop will be rated as “10”. The ratings for each variable for team is listed in the appendix.
The method used for rating each variable depended on the range of each variable. The range, as identified in the “Presentation of the Data” section, was divided by 10 to get 10% of the range. For every 10%, the statistic was from the lowest of the range, its rating received a point. For example, the range for temperature is 20 degree. Dividing 20 by 10 results in 2 being 10% of the range. Every two degrees from the lowest temperature recorded, in this case 65 degrees, resulted in a rating increase by one. 69 degrees qualifies as a 2 rating on the 1 out of 10 scale. Based on this type of rating assignment, the study can test the hypothesis of whether or not these variables have an effect on the runs scored during a team’s last home game. The rating for attendance was added to the rating for temperature and then correlated with the ratings for runs scored at home in the following regression analysis.
As it can be clearly seen, there is hardly any correlation between the attendances, temperature and the runs scored the last home game. In fact, the R2 of this specific sample regression equation shows an ever so slight negative relationship between the Total Rating and Runs at Home Rating. R2 is typically known as the coefficient of determination and it comes for the coefficient of correlation (r), but squared to disregard positive or negative.
Estimation with 95% Confidence. With an outdoors sport influenced by thousands of fans each game, could it be true that the attendance and weather of a specific teams stadium have little to no effect on a team’s performance? Do teams with better-subsidized stadiums equipped for weather conditions have a higher turnout? To address this question, our study looks to the standard hypothesis test at the 95% confidence interval. The first step in hypothesis testing is stating the null and alternative hypotheses. In this case:
H0 : Xbar (Sample Average) = μ (Population Average)
Ha : Xbar (Sample Average) ≠ μ (Population Average)
The next step is to identify the critical values. In order to break the confidence interval of 95%, a critical value of 1.67 is needed as the standard T-chart shows in the 95% confidence interval column and 30 degrees of freedom row. Next, our test statistic must be derived from the standard formula. Generally, this involves subtracting the sample average from the population average and dividing it by the standard deviation. In this context of this study, finding the population average for total rating versus runs scored at home rating for the entire population of games would involve thousands and thousands of combinations, making it very difficult to correlate in this specific study. This leads us to the lessons learned.
Baseball is a sport that has been around for generations. It is considered one of America’s favorite past times. It is because of American’s love for the sport that they keep such extensive documentation and records of each game for as long as the sport has been professionally played. Of all the records kept, there are not too many that attempt to correlate the weather with home team performance. There is also the concept of “home field advantage” which comes from teams performing better when their home crowd makes noise. It can be generally hypothesized by any reasonable person that the better the weather is, the better the performance of the players will be. It can also be hypothesized by any reasonable person that the more fans in attendance, the higher the team morale and the more likely they are going to win. However, as this study showed through a sample of 30 games, there is, in fact, little to no correlation between both of these variables and the success of the home performing team. Why is that?
Part of the reasons these factors do not affect baseball is because there are many other things that may come into play when determining the success of a home team. One important factor to consider is specifically who the pitcher is. If the visiting team decides to start their best pitcher, it may be a long while before the home team finally gets a hit. Another factor likely explaining why there is a lack of a relationship between attendance, temperature, and runs scored at home is the abundance of games played. Each team plays around 164 games a year in many different parts of the country. Because the season takes place from mid-April to mid-October, the weather is generally warm no matter where the team travels and is unlikely to cause concern. Finally, baseball is not much of an interactive sport. It requires a lot of market campaigning to get people in the stands. Therefore, although there may be a full stadium of people cheering, they are likely they’re enjoying their day instead of yelling and cheering every pitch. Therefore, there are many different factors that go into the success of a baseball team. It may be intuitive to assume that attendance and temperature affect the success of a home team (such as with Notre Dame's turnouts), but that assumption stands corrected as this study suggests.
There are many little things one may assume simply based on their common sense. This study showed that sometimes common sense can lead one to arrive at an incorrect conclusion. This was the case in this study. Although baseball is a well-documented sport, there is an overwhelming combination of teams and games played to test the population of games as a whole. In this sample study of 30 randomly selected games, the relationship between the independent variables, attendance and temperature, and the dependent variable, runs scored at home, the coefficient of determination was .00127. This means .127% of the dependent, the runs scored at home, is dependent on these variables. This means 99.873% of the data is explained by other factors. And it is with this, instead of the sample vs. population testing, that we conclude with over a 95% confidence interval that temperature and attendance do not significantly affect the success of a home team in the sport of baseball.
Baseball Attendance by Baseball Almanac. (2013) Baseball Attendance by Baseball Almanac. Retrieved from http://www.baseball-almanac.com/baseball_attendance.shtml>.
Earth’s Seasons - Zoom Astronomy. (n.d.) EARTH'S Seasons - Zoom Astronomy. Retrieved from <http://www.enchantedlearning.com/subjects/astronomy/planets/earth/Seasons.shtml>.
Freedman, D., Pisani, R., & Purves, R. (2007) Statistics. New York: W.W. Norton &, 2007.
National and Local Weather Forecast, Hurricane, Radar, and Report. (n.d.) The Weather Channel. Retrieved from <http://www.weather.com/>.
Scoreboard. (n.d.). Major League Baseball. Retrieved from http://mlb.mlb.com/mlb/scoreboard/index.jsp?tcid=nav_mlb_scoreboard.
Sports Weather. (n.d.) Major League Baseball Weather. Retrieved from http://www.wunderground.com/sports/MLB/>.
Sullivan, D. A. (1997). Early Innings: A Documentary History of Baseball, 1825-1908. Lincoln: University of Nebraska.