AZDuffman
AZDuffman
  • Threads: 242
  • Posts: 14070
Joined: Nov 2, 2009
January 26th, 2011 at 7:14:24 AM permalink
I need an idea or two how to set a spreadsheet up to analyze baseball streaks. Let me open by saying I have fairly good spreadsheet knowlege but am having a hard time figuring how to get this one to work. First, the thought process.

Any book on sports betting says to be serious you need to know enough to bet on baseball. No other sport has a many games or goes on as long. And in summer if you don't bet baseball there is no other real sports action going unless you want to learn foreign sports. I am not learning crickett. Baseball is known to be the stat-junkee's heaven. I'm not going to the level of figuring RHP vs LHH in a certain park. I don't have the time. But there is something I notice about team win-loss streaks:

1. They are printed in the box scores for eveyone to see
2. They are thus easy to track
3. They rarely reach 10 or more, espically in wins
4. Betting against the winning-streak team should give beter than 1-1 odds.

So what I plan to do is take last years numbers and count how long the win or loss streaks went as well as if they "borke" home or away more often in a meaningful way. I will number the games 1-162 and then put a "W" or "L" for win/loss then "H" or "A" for home/away. Thus the cells in the spreadsheet will look like this:



TEAM: YANKEES

1 W H
2 W H
3 L H
4 W A
5 W A

One team per sheet.

This is where I get stumped. In the next cell I suppose I could put a command say "IF C3 = C4 THEN +1 ELSE 0" for the column. The some kind of running total with an if/then qualifier. Would that work? Then I would need to put the grand totals on a final page, which I might be able to figure out but am stumped before I see it all. So, any ideas on the spreadsheet construction?
All animals are equal, but some are more equal than others
weaselman
weaselman
  • Threads: 20
  • Posts: 2349
Joined: Jul 11, 2010
January 26th, 2011 at 8:15:50 AM permalink
Quote: AZDuffman


This is where I get stumped. In the next cell I suppose I could put a command say "IF C3 = C4 THEN +1 ELSE 0"


Yeah, should work, I think. The correct syntax is something like this: "if(c3=c4;d3+1;0)" (I think you may need to replace semicolons with comas depending on which software you use - IIRC, it's semicolons in excel and open office, but comas in google docs). It also seems more logical to use "1" instead of "0", but that's a question of taste :)
"When two people always agree one of them is unnecessary"
goatcabin
goatcabin
  • Threads: 4
  • Posts: 665
Joined: Feb 13, 2010
January 26th, 2011 at 12:46:28 PM permalink
Quote: AZDuffman

But there is something I notice about team win-loss streaks:

1. They are printed in the box scores for eveyone to see
2. They are thus easy to track
3. They rarely reach 10 or more, espically in wins
4. Betting against the winning-streak team should give beter than 1-1 odds.



Why?
Cheers,
Alan Shank
Woodland, CA
Cheers, Alan Shank "How's that for a squabble, Pugh?" Peter Boyle as Mister Moon in "Yellowbeard"
AZDuffman
AZDuffman
  • Threads: 242
  • Posts: 14070
Joined: Nov 2, 2009
January 26th, 2011 at 12:54:33 PM permalink
Quote: goatcabin

Why?



If you are going to bet sports it is as good a reason as any. Did a little of it last year and was up over a month.
All animals are equal, but some are more equal than others
boymimbo
boymimbo
  • Threads: 17
  • Posts: 5994
Joined: Nov 12, 2009
January 26th, 2011 at 1:01:07 PM permalink
You can test your hypothesis by downloading results from baseball-reference.com.

You can click on any team, and get the schedules and scores for that year, by game

You can then copy and paste the results into a spreadsheet. For example, here is the first 22 games of Toronto's illustrious season last year.

The Win/Loss result is in Column D
The Loss Streak is Column E
The Win Streak is Column F

Rk Tm Opp Result Lose Win
1 TOR TEX L =IF(D10="L",1,0) =IF(D10="W",1,0)
2 TOR TEX W =IF(D11="L",E10+1,0) =IF(D11="W",F10+1,0)
3 TOR TEX W =IF(D12="L",E11+1,0) =IF(D12="W",F11+1,0)
4 TOR BAL W =IF(D13="L",E12+1,0) =IF(D13="W",F12+1,0)
5 TOR BAL W =IF(D14="L",E13+1,0) =IF(D14="W",F13+1,0)
6 TOR BAL W =IF(D15="L",E14+1,0) =IF(D15="W",F14+1,0)
7 TOR CHW L =IF(D16="L",E15+1,0) =IF(D16="W",F15+1,0)
8 TOR CHW W =IF(D17="L",E16+1,0) =IF(D17="W",F16+1,0)
9 TOR CHW L =IF(D18="L",E17+1,0) =IF(D18="W",F17+1,0)
10 TOR CHW W =IF(D19="L",E18+1,0) =IF(D19="W",F18+1,0)
11 TOR LAA L =IF(D20="L",E19+1,0) =IF(D20="W",F19+1,0)
12 TOR LAA L =IF(D21="L",E20+1,0) =IF(D21="W",F20+1,0)
13 TOR LAA L =IF(D22="L",E21+1,0) =IF(D22="W",F21+1,0)
14 TOR KCR W =IF(D23="L",E22+1,0) =IF(D23="W",F22+1,0)
15 TOR KCR W =IF(D24="L",E23+1,0) =IF(D24="W",F23+1,0)
16 TOR KCR L =IF(D25="L",E24+1,0) =IF(D25="W",F24+1,0)
17 TOR TBR W =IF(D26="L",E25+1,0) =IF(D26="W",F25+1,0)
18 TOR TBR L =IF(D27="L",E26+1,0) =IF(D27="W",F26+1,0)
19 TOR TBR L =IF(D28="L",E27+1,0) =IF(D28="W",F27+1,0)
20 TOR BOS L =IF(D29="L",E28+1,0) =IF(D29="W",F28+1,0)
21 TOR BOS L =IF(D30="L",E29+1,0) =IF(D30="W",F29+1,0)
22 TOR BOS L =IF(D31="L",E30+1,0) =IF(D31="W",F30+1,0)
----- You want the truth! You can't handle the truth!
guido111
guido111
  • Threads: 10
  • Posts: 707
Joined: Sep 16, 2010
January 26th, 2011 at 3:03:25 PM permalink
Quote: boymimbo

You can test your hypothesis by downloading results from baseball-reference.com.


I have always just imported a table into Excel, deleted the columns I did not want and I was set.
baseball-reference.com season tables has a streak column at the far right, but it is quite useless IMHO.
-
--
+
++
+++
What is that all about?
I use =Len() function with a few IFs and get something like this.

Gm# Streak loss win
1 - 1 0
2 -- 2 0
3 + 0 1
4 ++ 0 2
5 - 1 0
6 -- 2 0
7 + 0 1
8 - 1 0
9 + 0 1
10 ++ 0 2
11 - 1 0
12 + 0 1
13 - 1 0
14 + 0 1
15 - 1 0
16 -- 2 0
17 + 0 1
18 - 1 0
19 -- 2 0
20 --- 3 0
21 ---- 4 0
22 ----- 5 0
23 + 0 1
24 ++ 0 2
25 +++ 0 3


Good Luck breaking streaks!
AZDuffman
AZDuffman
  • Threads: 242
  • Posts: 14070
Joined: Nov 2, 2009
January 30th, 2011 at 11:31:51 AM permalink
One more formula idea needed. Thank you both for the great site (I wasted lots of time at MLB.com downloading schedules to do it manually, oh well!) I have been able to copy teams in and get the streak results. But now I am at a loss where to go for the next part(s.)

I want to be able to put a summary at the bottom and show how many streaks there were of a given legnth. ie: there were 5 win streaks of 2 games, 6 of 3 games, etc. I am near a total loss on this one. Is there a openoffice command to "rank the modes?" Not even sure if that is the right terminology.

My final goal is to be able to get something like the following. Using any team for an example:


In 2010 there were "x" streaks of at least 1 win in a row
there were "y" streaks of at least 2 wins in a row
there were "z" streaks of at least 3 wins in a row

Then the same for losses, though I am looking more to bet against a winning streak. So a .500 team would have at least 81 streaks of at least 1 win in a row. Then put it all on a summary sheet for all of MLB.

Or am I thinkng on this wrong? My goal is to find when say 50% of winning streaks break.
All animals are equal, but some are more equal than others
guido111
guido111
  • Threads: 10
  • Posts: 707
Joined: Sep 16, 2010
February 27th, 2011 at 11:24:40 PM permalink
Quote: AZDuffman


Then the same for losses, though I am looking more to bet against a winning streak.
So a .500 team would have at least 81 streaks of at least 1 win in a row. Then put it all on a summary sheet for all of MLB.

Or am I thinkng on this wrong? My goal is to find when say 50% of winning streaks break.



Spring training around the corner. How are your studies coming?
Just in case you have not figured out the answer to your question.
Yes, your math is off.
81.5 would be the total average of win and lose runs. (streaks)

For a 162 game MLB season, .500 win percentage
Average # of win streaks(run is the math term) is 40.750
Mode 42
SD 3.18
10 million season sim distribution results

w runs freq freq/100
25 2 0.00%
26 7 0.00%
27 22 0.00%
28 99 0.00%
29 406 0.00%
30 1302 0.01%
31 4142 0.04%
32 11425 0.11%
33 28855 0.29%
34 65642 0.66%
35 134208 1.34%
36 247190 2.47%
37 414186 4.14%
38 628217 6.28%
39 862772 8.63%
40 1076148 10.76%
41 1214810 12.15%
42 1242931 12.43%
43 1156223 11.56%
44 975464 9.75%
45 745298 7.45%
46 515560 5.16%
47 323582 3.24%
48 184150 1.84%
49 94106 0.94%
50 44163 0.44%
51 18473 0.18%
52 7069 0.07%
53 2506 0.03%
54 771 0.01%
55 216 0.00%
56 45 0.00%
57 7 0.00%
58 3 0.00%


The expeced number of runs of each length

run EV
1 20.50
2 10.19
3 5.06
4 2.52
5 1.25
6 0.62
7 0.31
8 0.15
9 0.08
10 0.04


I do not use open office, only Excel so I could assist you if you have access to Excel.
My son's friend's father passed away last week and he gave my son his dads laptop. Nothing fancy but a lot of Excel 2007 folders. Says he was into gambling math.
I have looked through it and could not figure out how he did many of his formulas found in his worksheets. To make a long story short, I found all his hidden folders and have found some treasures.
They are in Excel 2007 and would have to be converted back to Excel .xls format instead of .xlsx if you have access to Excel before version 2007.

Here are a few snap shots of what I threw together for the 2009 LA Dodgers season.
Let me know if you are interested and I could send you some formulas or a copy of the worksheet.

This photo shows the distribution of each streak, it was on a different page but could easily be incorporated onto the same worksheet as below. The grey cells are the expected values.

This shows the running count in numbers in columns as well as a graph.


Have fun busting win streaks this season!

Batter Up!
  • Jump to: