oldernow
2024-04-24 14:14:44 UTC
Reply
Permalinkonline text, and whoever owns the site containing it keeps
the format the same for fairly long periods of time?
A couple years ago, I wrote a script called "mlb" that
scrapes a site showing American Major League Baseball
standings (it's not the official mlb.com site, as it
is utterly useless for that for being a typical modern
javascript nightmare). And it still works! I present it
as a possible fairly simple example of how to leverage
Lua for such in case a Lua neophyte chances upon this post:
(Pardon the absence of comments and intervening empty
lines, but I no longer believe in such, because I generally
can't even figure out what my comments mean no matter
how hard I try to make them enduringly meaningful, and I
prefer being able to see more code at once that aspects
of it being set apart by empty lines.)
(In a nutshell, it gets formatted web page output
via "elinks -dump", parses is, and presents just the
information I'm interested in via "less -X" (I can't
remember what the -X accomplishes, and the 'less' man page
didn't clarify what I apparently understood back when I
wrote the script).)
----------------------------------------------------------
#! /usr/bin/env lua
local show = false
local function remove_playoff_indicators(s)
s = string.gsub(s, ' x%-', ' ')
s = string.gsub(s, ' y%-', ' ')
return s
end
local out = io.popen('less -X', 'w')
local handle = io.popen('elinks -dump https://www.baseball-reference.com/leagues/MLB-standings.shtml')
for line in handle:lines() do
line = remove_playoff_indicators(line)
if show then
if string.match(line, 'Major League Baseball Detailed Standings') then
break
else
line = string.gsub(line, '%[%d+%]', ' ')
if not string.match(line, '^%s*$') and string.match(line, '^%s') then
out:write(line .. '\n')
end
end
else
if string.match(line, 'East Division Table') then
show = true
line = string.gsub(line, '%[%d+%]', ' ')
if not string.match(line, '^%s*$') and string.match(line, '^%s') then
out:write(line .. '\n')
end
end
end
end
handle:close()
out:close()
----------------------------------------------------------
How the output in "less" looks at the moment:
----------------------------------------------------------
East Division Table
Tm W L W-L% GB
New York Yankees 16 8 .667 --
Baltimore Orioles 15 8 .652 0.5
Boston Red Sox 13 11 .542 3.0
Toronto Blue Jays 13 11 .542 3.0
Tampa Bay Rays 12 13 .480 4.5
Central Division Table
Tm W L W-L% GB
Cleveland Guardians 17 6 .739 --
Detroit Tigers 14 10 .583 3.5
Kansas City Royals 14 10 .583 3.5
Minnesota Twins 9 13 .409 7.5
Chicago White Sox 3 20 .130 14.0
West Division Table
Tm W L W-L% GB
Seattle Mariners 12 11 .522 --
Texas Rangers 12 12 .500 0.5
Los Angeles Angels 10 14 .417 2.5
Oakland Athletics 9 15 .375 3.5
Houston Astros 7 17 .292 5.5
East Division Table
Tm W L W-L% GB
Atlanta Braves 16 6 .727 --
Philadelphia Phillies 15 9 .625 2.0
New York Mets 12 11 .522 4.5
Washington Nationals 10 12 .455 6.0
Miami Marlins 6 19 .240 11.5
Central Division Table
Tm W L W-L% GB
Milwaukee Brewers 14 8 .636 --
Chicago Cubs 14 9 .609 0.5
Cincinnati Reds 13 10 .565 1.5
Pittsburgh Pirates 13 11 .542 2.0
St. Louis Cardinals 10 14 .417 5.0
West Division Table
Tm W L W-L% GB
Los Angeles Dodgers 14 11 .560 --
San Diego Padres 13 13 .500 1.5
Arizona Diamondbacks 12 13 .480 2.0
San Francisco Giants 12 13 .480 2.0
Colorado Rockies 6 18 .250 7.5
(END)
----------------------------------------------------------
Ain't that a thing of pure beauty to any "terminal first"
- or, better yet, "terminal only" - types out there?
Also, go Yankees and Brewers! :-)
--
oldernow
xyz001 at nym.hush.com
oldernow
xyz001 at nym.hush.com