General

Getting Stock Quote from www.settrade.com

Very thanks to  m3rLinEz whose page gave me an information on catch-up stock data from http://www.settrade.com.

Please take a look his code in Python language via this link : http://www.solidskill.net/post/Fetch-SET-Data-with-Python.aspx : those code accomplish of 3 step since a html page requesting until a html page parsing.

That’s >

  1. To get data: urllib2
    http://docs.python.org/library/urllib2.html#urllib2.urlopen
    http://personalpages.tds.net/~kent37/kk/00010.html
    http://www.doughellmann.com/PyMOTW/urllib2/
    http://www.voidspace.org.uk/python/articles/urllib2.shtml
  2. High tolerance HTML parser: Beautiful Soup
    http://www.crummy.com/software/BeautifulSoup/documentation.html
    http://segfault.in/2010/07/parsing-html-table-in-python-with-beautifulsoup/

However, there’s still trouble on HTML requesting. So, I need to modify some of them as followed.

'''
Created on Oct 16, 2010

@author: ipas
'''

class DayData:
 date        = None
 open        = None
 close       = None
 max         = None
 min         = None
 volume      = None
 value       = None
 set_index   = None

 def __init__(self):
 pass

 @staticmethod
 def to_datetime(text):
 from datetime import datetime
 import re 
 text = str(text)
 text = re.sub('<td[^>]+>', '', text, re.I)
 text = re.sub('</td>', '', text, re.I)                
 ds = [ int(x) for x in text.split('/') ]
 ds[2] = ds[2] + 2000
 return datetime(ds[2], ds[1], ds[0])
 
 @staticmethod
 def to_decimal(text):
 from decimal import Decimal
 import re 
 text = str(text)
 text = re.sub('<td[^>]+>', '', text, re.I)
 text = re.sub('</td>', '', text, re.I)
 return Decimal(text.replace(',',''))


class SETFetch:

 @staticmethod
 def fetch(symbol):
 import urllib2
 from BeautifulSoup import BeautifulSoup
 import re
 
 set_url_format  = 'http://www.settrade.com/C04_02_stock_historical_p1.jsp?txtSymbol=%s&from=%d'
 cur_pos         = 1
 all_data        = []

 # Loop read many page
 while True:                         
 request   = urllib2.Request(set_url_format % (symbol, cur_pos))
 #request.add_header('If-Modified-Since', 'Sun, 02 Mar 2008 04:00:08 GMT')    
 request.add_header('User-agent', 'Mozilla/5.0')
 response  = urllib2.urlopen(request)
 page      = response.read()

 # Clean dirt html
 text = page 
 while True: 
 oldtext = text
 text = re.sub('<form[^>]+>', '', text, re.I)
 text = re.sub('</form>', '', text, re.I)        
 text = re.sub('<input[^>]+>', '', text, re.I)        
 text = re.sub('<img[^>]+>', '', text, re.I)
 text = re.sub('</a>', '', text, re.I)
 if text == oldtext:
 break
 page = text

 # HTML
 soup      = BeautifulSoup(page)
 read_data = soup.findAll('tr', 'tdbg_gray20') + soup.findAll('tr', 'tdbg_white20')
 
 if len(read_data) == 0: # no more data to read
 break
 
 cur_pos  = cur_pos + len(read_data)
 all_data = all_data + read_data
 break # --debug--

 
 # Converse to decimal data        
 daily = []
 for day in all_data:
 flds    = day.findAll('td')            
 current = DayData()
 current.date        = DayData.to_datetime(flds[0])            
 current.open        = DayData.to_decimal(flds[1])
 current.max         = DayData.to_decimal(flds[2])
 current.min         = DayData.to_decimal(flds[3])
 current.close       = DayData.to_decimal(flds[5])
 current.volume      = DayData.to_decimal(flds[8]) * 1000
 current.value       = DayData.to_decimal(flds[9])
 current.set_index   = DayData.to_decimal(flds[10])
 daily.append(current)    
 
 daily.sort(key=lambda x: x.date, reverse=True)
 return daily


if __name__ == '__main__':
 print 'test'
 daily = SETFetch.fetch('ptt')
 for day in daily:
 print day.date, day.close, day.value
 

All data were gathered from http://www.settrade.com.
http://marketdata.set.or.th/mkt/mainboardstocklistresult.do
http://www.settrade.com/C04_02_stock_historical_p1.jsp

 

Advertisements
มาตรฐาน

ใส่ความเห็น

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / เปลี่ยนแปลง )

Twitter picture

You are commenting using your Twitter account. Log Out / เปลี่ยนแปลง )

Facebook photo

You are commenting using your Facebook account. Log Out / เปลี่ยนแปลง )

Google+ photo

You are commenting using your Google+ account. Log Out / เปลี่ยนแปลง )

Connecting to %s