pw usermod tom -G ftpusers
Monthly Archives: April 2011
FreeBSD python
“The Ports Collection supports parallel installation of multiple Python versions. Ports should make sure to use a correct python interpreter, according to the user-settable PYTHON_VERSION variable.”
AHA! it was PYTHON_VERSION… So the only thing I need is to set up such variable in the user environment to get the correct version of the module installed:
prunus# setenv PYTHON_VERSION python2.4 prunus# make clean install
“PYTHON_VERSION=python2.6″ to /etc/make.conf.
TinyDNS Data File Syntax
Z defines the zone record & defines a name server @ defines an MX record + defines an A record ^ defines a PTR record = defines BOTH an A record and the PTR record at once C defines a CNAME -- DO NOT USE THESE *.domain can be used to create a wildcard
python SQL
Warning
Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
The correct way to pass variables in a SQL command is using the second argument of the execute() method:
>>> SQL = "INSERT INTO authors (name) VALUES (%s);" # Notice: no quotes
>>> data = ("O'Reilly", )
>>> cur.execute(SQL, data) # Notice: no % operator
domain name regular expression
ignore case
\b((?=[a-z0-9-]{1,63}\.)(xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,63}\b
bash
No filename expansion or word splitting takes place between [[ and ]], but there is parameter expansion and command substitution.
Using the [[ ... ]] test construct, rather than [ ... ] can prevent many logic errors in scripts. For example, the &&, ||, <, and > operators work within a [[ ]] test, despite giving an error within a [ ] construct.
Python Webscraper
Use Scrapy.
It is a twisted-based web crawler framework. Still under heavy development but it works already. Has many goodies:
- Built-in support for parsing HTML, XML, CSV, and Javascript
- A media pipeline for scraping items with images (or any other media) and download the image files as well
- Support for extending Scrapy by plugging your own functionality using middlewares, extensions, and pipelines
- Wide range of built-in middlewares and extensions for handling of compression, cache, cookies, authentication, user-agent spoofing, robots.txt handling, statistics, crawl depth restriction, etc
- Interactive scraping shell console, very useful for developing and debugging
- Web management console for monitoring and controlling your bot
- Telnet console for low-level access to the Scrapy process
Example code to extract information about all torrent files added today in the mininova torrent site, by using a XPath selector on the HTML returned:
class Torrent(ScrapedItem):
pass
class MininovaSpider(CrawlSpider):
domain_name = 'mininova.org'
start_urls = ['http://www.mininova.org/today']
rules = [Rule(RegexLinkExtractor(allow=['/tor/\d+']), 'parse_torrent')]
def parse_torrent(self, response):
x = HtmlXPathSelector(response)
torrent = Torrent()
torrent.url = response.url
torrent.name = x.x("//h1/text()").extract()
torrent.description = x.x("//div[@id='description']").extract()
torrent.size = x.x("//div[@id='info-left']/p[2]/text()[2]").extract()
return [torrent]
MySQLDB – parameterised SQL – “TypeError: not all arguments converted during string formatting”
One: I believe MySQLdb uses %s placeholders, not ?
>>> import MySQLdb
>>> MySQLdb.paramstyle
‘format’
From the DB-API PEP
paramstyle
String constant stating the type of parameter marker
formatting expected by the interface. Possible values are
[2]:
‘qmark’ Question mark style,
e.g. ‘…WHERE name=?’
‘numeric’ Numeric, positional style,
e.g. ‘…WHERE name=:1′
‘named’ Named style,
e.g. ‘…WHERE name=:name’
‘format’ ANSI C printf format codes,
e.g. ‘…WHERE name=%s’
‘pyformat’ Python extended format codes,
e.g. ‘…WHERE name=%(name)s’
Python, MySQLdb and UTF-8
import MySQLdb
db=MySQLdb.connect(user="guest",passwd="guest",db="dbname",use_unicode=True)
db.set_character_set('utf8')
c=db.cursor()
c.execute('SET NAMES utf8;')
c.execute('SET CHARACTER SET utf8;')
c.execute('SET character_set_connection=utf8;')
Massive Data stores
No company better illustrates the advantages of leveraging massive volumes of data for competitive advantage than Wal-Mart, which operates a data warehouse with, at last count, 583 terabytes of sales and inventory data built on a massively parallel 1,000-processor system from data-warehouse-technology vendor Teradata, an NCR Corp. subsidiary. While some companies might consider having more than half a petabyte of data overkill, at Wal-Mart it’s the way to do business.
“Our database grows because we capture data on every item, for every customer, for every store, every day,” Phillips says. Wal-Mart deletes data after two years and doesn’t track individual customer purchases, he says.
By refreshing the information its data warehouse holds every hour–1 billion rows of data or more are updated every day–Wal-Mart turned its data warehouse into an operational system for managing daily store operations. Store managers used to query the database at the end of the day to see what was selling at their location. Now they can check hourly and see what’s happening at stores throughout a region that might be experiencing an unusual event such as a snowstorm or hurricane
Wal-Mart certainly applauds that approach. The $200 billion retail giant runs its central-office applications on DB2 for mainframes. However, the company’s retail stores typically run applications on Informix for UNIX.
When Informix’s financial performance eroded in the late 1990s, Wal-Mart became nervous. The retail giant knew that migrating all of its data to DB2 was a difficult proposition. Seeking to protect its IT investments, Wal-Mart quietly suggested that IBM buy Informix, said Janet Perna, general manager of IBM’s data management software group.
In Winter Corp.’s most recent survey, conducted in mid-2005, the Yahoo Search Marketing database came out on top as the largest commercial database, with 100.4 terabytes of data running on an Oracle database and Unix-based Fujitsu-Siemens server. Second place went to AT&T Labs Research, which was running a 93.9-terabyte data warehouse using its proprietary Daytona database software running on a Unix-based Hewlett-Packard server. That system has since exceeded 100 terabytes, says David Browne, AT&T executive director of enterprise data warehousing.

