Python Data Mining Quick Start Guide
上QQ阅读APP看书,第一时间看更新

Databases

A relational database is one of the most common ways that enterprises can store data. So, loading from and interacting with databases is essential for most fieldwork. The Python library that we will use is sqlite3 and is included in Anaconda's package. Let's begin by connecting to the database, which is stored in a .db file, and included with the book materials. After we connect to the database, we will create a cursor object that we will use to traverse the object during a query. Next, we will select the entire contents of the boston table with the * condition and limit the rows to five (only so that we can display the output without overloading our console). Finally, we will execute() the query and fetchall(). The data returned from the query is controlled by the search teams (select and limit, in this case). The next section will introduce more common terms used for queries:

import sqlite3
sqlite_file = './data/boston.db'

# connecting to the database file
conn = sqlite3.connect(sqlite_file)

# initialize a cursor obect
cur = conn.cursor()

# define a traversing search
cur.execute("select * from boston limit 5;")

# fetch and print
data = cur.fetchall()
print(data)

The output from the print() statement is 5 records, each with 15 entries, which correspond to the rows and columns of the data table, respectively:

For our example, we use the included database ( .db) file in the place of an actual remote database. This means that we will connect to this file. In practice, you will connect to a remote location by using a network address and login credentials.