Get online data using HTTP python interface in Compose
Data aquisition online
In this part we’ll grab some data from Chicago public data portal, it serves large amount of data soruces in a variaty of domains, including administration, buildings, education, environment, etc. We’ll do some statistics about the schools in Chicago city, find the best 5 schools in math grading, parent engagement score.
Since the data source is provided on web for public download, we will leverage the Requests python package to download it. It is a popular HTTP library for Python to send web requests and handle responses.For detailed documentation of usage, please visit Requests docs.
First, in solidThinking Compose, create a new Python file, and enter following codes.
1 2 3 4 5 6 |
[crayon-6006c12f57261292342841 inline="true" class="python"]import requests headers = {'User-Agent':"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0"} data = requests.get('https://data.cityofchicago.org/api/views/9xs2-f89t/rows.csv?accessType=DOWNLOAD',headers=headers) f = open('C:/Users/compose_user/schools.csv', 'ab') f.write(data.content) f.close() |
[/crayon]
Next, you’ll find the CSV data table is created at prescribed location. Open this file with Microsoft Excel, it looks like
Note: if there are some errors from requests.get call in python command window, it could be network issue, just try a few more times.
Processing data with Compose
With the CSV data file, the next is using Compose to read the data and find out the information to our interests.
Run this command to read the csv file in Compose OML command window.
1 |
[crayon-6006c12f5726c706567097 inline="true" class="matlab"][num, txt, raw] = xlsread('c:\users\compose_user\schools.csv'); |
[/crayon]
We’ll look for schools with good math in grade 3-5, from excel we found the header lable Gr3-5 Grade Level Math % at column AM, in Compose we need to get the actual column index by running these commands.
1 2 3 4 5 6 |
[crayon-6006c12f57272936298539 inline="true" class="matlab"]for k = 1 : length(raw(1,:)) if strcmp(raw{1,k},'Gr3-5 Grade Level Math %') printf('Index %d - %s\n',k, 'Gr3-5 Grade Level Math %') end end Index 39 - Gr3-5 Grade Level Math % |
[/crayon]
It displays the column index 39, so we wi’ll display all shcools with grade level over 80%.
1 2 3 4 5 |
[crayon-6006c12f57277646340134 inline="true" class="matlab"]for k = 1: size(raw,1) if ~isstr(raw{k,39}) && raw{k,39} > 80 printf('%-70s %8d\n', raw{k,2}, raw{k,39}); end end |
[/crayon]
The results are displayed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
[crayon-6006c12f5727d459769570 inline="true" ]Index 39 - Gr3-5 Grade Level Math % Abraham Lincoln Elementary School 89 Andrew Jackson Elementary Language Academy 90 Annie Keller Elementary Gifted Magnet School 100 Christian Ebinger Elementary School 81 Edgar Allan Poe Elementary Classical School 94 Edgebrook Elementary School 88 Hawthorne Elementary Scholastic Academy 82 James E McDade Elementary Classical School 87 James G Blaine Elementary School 88 LaSalle Elementary Language Academy 81 Lenart Elementary Regional Gifted Center 93 Mark Skinner Elementary School 92 Oriole Park Elementary School 86 Orozco Fine Arts & Sciences Elementary School 82 Skinner North 100 South Loop Elementary School 81 Stephen Decatur Classical Elementary School 98 Thomas A Edison Regional Gifted Center Elementary School 92 |
[/crayon]
If we want to display the best 5 schools in math grades, following scripting will give the desired results.
1 2 3 4 5 6 7 8 9 10 11 12 |
[crayon-6006c12f57283254064452 inline="true" class="matlab"]math_scores = []; for k = 2: size(raw,1) if ~isstr(raw{k,39}) math_scores(k) = raw{k,39}; else math_scores(k) = -1; end end [m,idx] = sort(math_scores,'descend'); for k = 1: 5 printf('%-70s %8d\n', raw{idx(k),2}, raw{idx(k),39}); end |
[/crayon]
1 2 3 4 5 |
[crayon-6006c12f57288133372081 inline="true" ]Annie Keller Elementary Gifted Magnet School 100 Skinner North 100 Stephen Decatur Classical Elementary School 98 Edgar Allan Poe Elementary Classical School 94 Lenart Elementary Regional Gifted Center 93 |
[/crayon]
Similarly, running following script gives the high scores of parent engagement among all shcools.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
[crayon-6006c12f5728e637094346 inline="true" class="matlab"]for k = 1 : length(raw(1,:)) if strcmp(raw{1,k},'Parent Engagement Score') printf('Index %d - %s\n',k, 'Parent Engagement Score') parent_engage_index = k; end end parent_engage = []; for k = 2: size(raw,1) if ~isstr(raw{k,parent_engage_index}) parent_engage(k) = raw{k,parent_engage_index}; else parent_engage(k) = -1; end end [m,idx] = sort(parent_engage,'descend'); for k = 1: 5 printf('%-70s %8d\n', raw{idx(k),2}, raw{idx(k),parent_engage_index}); end |
[/crayon]
1 2 3 4 5 6 |
[crayon-6006c12f57293158890505 inline="true" ]Index 30 - Parent Engagement Score Frederick Stock Elementary School 69 Alice L Barnard Computer Math & Science Center Elementary School 69 George W Curtis Elementary School 68 Annie Keller Elementary Gifted Magnet School 68 Frazier Prospective IB Magnet Elementary School 66 |
[/crayon]
Processing data with Python pandas
Installation
pandas is one of the most popular package for data analysis in python community. It is a Python package providing fast, flexible, and expressive data structures designed to make working with structured (tabular, multidimensional, potentially heterogeneous) and time series data both easy and intuitive. By default, solidThinking Compose doesn’t come with a preinstalled pandas package. So next you’ll be shown how to install this package from scratch.
Assume you have VS 2013 Express installed. From Windows Start menu, click the shortcut VS2013 x64 Cross Tools Command Prompt to open the x64 target development command prompt. In the dos prompt, you have the windows SDK environment set up properly so the pandas source codes can be downloaded and compiled correctly on your machine.
In the dos prompty, run following command to overwrite the default Visual Studio version 2008 to 2013, then use
pip command to fetch the pandas packages from Pypi official repository.
1 2 |
[crayon-6006c12f5729c999836330 inline="true" class="batch"]set VS100COMNTOOLS=%VS120COMNTOOLS% <Compose install path>\python\python3.4\win64\python.exe -m pip install pandas |
[/crayon]
After a few minutes when you see these message appears in the dos prompt, that means you have pandas installed successfully.
Successfully installed pandas-0.20.3
Processing data
To read the CSV file into Python with pandas, run the commands in python console
1 2 3 |
[crayon-6006c12f572a1498185603 inline="true" class="python"]import pandas schools = pandas.read_csv('c:/Users/compose_user/schools.csv') print(schools) |
[/crayon]
It displays the results as an object of pandas data types.
1 2 3 4 5 6 7 8 9 10 |
[crayon-6006c12f572a3765244014 inline="true" ] School ID Name of School \ 0 610038 Abraham Lincoln Elementary School 1 610281 Adam Clayton Powell Paideia Community Academy ... 2 610185 Adlai E Stevenson Elementary School 3 609993 Agustin Lara Elementary Academy 4 610513 Air Force Academy High School 5 610212 Albany Park Multicultural Academy 6 609720 Albert G Lane Technical High School 7 610342 Albert R Sabin Elementary Magnet School ... |
[/crayon]
Now displays the first 5 schools of highest scores in math.
1 2 3 4 5 6 |
[crayon-6006c12f572a6229381915 inline="true" class="python"]import pandas schools = pandas.read_csv('c:/Users/compose_user/schools.csv',header=1) schools_strip = schools[schools['Gr3-5 Grade Level Math %'] != 'NDA'] schools_strip['Gr3-5 Grade Level Math %'] = schools_strip['Gr3-5 Grade Level Math %'].astype(float) schools_sorted = schools_strip.sort_values(['Gr3-5 Grade Level Math %'], ascending=False).head(5) print(schools_sorted[['Name of School', 'Gr3-5 Grade Level Math %']]) |
[/crayon]
1 2 3 4 5 6 |
[crayon-6006c12f572a9428430431 inline="true" ] Name of School Gr3-5 Grade Level Math % 480 Skinner North 100.0 27 Annie Keller Elementary Gifted Magnet School 100.0 488 Stephen Decatur Classical Elementary School 98.6 118 Edgar Allan Poe Elementary Classical School 94.0 333 Lenart Elementary Regional Gifted Center 93.9 |
[/crayon]
Now the data can be converted to list and prepared for further use in Compose
1 2 3 |
[crayon-6006c12f572ab571564218 inline="true" class="python"]import numpy results = schools_sorted[['Name of School', 'Gr3-5 Grade Level Math %']] results_data = numpy.array(results).tolist() |
[/crayon]
Run following commands in Compose OML window, it will fetch the data from Python workspace to Compose OML workspace.
1 |
[crayon-6006c12f572ae112102619 inline="true" class="matlab"][results, status, errmsg] = getpythonvar('results_data') |
[/crayon]
Now the data we get in Compose from Python is ready for you for doing further analysis and post-processing.
本文出自扉启博客,转载时请注明出处及相应链接。
本文永久链接: https://www.feiqy.com/get-online-data-using-http-python-interface-in-compose/
近期评论