原文地址: SQL數據分析
使用SQL進行數據分析。git
This test is designed to give us a sense of your SQL, data analysis skills, and experience in managing large datasets. Below are some important things to remember:sql
You should have Python, R, or your preferred data analysis tools installed prior to beginning. For the spatial analysis in problem 3, you will also need software or a package able to handle point-in-polygon, specifically, lat/long inside a geofence (e.g. psql/PostgreSQL). You will also want to have US State shapefiles pre-loaded before you begin.ide
Consider the following database, where server timezone is UTC. Please answer the following question using SQL (it does not matter which version of SQL is used)
Table Name: tripsui
Column Name | Data Type |
---|---|
uuid | Integer (key) |
driver_uuid | Integer (foreign keyed to driver.uuid) |
city_uuid | Integer (foreign keyed to city.uuid) |
status | Enum('completed', 'cancelled') |
request_at | Timestamp with timezone |
completed_at | Timestamp with timezone |
Table Name: driverthis
Column Name | Data Type |
---|---|
uuid | Integer (key) |
is_test_account | Boolean |
Table Name: cityspa
Column Name | Data Type |
---|---|
uuid | Integer (key) |
timezone | Character varying |
city_name | Character varying |
country_name | Character varying |
D. Provide SQL queries that do the following.3d
Column Name | Data Type |
---|---|
driver_uuid | Integer (unique identifier of driver) |
trip_uuid | Integer (unique identifier of trip) |
pct_cancelled | Double (cancellation rate in all historical trips prior to current trip from current driver) |
pct_cancelled_last100 | Double (cancellation rate in the last 100 trips prior to current trip from current driver) |
This problem uses a mock dataset ('mock_accident_data.csv'). Please download the file and confirm you have received 7,911 rows. The dataset includes trip miles and reported accidents by month, city, product (e.g. UberX, UberEATS), and segment (e.g. segmentation for drivers, riders, or trips). The safety and insurance team is interested in understanding reported accident rate, namely total reported accidents per million miles.rest
Please include any code / formulas (R, Python, SQL, etc.) you wrote for the analysis in your response and delete the dataset when you have finished with the challenge.code
Using the attached dataset, please do the following:orm
D. Build a model(e.g., generalized linear model) to analyze the reported accident rate per mile as a function of the features in the data (i.e., Month, Segment, City, and Product). Please consider the following in your analysis:
Summarize the steps you take to build the model and report the results. Explain the findings from the model.
E. Conduct the following hypothesis tests and report your results:
All subsequent questions concern this dataset. Please download this file and confirm that you have received 3,060,528 rows of data. Please support your answer with cogent R, Python or other languages.
A. Compute the following metrics for the 'widgets' column in the dataset.
C. The dataset shows the number of widgets which exist at each latitude, longitudeon Earth. Answer the following:
(本文出自csprojectedu.com,轉載請註明出處)