原文地址: 使用Pandas庫分析股票
用Pandas等三方庫,根據Financial technology相關程序,進行股票分析,html
This assignment builds on Lectures 7 to 9 and on Tutorials 6 and 7. You might want to consider using some of the Python code discussed in those lectures and tutorials to answer some of the questions below.python
Important: It is important that you do not change the type (markdwon vs. code) of any cell, nor copy/paste/duplicate any cell! If the cell type is markdown, you are supposed to write text, not code, and vice versa. Provide your answer to each question in the allocated cell. Do not create additional cells. Answers provided in any other cell will not be marked. Do not rename the assignment files. All files should be left as is in the assignment directory.git
You are given two datasets:markdown
You may find the following commands helpful to complete some of the questions.app
Tutorial 7, we worked with a variable called FSscore . Suppose we wanted to divide all the values of this variable by 100 and store the outcome in a new column. This can be done in one step. The code df['FSscore_scaled'] = df['FSscore']/100 creates a new column with the name FSscore_scaled and stores the modified values.ide
Please run the following cell to import the required libraries and for string operations example.ui
In [1]
:this
## Execute this cell ####################### Package Setup ########################## # Disable FutureWarning for better aesthetics. import warnings warnings.simplefilter(action='ignore', category=FutureWarning) # essential libraries for this assignment from finml import * import numpy as np import pandas as pd %matplotlib inline # for logistic regression from sklearn.linear_model import LogisticRegression from sklearn.metrics import precision_score from sklearn.metrics import recall_score # suppress warnings for deprecated methods from TensorFlow import tensorflow as tf tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR) ################################################################ # Example of string operations import pandas as pd example_data = {'alphabets':['a,b,c', 'd,e,f', 'a,z,x', 'a,s,p']} example_df = pd.DataFrame(example_data) # Chain two string operations example_df['alphabets'].str.upper().str.split(",")
Out[1]
:lua
0 [A, B, C] 1 [D, E, F] 2 [A, Z, X] 3 [A, S, P] Name: alphabets, dtype: object
The dataset has the following three columns:spa
<end>
string. For example, if there are three headlines h1 , h2 , and h3 on a given day, the headline cell for that day will be the string h1<end>h2<end>h3
.In your assessment, please address the following questions.
Load the dataset in a Pandas dataframe and write a Python code that plots the time series of the daily Apple returns (returns on the y-axis and dates on the x-axis). Make sure your plot's axes are appropriately labelled.
Note: Please use df as the variable name for the dataframe and the parse_dates argument to correctly parse the date column.
In [41]
:
"""Write your code in this cell""" import pandas as pd df = pd.read_csv('AAPL_returns.csv',index_col = 0,parse_dates=True ax = df.plot( x='date', y='daily Apple returns') ax.set_xlabel("date") ax.set_ylabel("daily Apple returns") df.plot()
Out[41]
:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7cb1ae d6a0>
Write a Python code that plots the time series of daily headline frequencies (the number of headlines per day on the y-axis and the corresponding date on the x-axis). Make sure your plot's axes are appropriately labelled.
In [*]
:
"""Write your code in this cell""" import matplotlib.pyplot as plt df = pd.read_csv('Assignment4-data.csv', encoding = "ISO-8859-1") df.head() df.headlines.hist();
We will use neural networks to explore the relationship between the content of financial news and the direction of stock returns, i.e., their classification into positive or negative returns.
In [ ]
:
"""Write your code in this cell""" # YOUR CODE HERE raise NotImplementedError()
For this question please restrict your computations to the first 100 headline dates. You can select them by using the head function of Pandas . Calculate the tf-idf metric for the following word and headline(s) pairs:
Please write a Python code that calculates the metrics from the df dataframe.
In [ ]
:
"""Write your code in this cell""" # YOUR CODE HERE raise NotImplementedError()
Build and train a one-layer neural network with two units (neurons) to explain return directions based on financial news. Report and interpret the following three performance measures: "Precision", "Recall", and "Accuracy". According to your opinion, which performance measure(s) is (are) most important in the context of linking news headlines to stock returns and why?
In [ ]
:
"""Write your code in this cell""" # YOUR CODE HERE raise NotImplementedError()
YOUR ANSWER HERE
Explore dierent neural network models by changing the number of layers and units.
You can use up to three layers and five units.
Complete the table below by adding your results for the test data set. You should duplicate the table format in your own markdown cell and replace the "-" placeholders with the corresponding values. Discuss your findings for both the test and train data sets.
In [ ]
:
"""Write your code in this cell""" # YOUR CODE HERE raise NotImplementedError()
YOUR ANSWER HERE
Explore the eects of dierent splits between the training and testing data on the performance of a given neural network model.
Complete the table below by adding your results. You should duplicate the table format in your own markdown cell and replace the "-" placeholders with the corresponding values. Discuss your findings.
Complete the table below by adding your results for the test data set. You should use the same markdown format and simply replace the "-" placeholders with the corresponding values. Discuss your findings for the dierent test and train data sets.
In [ ]
:
"""Write your code in this cell""" # YOUR CODE HERE raise NotImplementedError()
YOUR ANSWER HERE
Run a logistic regression with the same independent and dependent variables as used for the above neural network models. You have access to the sklearn package, which should help you answering this question. To work with the sklearn package, you may find the following links helpful.
Evaluating a logit model:
Compare and contrast your findings with the above findings based on neural network models.
In [ ]
:
"""Write your code in this cell""" # YOUR CODE HERE raise NotImplementedError()
YOUR ANSWER HERE
Everything you did so far was explaining stock returns with contemporaneous financial news that were released on the same date. To explore how well a neural network can predict the direction of future returns based on our text data, you should do the following.
Interpret your results in the context of the Ecient Market Hypothesis (EMH).
In [ ]
:
"""Write your code in this cell""" # YOUR CODE HERE raise NotImplementedError()
YOUR ANSWER HERE
(本文出自csprojectedu.com,轉載請註明出處)