# 8: 04 In-Class Assignment - Linear Algebra and Python

In order to successfully complete this assignment you need to participate both individually and in groups during class. If you are attending asynchronously, turn in your assignment using D2L no later than 11:59pm on the day of class. See links at the end of this document for access to the class timeline for your section.

## Applications of Linear Alebra: PCA¶

We will explore 3 applications of linear algebra in data analysis - change of basis (for dimension reduction), projections (for solving linear systems) and the quadratic form (for optimization). The first applicaiotn is the change of basis to the eigenvector basis that underlies Principal Components Analysis s(PCA).

We wil review the following in class:

• The standard basis
• Orthonormal basis and orthgonal matrics
• Change of basis
• Similar matrices
• Eigendecomposiiotn
• Sample covariance
• Covariance as a linear transform
• PCA and dimension reduction
• PCA and “explained variance”
• SVD

UPDATE: The answer has become somewhat outdated in the past 4 years, here is an update. You have many options:

If you do not have to do it Python then it is a lot more easier to do this in a modeling langage, see Any good tools to solve integer programs on linux?

I personally use Gurobi these days through its Python API. It is a commercial, closed-source product but free for academic research.

With PuLP you can create MPS and LP files and then solve them with GLPK, COIN CLP/CBC, CPLEX, or XPRESS through their command-line interface. This approach has its advantages and disadvantages.

The OR-Tools from Google is an open source software suite for optimization, tuned for tackling the world's toughest problems in vehicle routing, flows, integer and linear programming, and constraint programming.

Pyomo is a Python-based, open-source optimization modeling language with a diverse set of optimization capabilities.

SciPy offers linear programming: scipy.optimize.linprog. (I have never tried this one.)

Apparently, CVXOPT offers a Python interface to GLPK, I did not know that. I have been using GLPK for 8 years now and I can highly recommend GLPK. The examples and tutorial of CVXOPT seem really nice!

You can find other possibilites at in the Wikibook under GLPK/Python. Note that many of these are not necessarily resticted to GLPK.

Recall that if we enumerate the estimation of the data at each data point, (x_i) , this gives us the following system of equations:

If the data was absolutely perfect (i.e., no noise), then the estimation function would go through all the data points, resulting in the following system of equations:

If we take (A) to be as defined previously, this would result in the matrix equation $( Y = A<eta>. )$

However, since the data is not perfect, there will not be an estimation function that can go through all the data points, and this system will have ( extit) . Therefore, we need to use the least square regression that we derived in the previous two sections to get a solution.

## Applied Machine Learning

Learn and apply key concepts of modeling, analysis and validation from Machine Learning, Data Mining and Signal Processing to analyze and extract meaning from data. Implement algorithms and perform experiments on images, text, audio and mobile sensor measurements. Gain working knowledge of supervised and unsupervised techniques including classification, regression, clustering, feature selection, association rule mining and dimensionality reduction.

### Prerequisites

CS 2800 or equivalent, Linear Algebra, and experience programming with Python or Matlab, or permission of the instructor.

### Room & Time

Tuesdays and Thursdays, 10:55AM-12:10PM, Bloomberg Center 131, Cornell Tech

Links: CMS for homework submission, Slack for discussions.

Required:
T. Hastie, R. Tibshirani and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edition), Springer-Verlag, 2008.
Recommended:
Yaser S. Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin: Learning from Data, AMLBook, 2012.
P. Harrington, Machine Learning in Action, Manning, 2012.
A. Rajaraman, J. Leskovec and J. Ullman, Mining of Massive Datasets, v1.1.
H. Daumé III, A Course in Machine Learning, v0.8.

Grade Breakdown: Your grade will be determined by the assignments (30%), one prelim (30%), a final exam (30%), and in-class quizzes (10%).

Homework: There will be four assignments and an “assignment 0” for environment setup. Each assignment will have a due date for completion. Half of the points of the lowest-scoring assignment will count as extra credit, meaning the points received for homeworks 1, 2, 3, and 4 is calculated as (sum of scores) / 3.5.

Late Policy: Each student has a total of one slip day that may be used without penalty.

External Code: Unless otherwise specified, you are allowed to use well known libraries such as scikit-learn, scikit-image, numpy, scipy, etc. in the assignments. Any reference or copy of public code repositories should be properly cited in your submission (examples include Github, Wikipedia, Blogs). In some assignment cases, you are NOT allowed to use any of the libraries above, please refer to individual HW instructions for more details.

Collaboration: You are encouraged (but not required) to work in groups of no more than 2 students on each assignment. Please indicate the name of your collaborator at the top of each assignment and cite any references you used (including articles, books, code, websites, and personal communications). If you’re not sure whether to cite a source, err on the side of caution and cite it. You may submit just one writeup for the group. Remember not to plagiarize: all solutions must be written by members of the group.

Quizzes: There will be surprise in-class quizzes to make sure you attend and pay attention to the class.

Prelim: October 5 in class. The exam is closed book but you are allowed to bring one sheet of written notes (Letter size, two-sided). You are allowed to use a calculator.

Final Exam: November 28 through December 6. The final exam will be hosted on Kaggle. You will develop an algorithm, prepare a professional paper, submit an anonymized version to the EasyChair conference system, and peer-review the work from other groups. You are strongly encouraged to work in a group of three students.

In scipy, there are also a basic solver for solving the boundary value problems, that is the scipy.integrate.solve_bvp function. The function solves a first order system of ODEs subject to two-point boundary conditions. The function construction are shown below:

CONSTRUCTION:

Let (F) be a function object to the function that computes

(t) is a one-dimensional independent variable (time), (S(t)) is an n-dimensional vector-valued function (state), and the (F(t, S(t))) defines the differential equations. (S0) be an initial value for (S) . The function (F) must have the form (dS = F(t, S)) , although the name does not have to be (F) . The goal is to find the (S(t)) approximately satisfying the differential equations, given the initial value (S(t0)=S0) .

The way we use the solver to solve the differential equation is: $(solve\_ivp(fun, t\_span, s0, method = 'RK45', t\_eval=None))$

where (fun) takes in the function in the right-hand side of the system. (t\_span) is the interval of integration ((t0, tf)) , where (t0) is the start and (tf) is the end of the interval. (s0) is the initial state. There are a couple of methods that we can choose, the default is ‘RK45’, which is the explicit Runge-Kutta method of order 5(4). There are other methods you can use as well, see the end of this section for more information. (t\_eval) takes in the times at which to store the computed solution, and must be sorted and lie within (t\_span) .

There are two main issues to consider with regard to integration schemes for ODEs: accuracy and stability. Accuracy refers to a scheme’s ability to get close to the exact solution, which is usually unknown, as a function of the step size (h) . Previous chapters have referred to accuracy using the notation (O(h^p)) . The same notation translates to solving ODEs. The stability of an integration scheme is its ability to keep the error from growing as it integrates forward in time. If the error does not grow, then the scheme is stable otherwise it is unstable. Some integration schemes are stable for certain choices of (h) and unstable for others these integration schemes are also referred to as unstable.

To illustrate issues of stability, we numerically solve the pendulum equation using the Euler Explicit, Euler Implicit, and Trapezoidal Formulas.

TRY IT! Use the Euler Explicit, Euler Implicit, and Trapezoidal Formulas to solve the pendulum equation over the time interval ([0,5]) in increments of (0.1) and for an initial solution of (S_0 = left[egin 1 end ight]) . For the model parameters using (sqrt> = 4) . Plot the approximate solution on a single graph

## Schedule

In this schedule page, you will find the list of assignments you will have to complete this semester:

1. L#: These are the Asynchronous PrairieLearn Lecture Assignments including the pre-recorded lectures and short questions testing concepts introduced in the videos. These PL assessments will also include links to lecture notes, slides and annotations from the lectures. You are encouraged to complete these assessments on the date they appear on the course schedule. Tuesday lectures will be due on the following Sunday at 9am, and Thursday lectures will be due on the following Tuesday at 9am (5 days to complete). You can find all the due dates directly in PrairieLearn. There will be a total of 24 graded lecture assignments, and the lowest 4 scores will be dropped. Combined they will count towards 4% of your final grade.
2. GA#: These are the Synchronous PrairieLearn Group Activities to be completed on Tuesdays either during the 2pm zoom lecture, or at the other two available zoom meetings on the same day at 9am or 8pm. The GAs cover material learned during previous lecture assignments, so make sure to complete the open PrairieLearn Lecture Assignments before you meet with your group. You can find a lot more information about the group activities on the Collaborate page.
3. HW#: These are the PrairieLearn Homework Assigments due on most Tuesdays and Thursdays at 8pm. These are individual assessments. The schedule indicates when the HW will be open, and the due date for 100% credit. You can still submit all the HWs by May 5 for 96% credit. Almost every PrairieLearn Lecture Assignment has a corresponding HW. You are strongly encouraged to complete the PrairieLearn Lecture Assignment before you start your HW. There will be a total of 20 HW assignments, and the lowest 2 scores will be dropped. Combined they will count towards 25% of your grade.
4. MP#: These are the PrairieLearn Machine Problems The schedule indicates when the MPs will be open, and the due date for 100% credit (all at 8pm). You can still submit all the MPs by May 5 for 96% credit. These are individual assessments. There will be a total of 6 MP assignments, and no drops. Combined they will count towards 10% of your grade.
5. Q#: These are the Synchronous PrairieLearn Quizzes that happen on Thursdays during the official lecture time at 2pm. The quizzes will be delivered using CBTF-online. Make sure to register at CBTF (You can find more details on their website). If you need to request to take the conflict quiz, you can get instructions here. You can find on the schedule the content covered in each quiz (corresponding HW assignments) and also the dates when the Practice Quizzes (Q#P) will be open. Practice Quizzes will not count towards your grade (zero credit). There will be a total of 6 quizzes and the lowest score will be dropped. Combined they will count towards 35% of your grade.

The only required synchronous components of this class are the GAs and Quizzes. Attendance in the first week of classes is strongly encouraged, since we will be talking about all the logistics of the course, and having a demo for the group work.

You can see from the schedule below that many of the Thursday classes have a placeholder for a "Demo" class. These classes are not required. In Fall 2020, these demo classes (that include live coding using jupyter notebooks) were pre-recorded and included in the Lecture Assignments. However, many students indicated they would prefer to be able to have this type of activity following an synchronous format. Hence this semester we will have "demo classes" on some Thursdays, which will be recorded and posted inside a PrairieLearn assignment (for the students that prefer the asynchronous format). These jupyter notebooks are available in PrairieLearn under the label WS#: Workspace.

## PyTorch Tutorial Overview

The focus of this tutorial is on using the PyTorch API for common deep learning model development tasks we will not be diving into the math and theory of deep learning. For that, I recommend starting with this excellent book.

The best way to learn deep learning in python is by doing. Dive in. You can circle back for more theory later.

I have designed each code example to use best practices and to be standalone so that you can copy and paste it directly into your project and adapt it to your specific needs. This will give you a massive head start over trying to figure out the API from official documentation alone.

It is a large tutorial, and as such, it is divided into three parts they are:

1. How to Install PyTorch
1. What Are Torch and PyTorch?
2. How to Install PyTorch
3. How to Confirm PyTorch Is Installed
1. Step 1: Prepare the Data
2. Step 2: Define the Model
3. Step 3: Train the Model
4. Step 4: Evaluate the Model
5. Step 5: Make Predictions
1. How to Develop an MLP for Binary Classification
2. How to Develop an MLP for Multiclass Classification
3. How to Develop an MLP for Regression
4. How to Develop a CNN for Image Classification

### You Can Do Deep Learning in Python!

Work through this tutorial. It will take you 60 minutes, max!

You do not need to understand everything (at least not right now). Your goal is to run through the tutorial end-to-end and get a result. You do not need to understand everything on the first pass. List down your questions as you go. Make heavy use of the API documentation to learn about all of the functions that you’re using.

You do not need to know the math first. Math is a compact way of describing how algorithms work, specifically tools from linear algebra, probability, and calculus. These are not the only tools that you can use to learn how algorithms work. You can also use code and explore algorithm behavior with different inputs and outputs. Knowing the math will not tell you what algorithm to choose or how to best configure it. You can only discover that through carefully controlled experiments.

You do not need to know how the algorithms work. It is important to know about the limitations and how to configure deep learning algorithms. But learning about algorithms can come later. You need to build up this algorithm knowledge slowly over a long period of time. Today, start by getting comfortable with the platform.

You do not need to be a Python programmer. The syntax of the Python language can be intuitive if you are new to it. Just like other languages, focus on function calls (e.g. function()) and assignments (e.g. a = “b”). This will get you most of the way. You are a developer you know how to pick up the basics of a language really fast. Just get started and dive into the details later.

You do not need to be a deep learning expert. You can learn about the benefits and limitations of various algorithms later, and there are plenty of tutorials that you can read to brush up on the steps of a deep learning project.

## Andrew Ng’s Machine Learning Course in Python (Linear Regression)

I am a pharmacy undergraduate and had always wanted to do much more than the scope of a clinical pharmacist. I had tried to find some sort of integration between my love for IT and the healthcare knowledge I possess but one would really feel lost in the wealth of information available in this day and age.

6 months ago, I chanced upon the concept of data science and its application in the healthcare industry. Given the advance in data and computing power, utilizing a computer to identify, diagnose, and treat diseases is no longer a dream. At a more advanced level, computer vision can help identify diseases using radiography images, while in the simpler level, algorithm can detect life-changing potential drug interaction.

With the goal of venturing into the health IT industry, I came up with a data science curriculum for those with a non-technical background where I showcased it here.

Machine learning by Andrew Ng offered by Stanford in Coursera (https://www.coursera.org/learn/machine-learning) is one of the highly recommended courses in the Data Science community. After 6 months of basic maths and python training, I started this course to step into the world of machine learning. As many of you would have known, the course is conducted in Octave or Matlab. Although It is all well and good to learn some Octave programming and complete the programming assignment, I would like to test my knowledge in python and try to complete the assignment in python from scratch.

This article will be a part of a series I will be writing to document my python implementation of the programming assignments in the course. This is by no means a guide for others as I am also learning as I move along but can serve as a starting point for those who wish to do the same. With that said, I am more than happy to receive some constructive feedbacks from you guys.

First off will be univariate linear regression using the dataset ex1data1.txt

To start off, I will import all relevant libraries and load the dataset into jupyter notebook

To build up a good habit, I would always have a look at the data and have a good sense of the data

Plotting of the data to visualize the relationship between the dependent(y) and the independent(X) variable

I am used to this way of plotting graph but do realize that there is an object-orientated way of using matplotlib, I will be using that in some other graphs within this assignment

Next to compute the cost function J(Θ)

Initialize X,y and compute the cost of using Θ = (0,0)

This might not be the best way of doing things but it is the only solution I found to add a column of ones for X₀. The computeCost function here will give 32.072733877455676

Now to implement gradient descent to optimize Θ, by minimizing the cost function J(Θ)

The print statement will print out the hypothesis: h(x) = -3.63 + 1.17x₁ which shows the optimized Θ values rounded off to 2 decimal places

To make the assignment more complete, I also went ahead and try to visualize the cost function for a standard univariate case

The block of code above generate the 3d surface plot as shown. As mentioned in the lecture, the cost function is a convex function which only has 1 global minimum, hence, gradient descent would always result in finding the global minimum

By the way, I used the mplot3d tutorial to help me with the 3d plotting. (https://matplotlib.org/mpl_toolkits/mplot3d/tutorial.html)

Plotting the cost function against the number of iterations gave a nice descending trend, indicating that the gradient descent implementation works in reducing the cost function

Now with that optimized Θ values, I will plot the graph together with the predicted values (the line of best fit)

Again, might not be the best way to generate a line based on Θ, let me know if there is a better way of doing so

The last part of the assignment involved making predictions based on your model

The print statement print: For population = 35,000, we predict a profit of $4520.0 The print statement print: For population = 70,000, we predict a profit of$45342.0

Now on to multivariate linear regression using the dataset ex1data2.txt

As with all datasets, I started off by loading the data and looking into the data

As you can see, now there are 2 features for X, making it a multivariate problem

Plotting the price against each feature shows the relationship between them. Just by looking at the plot, we should expect some degree of positive correlation between the dependent and the independent variables.

For multivariable problem optimizing using gradient descent, feature normalization is required to speed up the optimizing process.

Next is to test if our previous functions, computeCost(X, y, theta) and gradientDescent(X, y, theta, alpha, num_iters) work with multiple features input

Using computeCost(X2,y2,theta2) gives 65591548106.45744 which is the cost of using Θ (0,0,0) as parameters

The print statement print: h(x) =334302.06 + 99411.45x1 + 3267.01x2 ,which is the optimized Θ values round to 2 decimals places

Plotting the J(Θ) against the number of iterations gives a descending trend, proving that our gradientDescent function works for multivariate cases too

Lastly, making predictions using the optimized Θ values for a 1650 square feet house with 3 bedrooms.

This print statement print: For size of house = 1650, Number of bedroom = 3, we predict a house value of \$430447.0