Project 1: Create a python file with a function that takes a
positive integer m as its input and finds the first mprimies (singular: primy). A primy is defined as follows.
First, 2, and 3 are primies. An integer greater than 3 is a primy
if it cannot be written as the product of 2 or 3 primies
(that are not necessarily distinct).
Thus, 4 is not a primy, because it can be written as 2*2.
Simiularly, 12 is not a primy, because it can be written as 2*2*3.
But 16 is a primy, because you can check that it cannot be written as
the product of 2 or 3 primies.
The first several primies are 2,3,5,7,11,13,16,17,19.
In this and other projects, you
should submit the file and also a set of console input/output
examples showing that it works. The function should return an error
message in the case that the input is not a positive integer (eg text or
a negative number or something like 2.8). Note that your output is
m primies, *not* the set of primies less than m.
Example of output: function applied to m=19: [2, 3, 5, 7, 11, 13, 16, 17, 19, 23, 24, 29, 31, 36, 37, 40, 41, 43, 47]
Project 2: Make a GUI that allows the user
to input the integer k into a text box, and then the GUI prints out the first k
fourceful prime pairs, which are pairs of primes that are separated by 4 [ eg, (3,7), (7,11), (13,17) are the first
three fourceful prime pairs.]
Use colors and images to make your GUI look interesting.
If the user inputs something that is not a positive integer, the GUI should display a message.
Project 3: Create a python file with a function that generates one
4x4 matrix, where the computer has filled the entries with random integers between
-7 and 5, inclusive. The script should have another function
that asks the user how many computations should be made. After the user inputs
a positive integer, say for example 37, the python script should generate 37 of
these matrices
and create a csv data file with 37 rows where the first column is the first-row, first-column entry of the matrix,
the second column is the trace of each
3x3 matrix and the third column is the determinant of the 3x3 matrix.
Note that the matrices themselves are not stored in the csv file.
Next, the script should output (in the console) the following information: mean,
median, standard deviation
of each column in your csv file (9 values total), and also it should calculate the percentage of
the matrices
where the trace is less than 0 and also the percentage of the
entries where the determinant
is in absolute value less than 10.
(So this is a total of 11 outputs of information in the console,
in addition to the csv file.)
The script should produce an error message if the user does not input a positive
integer.
Project 4: Create a python script that does the following.
It should have a function that chooses two points at random
in the unit square { (x,y): 0< x< 1, 0 < y < 1 }.
The script should start with a GUI with a text box, and the user
should input a positive integer n.
After the user pushes a button,
the script should call the function n times and
then create an xlsx file with three labeled columns 'x1', 'y1', 'x2', 'y2' ,'d',
and then with n rows. Each row
should have 5 entries given as the x- and y- values of
the coordinates of the two points followed by the distance d between the two
points.
Also, the GUI should then display 2 plots. The first plot should graph each
pair of points and the line segment connecting them.
The second plot should make a histogram of all the values of d from your set of n
pairs of points.
Project 5:
Make an R Script that does the following:
All of this should be in a shiny GUI, where the user
selects an integer k between 1 and 100, inclusive. You may choose
to use a slider or dropdown menu for this part. There should be some text written above the selection
part which tells the user what to do.
After the selection is made, the script does the following.
First, it generates a CSV file with the following data, in 200 rows. For
each row, the first column should be a randomly chosen prime number between
1 and 4000. The second column should be a floating point number that is taken
from a normal distribution whose mean is the k times the given prime number and whose standard
deviation is 10. You should make up some appropriate column headings for this csv file.
The GUI displays the plot of all the points (x,y) and also displays
the least squares fit line on the same plot. It should also show the equation of the line drawn.
The GUI also displays the plot of all the points (x,y) such that the prime x is less than 3000
and also displays the new least squares fit line (restricted to that data) on the same plot.
Again, it should also show the equation of the line drawn.
Project 6:
For this exercise, you will determine the sequence of SQL commands needed to do the job, and then you
should submit by sending that inside a text file to me.
Your project should do the following. Create a MySQL database, and create a table with the following columns.
The first column should give a first name, chosen randomly, which agrees with 5 of the first names of people in
your family (or extended family). The second column y should be computed as number of characters in the first name.
The third column should be a randomly chosen number between y and y^2.
You should generate 100 rows in your table.
Next, generate a second table with all of the data in the first table, except that you have removed
all of the data corresponding to one particular first name, and you have sorted the table
rows according the the third column values, in descending order. Find the average of the second column and
the maximum value M of the third column.
Generate a new table from the first table, only including the rows with third column value greater than M/2.
Finally, general a list of all the distinct numbers in the second column of the first table.
Final Project:
This final project will involve an investigation of some machine learning algorithms that are
used in predictive modeling of data. You will be making two scripts - one in
python and one in R, and both scripts should use a GUI to display the results nicely.
To see an examples of what packages and code should be used in your projects, go to these websites:
simple python machine-learning project
simple R machine-learning project
First, go to kaggle.com, and register for a free account to access datasets. After you join, click on
Datasets from the menu, and find a dataset that you want to do your project on, and download it.
Find a parameter in the dataset that you would like to predict from the other values. The user
decides what
percentage of the dataset should be the part that the machine learning model should use as the
training set. The actual rows used in the training set should be randomly chosen every time the
program runs. Test at least 3 different machine learning models on the data, and
evaluate the accuracy (similar to what
the websites above do).
Your GUI should have a slider or textbox where the user selects the percentage of training data from the whole
dataset. After the selection, the script displays the results of the test. A graph should be shown.
Since the training set is
randomly chosen, each time a selection is made, a different result should appear.
Open a web browser and navigate to https://labaccess.tcu.edu.
Choose a computer inside TUC 353 (and then click on it). If you have done this before, it will be faster if you choose the same number as before.
A file will download to a spot on your computer. Then double-click on it. Do whatever log-ins or installs are necessary.
You will now be running the PC in TUC 353.
You can then start working on whatever - you might want to either save your files to your U: or M: drive or email them to
yourself and then delete them off of the TUC 353 computer before logging off.
*Very Important* Be sure to log off the lab computer when you are finished.