Neural Networks: Learning neural network architectures with experiments.(Including CNN, RNN, LSTM, etc) [link]
German Credit Risk:The primary objective of this project is to develop a predictive model using the German credit risk dataset that can accurately
forecast the credit amount that potential borrowers are likely to receive.
[link]
Chicago Schools Data EDA: My main goal is to gain insights to educational equality in Chicago. To achieve that,
I would like to see the student demographic and performance by location. I am hoping to learn if the school's
location and student body affect the schools' college enrollmen rate.
[link]
Survival Analysis of Political Leaders:
This data set documents the party leadership succession in 23 parliamentary
democracies(as defined by Lijphart 1999). There are 25 columns and 4559 rows
in the data, it includes the country, party information, name, sex, and term
information about the leaders, and it also includes a status vector which use one
to indicate the leader is still in office and 0 to indicate that they are out of
office. There are, however, many missing values in the data set due to the lack
of information for some countries. In this project, we use tenure as the time
variable which shows the leader’s time in office (in years), and status as our
censoring data with 1 representing the leader’s still in office, 0 representing
the leader has finished their term. The original paper studied the effect of
succession on terms, in this project, however, we would like to find out if
there’s any relationship between time and the length of tenure (for example,
the more recent the election/ in office year is, the shorter the term is).
(Horiuchi and Laing, 2015)
[link]
Ground Ozone Data Time Series Analysis:
This report observes the monthly average of ground-level ozone in Los Angeles,
California from 2000 - 2020. Using data transformation and differencing,
I find out that the original time series does not need transformation, and
it has a seasonal pattern but no significant trend. I identified some models to
fit by looking at the ACF anf PACF of differentiated time series,
the best model with the lowest AICc for the time series is
a seasonal ARIMA model with (p, d, q)×(P, D, Q) = (1, 0, 1)×(1, 1, 2).
[link]