Aug.14

Deep learning

This project is all about deep learning in python.I have used two datasets(titanic and hourly wages of the people.)I have use different techniques of neural network to find the accuracy of the datasets.

I have started from basic to advance level of deep learning to describe what is forward propagation and how the activation function plays an important role in deep learning.I have also use different learning rate to know which one has less loss value.

I have use graph to describe which model has good loss value score by using two models with different different density and optimization function.I believe that it is always a good idea to split the data into training and test sets to check the performance of the model.In deep learning we are able to describe the complexity of the model by changing density as well as no of hidden layers.

Link to my work.

Deep learning

Project,Deep learning

Aug.10

Housing value prediction with XGboost

Gradient boosting is currently one of the most popular techniques for efficient modeling of tabular datasets of all sizes. XGboost is a very fast, scalable implementation of gradient boosting that has taken data science by storm, with models using XGBoost regularly winning many online data science competitions and used at scale across different industries.l learn how to use this powerful library alongside pandas and scikit-learn to build and tune supervised learning models. I worked with real-world datasets to solve classification as well as regression problems.

Extreme gradient boosting

Project

Aug.08

Aug.07

The Hottest Topics in Machine Learning

Click to know about this project.

notebook-2

The NIPS conference (Neural Information Processing Systems) is one of the most prestigious yearly events in the machine learning community. At each NIPS conference, a large number of research papers are published. Over 50,000 PDF files were automatically downloaded and processed to obtain a dataset on various machine learning techniques.

These papers discuss a wide variety of topics in machine learning, from neural networks to optimization methods and many more.

 

Project

Aug.06

Capstone project

This is my capstone project

Please click to get into my work

Python capstone project

In this project l have to put yourself in the shoes of a loan issuer and manage credit risk by using the past data and deciding whom to give the loan to in the future. The text files contain complete loan data for all loans issued by XYZ Corp. through 2007-2015. The data contains the indicator of default, payment information, credit history, etc.

I divided the data into train ( June 2007 – May 2015 ) and out-of-time test ( June 2015 – Dec 2015 ) data. I have use the training data to build models/analytical solution and finally apply it to test data to measure the performance and robustness of the models.

There are 855969 rows and 73 columns are present in the whole dataset.There are many different problems are present in the dataset.Like,there are so many columns which have missing values, the type of the columns is different according to the requirements.The date column in the dataset is totally jumbled and due to this it was difficult to split the data according to the requirement like(data into train ( June 2007 – May 2015 ) and out-of-time test ( June 2015 – Dec 2015 ) data).
So I have decided to treat the date column(‘issue_d’).First, I split the column(‘issue_d’) into two different columns and replace the values as my requirement.After that with the help of map function I joined the splited columns and make them one with different name (‘period’).
Then I sort the ‘period’ column and make it an index for slicing according to the requirement.After sorting I decided to treat all the missing values and the columns which are not relevant for the problem statement.I also convert the type of the columns because it is important to convert the columns to their default type.In python the default datatype of string is object.If columns has some NaN values but remaining values of the data is integer or float then python treat that column as object.
For this project I did not find to take all the columns to make prediction.So I have deleted the columns where the majority of values are missing and not relevant for the model.I have also used Randomforestclassifier to select important features for the model.With the help of forward stepwise process I found the auc score of the features to determine whether the features I have selected is good enough for the model or not.I have also used confusion matrix and classification report to find the accuracy of the model.

I have used different machine learning technique for this like Knn classification, logistic regression..

 

Capstone project,Project

Aug.06

Aug.06

Aug.06

Aug.06

Aug.06