Predict credit card approval using jupyter notebook, sklearn, and Postman.
A common misconception about doing data science or machine learning in the cloud is complexity and difficulty. While it is true that intimate knowledge of cloud technology and networking is required if you’re going to deploy something into production, nothing is saying that we can’t play around with the technology and deploy proof of concept projects to demonstrate what’s possible.
In this tutorial, we will play around with Microsoft’s Azure Machine Learning platform and train a simple binary classification model to predict whether it will get approved for a credit card or not. We will then take that model and deploy it in the cloud as a REST endpoint. We will do all this using a Jupyter notebook that is hosted in the cloud and we will also use Postman to simulate the prediction process.
WARNING! This is an epic post. Lots of pictures abound. I tried as hard as I could to capture each step of the way with a screenshot to give a sense of security for newbies out there like me!
In this post, we will:
- Sign up for Azure Machine Learning Studio
- Create a machine learning workspace
- Request for an increase in quota
- Create a compute instance
- Download a dataset from Kaggle
- Create a dataset in Azure Machine Learning Studio
- Play around in Jupyter notebook and predict the future!
1. Sign up for Azure Machine Learning Studio
To start, let’s go to https://azure.microsoft.com/en-us/free/ and click on the big green button that says “Start free” as shown below:
It will ask you to sign in to your Microsoft account (sign up for one if you don’t have one already).
Then it will prompt you for details to put on your Microsoft Azure profile.
Next, it will ask you to verify who you are by using your phone to text you a verification code.
Then, here comes the dreaded risk of putting your credit card information. Go ahead; fortune favors the bold… but don’t blame me if anything goes awry.
Finally, we’re ready to start.
2. Create a machine learning workspace
Go to https://portal.azure.com and on the homepage, click on “Create a resource”
Select “AI + Machine Learning” under the Categories panel,
or simply type in “machine learning” on the search box that says “Search services and marketplace” as shown below:
Then, on the machine learning card/tile that appears, click on “Create.”
Select the subscription that you would like to use. The default is usually something like “Azure subscription 1.” In this case, I renamed my subscription to ECDEDIOS-DEV.
For the resource group, click on “Create new” and type in a name for your project.
Next, fill in the workspace details…
Below, you can see the details of the machine learning workspace.
Next, scroll down.
Click on the “Launch studio” button to go to the Azure Machine Learning Studio.
3. Request for an increase in quota
In Azure Machine Learning Studio, if not already visible, click on the hamburger icon (three horizontal lines) on the top left of the page to expand the left blade or sidebar.
Click on “Compute”
Click on “New” to attempt creating a new compute instance.
On the right blade or sidebar that appears, fill in the details of your compute.
When selecting the size of the virtual machine and nothing seems to be available, you might need to request a quota increase first.
Click on the text that says “Click here to view and request quota,”
Click on “Request quota.”
Fill in the details of your support request.
Don’t forget to click on “Enter details” to fill in the rest of the support ticket information.
When done, you’ll see a pop-up window on the top right of the page that says “New Support Request.” It will display the support request number for your ticket.
You should get an email that looks similar to the one below.
If approved, you will get a notification that your quotas have increased. This should take only a few hours but it really depends on the quota team’s workload.
4. Create a compute instance
Once approved, you can continue creating the compute instance.
In this case, I chose the Standard_DS11_v2 because it’s the cheapest one at $0.18 per hour.
It will take a few moments for the compute instance to be provisioned.
5. Download a dataset from Kaggle
While we’re waiting for the compute instance to be provisioned, let’s head on over to Kaggle for some clean credit card dataset.
Click on the “Download” button to download the dataset.
6. Create a dataset in Azure Machine Learning Studio
Next, let’s backtrack a little bit by clicking on the name of your workspace (creditcardproject).
Let’s expand the left blade/sidebar by clicking on the hamburger icon again.
And, click on “Datasets” on the left blade/sidebar.
On the next screen, click on the “Create dataset” button.
Select “From local files.”
Fill in the details and click on “Next.”
On the drop-down that appears, select “Browse files” and navigate to the place where you downloaded the clean dataset from Kaggle.
Click “Next” after uploading the dataset.
Doble check that the contents look correct. Click “Next” when you’re done.
On the next screen, you’ll get a chance to specify the data type of some, all, or none of the columns.
Pay special attention to the column and their corresponding data type. Below, I changed integer to string for the ZipCode column. Click “Next” when you’re done examining and correcting data types.
Examine the details of the dataset on the next screen and click “Create” to register the clean credit card data set as an Azure Machine Learning Studio data set that can be used for runs et cetera.
After creating the dataset, click on its name so we can explore its properties and get the details needed to use it for our experiments.
Notice that Azure Machine Learning Studio keeps track of the version number of the dataset.
It can also profile the dataset to get the feel of its contents.
Click on “Consume” when you’re done.
Here on the “Consume” tab, we can get code snippets that we can use to refer to the dataset.
On the “Explore” tab, you can see the actual contents of the dataset.
7. Play around in Jupyter notebook and predict the future!
Now it’s time for the real fun to begin!
In the compute screen, find the compute instance that you want to use and click on “Jupyter” under Applications.
Acknowledge the warning that pops up about trusted code and click “Continue” when done.
Navigate to the Users folder as well as your username.
On the top right of the Jupyter notebook interface, click on the “New” button and Python 3.8 AzureML.
Let’s give the notebook a proper name.
Copy and paste the following into a cell in Jupyter.
Above, we’re simply importing the needed packages.
Next, we’re simply configuring Jupyter’s output to not truncate anything and display everything.
Above, we’re supplying the details of our subscription, resource group, workspace, and dataset that we’re going to use.
Here, we’re excluding attributes that may be considered discriminatory and dropping them. We then assigned the resulting DataFrame to a new one.
Next, we’ll create our X and y variables with X as our feature (independent) variables and y as our target (dependent) variable.
Then, we’ll normalize our data with sklearn’s MinMaxScaler.
Then, we’ll connect to our workspace, create an experiment called “credit-card-project” and set up MLflow.
Here, we are splitting our dataset to create training and testing sets.
Now, we’re just setting up LogisticRegression as our classifier and training it.
We should get something that looks like this:
Let’s go back to the Azure Machine Learning Studio UI and navigate to “Models.” The model list should be blank for now because we haven’t registered any models yet.
Let’s change that.
On the cell above, we’re simply registering the model and naming it “credit_card_model.” If you refresh the Model List, you should now see the model.
Let’s set up the environment.
Above, we’re creating an environment for our deployment. In this case, we’re using an Ubuntu image that is already loaded with Python 3.8 and sklearn.
Next, we’re just configuring some of the deployment details like the number of CPU cores, memory, and description.
Above is where we are actually deploying the model so that we can use it for inferencing.
In the above cell, we’re simply preparing the sample data and headers that will go into the request.
The magic actually happens in the cell above. Here, we are making a POST request to the URI of the model that we deployed earlier.
Let’s see the result:
Above, we’re printing the REST endpoint, the actual label, and the prediction. Here’s what the output should look like:
You can check out the whole notebook here.
In this section, we’re going to demonstrate using Postman to simulate the prediction process using POST.
First, let’s make sure that we are making a POST request and not a GET request. Second, let’s put the URI of the endpoint (service.uri). Third, click on the “Body” tab, select “raw,” and select “JSON.” Fourth, put the body of the request in the textbox as shown above.
And finally, click on the “Send” button.