Identify whether the given transaction is an intercompany transaction or not.
It is a legal obligation for “Company A” to disclose their financials to the internal and/or external stakeholders of their company. During this process “Company A” collects financial data from all of its subsidiaries and then tries to find out intercompany transactions (ex: transactions that took place between a parent company and its subsidiary since these are not real financial transactions).
Traditionally, this process has been performed by “Company A” using various systems, several data sets and a group of accounting experts, and it has been a weeklong task.
“Company A” decided to use artificial intelligence to automate this process, in order to increase productivity and reduce the time taken to complete this task.Model selection is the process of choosing between different deep learning approaches - e.g. ANN, CNN, RNN etc. - or choosing between different hyperparameters or sets of features for the same deep learning approach.
The choice of the actual deep learning algorithm (e.g. ANN or CNN) is less important than you'd think - there may be a "best" algorithm for a particular problem, but often its performance is not much better than other well-performing approaches for that problem.
There may be certain qualities you look for in a model:
Our Problem here is a Supervised Classification Problem. The Problem is to Identify whether the given transaction is an intercompany transaction or not. This Type of Problem can be Solved by the following Models.
Advantages of Feature Engineering:
The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN etc.). This is the actual data with which the models learn with various API and algorithms to train the machine to work automatically as a learning algorithm.
The following section describes the training data and its associated characteristics. These characteristics are:
Test data set helps you validate that the training has happened efficiently in terms of either accuracy, or precision and so on. Actually, such data is used for testing the model, whether it is responding, working appropriately, or not.
The following section describes the features that’s used in the model for predictions.
There are several machine and data engineering libraries available. We are using the following libraries, and these libraries and their associated functions are readily available to use in Python to develop business application.
As we explained above, we are using supervised machine learning model Artificial Neural Networks (ANN).
Model Building Blocks: There are several technical and functional components involved in implementing this model. Here are the key building blocks to implement the model.
A model implementation, to address a given problem involves several steps. Here are the key steps that are involved to implement a model. You can customize these steps as needed and we developed these steps for learning purpose only.
Jupiter notebook is launched through the command prompt. Type cmd & Search to Open Command prompt Terminal.
Now, Type Jupiter notebook & press Enter as shown
After typing, the Below Page opens
To Open a New File, follow the Below Instructions
Go to New >>> Python [conda root]
Give a meaningful name to the File as shown below.
For our Model Implementation we need the following two libraries:
Tensorflow: TensorFlow is an open-source machine learning library for research and production. TensorFlow offers APIs for beginners and experts to develop for desktop, mobile, web, and cloud.
Keras: Keras is an open source neural network library written in Python. It is capable of running on top of TensorFlow, Microsoft Cognitive Toolkit or Theano.
Pandas: Pandas is a library used for data manipulation and analysis. For Our Implementation we are using it for Importing the Data file & Creating Data frames (Stores the Data).
Next immediate step after importing all libraries is getting the Training data imported. We are importing the Training data stored in our local system with the use of Pandas library.
Step 3 of the Implementation is Feature Selection. Deep learning works on a simple rule – if you put garbage in, you will only get garbage to come out. By garbage here, we mean noise in data. This becomes even more important when the number of features are very large. We need only those features (Input) that are function of the Labels (Outputs). Ex: To Predict whether the given fruit is an apple or an orange, Color/Texture of the Fruit becomes a feature to be Considered. If the Color/Texture is Red then it is an Apple, If it is Orange, it is an Orange.
The Features Selected Must be Numerical. If not, they have to be Converted to numerical values from categorical values. In our Scenario we use Label Encoder for the Conversion.
The Features Selected are Company, Trading Partner, Transaction Type, & Data Category. The Label is the Target Variable i.e. the Inter Company Transaction.
Step 4 is Training the Model, meaning Making the model Learn, understand & recognize the Pattern in the data. Before the Model. Train is for the Model Input & the Test is for Testing the Model.
We use a Binary Classification Model for our Classification Problem.
Model.fit (Features_train, Labels_train)
As a next step we need to Review the Algorithm as to how it has learned from the Features we Provided as shown
Import the Test Data, this is the data used to test as to how the Model Performs
Next, We Review the Output of the Model, i.e. the Prediction it has made on the test data
Open the Written File & Check the Outcome as Shown. Execute to View the data
In this lab work, we used Artificial Neural Networks, a Deep learning model to predict whether the given transaction is an intercompany transaction or not. The model performed well on the test data & predicted the outcome as expected. For further data analysis and business decision the model outcome is in a persistent - File.
This is a very basic implementation to learn and better understand the overall steps and processes that are involved in implementing a deep learning model. There are a lot more steps, processes, data and technologies involved. We strongly request and recommend you learn more and prepare yourself to address real-world problems.
Fitting is a measure of how well a Machine learning model generalizes to similar data to that on which it was trained. A model that is well-fitted produces more accurate outcomes, a model that is overfitted matches the data too closely, and a model that is underfitted doesn’t match closely enough. Fitting is the essence of machine learning. If your model doesn’t fit your data correctly, the outcomes it produces will not be accurate enough to be useful for practical decision-making.
Best Fitting: The model is Best Fitting, when it performs well on training example & also performs well on unseen data. Ideally, the case when the model makes the predictions with 0 error, is said to have a best fit on the data. This situation is achievable at a spot between overfitting and underfitting. In order to understand it we will have to look at the performance of our model with the passage of time, while it is learning from training dataset.
Training Data Set (Example-1): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically
Test Data Set (Example-1): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Model_Fitting_Best_Fit_Test =='Y':
import matplotlib.pyplot as vAR_plt
import pandas as vAR_pd
vAR_df8 = vAR_pd.read_csv(open(vAR_Fetched_Data_Best_Fit_File_Example_1,'r',encoding ='utf-8'))
vAR_df9 = vAR_df8.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_plt.scatter(vAR_df.iloc[:,0],vAR_df.iloc[:,12],s=100, c='rgb')
#print(vAR_df11)
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5])
vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted_Inter_Transaction')
#plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Best_Fit_Image_Example_1)
Training Data Set (Example-2): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically
Test Data Set (Example-2): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Model_Fitting_Best_Fit_Test =='Y':
import matplotlib.pyplot as vAR_plt
vAR_df8 = vAR_pd.read_csv(open(vAR_Fetched_Data_Best_Fit_File_Example_2,'r',encoding ='utf-8'))
vAR_df9 = vAR_df8.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_plt.scatter(vAR_df.iloc[:,0],vAR_df.iloc[:,12],s=100, c='bky')
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5])
vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted_Inter_Transaction')
#plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Best_Fit_Image_Example_2)
Training Data Set (Example-3): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically.
Test Data Set (Example-3): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Model_Fitting_Best_Fit_Test =='Y':
import matplotlib.pyplot as vAR_plt
vAR_df8 = vAR_pd.read_csv(open(vAR_Fetched_Data_Best_Fit_File_Example_3,'r',encoding ='utf-8'))
vAR_df9 = vAR_df8.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_plt.scatter(vAR_df.iloc[:,0],vAR_df.iloc[:,12],s=100,c='gcg')
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5])
vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted_Inter_Transaction')
#plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Best_Fit_Image_Example_3)
The model is Overfitting, when it performs well on training example but does not perform well on unseen data. It is often a result of an excessively complex model. It happens because the model is memorizing the relationship between the input example (often called X) and target variable (often called y) or, so unable to generalize the data well. Overfitting model predicts the target in the training data set very accurately.
Training Data Set (Example-1): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically
Test Data Set (Example-1): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Model_Fitting_Over_Fit_Test =='Y':
import matplotlib.pyplot as vAR_plt
vAR_df8 = vAR_pd.read_csv(open(vAR_Fetched_Data_Over_Fit_File_Example_1,'r',encoding ='utf-8'))
vAR_df9 = vAR_df8.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_plt.scatter(vAR_df.iloc[:,0],vAR_df.iloc[:,12],s=100, c='rgb')
#print(vAR_df11)
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5])vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted_Inter_Transaction')
#plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Over_Fit_Image_Example_1)
Training Data Set (Example-2): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically
Test Data Set (Example-2): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Model_Fitting_Over_Fit_Test =='Y':
import matplotlib.pyplot as vAR_plt
vAR_df8 = vAR_pd.read_csv(open(vAR_Fetched_Data_Over_Fit_File_Example_2,'r',encoding ='utf-8'))
vAR_df9 = vAR_df8.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_plt.scatter(vAR_df.iloc[:,0],vAR_df.iloc[:,12],s=100, c='rgb')
#print(vAR_df11)
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5])
vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted_Inter_Transaction')
#plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Over_Fit_Image_Example_2)
Training Data Set (Example-3): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically
Test Data Set (Example-3): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Model_Fitting_Over_Fit_Test =='Y':
import matplotlib.pyplot as vAR_plt
vAR_df8 = vAR_pd.read_csv(open(vAR_Fetched_Data_Over_Fit_File_Example_3,'r',encoding ='utf-8'))
vAR_df9 = vAR_df8.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_plt.scatter(vAR_df.iloc[:,0],vAR_df.iloc[:,12],s=100, c='rgb')
#print(vAR_df11)
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5])
vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted_Inter_Transaction')
#plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Over_Fit_Image_Example_3)
The predictive model is said to be Underfitting, if it performs poorly on training data. This happens because the model is unable to capture the relationship between the input example and the target variable. It could be because the model is too simple i.e. input features are not expressive enough to describe the target variable well. Underfitting model does not predict the targets in the training data sets very accurately. Underfitting can be avoided by using more data and also reducing the features by feature selection.
Training Data Set (Example-1): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically.
Test Data Set (Example-1): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Model_Fitting_Under_Fit_Test =='Y':
import matplotlib.pyplot as vAR_plt
vAR_df8 = vAR_pd.read_csv(open(vAR_Fetched_Data_Under_Fit_File_Example_1,'r',encoding ='utf-8'))
vAR_df9 = vAR_df8.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_plt.scatter(vAR_df.iloc[:,0],vAR_df.iloc[:,12],s=100, c='rgb')
#print(vAR_df11)
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5])
vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted_Inter_Transaction')
#plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Under_Fit_Image_Example_1)
Training Data Set (Example-2): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically.
Test Data Set (Example-2): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Model_Fitting_Under_Fit_Test =='Y':
import matplotlib.pyplot as vAR_plt
vAR_df8 = vAR_pd.read_csv(open(vAR_Fetched_Data_Under_Fit_File_Example_2,'r',encoding ='utf-8'))
vAR_df9 = vAR_df8.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_plt.scatter(vAR_df.iloc[:,0],vAR_df.iloc[:,12],s=100, c='ycr')
#print(vAR_df11)
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5])
vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted_Inter_Transaction')
#plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Under_Fit_Image_Example_2)
Training Data Set (Example-3): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically.
Test Data Set (Example-3): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Model_Fitting_Under_Fit_Test =='Y':
import matplotlib.pyplot as vAR_plt
vAR_df8 = vAR_pd.read_csv(open(vAR_Fetched_Data_Under_Fit_File_Example_3,'r',encoding ='utf-8'))
vAR_df9 = vAR_df8.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_plt.scatter(vAR_df.iloc[:,0],vAR_df.iloc[:,12],s=100, c='gbc')
#print(vAR_df11)
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5])
vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted_Inter_Transaction')
#plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Under_Fit_Image_Example_3)
Cross-validation is a technique in which we train our model using the subset of the data-set and then evaluate using the complementary subset of the data-set.
Training Data Set (Example-1): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically
Test Data Set (Example-1): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Cross_Validation_Required =='Y':
#from sklearn import datasets
from sklearn.model_selection import cross_val_predict
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as vAR_plt
vAR_model = LogisticRegression()
vAR_Predicted = cross_val_predict(vAR_model, vAR_Features_train, vAR_Label_train , cv=2)
vAR_fig, vAR_ax = vAR_plt.subplots()
vAR_ax.scatter(vAR_Label_train, vAR_Predicted, edgecolors=(0, 0, 0))
vAR_ax.plot([vAR_Label_train.min(), vAR_Label_train.max()], [vAR_Label_train.min(), vAR_Label_train.max()], 'k--', lw=4)
vAR_ax.set_xlabel('Actual Intercompany Transaction')
vAR_ax.set_ylabel('Predicted Intercompany Transaction')
## plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Cross_Validation_Image_Example_1)
Training Data Set (Example-2): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN etc). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically
Test Data Set (Example-2): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Cross_Validation_Required =='Y':
#from sklearn import datasets
from sklearn.model_selection import cross_val_predict
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as vAR_plt
vAR_model = LogisticRegression()
vAR_Predicted = cross_val_predict(vAR_model, vAR_Features_train, vAR_Label_train , cv=5)
vAR_fig, vAR_ax = vAR_plt.subplots()
vAR_ax.scatter(vAR_Label_train[:20], vAR_Predicted[:20], edgecolors=(0, 0, 0))
vAR_ax.plot([vAR_Label_train[:20].min(), vAR_Label_train[:20].max()], [vAR_Label_train[:20].min(), vAR_Label_train[:20].max()], 'k--', lw=4)
vAR_ax.set_xlabel('Actual Intercompany Transaction')
vAR_ax.set_ylabel('Predicted Intercompany Transaction')
## plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Cross_Validation_Image_Example_2)
Training Data Set (Example-3): The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN etc). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically
Test Data Set (Example-3): Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if vAR_Fetched_Data_Cross_Validation_Required =='Y':
#from sklearn import datasets
from sklearn.model_selection import cross_val_predict
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as vAR_plt
vAR_model = LogisticRegression()
vAR_Predicted = cross_val_predict(vAR_model, vAR_Features_train, vAR_Label_train , cv=10)
vAR_fig, vAR_ax = vAR_plt.subplots()
vAR_ax.scatter(vAR_Label_train[:15], vAR_Predicted[:15], edgecolors=(0, 0, 0))
vAR_ax.plot([vAR_Label_train[:15].min(), vAR_Label_train[:15].max()], [vAR_Label_train[:15].min(), vAR_Label_train[:15].max()], 'k--', lw=4)
vAR_ax.set_xlabel('Actual Intercompany Transaction')
vAR_ax.set_ylabel('Predicted Intercompany Transaction')
##plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Cross_Validation_Image_Example_3)
Hyperparameter Optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. The same kind of machine learning model can require different constraints, weights or learning rates to generalize different data patterns. These measures are called hyperparameters, and have to be tuned so that the model can optimally solve the machine learning problem. Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given independent data.
The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN etc.). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically.
Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
# Hyperparameter Tuning
if vAR_Fetched_Data_Hyperparameter_Tuning_Required =='Y':
import matplotlib.pyplot as vAR_plt
vAR_le = LabelEncoder()
vAR_Transaction_Type_Conversion = vAR_le.fit_transform(vAR_df.iloc[:,7])
vAR_Transaction_Type_Conversion_df = vAR_pd.DataFrame(vAR_Transaction_Type_Conversion,columns={'Transaction_Type_Converted'})
vAR_Data_Category_Conversion = vAR_le.fit_transform(vAR_df.iloc[:,9])
vAR_Data_Category_Conversion_df = vAR_pd.DataFrame(vAR_Data_Category_Conversion,columns={'Data_Category_Converted'})
vAR_Features_train = vAR_pd.read_excel(vAR_Fetched_Data_Train_All_Features)
vAR_Label_train = vAR_df.iloc[:,12]
]vAR_model = LogisticRegression()
]vAR_model.fit(vAR_Features_train,vAR_Label_train)
vAR_plt.scatter(vAR_Features_train.iloc[:,0],vAR_Label_train,s=100, c='gbc')
vAR_df3 = vAR_pd.read_excel(vAR_Fetched_Data_Source_Path_Test_Data)
vAR_Transaction_Type_Conversion_test = vAR_le.fit_transform(vAR_df3.iloc[:,3])
vAR_Transaction_Type_Conversion_test_df = vAR_pd.DataFrame(vAR_Transaction_Type_Conversion_test,columns={'Transaction_Type_Converted'})
vAR_Data_Category_Conversion_test = vAR_le.fit_transform(vAR_df.iloc[:,4])
vAR_Data_Category_Conversion_test_df = vAR_pd.DataFrame(vAR_Data_Category_Conversion_test,columns={'Data_Category_Converted'})
vAR_df4 = vAR_df3.merge(vAR_Transaction_Type_Conversion_test_df,left_index=True, right_index=True)
vAR_df5 = vAR_df4.merge(vAR_Data_Category_Conversion_test_df,left_index=True, right_index=True)
vAR_Features_test = vAR_pd.read_excel(vAR_Fetched_Data_Test_All_Features)
vAR_Labels_Pred = vAR_model.predict(vAR_Features_test)
vAR_Labels_Pred = vAR_pd.DataFrame(vAR_Labels_Pred,columns={'Predicted_Inter_Transaction_Type'})
vAR_df6 = vAR_pd.read_excel(vAR_Fetched_Data_Source_Path_Test_Data)
vAR_df7 = vAR_df6.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_df8 = vAR_df7.to_excel(vAR_Fetched_Data_Model_Path, engine='xlsxwriter')
vAR_df9 = vAR_pd.read_excel(vAR_Fetched_Data_Model_Path)
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5],c='b') vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted Intercompany Transaction')
#vAR_plt.show()
vAR_plt.savefig(vAR_Fetched_Data_Before_Hyperparameter_Tuning_Image)
The training data set is the actual dataset used to train the model for performing various Deep Learning Operations (DNN, CNN, RNN etc.). This is the actual data with which the models learn with various API and algorithm to train the machine to work automatically.
Test data set helps you to validate that the training has happened efficiently in terms of either accuracy, or precision so on. Actually, such data is used for testing the model whether it is responding or working appropriately or not.
if Fetched_Data_Hyperparameter_Tuning_Required =='Y':
import matplotlib.pyplot as vAR_plt
vAR_le = LabelEncoder()
vAR_Transaction_Type_Conversion = vAR_le.fit_transform(vAR_df.iloc[:,7])
vAR_Transaction_Type_Conversion_df = vAR_pd.DataFrame(vAR_Transaction_Type_Conversion,columns={'Transaction_Type_Converted'})
vAR_Data_Category_Conversion = vAR_le.fit_transform(vAR_df.iloc[:,9])
vAR_Data_Category_Conversion_df = vAR_pd.DataFrame(vAR_Data_Category_Conversion,columns={'Data_Category_Converted'})
vAR_Features_train = vAR_pd.read_excel(Fetched_Data_Train_All_Features)
vAR_Label_train = vAR_df.iloc[:,12]
vAR_model = LogisticRegression(C=10.0, fit_intercept=False, warm_start=True)
vAR_model.fit(vAR_Features_train,vAR_Label_train)
vAR_plt.scatter(vAR_Features_train.iloc[:,0],vAR_Label_train,s=100, c='gbc')
vAR_df3 = vAR_pd.read_excel(vAR_Fetched_Data_Source_Path_Test_Data)
vAR_Transaction_Type_Conversion_test = vAR_le.fit_transform(vAR_df3.iloc[:,3])
vAR_Transaction_Type_Conversion_test_df = vAR_pd.DataFrame(vAR_Transaction_Type_Conversion_test,columns={'Transaction_Type_Converted'})
vAR_Data_Category_Conversion_test = vAR_le.fit_transform(vAR_df.iloc[:,4])
vAR_Data_Category_Conversion_test_df = vAR_pd.DataFrame(vAR_Data_Category_Conversion_test,columns={'Data_Category_Converted'})
vAR_df4 = vAR_df3.merge(vAR_Transaction_Type_Conversion_test_df,left_index=True, right_index=True)
vAR_df5 = vAR_df4.merge(vAR_Data_Category_Conversion_test_df,left_index=True, right_index=True)
vAR_Features_test = vAR_pd.read_excel(Fetched_Data_Test_All_Features)
vAR_Labels_Pred = vAR_model.predict(vAR_Features_test)
vAR_Labels_Pred = vAR_pd.DataFrame(vAR_Labels_Pred,columns={'Predicted_Inter_Transaction_Type'})
vAR_df6 = vAR_pd.read_excel(vAR_Fetched_Data_Source_Path_Test_Data)
vAR_df7 = vAR_df6.merge(vAR_Labels_Pred,left_index=True, right_index=True)
vAR_df8 = vAR_df7.to_excel(vAR_Fetched_Data_Model_Path, engine='xlsxwriter')
vAR_df9 = vAR_pd.read_excel(vAR_Fetched_Data_Model_Path)
vAR_plt.plot(vAR_df9.iloc[:,0],vAR_df9.iloc[:,5],c='b')
vAR_plt.xlabel('Company')
vAR_plt.ylabel('Predicted Intercompany Transaction')
#vAR_plt.show()
vAR_plt.savefig(vAR_Fetched_Data_After_Hyperparameter_Tuning_Image)
Jothi Periasamy
Chief AI Architect
2100 Geng Road
Suite 210
Palo Alto
CA 94303
(916)-296-0228