How to correctly apply the same data transformation , used on the training dataset , on real data in a...
$begingroup$
Let's say I used minmaxscaler while creating my model.
Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?
machine-learning data
New contributor
$endgroup$
add a comment |
$begingroup$
Let's say I used minmaxscaler while creating my model.
Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?
machine-learning data
New contributor
$endgroup$
add a comment |
$begingroup$
Let's say I used minmaxscaler while creating my model.
Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?
machine-learning data
New contributor
$endgroup$
Let's say I used minmaxscaler while creating my model.
Now, i'm loading that model via Pickle in a Flask app. Upon receiving a request containing a datapoint I would like to apply to it the same transformations that I applied to my training dataset before calling the predict() method. How do I transfer that set of transformations from one file to a webservice?
machine-learning data
machine-learning data
New contributor
New contributor
edited Mar 27 at 3:23
Ethan
612324
612324
New contributor
asked Mar 26 at 13:52
BlenzusBlenzus
728
728
New contributor
New contributor
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.
from sklearn.pipeline import Pipeline
from sklearn.externals import joblib
pipeline = Pipeline([
('normalization', MinMaxScaler()),
('classifier', RandomForestClassifier())
])
joblib.dump(pipeline, 'transform_predict.joblib')
You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:
pipeline = load('transform_predict.joblib')
predictions = pipeline.predict(new_data)
$endgroup$
1
$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
Mar 26 at 14:43
$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
Mar 26 at 14:55
1
$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
Mar 26 at 15:34
add a comment |
$begingroup$
You need to save minmaxscaler (along with model). In Flask app, you can :
- Load scaler from file
- Use this instance of scaler for scaling input values
#While training
from sklearn.externals import joblib
scaler_filename = "saved_scaler"
joblib.dump(scaler, scaler_filename)
In Flask App
scaler_filename = "saved_scaler"
scaler = joblib.load(scaler_filename)
$endgroup$
$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
Mar 26 at 14:17
1
$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
Mar 26 at 14:33
add a comment |
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Blenzus is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48026%2fhow-to-correctly-apply-the-same-data-transformation-used-on-the-training-datas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.
from sklearn.pipeline import Pipeline
from sklearn.externals import joblib
pipeline = Pipeline([
('normalization', MinMaxScaler()),
('classifier', RandomForestClassifier())
])
joblib.dump(pipeline, 'transform_predict.joblib')
You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:
pipeline = load('transform_predict.joblib')
predictions = pipeline.predict(new_data)
$endgroup$
1
$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
Mar 26 at 14:43
$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
Mar 26 at 14:55
1
$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
Mar 26 at 15:34
add a comment |
$begingroup$
Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.
from sklearn.pipeline import Pipeline
from sklearn.externals import joblib
pipeline = Pipeline([
('normalization', MinMaxScaler()),
('classifier', RandomForestClassifier())
])
joblib.dump(pipeline, 'transform_predict.joblib')
You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:
pipeline = load('transform_predict.joblib')
predictions = pipeline.predict(new_data)
$endgroup$
1
$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
Mar 26 at 14:43
$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
Mar 26 at 14:55
1
$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
Mar 26 at 15:34
add a comment |
$begingroup$
Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.
from sklearn.pipeline import Pipeline
from sklearn.externals import joblib
pipeline = Pipeline([
('normalization', MinMaxScaler()),
('classifier', RandomForestClassifier())
])
joblib.dump(pipeline, 'transform_predict.joblib')
You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:
pipeline = load('transform_predict.joblib')
predictions = pipeline.predict(new_data)
$endgroup$
Rather than storing and loading many files, create a Scikit-learn transformation pipeline with all of your transformations, and then save that as a pickle or joblib file.
from sklearn.pipeline import Pipeline
from sklearn.externals import joblib
pipeline = Pipeline([
('normalization', MinMaxScaler()),
('classifier', RandomForestClassifier())
])
joblib.dump(pipeline, 'transform_predict.joblib')
You can then just load one transformation pipeline and call fit_transform to transform the input data and get predictions for it:
pipeline = load('transform_predict.joblib')
predictions = pipeline.predict(new_data)
edited Mar 26 at 14:45
answered Mar 26 at 14:40
Dan CarterDan Carter
8101218
8101218
1
$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
Mar 26 at 14:43
$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
Mar 26 at 14:55
1
$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
Mar 26 at 15:34
add a comment |
1
$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
Mar 26 at 14:43
$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
Mar 26 at 14:55
1
$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
Mar 26 at 15:34
1
1
$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
Mar 26 at 14:43
$begingroup$
Thanks, this is what i was looking for
$endgroup$
– Blenzus
Mar 26 at 14:43
$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
Mar 26 at 14:55
$begingroup$
Does this apply to dummy variables?
$endgroup$
– Blenzus
Mar 26 at 14:55
1
1
$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
Mar 26 at 15:34
$begingroup$
If you're using scikit-learn's OneHotEncoder then yes. Any scikit learn 'transformer' can be used with a pipeline, so anything that implements the TransformerMixin and BaseEstimator: github.com/scikit-learn/scikit-learn/blob/7b136e9/sklearn/… This also means you can create your own custom 'transformers' to add to a pipeline, by implementing these in the same way.
$endgroup$
– Dan Carter
Mar 26 at 15:34
add a comment |
$begingroup$
You need to save minmaxscaler (along with model). In Flask app, you can :
- Load scaler from file
- Use this instance of scaler for scaling input values
#While training
from sklearn.externals import joblib
scaler_filename = "saved_scaler"
joblib.dump(scaler, scaler_filename)
In Flask App
scaler_filename = "saved_scaler"
scaler = joblib.load(scaler_filename)
$endgroup$
$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
Mar 26 at 14:17
1
$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
Mar 26 at 14:33
add a comment |
$begingroup$
You need to save minmaxscaler (along with model). In Flask app, you can :
- Load scaler from file
- Use this instance of scaler for scaling input values
#While training
from sklearn.externals import joblib
scaler_filename = "saved_scaler"
joblib.dump(scaler, scaler_filename)
In Flask App
scaler_filename = "saved_scaler"
scaler = joblib.load(scaler_filename)
$endgroup$
$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
Mar 26 at 14:17
1
$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
Mar 26 at 14:33
add a comment |
$begingroup$
You need to save minmaxscaler (along with model). In Flask app, you can :
- Load scaler from file
- Use this instance of scaler for scaling input values
#While training
from sklearn.externals import joblib
scaler_filename = "saved_scaler"
joblib.dump(scaler, scaler_filename)
In Flask App
scaler_filename = "saved_scaler"
scaler = joblib.load(scaler_filename)
$endgroup$
You need to save minmaxscaler (along with model). In Flask app, you can :
- Load scaler from file
- Use this instance of scaler for scaling input values
#While training
from sklearn.externals import joblib
scaler_filename = "saved_scaler"
joblib.dump(scaler, scaler_filename)
In Flask App
scaler_filename = "saved_scaler"
scaler = joblib.load(scaler_filename)
answered Mar 26 at 14:08
Shamit VermaShamit Verma
1,1191211
1,1191211
$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
Mar 26 at 14:17
1
$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
Mar 26 at 14:33
add a comment |
$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
Mar 26 at 14:17
1
$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
Mar 26 at 14:33
$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
Mar 26 at 14:17
$begingroup$
Do i need to do this for every normalization library i use? i'll be loading many files into the memory, isn't there way to load something that contains every step of the data transformations?
$endgroup$
– Blenzus
Mar 26 at 14:17
1
1
$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
Mar 26 at 14:33
$begingroup$
You can save and load all scalers at the same time. Example : stackoverflow.com/questions/33497314/…
$endgroup$
– Shamit Verma
Mar 26 at 14:33
add a comment |
Blenzus is a new contributor. Be nice, and check out our Code of Conduct.
Blenzus is a new contributor. Be nice, and check out our Code of Conduct.
Blenzus is a new contributor. Be nice, and check out our Code of Conduct.
Blenzus is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48026%2fhow-to-correctly-apply-the-same-data-transformation-used-on-the-training-datas%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown