ValueError: Input contains NaN, infinity or a value too large for dtype('float64') in Python

Dung Do Tien Sep 04 2021 236

Hi Guys. In Python, I want to train data with the panda's data frame.

When standardizing data using scikit-learn's StandardScaler, the following error may occur.

from sklearn.preprocessing import StandardScaler

#Training data (pandas.DataFrame type)
X = training_data()

# Standardization
sc = StandardScaler()
sc.fit(X)

But I get an exception throw ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

How can I solve it?

Thanks for any suggestions.

Have 1 answer(s) found.
  • N

    Nguyen Truong Giang Sep 04 2021

    To avoid this, it is necessary to remove NaN and infinity from the input data.
    For example, you can remove a column from X that contains at least one NaN with the code below.

    # Remove columns containing NaN from X
    X.drop(X.columns[np.isnan(X).any()], axis=1)

    Description of each function

    • np.isnan(X): Get True for NaN elements, False matrix for other elements
    • np.isnan(X).any(): Get a list of True for columns containing NaN and False for other columns
    • X.columns[np.isnan(X).any()]: Get column names containing NaN
    • X.drop('col', axis = 1): Remove a column with column name col from X
Leave An Answer
* NOTE: You need Login before leave an answer

* Type maximum 2000 characters.

* All comments have to wait approved before display.

* Please polite comment and respect questions and answers of others.

Popular Tips

X Close