python - OneHotEncoded features causing error when input to Classifier -


i’m trying prepare data input decision tree , multinomial naïve bayes classifier.

this data looks (pandas dataframe)

label  feat1  feat2  feat3  feat4  0        1     3       2      1 1        0     1       1      2 2        2     2       1      1 3        3     3       2      3 

i have split data datalabel , datafeatures. prepared datalabel using datalabel.ravel()

i need discretize features classifiers treat them being categorical not numerical.

i’m trying using onehotencoder

enc = onehotencoder()  enc.fit(datafeatures) chk = enc.transform(datafeatures) sklearn.naive_bayes import multinomialnb  mnb = multinomialnb()  sklearn import metrics sklearn.cross_validation import cross_val_score scores = cross_val_score(mnb, y, chk, cv=10, scoring='accuracy') 

i error - bad input shape (64, 16)

this shape of label , input

datalabel.shape = 72 chk.shape = 72,16

why won't classifier accept onehotencoded features?

edit - entire stack trace code

/root/anaconda2/lib/python2.7/site-packages/sklearn/utils /validation.py:386: deprecationwarning: passing 1d arrays data deprecated in 0.17 , willraise valueerror in 0.19. reshape data either using x.reshape(-1, 1) if data has single feature or x.reshape(1, -1) if contains single sample.   deprecationwarning) traceback (most recent call last):   file "<stdin>", line 1, in <module>   file "/root/anaconda2/lib/python2.7/site-packages/sklearn /cross_validation.py", line 1433, in cross_val_score train, test in cv)   file "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 800, in __call__ while self.dispatch_one_batch(iterator):   file "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 658, in dispatch_one_batch self._dispatch(tasks)   file "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 566, in _dispatch job = immediatecomputebatch(batch)   file "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 180, in __init__ self.results = batch()   file "/root/anaconda2/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py", line 72, in __call__ return [func(*args, **kwargs) func, args, kwargs in self.items]   file "/root/anaconda2/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1531, in _fit_and_score estimator.fit(x_train, y_train, **fit_params)   file "/root/anaconda2/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 527, in fit x, y = check_x_y(x, y, 'csr')  file "/root/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 515, in check_x_y y = column_or_1d(y, warn=true)   file "/root/anaconda2/lib/python2.7/site-packages/sklearn/utils/validation.py", line 551, in column_or_1d raise valueerror("bad input shape {0}".format(shape)) 

valueerror: bad input shape (64, 16)

first, have swap chk , y consider cross_val_score documentation. next, didn't specify y hope it's 1d-array. , last instead of using separately it's better combine transformers within 1 classifier using pipeline. that:

from sklearn import metrics sklearn.cross_validation import cross_val_score sklearn.naive_bayes import multinomialnb sklearn.pipeline import pipeline  clf = pipeline([     ('transformer', onehotencoder()),     ('estimator', multinomialnb()), ])  scores = cross_val_score(clf, datafeatures.values, y, cv=10, scoring='accuracy') 

Comments

Popular posts from this blog

jOOQ update returning clause with Oracle -

java - Warning equals/hashCode on @Data annotation lombok with inheritance -

java - BasicPathUsageException: Cannot join to attribute of basic type -