I am new in python. I am trying to predict the "time_to_failure" for given "acoustic_data" in the test CSV file using catboost algorithm.
def catbostregtest(X_train, y_train):
submission = pd.read_csv('sample_submission.csv', index_col='seg_id')
X_test = pd.DataFrame()
for seg_id in submission.index:
seg = pd.read_csv('test/' + seg_id + '.csv')
ch = gen_features(seg['acoustic_data'])
X_test = X_test.append(ch, ignore_index=True)
model = CatBoostRegressor(iterations=10000, loss_function='MAE', boosting_type='Ordered')
model.fit(X_train, y_train)
y_hat = model.predict(X_test)
submission['time_to_failure'] = y_hat
submission.to_csv('submissionCAT.csv')
print(model.best_score_)
This function "catbostregtest" is giving me error with the errorlog
Traceback (most recent call last):<br />
<br />
File "E:\dir\code.py", line 290, in main()<br />
<br />
File "E:\dir\code.py", line 230, in main catbostregtest(X_train, y_train)<br />
<br />
File "E:\dir\code.py", line 175, in catbostregtest y_hat = model.predict(X_test)<br />
<br />
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\catboost\core.py", line 4365, in predict return self._predict(data, "RawFormulaVal", ntree_start, ntree_end, thread_count, verbose, 'predict')<br />
<br />
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\catboost\core.py", line 1854, in _predict predictions = self._base_predict(data, prediction_type, ntree_start, ntree_end, thread_count, verbose)<br />
<br />
File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\catboost\core.py", line 1271, in _base_predict return self._object._base_predict(pool, prediction_type, ntree_start, ntree_end, thread_count, verbose)<br />
<br />
File "_catboost.pyx", line 4015, in _catboost._CatBoost._base_predict<br />
<br />
File "_catboost.pyx", line 4020, in _catboost._CatBoost._base_predict<br />
<br />
CatBoostError: c:/goagent/pipelines/buildmaster/catboost.git/catboost/libs/data/model_dataset_compatibility.cpp:236: Feature 0 from pool must be mean.
This is gen_features function
def gen_features(X):
strain = []
strain.append(X.mean())
strain.append(X.std())
strain.append(X.min())
strain.append(X.max())
strain.append(X.kurtosis())
strain.append(X.skew())
strain.append(np.quantile(X,0.01))
strain.append(np.quantile(X,0.05))
strain.append(np.quantile(X,0.95))
strain.append(np.quantile(X,0.99))
strain.append(np.abs(X).max())
strain.append(np.abs(X).mean())
strain.append(np.abs(X).std())
return pd.Series(strain)
This function is called from the main function
def main():
train1 = pd.read_csv('train.csv', iterator=True, chunksize=150_000, dtype={'acoustic_data': np.int16, 'time_to_failure': np.float64})
X_train = pd.DataFrame()
y_train = pd.Series()
for df in train1:
ch = gen_features(df['acoustic_data'])
X_train = X_train.append(ch, ignore_index=True)
y_train = y_train.append(pd.Series(df['time_to_failure'].values[-1]))
catbostregtest(X_train, y_train)
Here is the structure of the train.csv file
train — ImgBB[
^]
Here is the structure of the sample_submission.csv file
submittion — ImgBB[
^]
Here is the structure of one of the test(csv) file.
test — ImgBB[
^]
How I can remove the error that occur during making predict from catboost model?
How I can remove this error please help. You can download and run the project in spyder ide from this link
Link
What I have tried:
I have tried all procedure on these links
Usage examples - CatBoost. Documentation[
^]
python - Catboost Regression. Function Extrapolation - Stack Overflow[
^]