I am new in python. I read this
Kaggle kernel.
In that kernel, he used the train data with chunksize 150_000
train = pd.read_csv('../input/train.csv', iterator=True, chunksize=150_000, dtype={'acoustic_data': np.int16, 'time_to_failure': np.float64})
I visualized the X_train(statistical features) and y_train(given time_to_failure) using python. It gave me good visualizations
train = pd.read_csv('../input/train.csv', iterator=True, chunksize=150_000, dtype={'acoustic_data': np.int16, 'time_to_failure': np.float64})
X_train = pd.DataFrame()
y_train = pd.Series()
for df in train:
ch = gen_features(df['acoustic_data'])
X_train = X_train.append(ch, ignore_index=True)
y_train = y_train.append(pd.Series(df['time_to_failure'].values[-1]))
plotstatfeature(X_train,y_train.to_numpy(dtype ='float32'))
For the test data, plotted same visualizations between X_test(statistical features) and y_hat(calculated time_to_failure) using the same function
submission = pd.read_csv('../input/sample_submission.csv', index_col='seg_id')
X_test = pd.DataFrame()
for seg_id in submission.index:
seg = pd.read_csv('../input/test/' + seg_id + '.csv')
ch = gen_features(seg['acoustic_data'])
X_test = X_test.append(ch, ignore_index=True)
X_test = scaler.transform(X_test)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
y_hat = model.predict(X_test)
submission['time_to_failure'] = y_hat
submission.to_csv('submission.csv')
plotstatfeature(X_test,y_hat.to_numpy(dtype ='float32'))
Question 1:
Is it meaningful to visualize X_test(statistical features) and y_hat(calculated time_to_failure)
Question 2(main question):
The visualization of test data are not good like train data .because train data is read in chunksize of 150000 giving the clear visualization while test data is full data which gives the more dense unclear visualization.
How I can convert the test data in same chunksize of 150000 for the same uniform visualization just as train data visualization?
For converting the test data in the same chunksize of 150000 I tried to modify this line by introducing iterator and chunksize in the code
First case:
submission = pd.read_csv('../input/sample_submission.csv', index_col='seg_id' , iterator=True, chunksize=150_000)
But it gave me this error
Quote:
Traceback (most recent call last):
File "", line 1, in runfile('D:/code.py', wdir='D:/')
File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)
File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/code.py", line 299, in main()
File "D:/code.py", line 239, in main test(X_train, y_train)
File "D:/code.py", line 168, in test for seg_id in submission.index:
AttributeError: 'TextFileReader' object has no attribute 'index'
2nd case
seg = pd.read_csv('test/' + seg_id + '.csv' , iterator=True, chunksize=150000)
it gave me this error
Quote:
Traceback (most recent call last):
File "", line 1, in runfile('D:/code.py', wdir='D:/')
File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)
File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/code.py", line 299, in main()
File "D:/code.py", line 239, in main test(X_train, y_train)
File "D:/code.py", line 170, in test ch = gen_features(seg['acoustic_data'])
TypeError: 'TextFileReader' object is not subscriptable
How I can introduce the chuncksize in test data ?
What I have tried:
First case:
submission = pd.read_csv('../input/sample_submission.csv', index_col='seg_id' , iterator=True, chunksize=150_000)
But it gave me this error
Quote:
Traceback (most recent call last):
File "", line 1, in runfile('D:/code.py', wdir='D:/')
File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)
File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/code.py", line 299, in main()
File "D:/code.py", line 239, in main test(X_train, y_train)
File "D:/code.py", line 168, in test for seg_id in submission.index:
AttributeError: 'TextFileReader' object has no attribute 'index'
2nd case
seg = pd.read_csv('test/' + seg_id + '.csv' , iterator=True, chunksize=150000)
it gave me this error
Quote:
Traceback (most recent call last):
File "", line 1, in runfile('D:/code.py', wdir='D:/')
File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)
File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/code.py", line 299, in main()
File "D:/code.py", line 239, in main test(X_train, y_train)
File "D:/code.py", line 170, in test ch = gen_features(seg['acoustic_data'])
TypeError: 'TextFileReader' object is not subscriptable
How I can introduce the chuncksize in test data ?