Click here to Skip to main content
16,020,249 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am new in python. I read this Kaggle kernel.
In that kernel, he used the train data with chunksize 150_000
Python
train = pd.read_csv('../input/train.csv', iterator=True, chunksize=150_000, dtype={'acoustic_data': np.int16, 'time_to_failure': np.float64})


I visualized the X_train(statistical features) and y_train(given time_to_failure) using python. It gave me good visualizations


Python
train = pd.read_csv('../input/train.csv', iterator=True, chunksize=150_000, dtype={'acoustic_data': np.int16, 'time_to_failure': np.float64})

    X_train = pd.DataFrame()
    y_train = pd.Series()
    for df in train:
        ch = gen_features(df['acoustic_data'])
        X_train = X_train.append(ch, ignore_index=True)
        y_train = y_train.append(pd.Series(df['time_to_failure'].values[-1]))

   #Visulization function 
    plotstatfeature(X_train,y_train.to_numpy(dtype ='float32'))


For the test data, plotted same visualizations between X_test(statistical features) and y_hat(calculated time_to_failure) using the same function

Python
submission = pd.read_csv('../input/sample_submission.csv', index_col='seg_id')
X_test = pd.DataFrame()

# prepare test data
for seg_id in submission.index:
    seg = pd.read_csv('../input/test/' + seg_id + '.csv')
    ch = gen_features(seg['acoustic_data'])
    X_test = X_test.append(ch, ignore_index=True)

X_test = scaler.transform(X_test)    
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)
y_hat = model.predict(X_test)
submission['time_to_failure'] = y_hat
submission.to_csv('submission.csv')

#Visulization function 
plotstatfeature(X_test,y_hat.to_numpy(dtype ='float32'))


Question 1:

Is it meaningful to visualize X_test(statistical features) and y_hat(calculated time_to_failure)

Question 2(main question):
The visualization of test data are not good like train data .because train data is read in chunksize of 150000 giving the clear visualization while test data is full data which gives the more dense unclear visualization. How I can convert the test data in same chunksize of 150000 for the same uniform visualization just as train data visualization?

For converting the test data in the same chunksize of 150000 I tried to modify this line by introducing iterator and chunksize in the code

First case:
submission = pd.read_csv('../input/sample_submission.csv', index_col='seg_id' , iterator=True, chunksize=150_000)


But it gave me this error
Quote:
Traceback (most recent call last):

File "", line 1, in runfile('D:/code.py', wdir='D:/')

File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)

File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/code.py", line 299, in main()

File "D:/code.py", line 239, in main test(X_train, y_train)

File "D:/code.py", line 168, in test for seg_id in submission.index:

AttributeError: 'TextFileReader' object has no attribute 'index'



2nd case

seg = pd.read_csv('test/' + seg_id + '.csv'  , iterator=True, chunksize=150000)



it gave me this error

Quote:
Traceback (most recent call last):

File "", line 1, in runfile('D:/code.py', wdir='D:/')

File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)

File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/code.py", line 299, in main()

File "D:/code.py", line 239, in main test(X_train, y_train)

File "D:/code.py", line 170, in test ch = gen_features(seg['acoustic_data'])

TypeError: 'TextFileReader' object is not subscriptable


How I can introduce the chuncksize in test data ?

What I have tried:

First case:
submission = pd.read_csv('../input/sample_submission.csv', index_col='seg_id' , iterator=True, chunksize=150_000)


But it gave me this error
Quote:
Traceback (most recent call last):

File "", line 1, in runfile('D:/code.py', wdir='D:/')

File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)

File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/code.py", line 299, in main()

File "D:/code.py", line 239, in main test(X_train, y_train)

File "D:/code.py", line 168, in test for seg_id in submission.index:

AttributeError: 'TextFileReader' object has no attribute 'index'



2nd case

seg = pd.read_csv('test/' + seg_id + '.csv'  , iterator=True, chunksize=150000)



it gave me this error

Quote:
Traceback (most recent call last):

File "", line 1, in runfile('D:/code.py', wdir='D:/')

File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)

File "C:\Users\abc\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/code.py", line 299, in main()

File "D:/code.py", line 239, in main test(X_train, y_train)

File "D:/code.py", line 170, in test ch = gen_features(seg['acoustic_data'])

TypeError: 'TextFileReader' object is not subscriptable


How I can introduce the chuncksize in test data ?
Posted
Comments
Richard MacCutchan 6-Jun-20 4:07am    
The error messages are quite clear, you cannot use indexes or subscripts on the object. Check the documentation for the TextFileReader class.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900