Dataset & Training Guide
Reference for preparing data, formatting CSVs, and training the classifier
The pipeline is designed around KOI (Kepler Object of Interest) naming conventions. Rename your columns if they differ.
If your CSV uses different column names, rename them before uploading:
import pandas as pd
df = pd.read_csv('my_data.csv')
rename_map = {
'period': 'koi_period',
'depth': 'koi_depth',
'duration': 'koi_duration',
'prad': 'koi_prad',
}
df.rename(columns=rename_map, inplace=True)
df.to_csv('my_data_renamed.csv', index=False)
koi_disposition exists with consistent labels (e.g. CONFIRMED, CANDIDATE, FALSE POSITIVE).'UNKNOWN'; encode with LabelEncoder.planets_to_star_radius_ratio, log_period, depth_to_duration.n_estimators=200, max_depth=6–10, learning_rate=0.05–0.1.NaN or null) for missing values.koi_disposition for labelled uploads; omit or leave blank for test uploads.