- Help Center
- Machine Learning
-
Data Science Bootcamp
-
Python Programming
-
Machine Learning
-
Data Analysis
-
Pricing
-
Registration
-
R Language
-
SQL
-
Power BI
-
Homework and Notebooks
-
Platform Related Issues
-
Programming and Tools
-
Large Language Models Bootcamp
-
Blog
-
Employment Assistance
-
Partnerships
-
Data Science for Business
-
Python for Data Science
-
Introduction to Power BI
-
Agentic AI Bootcamp
-
Practicum
-
Bootcamps
how to split the data?
Scikit-learn alias sklearn is the most useful and robust library for machine learning in Python. The scikit-learn library provides us with the model_selection module in which we have the splitter function train_test_split().
# import modules
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
# read the dataset
df = pd.read_csv('Real estate.csv')
# get the locations
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
# split the dataset
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.05, random_state=0)
In the above example, We import the pandas package and sklearn package. after that to import the CSV file we use the read_csv() method. The variable df now contains the data frame. In the example “house price” is the column we’ve to predict so we take that column as y and the rest of the columns as our X variable. test_size = 0.05 specifies only 5% of the whole data is taken as our test set, and 95% as our train set. The random state helps us get the same random split each time.