Skip to main content

Posts

Showing posts from May, 2022

spark-Pyspark

 spark : in-memery computation spark : Data extracting from hard disk and store into ram. and each step operation are store in ram and only after completion of job it write to hard disk. hadoop mapreduce : it perform some operation and store to hard drive every time for each step. it read and write from hard disk every step of the job. so latency is high. Lazy execution : when applying function to read the data it does not read because we do not performing any opearaiton. it does not read data until we perform some operation or computation. suppose we do in pandas pd.read_csv it read data store in ram. Parallel Processing : distributed the data into different cluster and stored in nodes. batch processing and real-time processing ex. credit card transaction. genuine or fake

Logistic Regression

 Logistic Regression : logistic regression is a supervised learning algorithm used for classification problems. It is mainly used for binary classification. the target variable is categorical. the algorithm predicts the group which the current object belong.  it predicts the group by estimating the probabilities based on weightage relationship between dependent and independent variable.  for this probabilities it uses the logistics function. the task of logistic function is to take any real value and map it between 0 and 1. then this values are converted into binary values 0 and 1 using threshold value. default threshold value is 0.5