spark-Pyspark

spark :

in-memery computation
spark : Data extracting from hard disk and store into ram. and each step operation are store in ram and only after completion of job it write to hard disk.

hadoop mapreduce : it perform some operation and store to hard drive every time for each step. it read and write from hard disk every step of the job. so latency is high.

Lazy execution :

when applying function to read the data it does not read because we do not performing any opearaiton. it does not read data until we perform some operation or computation. suppose we do in pandas pd.read_csv it read data store in ram.

Parallel Processing :

distributed the data into different cluster and stored in nodes.

batch processing and real-time processing ex. credit card transaction. genuine or fake

deploying Machine learning Model : pkl, Flask,postman

1)Create model and train # importing Librarys import pandas as pd import numpy as np import matplotlib . pyplot as plt import seaborn as sns import requests from pickle import dump , load # Load Dataset url = "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" names = [ "sepal_length" , "sepal_width" , "petal_length" , "petal_width" , "species" ] # Loading Dataset df = pd . read_csv ( url , names = names ) df . tail ( 11 ) df . columns test = [ { 'sepal_length' : 5.1 , 'sepal_width' : 3.5 , 'peta...

Binomial Distribution

The binomial distribution formula is: b(x; n, P) = n C x * P x * (1 – P) n – x Where: b = binomial probability x = total number of “successes” (pass or fail, heads or tails etc.) P = probability of a success on an individual trial n = number of trials Note: The binomial distribution formula can also be written in a slightly different way, because n C x = n! / x!(n – x)! (this binomial distribution formula uses factorials (What is a factorial? ). “q” in this formula is just the probability of failure (subtract your probability of success from 1). Using the First Binomial Distribution Formula The binomial distribution formula can calculate the probability of success for binomial distributions. Often you’ll be told to “plug in” the numbers to the formul...

cammand for installing library in python

Command for installing in jupyter notebook: pip install library_name ex. pip install nump installing from anaconda prompt: 1. pip install numpy 2. conda install -c conda-forge matplotlib search for conda command for matplotlib and go to official website. Installing from anaconda navigator easy. Somtime give error then open as administrator

aviator coding

Search This Blog

spark-Pyspark

Popular posts from this blog

deploying Machine learning Model : pkl, Flask,postman

Binomial Distribution

cammand for installing library in python