Skip to main content

spark-Pyspark

 spark :


in-memery computation
spark : Data extracting from hard disk and store into ram. and each step operation are store in ram and only after completion of job it write to hard disk.

hadoop mapreduce : it perform some operation and store to hard drive every time for each step. it read and write from hard disk every step of the job. so latency is high.

Lazy execution :

when applying function to read the data it does not read because we do not performing any opearaiton. it does not read data until we perform some operation or computation. suppose we do in pandas pd.read_csv it read data store in ram.

Parallel Processing :

distributed the data into different cluster and stored in nodes.

batch processing and real-time processing ex. credit card transaction. genuine or fake





Popular posts from this blog

deploying Machine learning Model : pkl, Flask,postman

1)Create model and train          #  importing Librarys         import pandas as pd         import numpy as np         import matplotlib . pyplot as plt         import seaborn as sns         import requests         from pickle import dump , load         # Load Dataset         url = "http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"         names = [ "sepal_length" , "sepal_width" , "petal_length" , "petal_width" , "species" ]         # Loading Dataset         df = pd . read_csv ( url , names = names )         df . tail ( 11 )         df . columns         test = [         { 'sepal_length' : 5.1 , 'sepal_width' : 3.5 , 'peta...

Binomial Distribution

  The binomial distribution formula is:                                                    b(x; n, P) =  n C x  * P x  * (1 – P) n – x Where: b = binomial probability x = total number of “successes” (pass or fail, heads or tails etc.) P = probability of a success on an individual trial n = number of trials Note:  The binomial distribution formula can also be written in a slightly different way, because  n C x  = n! / x!(n – x)! (this binomial distribution formula uses factorials  (What is a factorial? ). “q” in this formula is just the probability of failure (subtract your probability of success from 1). Using the First Binomial Distribution Formula The binomial distribution formula can calculate the probability of success for binomial distributions. Often you’ll be told to “plug in” the numbers to the  formul...

Tabel of content in Jupyter notebook

  SourceForge uses markdown syntax everywhere to allow you to create rich text markup, and extends markdown in several ways to allow for quick linking to other artifacts in your project. Markdown was created to be easy to read, easy to write, and still readable in plain text format. Links Reference Links Artifact Links User Mentions Basic Text Formatting Blockquotes Preformatted Text Lists Tables Headers Horizontal Rules Images Videos Escapes and HTML More Headers Table of Contents Code Highlighting Includes Neighborhood Notifications Project Info Macros Thanks Links Most URLs will automatically be turned into links. To be explicit, just write it like this: <http://someurl> <somebbob@example.com> Output: http://someurl somebbob@example.com To use text for the link, write it: [like this](http://someurl) Output: like this You can add a *title* (which shows up under the cursor): [like this](http://someurl "this title shows up when you hover") Output: like this Refe...