Posts

Confounding Variables

Image
  Confounding Variables A  Confounder  is a variable whose presence affects the variables being studied so that the results do not reflect the actual relationship. There are various ways to exclude or control confounding variables including  Randomization ,  Restriction , and  Matching . A  confounding  variable is an “extra” variable that you didn’t account for. They can ruin an experiment and give you useless results. They can suggest there is a  correlation  when in fact there isn’t. They can even introduce  bias . That’s why it’s important to know what one is, and how to avoid getting them into your experiment in the first place. The  independent  variable typically has an effect on your  dependent  variable. For example, if you are researching whether lack of exercise leads to weight gain, then lack of exercise is your independent variable and weight gain is your dependent variable. Confounding variable...

Alpha Values and P-values

Image
  Alpha Values and P-values In conducting a test of significance or hypothesis test, there are two numbers that are easy to get confused. These numbers are easily confused because they are both numbers between zero and one, and are both probabilities. One number is called the  p-value  of the  test statistic . The other number of interests is the level of significance or  alpha( α ) . We will examine these two probabilities and determine the difference between them. Hypothesis Test There are two types of hypothesis namely: Null Hypothesis Alternative Hypothesis The  null hypothesis  of a test always predicts no effect or no relationship between variables. The  alternative hypothesis  states your research prediction of an effect or relationship. In conducting a test of significance or hypothesis test, there are two numbers that are easy to get confused. These numbers are easily confused because they are both numbers between zero and one, and a...

Joblib

  Joblib Joblib is a set of tools to provide lightweight pipelining in Python. In particular: transparent disk-caching of functions and lazy re-evaluation (memoize pattern) easy simple parallel computing . Why it is used? Better performance reproducibility Avoid computing the same thing twice Persist to disk transparently Features Transparent and fast disk-caching of output value Embarrassingly parallel helper Fast compressed Persistence Importing libraries from joblib import Memory,Parallel, delayed,dump,load import pandas as pd import numpy as np import math Data Creation my_dir = '/content/sample_data' a = np.vander(np.arange(3)) print(a) output: [[0 0 1] [1 1 1] [4 2 1]] Memory mem = Memory(my_dir) output: [[ 0 0 1] [ 1 1 1] [16 4 1]] sqr = mem.cache(np.square) b = sqr(a) print(b) output: [[ 0 0 1] [ 1 1 1] [16 4 1]] Parallel %%time Parallel(n_jobs=1)(delayed(np.square)(i) for i in range(10)) output: CPU times: user 2.85 ms, sys: 0 ns, total: 2.85 ms Wal...