[{"data":1,"prerenderedAt":7869},["ShallowReactive",2],{"content-query-nydz0E8puB":3},[4,610,1279,2255,2575,2799,3350,5560,6607,7768],{"_path":5,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":9,"description":10,"date":11,"draft":7,"tags":12,"thumbnail":14,"alt_description":15,"slug":16,"body":17,"_type":604,"_id":605,"_source":606,"_file":607,"_stem":608,"_extension":609},"/posts/two-approaches-to-create-a-dummy-variable","posts",false,"","Two approaches to a dummy variables","Learn about the difference between supervised and unsupervised machine learning and importance of creating dummy variable approaches","2026-02-21T00:00:00.000Z",[13],"the shape of data","/img/two_approaches_to_create_a_dummy_data.png","Creating dummy variables for different purposes","two-approaches-to-create-a-dummy-variable",{"type":18,"children":19,"toc":595},"root",[20,38,43,50,55,60,65,71,76,81,87,92,258,263,296,301,473,478,484,489,499,504,512,521,527,532,540,545,553,558,566,572,577,585,590],{"type":21,"tag":22,"props":23,"children":24},"element","blockquote",{},[25,32],{"type":21,"tag":26,"props":27,"children":28},"p",{},[29],{"type":30,"value":31},"text","Perfection is a myth. The best solution is simply the one with tradeoffs you can live with.",{"type":21,"tag":26,"props":33,"children":35},{"align":34},"right",[36],{"type":30,"value":37},"-- Smartest guy in the room",{"type":21,"tag":26,"props":39,"children":40},{},[41],{"type":30,"value":42},"There are a lot of different Machine Learning algorithms being created each and every day. For the sake of simplicity let's say there are just two categories - they are supervised and unsupervised.",{"type":21,"tag":44,"props":45,"children":47},"h2",{"id":46},"supervised-learning",[48],{"type":30,"value":49},"Supervised Learning",{"type":21,"tag":26,"props":51,"children":52},{},[53],{"type":30,"value":54},"The one we use when we have a specific number or target to predict, it's usually called a dependent variable, as opposed to the data points we use to predict the dependent variable which are called independent variables or predictors. When we are predicting a numerical value it's called regression; when we're predicting a categorical variable it's called classification.",{"type":21,"tag":26,"props":56,"children":57},{},[58],{"type":30,"value":59},"You may have heard of k-nearest neighbours (k-NN), support vector machines, random forest, etc. Using a supervised learning algorithm usually involves splitting your data into two sets. There's training data, where the algorithm tries to adjust the parameters in the function so that the predicted values are as close as possible to the actual values.",{"type":21,"tag":26,"props":61,"children":62},{},[63],{"type":30,"value":64},"After a supervised learning algorithm has been trained, we can use it to make new predictions and to estimate the impact of each independent variable on the dependent variable.",{"type":21,"tag":44,"props":66,"children":68},{"id":67},"unsupervised-learning",[69],{"type":30,"value":70},"Unsupervised Learning",{"type":21,"tag":26,"props":72,"children":73},{},[74],{"type":30,"value":75},"These algorithms tend to focus on data exploration - for example, reducing the dimensionality of a dataset to better visualize it, finding how the data points are related to each other, or detecting anomalous data points. In unsupervised learning there is no dependent or independent variable, and there is no split of the dataset; instead the entire dataset is served to the unsupervised algorithm.",{"type":21,"tag":26,"props":77,"children":78},{},[79],{"type":30,"value":80},"My favourite techniques of k-means, hierarchical clustering and principal component analysis are good examples where unsupervised learning is used for market segmentation or reducing dimensions as a preprocessing step to improve a supervised learning algorithm.",{"type":21,"tag":44,"props":82,"children":84},{"id":83},"structured-data-and-dummy-variables",[85],{"type":30,"value":86},"Structured Data and Dummy Variables",{"type":21,"tag":26,"props":88,"children":89},{},[90],{"type":30,"value":91},"Data science and statistical methods usually operate on structured data such as dataframes, which are basically Excel-like tables with columns, rows and strict rules on how the observed data points are inserted.",{"type":21,"tag":93,"props":94,"children":95},"table",{},[96,126],{"type":21,"tag":97,"props":98,"children":99},"thead",{},[100],{"type":21,"tag":101,"props":102,"children":103},"tr",{},[104,110,116,121],{"type":21,"tag":105,"props":106,"children":107},"th",{},[108],{"type":30,"value":109},"Variable 1",{"type":21,"tag":105,"props":111,"children":113},{"align":112},"center",[114],{"type":30,"value":115},"Variable 2",{"type":21,"tag":105,"props":117,"children":118},{"align":112},[119],{"type":30,"value":120},"Variable 3",{"type":21,"tag":105,"props":122,"children":123},{"align":112},[124],{"type":30,"value":125},"Outcome",{"type":21,"tag":127,"props":128,"children":129},"tbody",{},[130,154,176,197,217,237],{"type":21,"tag":101,"props":131,"children":132},{},[133,139,144,149],{"type":21,"tag":134,"props":135,"children":136},"td",{},[137],{"type":30,"value":138},"0",{"type":21,"tag":134,"props":140,"children":141},{"align":112},[142],{"type":30,"value":143},"1",{"type":21,"tag":134,"props":145,"children":146},{"align":112},[147],{"type":30,"value":148},"Male",{"type":21,"tag":134,"props":150,"children":151},{"align":112},[152],{"type":30,"value":153},"3.21",{"type":21,"tag":101,"props":155,"children":156},{},[157,161,166,171],{"type":21,"tag":134,"props":158,"children":159},{},[160],{"type":30,"value":143},{"type":21,"tag":134,"props":162,"children":163},{"align":112},[164],{"type":30,"value":165},"5",{"type":21,"tag":134,"props":167,"children":168},{"align":112},[169],{"type":30,"value":170},"Female",{"type":21,"tag":134,"props":172,"children":173},{"align":112},[174],{"type":30,"value":175},"3.99",{"type":21,"tag":101,"props":177,"children":178},{},[179,183,188,192],{"type":21,"tag":134,"props":180,"children":181},{},[182],{"type":30,"value":143},{"type":21,"tag":134,"props":184,"children":185},{"align":112},[186],{"type":30,"value":187},"4",{"type":21,"tag":134,"props":189,"children":190},{"align":112},[191],{"type":30,"value":170},{"type":21,"tag":134,"props":193,"children":194},{"align":112},[195],{"type":30,"value":196},"2.70",{"type":21,"tag":101,"props":198,"children":199},{},[200,204,208,212],{"type":21,"tag":134,"props":201,"children":202},{},[203],{"type":30,"value":138},{"type":21,"tag":134,"props":205,"children":206},{"align":112},[207],{"type":30,"value":143},{"type":21,"tag":134,"props":209,"children":210},{"align":112},[211],{"type":30,"value":148},{"type":21,"tag":134,"props":213,"children":214},{"align":112},[215],{"type":30,"value":216},"3.19",{"type":21,"tag":101,"props":218,"children":219},{},[220,224,228,232],{"type":21,"tag":134,"props":221,"children":222},{},[223],{"type":30,"value":143},{"type":21,"tag":134,"props":225,"children":226},{"align":112},[227],{"type":30,"value":165},{"type":21,"tag":134,"props":229,"children":230},{"align":112},[231],{"type":30,"value":170},{"type":21,"tag":134,"props":233,"children":234},{"align":112},[235],{"type":30,"value":236},"4.00",{"type":21,"tag":101,"props":238,"children":239},{},[240,244,248,253],{"type":21,"tag":134,"props":241,"children":242},{},[243],{"type":30,"value":143},{"type":21,"tag":134,"props":245,"children":246},{"align":112},[247],{"type":30,"value":187},{"type":21,"tag":134,"props":249,"children":250},{"align":112},[251],{"type":30,"value":252},"Non-binary",{"type":21,"tag":134,"props":254,"children":255},{"align":112},[256],{"type":30,"value":257},"1.29",{"type":21,"tag":26,"props":259,"children":260},{},[261],{"type":30,"value":262},"This is an example of three independent variables and the dependent variable that we are trying to predict, which is the \"Outcome\" column. Let's examine the data types of each column:",{"type":21,"tag":264,"props":265,"children":266},"ul",{},[267,278,287],{"type":21,"tag":268,"props":269,"children":270},"li",{},[271,276],{"type":21,"tag":272,"props":273,"children":274},"strong",{},[275],{"type":30,"value":109},{"type":30,"value":277}," - is a binary numerical column containing only 1 and 0",{"type":21,"tag":268,"props":279,"children":280},{},[281,285],{"type":21,"tag":272,"props":282,"children":283},{},[284],{"type":30,"value":115},{"type":30,"value":286}," - is a discrete numerical value column that takes whole numbers only",{"type":21,"tag":268,"props":288,"children":289},{},[290,294],{"type":21,"tag":272,"props":291,"children":292},{},[293],{"type":30,"value":120},{"type":30,"value":295}," - is a categorical variable representing gender",{"type":21,"tag":26,"props":297,"children":298},{},[299],{"type":30,"value":300},"Some algorithms accept categorical values, but most of them do not. Therefore data manipulation is required to convert categorical text data into numerical values.",{"type":21,"tag":93,"props":302,"children":303},{},[304,332],{"type":21,"tag":97,"props":305,"children":306},{},[307],{"type":21,"tag":101,"props":308,"children":309},{},[310,314,318,323,328],{"type":21,"tag":105,"props":311,"children":312},{},[313],{"type":30,"value":109},{"type":21,"tag":105,"props":315,"children":316},{"align":112},[317],{"type":30,"value":115},{"type":21,"tag":105,"props":319,"children":320},{"align":112},[321],{"type":30,"value":322},"Variable Male",{"type":21,"tag":105,"props":324,"children":325},{"align":112},[326],{"type":30,"value":327},"Variable Female",{"type":21,"tag":105,"props":329,"children":330},{"align":112},[331],{"type":30,"value":125},{"type":21,"tag":127,"props":333,"children":334},{},[335,358,381,404,427,450],{"type":21,"tag":101,"props":336,"children":337},{},[338,342,346,350,354],{"type":21,"tag":134,"props":339,"children":340},{},[341],{"type":30,"value":138},{"type":21,"tag":134,"props":343,"children":344},{"align":112},[345],{"type":30,"value":143},{"type":21,"tag":134,"props":347,"children":348},{"align":112},[349],{"type":30,"value":143},{"type":21,"tag":134,"props":351,"children":352},{"align":112},[353],{"type":30,"value":138},{"type":21,"tag":134,"props":355,"children":356},{"align":112},[357],{"type":30,"value":153},{"type":21,"tag":101,"props":359,"children":360},{},[361,365,369,373,377],{"type":21,"tag":134,"props":362,"children":363},{},[364],{"type":30,"value":143},{"type":21,"tag":134,"props":366,"children":367},{"align":112},[368],{"type":30,"value":165},{"type":21,"tag":134,"props":370,"children":371},{"align":112},[372],{"type":30,"value":138},{"type":21,"tag":134,"props":374,"children":375},{"align":112},[376],{"type":30,"value":143},{"type":21,"tag":134,"props":378,"children":379},{"align":112},[380],{"type":30,"value":175},{"type":21,"tag":101,"props":382,"children":383},{},[384,388,392,396,400],{"type":21,"tag":134,"props":385,"children":386},{},[387],{"type":30,"value":143},{"type":21,"tag":134,"props":389,"children":390},{"align":112},[391],{"type":30,"value":187},{"type":21,"tag":134,"props":393,"children":394},{"align":112},[395],{"type":30,"value":138},{"type":21,"tag":134,"props":397,"children":398},{"align":112},[399],{"type":30,"value":143},{"type":21,"tag":134,"props":401,"children":402},{"align":112},[403],{"type":30,"value":196},{"type":21,"tag":101,"props":405,"children":406},{},[407,411,415,419,423],{"type":21,"tag":134,"props":408,"children":409},{},[410],{"type":30,"value":138},{"type":21,"tag":134,"props":412,"children":413},{"align":112},[414],{"type":30,"value":143},{"type":21,"tag":134,"props":416,"children":417},{"align":112},[418],{"type":30,"value":143},{"type":21,"tag":134,"props":420,"children":421},{"align":112},[422],{"type":30,"value":138},{"type":21,"tag":134,"props":424,"children":425},{"align":112},[426],{"type":30,"value":216},{"type":21,"tag":101,"props":428,"children":429},{},[430,434,438,442,446],{"type":21,"tag":134,"props":431,"children":432},{},[433],{"type":30,"value":143},{"type":21,"tag":134,"props":435,"children":436},{"align":112},[437],{"type":30,"value":165},{"type":21,"tag":134,"props":439,"children":440},{"align":112},[441],{"type":30,"value":138},{"type":21,"tag":134,"props":443,"children":444},{"align":112},[445],{"type":30,"value":143},{"type":21,"tag":134,"props":447,"children":448},{"align":112},[449],{"type":30,"value":236},{"type":21,"tag":101,"props":451,"children":452},{},[453,457,461,465,469],{"type":21,"tag":134,"props":454,"children":455},{},[456],{"type":30,"value":143},{"type":21,"tag":134,"props":458,"children":459},{"align":112},[460],{"type":30,"value":187},{"type":21,"tag":134,"props":462,"children":463},{"align":112},[464],{"type":30,"value":138},{"type":21,"tag":134,"props":466,"children":467},{"align":112},[468],{"type":30,"value":138},{"type":21,"tag":134,"props":470,"children":471},{"align":112},[472],{"type":30,"value":257},{"type":21,"tag":26,"props":474,"children":475},{},[476],{"type":30,"value":477},"Notice how the Non-binary observation has 0 in both columns - Variable Male and Variable Female. That's where we have two different ways to feed the algorithm with gender values.",{"type":21,"tag":44,"props":479,"children":481},{"id":480},"n-value-dummy-variables",[482],{"type":30,"value":483},"n-value dummy variables",{"type":21,"tag":26,"props":485,"children":486},{},[487],{"type":30,"value":488},"We have three distinct values such as Male, Female and Non-binary, hence n equals 3. With n-value dummy variables we map gender in the following manner:",{"type":21,"tag":490,"props":491,"children":493},"pre",{"code":492},"Male       → (1, 0, 0)\nFemale     → (0, 1, 0)\nNon-binary → (0, 0, 1)\n",[494],{"type":21,"tag":495,"props":496,"children":497},"code",{"__ignoreMap":8},[498],{"type":30,"value":492},{"type":21,"tag":26,"props":500,"children":501},{},[502],{"type":30,"value":503},"This ensures that each category is equally distant from the others. The Euclidean distance between any two categories is:",{"type":21,"tag":490,"props":505,"children":507},{"code":506},"distance(Male, Female)       = sqrt((1-0)^2 + (0-1)^2 + (0-0)^2) = sqrt(2) ≈ 1.41\ndistance(Male, Non-binary)   = sqrt((1-0)^2 + (0-0)^2 + (0-1)^2) = sqrt(2) ≈ 1.41\ndistance(Female, Non-binary) = sqrt((0-0)^2 + (1-0)^2 + (0-1)^2) = sqrt(2) ≈ 1.41\n",[508],{"type":21,"tag":495,"props":509,"children":510},{"__ignoreMap":8},[511],{"type":30,"value":506},{"type":21,"tag":26,"props":513,"children":514},{},[515],{"type":21,"tag":516,"props":517,"children":520},"img",{"alt":518,"src":519},"Three value dummy variable","/img/img44.png",[],{"type":21,"tag":44,"props":522,"children":524},{"id":523},"n-1-dummy-variables",[525],{"type":30,"value":526},"n-1 dummy variables",{"type":21,"tag":26,"props":528,"children":529},{},[530],{"type":30,"value":531},"Using only 2 dummy variables means we drop one category:",{"type":21,"tag":490,"props":533,"children":535},{"code":534},"Male       → (1, 0)\nFemale     → (0, 1)\nNon-binary → (0, 0)\n",[536],{"type":21,"tag":495,"props":537,"children":538},{"__ignoreMap":8},[539],{"type":30,"value":534},{"type":21,"tag":26,"props":541,"children":542},{},[543],{"type":30,"value":544},"Now let's check the distances on a two-dimensional plane:",{"type":21,"tag":490,"props":546,"children":548},{"code":547},"distance(Male, Female)       = sqrt((1-0)^2 + (0-1)^2) = sqrt(2) ≈ 1.41\ndistance(Male, Non-binary)   = sqrt((1-0)^2 + (0-0)^2) = sqrt(1) = 1.00\ndistance(Female, Non-binary) = sqrt((0-0)^2 + (1-0)^2) = sqrt(1) = 1.00\n",[549],{"type":21,"tag":495,"props":550,"children":551},{"__ignoreMap":8},[552],{"type":30,"value":547},{"type":21,"tag":26,"props":554,"children":555},{},[556],{"type":30,"value":557},"Both Male and Female are closer to Non-binary than they are to each other, distorting the true relationships between categories.",{"type":21,"tag":26,"props":559,"children":560},{},[561],{"type":21,"tag":516,"props":562,"children":565},{"alt":563,"src":564},"Two value dummy variable","/img/img43.png",[],{"type":21,"tag":44,"props":567,"children":569},{"id":568},"tradeoff-between-n-and-n-1-approaches",[570],{"type":30,"value":571},"Tradeoff between n and n-1 approaches",{"type":21,"tag":26,"props":573,"children":574},{},[575],{"type":30,"value":576},"Have you heard of multicollinearity? That's exactly what happens with n-value dummy variables — each variable is completely and linearly determined by the others. Algebraically it's a linear dependence, meaning one column is a linear combination of the other columns.",{"type":21,"tag":490,"props":578,"children":580},{"code":579},"Variable_Male + Variable_Female + Variable_NonBinary = 1  (always)\n",[581],{"type":21,"tag":495,"props":582,"children":583},{"__ignoreMap":8},[584],{"type":30,"value":579},{"type":21,"tag":26,"props":586,"children":587},{},[588],{"type":30,"value":589},"Multicollinearity causes computational problems for linear and logistic regression, so for those algorithms we should use n-1 dummy variables rather than all n.",{"type":21,"tag":26,"props":591,"children":592},{},[593],{"type":30,"value":594},"On the other hand, for algorithms like k-NN where distances between data points are crucial, we don't want to drop any category, as that would skew the distances and lead to suboptimal performance.",{"title":8,"searchDepth":596,"depth":596,"links":597},2,[598,599,600,601,602,603],{"id":46,"depth":596,"text":49},{"id":67,"depth":596,"text":70},{"id":83,"depth":596,"text":86},{"id":480,"depth":596,"text":483},{"id":523,"depth":596,"text":526},{"id":568,"depth":596,"text":571},"markdown","content:posts:two-approaches-to-create-a-dummy-variable.md","content","posts/two-approaches-to-create-a-dummy-variable.md","posts/two-approaches-to-create-a-dummy-variable","md",{"_path":611,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":612,"description":613,"date":614,"draft":7,"tags":615,"thumbnail":619,"alt_description":620,"slug":621,"body":622,"_type":604,"_id":1276,"_source":606,"_file":1277,"_stem":1278,"_extension":609},"/posts/derivative-optimization","Using Calculus to Optimize Cryptocurrency Grid Trading","Optimizing crypto cuerrency trading with calculus","2025-12-12T00:00:00.000Z",[616,617,618],"trading","calculus","crypto","/img/derivative_optimization.png","Using calculus for cryptocurrency trading algorythms","derivative-optimization",{"type":18,"children":623,"toc":1254},[624,633,639,644,649,661,667,672,679,691,699,704,740,745,751,756,764,776,869,881,887,892,903,909,914,923,929,934,940,949,955,964,970,975,984,990,995,1003,1016,1029,1041,1054,1060,1065,1073,1079,1091,1100,1105,1129,1135,1140,1151,1156,1162,1215,1221,1226,1249],{"type":21,"tag":26,"props":625,"children":626},{},[627],{"type":21,"tag":628,"props":629,"children":630},"em",{},[631],{"type":30,"value":632},"How first and second derivatives can predict market reversals and dynamically adjust trading parameters",{"type":21,"tag":44,"props":634,"children":636},{"id":635},"introduction",[637],{"type":30,"value":638},"Introduction",{"type":21,"tag":26,"props":640,"children":641},{},[642],{"type":30,"value":643},"Grid trading is a popular algorithmic strategy that places buy and sell orders at regular intervals around a center price. When price oscillates, the bot profits from the spread between buy and sell prices. Simple, right?",{"type":21,"tag":26,"props":645,"children":646},{},[647],{"type":30,"value":648},"The challenge is that markets aren't always \"ranging.\" Sometimes they trend strongly in one direction like a falling knife or sky rocket effect, leaving your grid behind. Other times, volatility spikes and your fixed grid spacing becomes suboptimal.",{"type":21,"tag":26,"props":650,"children":651},{},[652,654,659],{"type":30,"value":653},"After running a grid trading bot for several months, I realized the key to profitability wasn't just having a grid—it was knowing ",{"type":21,"tag":272,"props":655,"children":656},{},[657],{"type":30,"value":658},"when to adjust it",{"type":30,"value":660},". This led me to implement derivative-based optimization: using calculus to detect market regime changes before they fully develop.",{"type":21,"tag":44,"props":662,"children":664},{"id":663},"the-mathematical-foundation",[665],{"type":30,"value":666},"The Mathematical Foundation",{"type":21,"tag":26,"props":668,"children":669},{},[670],{"type":30,"value":671},"Remember calculus? Here's where it becomes useful.",{"type":21,"tag":673,"props":674,"children":676},"h3",{"id":675},"first-derivative-momentum",[677],{"type":30,"value":678},"First Derivative: Momentum",{"type":21,"tag":26,"props":680,"children":681},{},[682,684,689],{"type":30,"value":683},"The first derivative of price with respect to time gives us ",{"type":21,"tag":272,"props":685,"children":686},{},[687],{"type":30,"value":688},"momentum",{"type":30,"value":690},"—the rate of price change:",{"type":21,"tag":490,"props":692,"children":694},{"code":693},"momentum = dP/dt = (P[t] - P[t-1]) / Δt\n",[695],{"type":21,"tag":495,"props":696,"children":697},{"__ignoreMap":8},[698],{"type":30,"value":693},{"type":21,"tag":26,"props":700,"children":701},{},[702],{"type":30,"value":703},"Where:",{"type":21,"tag":264,"props":705,"children":706},{},[707,718,729],{"type":21,"tag":268,"props":708,"children":709},{},[710,716],{"type":21,"tag":495,"props":711,"children":713},{"className":712},[],[714],{"type":30,"value":715},"P[t]",{"type":30,"value":717}," is the current price",{"type":21,"tag":268,"props":719,"children":720},{},[721,727],{"type":21,"tag":495,"props":722,"children":724},{"className":723},[],[725],{"type":30,"value":726},"P[t-1]",{"type":30,"value":728}," is the previous price",{"type":21,"tag":268,"props":730,"children":731},{},[732,738],{"type":21,"tag":495,"props":733,"children":735},{"className":734},[],[736],{"type":30,"value":737},"Δt",{"type":30,"value":739}," is the time interval (I use 30 seconds)",{"type":21,"tag":26,"props":741,"children":742},{},[743],{"type":30,"value":744},"Positive momentum means price is rising. Negative means falling. The magnitude tells us how fast.",{"type":21,"tag":673,"props":746,"children":748},{"id":747},"second-derivative-acceleration",[749],{"type":30,"value":750},"Second Derivative: Acceleration",{"type":21,"tag":26,"props":752,"children":753},{},[754],{"type":30,"value":755},"The second derivative—the rate of change of momentum—is where things get interesting:",{"type":21,"tag":490,"props":757,"children":759},{"code":758},"acceleration = d²P/dt² = (momentum[t] - momentum[t-1]) / Δt\n",[760],{"type":21,"tag":495,"props":761,"children":762},{"__ignoreMap":8},[763],{"type":30,"value":758},{"type":21,"tag":26,"props":765,"children":766},{},[767,769,774],{"type":30,"value":768},"This is the ",{"type":21,"tag":272,"props":770,"children":771},{},[772],{"type":30,"value":773},"key metric for early reversal detection",{"type":30,"value":775},":",{"type":21,"tag":93,"props":777,"children":778},{},[779,800],{"type":21,"tag":97,"props":780,"children":781},{},[782],{"type":21,"tag":101,"props":783,"children":784},{},[785,790,795],{"type":21,"tag":105,"props":786,"children":787},{},[788],{"type":30,"value":789},"Trend",{"type":21,"tag":105,"props":791,"children":792},{},[793],{"type":30,"value":794},"Acceleration",{"type":21,"tag":105,"props":796,"children":797},{},[798],{"type":30,"value":799},"Meaning",{"type":21,"tag":127,"props":801,"children":802},{},[803,821,838,854],{"type":21,"tag":101,"props":804,"children":805},{},[806,811,816],{"type":21,"tag":134,"props":807,"children":808},{},[809],{"type":30,"value":810},"Uptrend",{"type":21,"tag":134,"props":812,"children":813},{},[814],{"type":30,"value":815},"Positive",{"type":21,"tag":134,"props":817,"children":818},{},[819],{"type":30,"value":820},"Trend strengthening",{"type":21,"tag":101,"props":822,"children":823},{},[824,828,833],{"type":21,"tag":134,"props":825,"children":826},{},[827],{"type":30,"value":810},{"type":21,"tag":134,"props":829,"children":830},{},[831],{"type":30,"value":832},"Negative",{"type":21,"tag":134,"props":834,"children":835},{},[836],{"type":30,"value":837},"Trend weakening → possible reversal",{"type":21,"tag":101,"props":839,"children":840},{},[841,846,850],{"type":21,"tag":134,"props":842,"children":843},{},[844],{"type":30,"value":845},"Downtrend",{"type":21,"tag":134,"props":847,"children":848},{},[849],{"type":30,"value":832},{"type":21,"tag":134,"props":851,"children":852},{},[853],{"type":30,"value":820},{"type":21,"tag":101,"props":855,"children":856},{},[857,861,865],{"type":21,"tag":134,"props":858,"children":859},{},[860],{"type":30,"value":845},{"type":21,"tag":134,"props":862,"children":863},{},[864],{"type":30,"value":815},{"type":21,"tag":134,"props":866,"children":867},{},[868],{"type":30,"value":837},{"type":21,"tag":26,"props":870,"children":871},{},[872,874,879],{"type":30,"value":873},"The magic is that acceleration changes ",{"type":21,"tag":272,"props":875,"children":876},{},[877],{"type":30,"value":878},"before",{"type":30,"value":880}," price reverses. When you're in an uptrend and acceleration turns negative, momentum is still positive, but it's slowing down. This gives you 2-3 data points of warning before the actual reversal.",{"type":21,"tag":44,"props":882,"children":884},{"id":883},"implementation",[885],{"type":30,"value":886},"Implementation",{"type":21,"tag":26,"props":888,"children":889},{},[890],{"type":30,"value":891},"Here's the core Python implementation:",{"type":21,"tag":490,"props":893,"children":898},{"code":894,"language":895,"meta":8,"className":896},"from collections import deque\nimport numpy as np\n\nclass DerivativeOptimizer:\n    def __init__(self, symbol: str, lookback_periods: int = 30, time_interval: int = 30):\n        self.symbol = symbol\n        self.lookback_periods = lookback_periods\n        self.time_interval = time_interval\n        \n        # Circular buffers for efficient storage\n        self.price_history = deque(maxlen=lookback_periods)\n        self.momentum_history = deque(maxlen=lookback_periods)\n    \n    def add_price(self, price: float):\n        \"\"\"Add new price point\"\"\"\n        self.price_history.append(price)\n    \n    def calculate_momentum(self) -> float:\n        \"\"\"First derivative: rate of price change\"\"\"\n        if len(self.price_history) \u003C 2:\n            return None\n        \n        price_change = self.price_history[-1] - self.price_history[-2]\n        momentum = price_change / self.time_interval\n        self.momentum_history.append(momentum)\n        \n        return momentum\n    \n    def calculate_acceleration(self) -> float:\n        \"\"\"Second derivative: rate of momentum change\"\"\"\n        if len(self.momentum_history) \u003C 2:\n            return None\n        \n        momentum_change = self.momentum_history[-1] - self.momentum_history[-2]\n        return momentum_change / self.time_interval\n    \n    def calculate_smoothed_momentum(self, window: int = 5) -> float:\n        \"\"\"Moving average momentum to reduce noise\"\"\"\n        if len(self.price_history) \u003C window + 1:\n            return None\n        \n        prices = np.array(list(self.price_history)[-window - 1:])\n        price_changes = np.diff(prices)\n        return np.mean(price_changes) / self.time_interval\n","python",[897],"language-python",[899],{"type":21,"tag":495,"props":900,"children":901},{"__ignoreMap":8},[902],{"type":30,"value":894},{"type":21,"tag":44,"props":904,"children":906},{"id":905},"detecting-reversals",[907],{"type":30,"value":908},"Detecting Reversals",{"type":21,"tag":26,"props":910,"children":911},{},[912],{"type":30,"value":913},"The reversal detection algorithm looks for divergence between momentum and acceleration:",{"type":21,"tag":490,"props":915,"children":918},{"code":916,"language":895,"meta":8,"className":917},"def detect_reversal(self, momentum: float, acceleration: float) -> dict:\n    \"\"\"Detect potential trend reversals using derivative analysis\"\"\"\n    \n    # No reversal if data insufficient\n    if momentum is None or acceleration is None:\n        return {\"detected\": False}\n    \n    # Bullish reversal: downtrend losing steam\n    # momentum \u003C 0 (falling) but acceleration > 0 (slowing)\n    if momentum \u003C -0.001 and acceleration > 0.00005:\n        confidence = min(1.0, abs(acceleration) / 0.0002)\n        return {\n            \"detected\": True,\n            \"type\": \"bullish\",\n            \"confidence\": confidence,\n            \"description\": \"Downtrend losing momentum\"\n        }\n    \n    # Bearish reversal: uptrend losing steam\n    # momentum > 0 (rising) but acceleration \u003C 0 (slowing)\n    if momentum > 0.001 and acceleration \u003C -0.00005:\n        confidence = min(1.0, abs(acceleration) / 0.0002)\n        return {\n            \"detected\": True,\n            \"type\": \"bearish\", \n            \"confidence\": confidence,\n            \"description\": \"Uptrend losing momentum\"\n        }\n    \n    return {\"detected\": False}\n",[897],[919],{"type":21,"tag":495,"props":920,"children":921},{"__ignoreMap":8},[922],{"type":30,"value":916},{"type":21,"tag":44,"props":924,"children":926},{"id":925},"dynamic-grid-adjustment",[927],{"type":30,"value":928},"Dynamic Grid Adjustment",{"type":21,"tag":26,"props":930,"children":931},{},[932],{"type":30,"value":933},"Based on derivative signals, the grid adjusts automatically:",{"type":21,"tag":673,"props":935,"children":937},{"id":936},"_1-grid-spacing",[938],{"type":30,"value":939},"1. Grid Spacing",{"type":21,"tag":490,"props":941,"children":944},{"code":942,"language":895,"meta":8,"className":943},"def recommend_spacing_adjustment(self, volatility: float, trend_strength: str) -> float:\n    \"\"\"Adjust grid spacing based on market conditions\"\"\"\n    \n    base_spacing = 0.025  # 2.5% default\n    \n    if trend_strength in [\"strong_up\", \"strong_down\"]:\n        # Widen grid during trends to avoid getting run over\n        return base_spacing * 1.5\n    \n    elif trend_strength == \"weak\":\n        # Tighten grid during ranging for more trades\n        return base_spacing * 0.8\n    \n    # Adjust for volatility\n    if volatility > 0.02:\n        return base_spacing * 1.3\n    \n    return base_spacing\n",[897],[945],{"type":21,"tag":495,"props":946,"children":947},{"__ignoreMap":8},[948],{"type":30,"value":942},{"type":21,"tag":673,"props":950,"children":952},{"id":951},"_2-order-size-multiplier",[953],{"type":30,"value":954},"2. Order Size Multiplier",{"type":21,"tag":490,"props":956,"children":959},{"code":957,"language":895,"meta":8,"className":958},"def calculate_order_multiplier(self, momentum: float, trend_strength: str) -> float:\n    \"\"\"Adjust order size based on conditions\"\"\"\n    \n    if trend_strength == \"weak\":\n        # Ranging market = best for grid trading\n        # Increase size to capture more profit\n        return 1.3\n    \n    elif trend_strength in [\"strong_up\", \"strong_down\"]:\n        # Trending = risky for grid\n        # Reduce size to limit exposure\n        return 0.7\n    \n    return 1.0\n",[897],[960],{"type":21,"tag":495,"props":961,"children":962},{"__ignoreMap":8},[963],{"type":30,"value":957},{"type":21,"tag":44,"props":965,"children":967},{"id":966},"trend-strength-classification",[968],{"type":30,"value":969},"Trend Strength Classification",{"type":21,"tag":26,"props":971,"children":972},{},[973],{"type":30,"value":974},"I classify trends based on momentum magnitude, calibrated for crypto volatility:",{"type":21,"tag":490,"props":976,"children":979},{"code":977,"language":895,"meta":8,"className":978},"def classify_trend_strength(self, momentum: float) -> str:\n    \"\"\"Classify market regime based on momentum\"\"\"\n    \n    # Get current price for percentage calculation\n    current_price = self.price_history[-1]\n    \n    # Convert momentum to percentage per second\n    momentum_pct = abs(momentum) / current_price\n    \n    # Thresholds (per second):\n    # - Strong: >0.03% (1.8% per minute)\n    # - Moderate: 0.01-0.03% (0.6-1.8% per minute)  \n    # - Weak: \u003C0.01% (\u003C0.6% per minute)\n    \n    if momentum_pct > 0.0003:\n        return \"strong_up\" if momentum > 0 else \"strong_down\"\n    elif momentum_pct > 0.0001:\n        return \"moderate_up\" if momentum > 0 else \"moderate_down\"\n    else:\n        return \"weak\"  # Ranging market\n",[897],[980],{"type":21,"tag":495,"props":981,"children":982},{"__ignoreMap":8},[983],{"type":30,"value":977},{"type":21,"tag":44,"props":985,"children":987},{"id":986},"real-world-results",[988],{"type":30,"value":989},"Real-World Results",{"type":21,"tag":26,"props":991,"children":992},{},[993],{"type":30,"value":994},"Here's what the logs look like when the system is running:",{"type":21,"tag":490,"props":996,"children":998},{"code":997},"🔬 ETHUSDT Derivatives: ready\n   trend=weak, momentum=+0.046333, acceleration=-0.002911\n📊 ETHUSDT order multiplier: 1.30x\n\n🔬 SOLUSDT Derivatives: ready\n   trend=moderate_up, momentum=+0.685667, acceleration=+0.008289\n📊 SOLUSDT order multiplier: 0.85x\n",[999],{"type":21,"tag":495,"props":1000,"children":1001},{"__ignoreMap":8},[1002],{"type":30,"value":997},{"type":21,"tag":26,"props":1004,"children":1005},{},[1006,1008,1014],{"type":30,"value":1007},"When a ranging market is detected (",{"type":21,"tag":495,"props":1009,"children":1011},{"className":1010},[],[1012],{"type":30,"value":1013},"trend=weak",{"type":30,"value":1015},"), the system:",{"type":21,"tag":264,"props":1017,"children":1018},{},[1019,1024],{"type":21,"tag":268,"props":1020,"children":1021},{},[1022],{"type":30,"value":1023},"Tightens grid spacing for more frequent trades",{"type":21,"tag":268,"props":1025,"children":1026},{},[1027],{"type":30,"value":1028},"Increases order size by 1.3x to capture more profit per cycle",{"type":21,"tag":26,"props":1030,"children":1031},{},[1032,1034,1040],{"type":30,"value":1033},"When a trend is detected (",{"type":21,"tag":495,"props":1035,"children":1037},{"className":1036},[],[1038],{"type":30,"value":1039},"trend=moderate_up",{"type":30,"value":1015},{"type":21,"tag":264,"props":1042,"children":1043},{},[1044,1049],{"type":21,"tag":268,"props":1045,"children":1046},{},[1047],{"type":30,"value":1048},"Widens grid spacing to avoid being left behind",{"type":21,"tag":268,"props":1050,"children":1051},{},[1052],{"type":30,"value":1053},"Reduces order size to 0.85x to limit directional exposure",{"type":21,"tag":673,"props":1055,"children":1057},{"id":1056},"grid-adjustment-example",[1058],{"type":30,"value":1059},"Grid Adjustment Example",{"type":21,"tag":26,"props":1061,"children":1062},{},[1063],{"type":30,"value":1064},"The system logged this automatic adjustment:",{"type":21,"tag":490,"props":1066,"children":1068},{"code":1067},"SOLUSDT: Ranging market - tighten grid for more trading frequency (conf: 100%)\nSpacing: 3.00% → 2.40%\nOrder Size: $16.20 → $16.20\n",[1069],{"type":21,"tag":495,"props":1070,"children":1071},{"__ignoreMap":8},[1072],{"type":30,"value":1067},{"type":21,"tag":44,"props":1074,"children":1076},{"id":1075},"architecture-centralized-derivative-service",[1077],{"type":30,"value":1078},"Architecture: Centralized Derivative Service",{"type":21,"tag":26,"props":1080,"children":1081},{},[1082,1084,1089],{"type":30,"value":1083},"One important optimization: I use a ",{"type":21,"tag":272,"props":1085,"children":1086},{},[1087],{"type":30,"value":1088},"centralized derivative service",{"type":30,"value":1090}," that calculates derivatives once per symbol, shared across all trading logic. This avoids redundant calculations:",{"type":21,"tag":490,"props":1092,"children":1095},{"code":1093,"language":895,"meta":8,"className":1094},"class CentralizedDerivativeService:\n    \"\"\"Singleton service - one calculation per symbol for all consumers\"\"\"\n    \n    _instance = None\n    \n    @classmethod\n    def get_instance(cls, binance_client):\n        if cls._instance is None:\n            cls._instance = cls(binance_client)\n        return cls._instance\n    \n    async def get_signals_for_symbol(self, symbol: str) -> dict:\n        \"\"\"Get derivative signals - calculated once, used everywhere\"\"\"\n        if symbol not in self.optimizers:\n            return None\n        return self.optimizers[symbol].get_derivative_signals(use_cache=True)\n",[897],[1096],{"type":21,"tag":495,"props":1097,"children":1098},{"__ignoreMap":8},[1099],{"type":30,"value":1093},{"type":21,"tag":26,"props":1101,"children":1102},{},[1103],{"type":30,"value":1104},"The service:",{"type":21,"tag":1106,"props":1107,"children":1108},"ol",{},[1109,1114,1119,1124],{"type":21,"tag":268,"props":1110,"children":1111},{},[1112],{"type":30,"value":1113},"Fetches price every 30 seconds",{"type":21,"tag":268,"props":1115,"children":1116},{},[1117],{"type":30,"value":1118},"Calculates derivatives once",{"type":21,"tag":268,"props":1120,"children":1121},{},[1122],{"type":30,"value":1123},"Caches results for 15 seconds",{"type":21,"tag":268,"props":1125,"children":1126},{},[1127],{"type":30,"value":1128},"Multiple consumers (grid manager, analytics, protection service) all read the same cached values",{"type":21,"tag":44,"props":1130,"children":1132},{"id":1131},"data-persistence",[1133],{"type":30,"value":1134},"Data Persistence",{"type":21,"tag":26,"props":1136,"children":1137},{},[1138],{"type":30,"value":1139},"All derivative calculations are stored in PostgreSQL for analysis:",{"type":21,"tag":490,"props":1141,"children":1146},{"code":1142,"language":1143,"meta":8,"className":1144},"CREATE TABLE derivative_calculations (\n    id SERIAL PRIMARY KEY,\n    symbol VARCHAR(20),\n    timestamp TIMESTAMPTZ,\n    momentum DECIMAL(20, 10),\n    acceleration DECIMAL(20, 10),\n    smoothed_momentum DECIMAL(20, 10),\n    trend_strength VARCHAR(20),\n    volatility DECIMAL(10, 6)\n);\n","sql",[1145],"language-sql",[1147],{"type":21,"tag":495,"props":1148,"children":1149},{"__ignoreMap":8},[1150],{"type":30,"value":1142},{"type":21,"tag":26,"props":1152,"children":1153},{},[1154],{"type":30,"value":1155},"This enables backtesting and threshold tuning based on historical data.",{"type":21,"tag":44,"props":1157,"children":1159},{"id":1158},"key-lessons-learned",[1160],{"type":30,"value":1161},"Key Lessons Learned",{"type":21,"tag":1106,"props":1163,"children":1164},{},[1165,1175,1185,1195,1205],{"type":21,"tag":268,"props":1166,"children":1167},{},[1168,1173],{"type":21,"tag":272,"props":1169,"children":1170},{},[1171],{"type":30,"value":1172},"Smoothing matters",{"type":30,"value":1174},": Raw derivatives are noisy. Using a 5-period moving average for smoothed momentum reduces false signals significantly.",{"type":21,"tag":268,"props":1176,"children":1177},{},[1178,1183],{"type":21,"tag":272,"props":1179,"children":1180},{},[1181],{"type":30,"value":1182},"Require consecutive signals",{"type":30,"value":1184},": Don't adjust on a single spike. I require 5 consecutive strong signals (~2.5 minutes) before triggering a grid adjustment.",{"type":21,"tag":268,"props":1186,"children":1187},{},[1188,1193],{"type":21,"tag":272,"props":1189,"children":1190},{},[1191],{"type":30,"value":1192},"Thresholds need calibration",{"type":30,"value":1194},": What counts as \"strong\" momentum varies by asset. ETH and BTC can move 1% in a minute during news events. ADA might take 5 minutes to move 1%.",{"type":21,"tag":268,"props":1196,"children":1197},{},[1198,1203],{"type":21,"tag":272,"props":1199,"children":1200},{},[1201],{"type":30,"value":1202},"Cache aggressively",{"type":30,"value":1204},": Derivative calculations are called frequently. A 15-second cache prevents redundant numpy operations.",{"type":21,"tag":268,"props":1206,"children":1207},{},[1208,1213],{"type":21,"tag":272,"props":1209,"children":1210},{},[1211],{"type":30,"value":1212},"Reversals are probabilistic",{"type":30,"value":1214},": Even a high-confidence reversal signal only means \"momentum is slowing.\" Price could still continue in the same direction.",{"type":21,"tag":44,"props":1216,"children":1218},{"id":1217},"conclusion",[1219],{"type":30,"value":1220},"Conclusion",{"type":21,"tag":26,"props":1222,"children":1223},{},[1224],{"type":30,"value":1225},"Applying calculus to trading isn't just academic—it provides actionable signals for adaptive strategies. By treating price as a function of time and analyzing its derivatives, we can:",{"type":21,"tag":264,"props":1227,"children":1228},{},[1229,1234,1239,1244],{"type":21,"tag":268,"props":1230,"children":1231},{},[1232],{"type":30,"value":1233},"Detect trend changes 2-3 steps early",{"type":21,"tag":268,"props":1235,"children":1236},{},[1237],{"type":30,"value":1238},"Dynamically adjust grid spacing and order sizing",{"type":21,"tag":268,"props":1240,"children":1241},{},[1242],{"type":30,"value":1243},"Classify market regimes (trending vs. ranging)",{"type":21,"tag":268,"props":1245,"children":1246},{},[1247],{"type":30,"value":1248},"Improve profitability by trading more aggressively in favorable conditions",{"type":21,"tag":26,"props":1250,"children":1251},{},[1252],{"type":30,"value":1253},"The code in this article is simplified from my production system, but the core concepts are the same. If you're building any kind of algorithmic trading system, consider whether derivatives could help you anticipate rather than react to market changes.",{"title":8,"searchDepth":596,"depth":596,"links":1255},[1256,1257,1262,1263,1264,1268,1269,1272,1273,1274,1275],{"id":635,"depth":596,"text":638},{"id":663,"depth":596,"text":666,"children":1258},[1259,1261],{"id":675,"depth":1260,"text":678},3,{"id":747,"depth":1260,"text":750},{"id":883,"depth":596,"text":886},{"id":905,"depth":596,"text":908},{"id":925,"depth":596,"text":928,"children":1265},[1266,1267],{"id":936,"depth":1260,"text":939},{"id":951,"depth":1260,"text":954},{"id":966,"depth":596,"text":969},{"id":986,"depth":596,"text":989,"children":1270},[1271],{"id":1056,"depth":1260,"text":1059},{"id":1075,"depth":596,"text":1078},{"id":1131,"depth":596,"text":1134},{"id":1158,"depth":596,"text":1161},{"id":1217,"depth":596,"text":1220},"content:posts:derivative-optimization.md","posts/derivative-optimization.md","posts/derivative-optimization",{"_path":1280,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":1281,"description":1282,"date":1283,"draft":7,"tags":1284,"thumbnail":1287,"alt_description":1288,"slug":1289,"body":1290,"_type":604,"_id":2252,"_source":606,"_file":2253,"_stem":2254,"_extension":609},"/posts/a-guide-to-plink-data-in-sql","A Guide to Manage Bioinformatics Data in SQL Database","How to work with genotype data in databases","2025-01-04T00:00:00.000Z",[1143,1285,1286],"plink","bioinformatics","/img/a_guide_to_plink_data_in_sql.png","Getting better with databases as bioinformaticians","a-guide-to-plink-data-in-sql",{"type":18,"children":1291,"toc":2227},[1292,1310,1315,1320,1325,1343,1348,1354,1360,1365,1371,1376,1382,1387,1393,1398,1404,1409,1420,1425,1434,1444,1450,1455,1464,1473,1482,1491,1509,1514,1523,1532,1541,1550,1555,1564,1573,1582,1591,1600,1607,1612,1621,1630,1636,1641,1650,1657,1663,1668,1677,1686,1695,1704,1713,1720,1725,1731,1740,1745,1754,1759,1768,1775,1781,1786,1795,1800,1809,1818,1825,1831,1836,1845,1850,1856,1861,1870,1879,1888,1895,1901,1906,1915,1924,1933,1940,1945,1951,1967,1979,1986,1995,2000,2007,2012,2019,2024,2029,2038,2047,2056,2063,2068,2077,2085,2094,2099,2106,2112,2117,2126,2133,2138,2144,2149,2158,2167,2173,2178,2183,2201,2206,2211,2222],{"type":21,"tag":22,"props":1293,"children":1294},{},[1295,1300,1305],{"type":21,"tag":26,"props":1296,"children":1297},{},[1298],{"type":30,"value":1299},"Dedicated to Tamerlan.",{"type":21,"tag":26,"props":1301,"children":1302},{},[1303],{"type":30,"value":1304},"The world belongs to those who believe in the beauty of their dreams",{"type":21,"tag":26,"props":1306,"children":1307},{"align":34},[1308],{"type":30,"value":1309},"-- Not Random Indonesian Girl",{"type":21,"tag":26,"props":1311,"children":1312},{},[1313],{"type":30,"value":1314},"Many bioinformaticians excel at processing genetic data but have limited exposure to modern database practices. This tutorial aims to help laboratory specialists enhance their data management skills by building a practical SQLite database for PLINK genotype data.",{"type":21,"tag":26,"props":1316,"children":1317},{},[1318],{"type":30,"value":1319},"PLINK data, widely used in genetic analysis for applications like disease risk assessment and pharmacogenomics, typically exists in text-based formats. We'll demonstrate how to transform this data into a queryable SQL database using Python, following current best practices. This approach will introduce bioinformatics professionals to essential database skills while working with familiar genetic data.",{"type":21,"tag":26,"props":1321,"children":1322},{},[1323],{"type":30,"value":1324},"Our step-by-step guide will cover:",{"type":21,"tag":264,"props":1326,"children":1327},{},[1328,1333,1338],{"type":21,"tag":268,"props":1329,"children":1330},{},[1331],{"type":30,"value":1332},"Setting up a Python project for database operations",{"type":21,"tag":268,"props":1334,"children":1335},{},[1336],{"type":30,"value":1337},"Converting PLINK text files to SQLite format",{"type":21,"tag":268,"props":1339,"children":1340},{},[1341],{"type":30,"value":1342},"Accessing the database through DBeaver",{"type":21,"tag":26,"props":1344,"children":1345},{},[1346],{"type":30,"value":1347},"This tutorial is designed for bioinformaticians and other Data Clerks looking to expand their technical toolkit without disrupting their current workflow.",{"type":21,"tag":44,"props":1349,"children":1351},{"id":1350},"python-project-components",[1352],{"type":30,"value":1353},"Python Project Components",{"type":21,"tag":673,"props":1355,"children":1357},{"id":1356},"fastapi",[1358],{"type":30,"value":1359},"FastAPI",{"type":21,"tag":26,"props":1361,"children":1362},{},[1363],{"type":30,"value":1364},"Imagine our web application being a receptionist, whenever someone requests data, FastAPI handles it in a super fast manner (hence the name) making it easy to create APIs, which is a way different programs talk to each other. In our example when we want to store PLINK data into a database, FastAPI would handle that request and send back the results.",{"type":21,"tag":673,"props":1366,"children":1368},{"id":1367},"sqlmodel",[1369],{"type":30,"value":1370},"SQLModel",{"type":21,"tag":26,"props":1372,"children":1373},{},[1374],{"type":30,"value":1375},"Think of it as a translator between your Python code and your database. It helps you work with your database and define precise structure for your PLINK data. Some experienced Data Specialists may consider it as an alternative to SQLAlchemy.",{"type":21,"tag":673,"props":1377,"children":1379},{"id":1378},"uv",[1380],{"type":30,"value":1381},"UV",{"type":21,"tag":26,"props":1383,"children":1384},{},[1385],{"type":30,"value":1386},"And last, but not least the Python Package manager written in Rust, providing ease of use when it comes to start a project quick and clean. Thus might be considered as alternative to pip. It creates Git branch, virtual environment, keep track of your project dependencies and so much more.",{"type":21,"tag":44,"props":1388,"children":1390},{"id":1389},"set-up",[1391],{"type":30,"value":1392},"Set up",{"type":21,"tag":26,"props":1394,"children":1395},{},[1396],{"type":30,"value":1397},"First we need to open our Terminal and install our components and set up the project, let's do this typing following commands into our terminal:",{"type":21,"tag":673,"props":1399,"children":1401},{"id":1400},"install-uv",[1402],{"type":30,"value":1403},"Install UV",{"type":21,"tag":26,"props":1405,"children":1406},{},[1407],{"type":30,"value":1408},"if using Linux / Windows",{"type":21,"tag":490,"props":1410,"children":1415},{"className":1411,"code":1413,"language":1414,"meta":8},[1412],"language-bash","pip install uv\n","bash",[1416],{"type":21,"tag":495,"props":1417,"children":1418},{"__ignoreMap":8},[1419],{"type":30,"value":1413},{"type":21,"tag":26,"props":1421,"children":1422},{},[1423],{"type":30,"value":1424},"or using Mac",{"type":21,"tag":490,"props":1426,"children":1429},{"className":1427,"code":1428,"language":1414,"meta":8},[1412],"brew install uv\n",[1430],{"type":21,"tag":495,"props":1431,"children":1432},{"__ignoreMap":8},[1433],{"type":30,"value":1428},{"type":21,"tag":26,"props":1435,"children":1436},{},[1437,1442],{"type":21,"tag":516,"props":1438,"children":1441},{"alt":1439,"src":1440},"terminal_installation","/img/plink/plink_1.png",[],{"type":30,"value":1443},"\nIn my case I have it installed, so nothing really happens here after the prompt.",{"type":21,"tag":673,"props":1445,"children":1447},{"id":1446},"create-project",[1448],{"type":30,"value":1449},"Create Project",{"type":21,"tag":26,"props":1451,"children":1452},{},[1453],{"type":30,"value":1454},"Now let's initiate the project with UV",{"type":21,"tag":490,"props":1456,"children":1459},{"className":1457,"code":1458,"language":1414,"meta":8},[1412],"uv init plink_data\n",[1460],{"type":21,"tag":495,"props":1461,"children":1462},{"__ignoreMap":8},[1463],{"type":30,"value":1458},{"type":21,"tag":26,"props":1465,"children":1466},{},[1467,1471],{"type":21,"tag":516,"props":1468,"children":1470},{"alt":1439,"src":1469},"/img/plink/plink_2.png",[],{"type":30,"value":1472},"\nChange directory to a new project via \"cd plink_data\" and type \"ls\" to see files inside the project.",{"type":21,"tag":490,"props":1474,"children":1477},{"className":1475,"code":1476,"language":1414,"meta":8},[1412],"cd plink_data\nls\n",[1478],{"type":21,"tag":495,"props":1479,"children":1480},{"__ignoreMap":8},[1481],{"type":30,"value":1476},{"type":21,"tag":26,"props":1483,"children":1484},{},[1485,1489],{"type":21,"tag":516,"props":1486,"children":1488},{"alt":1439,"src":1487},"/img/plink/plink_3.png",[],{"type":30,"value":1490},"\nAs soon as we switched to plink_data project we can see three basic files here",{"type":21,"tag":264,"props":1492,"children":1493},{},[1494,1499,1504],{"type":21,"tag":268,"props":1495,"children":1496},{},[1497],{"type":30,"value":1498},"hello.py",{"type":21,"tag":268,"props":1500,"children":1501},{},[1502],{"type":30,"value":1503},"pyproject.toml",{"type":21,"tag":268,"props":1505,"children":1506},{},[1507],{"type":30,"value":1508},"README.md",{"type":21,"tag":26,"props":1510,"children":1511},{},[1512],{"type":30,"value":1513},"We also have initialized git project. Let's explore it first",{"type":21,"tag":490,"props":1515,"children":1518},{"className":1516,"code":1517,"language":1414,"meta":8},[1412],"git status\n",[1519],{"type":21,"tag":495,"props":1520,"children":1521},{"__ignoreMap":8},[1522],{"type":30,"value":1517},{"type":21,"tag":26,"props":1524,"children":1525},{},[1526,1530],{"type":21,"tag":516,"props":1527,"children":1529},{"alt":1439,"src":1528},"/img/plink/plink_4.png",[],{"type":30,"value":1531},"\nGit says we are at master branch with no commits and couple of untracked files. If you don't know what Git is, then don't mind and let's keep up with our project. Let's kick it off",{"type":21,"tag":490,"props":1533,"children":1536},{"className":1534,"code":1535,"language":1414,"meta":8},[1412],"uv run hello.py\n",[1537],{"type":21,"tag":495,"props":1538,"children":1539},{"__ignoreMap":8},[1540],{"type":30,"value":1535},{"type":21,"tag":26,"props":1542,"children":1543},{},[1544,1548],{"type":21,"tag":516,"props":1545,"children":1547},{"alt":1439,"src":1546},"/img/plink/plink_5.png",[],{"type":30,"value":1549},"\nWe just ran our project with CPython, created virtual environment and received greetings from plink-data project. Good job so far !",{"type":21,"tag":26,"props":1551,"children":1552},{},[1553],{"type":30,"value":1554},"Now let's add our project components by running following command",{"type":21,"tag":490,"props":1556,"children":1559},{"className":1557,"code":1558,"language":1414,"meta":8},[1412],"uv add fastapi sqlmodel python-multipart uvicorn\n",[1560],{"type":21,"tag":495,"props":1561,"children":1562},{"__ignoreMap":8},[1563],{"type":30,"value":1558},{"type":21,"tag":26,"props":1565,"children":1566},{},[1567,1571],{"type":21,"tag":516,"props":1568,"children":1570},{"alt":1439,"src":1569},"/img/plink/plink_6.png",[],{"type":30,"value":1572},"\nAll components being installed and we can synchronize them",{"type":21,"tag":490,"props":1574,"children":1577},{"className":1575,"code":1576,"language":1414,"meta":8},[1412],"uv sync\n",[1578],{"type":21,"tag":495,"props":1579,"children":1580},{"__ignoreMap":8},[1581],{"type":30,"value":1576},{"type":21,"tag":26,"props":1583,"children":1584},{},[1585,1589],{"type":21,"tag":516,"props":1586,"children":1588},{"alt":1439,"src":1587},"/img/plink/plink_7.png",[],{"type":30,"value":1590},"\nAlso we can see the project dependencies structure",{"type":21,"tag":490,"props":1592,"children":1595},{"className":1593,"code":1594,"language":1414,"meta":8},[1412],"uv tree\n",[1596],{"type":21,"tag":495,"props":1597,"children":1598},{"__ignoreMap":8},[1599],{"type":30,"value":1594},{"type":21,"tag":26,"props":1601,"children":1602},{},[1603],{"type":21,"tag":516,"props":1604,"children":1606},{"alt":1439,"src":1605},"/img/plink/plink_8.png",[],{"type":21,"tag":26,"props":1608,"children":1609},{},[1610],{"type":30,"value":1611},"Our plink-data project and it's components like fastapi which depends on pydantic and starlette, sqlmodel depend on sqlalchemy and so on. Now let's activate our python virtual environment",{"type":21,"tag":490,"props":1613,"children":1616},{"className":1614,"code":1615,"language":1414,"meta":8},[1412],". .venv/bin/activate\n",[1617],{"type":21,"tag":495,"props":1618,"children":1619},{"__ignoreMap":8},[1620],{"type":30,"value":1615},{"type":21,"tag":26,"props":1622,"children":1623},{},[1624,1628],{"type":21,"tag":516,"props":1625,"children":1627},{"alt":1439,"src":1626},"/img/plink/plink_9.png",[],{"type":30,"value":1629},"\nBy following this steps we accomplished to set up our project in a couple of minutes without wasting our time on creating git project , virtual environment and declare our dependencies. UV made it for us, and it's bad ass. Now let's write some source code",{"type":21,"tag":44,"props":1631,"children":1633},{"id":1632},"src",[1634],{"type":30,"value":1635},"SRC",{"type":21,"tag":26,"props":1637,"children":1638},{},[1639],{"type":30,"value":1640},"Let's create source directory where the main python code would live",{"type":21,"tag":490,"props":1642,"children":1645},{"className":1643,"code":1644,"language":1414,"meta":8},[1412],"mkdir src\ncd src\n",[1646],{"type":21,"tag":495,"props":1647,"children":1648},{"__ignoreMap":8},[1649],{"type":30,"value":1644},{"type":21,"tag":26,"props":1651,"children":1652},{},[1653],{"type":21,"tag":516,"props":1654,"children":1656},{"alt":1439,"src":1655},"/img/plink/plink_10.png",[],{"type":21,"tag":673,"props":1658,"children":1660},{"id":1659},"database",[1661],{"type":30,"value":1662},"Database",{"type":21,"tag":26,"props":1664,"children":1665},{},[1666],{"type":30,"value":1667},"Here we would need to define a database structure",{"type":21,"tag":490,"props":1669,"children":1672},{"className":1670,"code":1671,"language":1414,"meta":8},[1412],"nano database.py\n",[1673],{"type":21,"tag":495,"props":1674,"children":1675},{"__ignoreMap":8},[1676],{"type":30,"value":1671},{"type":21,"tag":26,"props":1678,"children":1679},{},[1680,1684],{"type":21,"tag":516,"props":1681,"children":1683},{"alt":1439,"src":1682},"/img/plink/plink_11.png",[],{"type":30,"value":1685},"\nHere we would need to write following",{"type":21,"tag":490,"props":1687,"children":1690},{"className":1688,"code":1689,"language":1414,"meta":8},[1412],"from sqlmodel import SQLModel, create_engine\n\nDATABASE_URL = \"sqlite:///genotypes.db\"\nengine = create_engine(DATABASE_URL)\n\ndef create_db_and_tables():\n    SQLModel.metadata.create_all(engine)\n",[1691],{"type":21,"tag":495,"props":1692,"children":1693},{"__ignoreMap":8},[1694],{"type":30,"value":1689},{"type":21,"tag":26,"props":1696,"children":1697},{},[1698,1702],{"type":21,"tag":516,"props":1699,"children":1701},{"alt":1439,"src":1700},"/img/plink/plink_12.png",[],{"type":30,"value":1703},"\nthen press Ctrl + X, and press \"Y\" and \"ENTER\" to save content",{"type":21,"tag":490,"props":1705,"children":1708},{"className":1706,"code":1707,"language":1414,"meta":8},[1412],"cat database.py\n",[1709],{"type":21,"tag":495,"props":1710,"children":1711},{"__ignoreMap":8},[1712],{"type":30,"value":1707},{"type":21,"tag":26,"props":1714,"children":1715},{},[1716],{"type":21,"tag":516,"props":1717,"children":1719},{"alt":1439,"src":1718},"/img/plink/plink_13.png",[],{"type":21,"tag":26,"props":1721,"children":1722},{},[1723],{"type":30,"value":1724},"I actually use bat, but it's an additional feature that has to be installed first, however cat would give you the same results, but without syntax highlight.",{"type":21,"tag":673,"props":1726,"children":1728},{"id":1727},"models",[1729],{"type":30,"value":1730},"Models",{"type":21,"tag":490,"props":1732,"children":1735},{"className":1733,"code":1734,"language":1414,"meta":8},[1412],"nano models.py\n",[1736],{"type":21,"tag":495,"props":1737,"children":1738},{"__ignoreMap":8},[1739],{"type":30,"value":1734},{"type":21,"tag":26,"props":1741,"children":1742},{},[1743],{"type":30,"value":1744},"The following code would create a class for GenotypeData, i.e the PLINK data structure",{"type":21,"tag":490,"props":1746,"children":1749},{"className":1747,"code":1748,"language":1414,"meta":8},[1412],"from datetime import datetime\nfrom typing import Optional\n\nfrom sqlmodel import Field, SQLModel\n\nclass GenotypeData(SQLModel, table=True):\n    id: Optional[int] = Field(default=None, primary_key=True)\n    family_id: str = Field(index=True)\n    individual_id: str = Field(index=True)\n    paternal_id: str\n    maternal_id: str\n    sex: int\n    phenotype: int\n    snp1: str\n    snp2: str\n    snp3: str\n    snp4: str\n    snp5: str\n    uploaded_at: datetime = Field(default_factory=datetime.utcnow)\n",[1750],{"type":21,"tag":495,"props":1751,"children":1752},{"__ignoreMap":8},[1753],{"type":30,"value":1748},{"type":21,"tag":26,"props":1755,"children":1756},{},[1757],{"type":30,"value":1758},"Save it with Ctrl + X, press \"Y\" and \"ENTER\", and check the content",{"type":21,"tag":490,"props":1760,"children":1763},{"className":1761,"code":1762,"language":1414,"meta":8},[1412],"cat models.py\n",[1764],{"type":21,"tag":495,"props":1765,"children":1766},{"__ignoreMap":8},[1767],{"type":30,"value":1762},{"type":21,"tag":26,"props":1769,"children":1770},{},[1771],{"type":21,"tag":516,"props":1772,"children":1774},{"alt":1439,"src":1773},"/img/plink/plink_14.png",[],{"type":21,"tag":673,"props":1776,"children":1778},{"id":1777},"main",[1779],{"type":30,"value":1780},"Main",{"type":21,"tag":26,"props":1782,"children":1783},{},[1784],{"type":30,"value":1785},"Create main python file",{"type":21,"tag":490,"props":1787,"children":1790},{"className":1788,"code":1789,"language":1414,"meta":8},[1412],"nano main.py\n",[1791],{"type":21,"tag":495,"props":1792,"children":1793},{"__ignoreMap":8},[1794],{"type":30,"value":1789},{"type":21,"tag":26,"props":1796,"children":1797},{},[1798],{"type":30,"value":1799},"Pass the following code",{"type":21,"tag":490,"props":1801,"children":1804},{"className":1802,"code":1803,"language":1414,"meta":8},[1412],"from fastapi import FastAPI, UploadFile\nfrom sqlmodel import Session\n\nfrom .database import create_db_and_tables, engine\nfrom .models import GenotypeData\n\n\napp = FastAPI()\n\n@app.on_event(\"startup\")\n\ndef on_startup():\n create_db_and_tables()\n\n@app.post(\"/upload/\")\n\nasync def upload_file(file: UploadFile):\n content = (await file.read()).decode()\n with Session(engine) as session:\n  for line in content.splitlines():\n  fields = line.strip().split()\n  if not fields: # Skip empty lines\n   continue\n\n  genotype_data = GenotypeData(\n   family_id=fields[0],\n   individual_id=fields[1],\n   paternal_id=fields[2],\n   maternal_id=fields[3],\n   sex=int(fields[4]),\n   phenotype=int(fields[5]),\n   snp1=f\"{fields[6]} {fields[7]}\",\n   snp2=f\"{fields[8]} {fields[9]}\",\n   snp3=f\"{fields[10]} {fields[11]}\",\n   snp4=f\"{fields[12]} {fields[13]}\",\n   snp5=f\"{fields[14]} {fields[15]}\",\n  )\n  session.add(genotype_data)\n session.commit()\n\n return {\"message\": f\"Data from {file.filename} uploaded successfully\"}\n",[1805],{"type":21,"tag":495,"props":1806,"children":1807},{"__ignoreMap":8},[1808],{"type":30,"value":1803},{"type":21,"tag":490,"props":1810,"children":1813},{"className":1811,"code":1812,"language":1414,"meta":8},[1412],"cat main.py\n",[1814],{"type":21,"tag":495,"props":1815,"children":1816},{"__ignoreMap":8},[1817],{"type":30,"value":1812},{"type":21,"tag":26,"props":1819,"children":1820},{},[1821],{"type":21,"tag":516,"props":1822,"children":1824},{"alt":1439,"src":1823},"/img/plink/plink_15.png",[],{"type":21,"tag":673,"props":1826,"children":1828},{"id":1827},"init",[1829],{"type":30,"value":1830},"Init",{"type":21,"tag":26,"props":1832,"children":1833},{},[1834],{"type":30,"value":1835},"We also need a simple init file, this way we interpret whole src directory as the python package",{"type":21,"tag":490,"props":1837,"children":1840},{"className":1838,"code":1839,"language":1414,"meta":8},[1412],"touch __init__.py\n",[1841],{"type":21,"tag":495,"props":1842,"children":1843},{"__ignoreMap":8},[1844],{"type":30,"value":1839},{"type":21,"tag":26,"props":1846,"children":1847},{},[1848],{"type":30,"value":1849},"And that's it.",{"type":21,"tag":673,"props":1851,"children":1853},{"id":1852},"create-a-sample-data-or-use-your-own",[1854],{"type":30,"value":1855},"Create a sample data or use your own",{"type":21,"tag":26,"props":1857,"children":1858},{},[1859],{"type":30,"value":1860},"I will create a sample to ingest the data, if you have your own PLNIK data, feel free to upload your samples into the same folder we working on",{"type":21,"tag":490,"props":1862,"children":1865},{"className":1863,"code":1864,"language":1414,"meta":8},[1412],"nano sample.txt\n",[1866],{"type":21,"tag":495,"props":1867,"children":1868},{"__ignoreMap":8},[1869],{"type":30,"value":1864},{"type":21,"tag":490,"props":1871,"children":1874},{"className":1872,"code":1873,"language":1414,"meta":8},[1412],"FAM1    IND1    0    0    1    2    A A    G G    A C    T T    A G\nFAM1    IND2    0    0    2    2    A G    G T    C C    T T    G G\nFAM2    IND3    0    0    1    1    G G    T T    C C    A T    G G\nFAM2    IND4    0    0    2    1    A G    G T    0 0    T T    A G\nFAM3    IND5    0    0    1    2    A A    G G    C C    T T    G G\n",[1875],{"type":21,"tag":495,"props":1876,"children":1877},{"__ignoreMap":8},[1878],{"type":30,"value":1873},{"type":21,"tag":490,"props":1880,"children":1883},{"className":1881,"code":1882,"language":1414,"meta":8},[1412],"cat sample.txt\n",[1884],{"type":21,"tag":495,"props":1885,"children":1886},{"__ignoreMap":8},[1887],{"type":30,"value":1882},{"type":21,"tag":26,"props":1889,"children":1890},{},[1891],{"type":21,"tag":516,"props":1892,"children":1894},{"alt":1439,"src":1893},"/img/plink/plink_16.png",[],{"type":21,"tag":673,"props":1896,"children":1898},{"id":1897},"upload-sample",[1899],{"type":30,"value":1900},"Upload sample",{"type":21,"tag":26,"props":1902,"children":1903},{},[1904],{"type":30,"value":1905},"First we need to launch our application with the uvicorn command",{"type":21,"tag":490,"props":1907,"children":1910},{"className":1908,"code":1909,"language":1414,"meta":8},[1412],"uvicorn src.main:app --reload\n",[1911],{"type":21,"tag":495,"props":1912,"children":1913},{"__ignoreMap":8},[1914],{"type":30,"value":1909},{"type":21,"tag":26,"props":1916,"children":1917},{},[1918,1922],{"type":21,"tag":516,"props":1919,"children":1921},{"alt":1439,"src":1920},"/img/plink/plink_17.png",[],{"type":30,"value":1923},"\nCool! The app is live and running. The nuance is that we have to keep this terminal in it's current state and open another terminal to ingest the file.\nIn the new terminal write the following command:",{"type":21,"tag":490,"props":1925,"children":1928},{"className":1926,"code":1927,"language":1414,"meta":8},[1412],"curl -X POST -F \"file=@sample.txt\" http://localhost:8000/upload/\n",[1929],{"type":21,"tag":495,"props":1930,"children":1931},{"__ignoreMap":8},[1932],{"type":30,"value":1927},{"type":21,"tag":26,"props":1934,"children":1935},{},[1936],{"type":21,"tag":516,"props":1937,"children":1939},{"alt":1439,"src":1938},"/img/plink/plink_18.png",[],{"type":21,"tag":26,"props":1941,"children":1942},{},[1943],{"type":30,"value":1944},"Congrats! Your data has been ingested.",{"type":21,"tag":44,"props":1946,"children":1948},{"id":1947},"read-data-using-sql",[1949],{"type":30,"value":1950},"Read data using SQL",{"type":21,"tag":26,"props":1952,"children":1953},{},[1954,1956,1965],{"type":30,"value":1955},"First you need a program that will allow you access your database with SQL. My way to go with SQL is dbeaver, but you can use any other program such as Data Grip for example. I have it installed, if you don't go to official website to ",{"type":21,"tag":1957,"props":1958,"children":1962},"a",{"href":1959,"rel":1960},"https://dbeaver.io/download/",[1961],"nofollow",[1963],{"type":30,"value":1964},"download",{"type":30,"value":1966}," and install it. Community version is free.",{"type":21,"tag":26,"props":1968,"children":1969},{},[1970,1972,1977],{"type":30,"value":1971},"This is how interface look like, click on the socket + sign to add the database\n",{"type":21,"tag":516,"props":1973,"children":1976},{"alt":1974,"src":1975},"dbeaver_interface","/img/plink/plink_19.png",[],{"type":30,"value":1978},"\nChoose SQLite and press Next",{"type":21,"tag":26,"props":1980,"children":1981},{},[1982],{"type":21,"tag":516,"props":1983,"children":1985},{"alt":1974,"src":1984},"/img/plink/plink_20.png",[],{"type":21,"tag":26,"props":1987,"children":1988},{},[1989,1991],{"type":30,"value":1990},"Press Open\n",{"type":21,"tag":516,"props":1992,"children":1994},{"alt":1974,"src":1993},"/img/plink/plink_21.png",[],{"type":21,"tag":26,"props":1996,"children":1997},{},[1998],{"type":30,"value":1999},"Then choose genotype.db file and press open and then finish",{"type":21,"tag":26,"props":2001,"children":2002},{},[2003],{"type":21,"tag":516,"props":2004,"children":2006},{"alt":1974,"src":2005},"/img/plink/plink_22.png",[],{"type":21,"tag":26,"props":2008,"children":2009},{},[2010],{"type":30,"value":2011},"Look at the bar where genotypes.db connection is chosen instead of N/A. You have to explicitly choose it.",{"type":21,"tag":26,"props":2013,"children":2014},{},[2015],{"type":21,"tag":516,"props":2016,"children":2018},{"alt":1974,"src":2017},"/img/plink/plink_23.png",[],{"type":21,"tag":673,"props":2020,"children":2021},{"id":1143},[2022],{"type":30,"value":2023},"SQL",{"type":21,"tag":26,"props":2025,"children":2026},{},[2027],{"type":30,"value":2028},"Now we can do some basic SELECT statements like so",{"type":21,"tag":490,"props":2030,"children":2033},{"className":2031,"code":2032,"language":1143,"meta":8},[1145],"SELECT * FROM genotypedata\n",[2034],{"type":21,"tag":495,"props":2035,"children":2036},{"__ignoreMap":8},[2037],{"type":30,"value":2032},{"type":21,"tag":26,"props":2039,"children":2040},{},[2041,2045],{"type":21,"tag":516,"props":2042,"children":2044},{"alt":1974,"src":2043},"/img/plink/plink_24.png",[],{"type":30,"value":2046},"\nNow as we got all data at hand, let's explore some DML (Data Manipulation Language) functionality. For example we might need to see how many individuals are in each familiy",{"type":21,"tag":490,"props":2048,"children":2051},{"className":2049,"code":2050,"language":1143,"meta":8},[1145],"SELECT\n family_id,\n COUNT(*) as individual_count\nFROM genotypedata\nGROUP BY family_id;\n",[2052],{"type":21,"tag":495,"props":2053,"children":2054},{"__ignoreMap":8},[2055],{"type":30,"value":2050},{"type":21,"tag":26,"props":2057,"children":2058},{},[2059],{"type":21,"tag":516,"props":2060,"children":2062},{"alt":1974,"src":2061},"/img/plink/plink_25.png",[],{"type":21,"tag":26,"props":2064,"children":2065},{},[2066],{"type":30,"value":2067},"Or let's say we want to see only females with phenotype 2",{"type":21,"tag":490,"props":2069,"children":2072},{"className":2070,"code":2071,"language":1143,"meta":8},[1145],"SELECT *\nFROM genotypedata\nWHERE sex = 2\nAND phenotype = 2;\n",[2073],{"type":21,"tag":495,"props":2074,"children":2075},{"__ignoreMap":8},[2076],{"type":30,"value":2071},{"type":21,"tag":26,"props":2078,"children":2079},{},[2080],{"type":21,"tag":516,"props":2081,"children":2084},{"alt":2082,"src":2083},"dbeaver interface","/img/plink/plink_26.png",[],{"type":21,"tag":490,"props":2086,"children":2089},{"className":2087,"code":2088,"language":1143,"meta":8},[1145],"SELECT \n family_id, \n COUNT(*) as total_records, \n SUM(CASE WHEN sex = 1 THEN 1 ELSE 0 END) as male_count, \n SUM(CASE WHEN sex = 2 THEN 1 ELSE 0 END) as female_count \nFROM genotypedata\nGROUP BY family_id;\n",[2090],{"type":21,"tag":495,"props":2091,"children":2092},{"__ignoreMap":8},[2093],{"type":30,"value":2088},{"type":21,"tag":26,"props":2095,"children":2096},{},[2097],{"type":30,"value":2098},"Get total records and split by sex",{"type":21,"tag":26,"props":2100,"children":2101},{},[2102],{"type":21,"tag":516,"props":2103,"children":2105},{"alt":1974,"src":2104},"/img/plink/plink_27.png",[],{"type":21,"tag":673,"props":2107,"children":2109},{"id":2108},"advanced-sql",[2110],{"type":30,"value":2111},"Advanced SQL",{"type":21,"tag":26,"props":2113,"children":2114},{},[2115],{"type":30,"value":2116},"Let's say we want to see Genotype Distribution by Phenotype analyzing relationships between genotypes and phenotypes",{"type":21,"tag":490,"props":2118,"children":2121},{"className":2119,"code":2120,"language":1143,"meta":8},[1145],"SELECT\n phenotype,\n snp1,\n COUNT(*) as count,\n ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY phenotype), 2) as percentage\nFROM genotypedata\nGROUP BY phenotype, snp1\nORDER BY phenotype, count DESC;\n",[2122],{"type":21,"tag":495,"props":2123,"children":2124},{"__ignoreMap":8},[2125],{"type":30,"value":2120},{"type":21,"tag":26,"props":2127,"children":2128},{},[2129],{"type":21,"tag":516,"props":2130,"children":2132},{"alt":1974,"src":2131},"/img/plink/plink_28.png",[],{"type":21,"tag":26,"props":2134,"children":2135},{},[2136],{"type":30,"value":2137},"We can see that between snp1 of phenotype 1 is evenly distributed in 50 / 50, but not much for phenotype 2 where distribution is 67 / 33",{"type":21,"tag":673,"props":2139,"children":2141},{"id":2140},"hardy-weinberg-equilibrium-hwe-check",[2142],{"type":30,"value":2143},"Hardy-Weinberg Equilibrium (HWE) Check",{"type":21,"tag":26,"props":2145,"children":2146},{},[2147],{"type":30,"value":2148},"It's based on a fundamental principle: in a stable population, the frequency of genotypes should follow a predictable pattern unless something is interfering.",{"type":21,"tag":490,"props":2150,"children":2153},{"className":2151,"code":2152,"language":1143,"meta":8},[1145],"WITH allele_counts AS (\n SELECT\n  COUNT(*) as total,\n  SUM(CASE WHEN snp1 LIKE 'A A' THEN 1 ELSE 0 END) as AA,\n  SUM(CASE WHEN snp1 LIKE 'A G' OR snp1 LIKE 'G A' THEN 1 ELSE 0 END) as AG,\n  SUM(CASE WHEN snp1 LIKE 'G G' THEN 1 ELSE 0 END) as GG\n FROM genotypedata\n)\nSELECT\n AA as observed_AA,\n AG as observed_AG,\n GG as observed_GG,\n ROUND(POWER((2*AA + AG)/(2.0*total), 2) * total, 2) as expected_AA,\n ROUND(2 * ((2*AA + AG)/(2.0*total)) * ((2*GG + AG)/(2.0*total)) * total, 2) as expected_AG,\n ROUND(POWER((2*GG + AG)/(2.0*total), 2) * total, 2) as expected_GG\nFROM allele_counts;\n",[2154],{"type":21,"tag":495,"props":2155,"children":2156},{"__ignoreMap":8},[2157],{"type":30,"value":2152},{"type":21,"tag":26,"props":2159,"children":2160},{},[2161,2165],{"type":21,"tag":516,"props":2162,"children":2164},{"alt":2082,"src":2163},"/img/plink/plink_29.png",[],{"type":30,"value":2166},"\nThe differences between observed and expected aren't large, but noticeable enough to warrant attention in quality control processes.",{"type":21,"tag":44,"props":2168,"children":2170},{"id":2169},"wrapping-up-from-lab-benches-to-database-queries",[2171],{"type":30,"value":2172},"Wrapping Up: From Lab Benches to Database Queries 🧬",{"type":21,"tag":26,"props":2174,"children":2175},{},[2176],{"type":30,"value":2177},"We've come quite a journey from those text-based PLINK files to a fully-functional SQL database. Pretty cool transformation, right?",{"type":21,"tag":26,"props":2179,"children":2180},{},[2181],{"type":30,"value":2182},"Here's what you've accomplished:",{"type":21,"tag":264,"props":2184,"children":2185},{},[2186,2191,2196],{"type":21,"tag":268,"props":2187,"children":2188},{},[2189],{"type":30,"value":2190},"Set up a modern Python project faster than you can say \"nucleotide sequencing\"",{"type":21,"tag":268,"props":2192,"children":2193},{},[2194],{"type":30,"value":2195},"Transformed genetic data into queryable gold using SQLite",{"type":21,"tag":268,"props":2197,"children":2198},{},[2199],{"type":30,"value":2200},"Learned to use use SQL queries (and even tackled Hardy-Weinberg equilibrium!)",{"type":21,"tag":26,"props":2202,"children":2203},{},[2204],{"type":30,"value":2205},"The best part? This is just the beginning. With your genetic data now living in a proper database, you've opened up a whole new world of possibilities for analysis and collaboration.",{"type":21,"tag":26,"props":2207,"children":2208},{},[2209],{"type":30,"value":2210},"Keep experimenting, keep querying, and most importantly - keep pushing the boundaries of what's possible with your data!",{"type":21,"tag":26,"props":2212,"children":2213},{},[2214,2216,2220],{"type":30,"value":2215},"Yours,",{"type":21,"tag":2217,"props":2218,"children":2219},"br",{},[],{"type":30,"value":2221},"\nBad Dog",{"type":21,"tag":26,"props":2223,"children":2224},{},[2225],{"type":30,"value":2226},"P.S. Remember: Every great bioinformatician started somewhere. Today, that somewhere was turning PLINK files into SQL magic! 🪄",{"title":8,"searchDepth":596,"depth":596,"links":2228},[2229,2234,2238,2246,2251],{"id":1350,"depth":596,"text":1353,"children":2230},[2231,2232,2233],{"id":1356,"depth":1260,"text":1359},{"id":1367,"depth":1260,"text":1370},{"id":1378,"depth":1260,"text":1381},{"id":1389,"depth":596,"text":1392,"children":2235},[2236,2237],{"id":1400,"depth":1260,"text":1403},{"id":1446,"depth":1260,"text":1449},{"id":1632,"depth":596,"text":1635,"children":2239},[2240,2241,2242,2243,2244,2245],{"id":1659,"depth":1260,"text":1662},{"id":1727,"depth":1260,"text":1730},{"id":1777,"depth":1260,"text":1780},{"id":1827,"depth":1260,"text":1830},{"id":1852,"depth":1260,"text":1855},{"id":1897,"depth":1260,"text":1900},{"id":1947,"depth":596,"text":1950,"children":2247},[2248,2249,2250],{"id":1143,"depth":1260,"text":2023},{"id":2108,"depth":1260,"text":2111},{"id":2140,"depth":1260,"text":2143},{"id":2169,"depth":596,"text":2172},"content:posts:a-guide-to-plink-data-in-sql.md","posts/a-guide-to-plink-data-in-sql.md","posts/a-guide-to-plink-data-in-sql",{"_path":2256,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":2257,"description":2258,"date":2259,"draft":7,"tags":2260,"thumbnail":2264,"alt_description":2265,"slug":2266,"body":2267,"_type":604,"_id":2572,"_source":606,"_file":2573,"_stem":2574,"_extension":609},"/posts/fastest-way-to-upload-data-into-postgresql","The Fastest Way to Upload Data into PostgreSQL","Learn how to significantly speed up your PostgreSQL data uploads by switching from INSERT to COPY method","2024-10-27T00:00:00.000Z",[2261,2262,2263],"data-engineering","postgresql","airflow","/img/the_fastest_way_to_upload_data.png","Uploading data into PostgreSQL using COPY method","fastest-way-to-upload-data-into-postgresql",{"type":18,"children":2268,"toc":2562},[2269,2282,2287,2293,2306,2311,2320,2325,2334,2339,2345,2350,2359,2364,2370,2424,2429,2435,2441,2484,2490,2533,2537,2542,2554],{"type":21,"tag":22,"props":2270,"children":2271},{},[2272,2277],{"type":21,"tag":26,"props":2273,"children":2274},{},[2275],{"type":30,"value":2276},"According to Pareto principal, 20% of your code do 80% of compilation",{"type":21,"tag":26,"props":2278,"children":2279},{},[2280],{"type":30,"value":2281},"                                                                            -- Christian Mayer, The Art of Clean Code",{"type":21,"tag":26,"props":2283,"children":2284},{},[2285],{"type":30,"value":2286},"Recently I changed my job from Data Analyst in a Big Tech to a Data Product Manager in Enterprise. And I was freaking out of FOMO concerning that Enterprise company would not challenge me enough to stay on point with technical approach to creative solutions. I gotta say, that I was wrong and my learning curve is as steep as it should be for anyone who changes the jobs in between to seize the learning opportunity.",{"type":21,"tag":44,"props":2288,"children":2290},{"id":2289},"the-problem-with-pandas-default-insert",[2291],{"type":30,"value":2292},"The Problem with Pandas Default Insert",{"type":21,"tag":26,"props":2294,"children":2295},{},[2296,2298,2304],{"type":30,"value":2297},"Recently our Data Team faced an interesting challenge. Our Airflow DAG was taking forever to upload a DataFrame into PostgreSQL. The culprit? The default pandas ",{"type":21,"tag":495,"props":2299,"children":2301},{"className":2300},[],[2302],{"type":30,"value":2303},"to_sql()",{"type":30,"value":2305}," method that uses INSERT statements.",{"type":21,"tag":26,"props":2307,"children":2308},{},[2309],{"type":30,"value":2310},"Here's what happens under the hood when you use the default INSERT approach:",{"type":21,"tag":490,"props":2312,"children":2315},{"className":2313,"code":2314,"language":895,"meta":8},[897],"df.to_sql('table_name', engine, if_exists='append')\n",[2316],{"type":21,"tag":495,"props":2317,"children":2318},{"__ignoreMap":8},[2319],{"type":30,"value":2314},{"type":21,"tag":26,"props":2321,"children":2322},{},[2323],{"type":30,"value":2324},"This innocent-looking line generates something like this for EACH row:",{"type":21,"tag":490,"props":2326,"children":2329},{"className":2327,"code":2328,"language":1143,"meta":8},[1145],"INSERT INTO table_name (col1, col2, col3) \nVALUES ('value1', 'value2', 'value3');\n",[2330],{"type":21,"tag":495,"props":2331,"children":2332},{"__ignoreMap":8},[2333],{"type":30,"value":2328},{"type":21,"tag":26,"props":2335,"children":2336},{},[2337],{"type":30,"value":2338},"Imagine doing this millions of times! Each INSERT statement requires a round trip to the database. It's like delivering packages one by one instead of using a container ship. No wonder our DAG was running slower than my previous employer's internet connection.",{"type":21,"tag":44,"props":2340,"children":2342},{"id":2341},"enter-the-copy-method",[2343],{"type":30,"value":2344},"Enter the COPY Method",{"type":21,"tag":26,"props":2346,"children":2347},{},[2348],{"type":30,"value":2349},"One of the greatest mind in our team comes with the thought \"I may have realized the fastest way to upload dataframe into PostgreSQL\". Here's the core of the solution:",{"type":21,"tag":490,"props":2351,"children":2354},{"className":2352,"code":2353,"language":895,"meta":8},[897],"def psql_insert_copy(table, conn, keys, data_iter):\n    dbapi_conn = conn.connection\n    with dbapi_conn.cursor() as cur:\n        s_buf = StringIO()\n        writer = csv.writer(s_buf)\n        writer.writerows(data_iter)\n        s_buf.seek(0)\n\n        columns = ', '.join('\"{}\"'.format(k) for k in keys)\n        table_name = '{}.{}'.format(table.schema, table.name)\n        sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(\n            table_name, columns)\n        cur.copy_expert(sql=sql, file=s_buf)\n",[2355],{"type":21,"tag":495,"props":2356,"children":2357},{"__ignoreMap":8},[2358],{"type":30,"value":2353},{"type":21,"tag":26,"props":2360,"children":2361},{},[2362],{"type":30,"value":2363},"Suggesting to use PostgreSQL's COPY command, this beast can handle bulk data loading like it's nothing. Instead of sending individual INSERT statements, COPY streams the data in a single transaction. It's like upgrading from a bicycle courier to a cargo plane!",{"type":21,"tag":44,"props":2365,"children":2367},{"id":2366},"the-results-speak-for-themselves",[2368],{"type":30,"value":2369},"The Results Speak for Themselves",{"type":21,"tag":2371,"props":2372,"children":2375},"div",{"className":2373},[2374],"table-container",[2376],{"type":21,"tag":93,"props":2377,"children":2378},{},[2379,2395],{"type":21,"tag":97,"props":2380,"children":2381},{},[2382],{"type":21,"tag":101,"props":2383,"children":2384},{},[2385,2390],{"type":21,"tag":105,"props":2386,"children":2387},{},[2388],{"type":30,"value":2389},"Method",{"type":21,"tag":105,"props":2391,"children":2392},{},[2393],{"type":30,"value":2394},"Time to Upload 1M Rows",{"type":21,"tag":127,"props":2396,"children":2397},{},[2398,2411],{"type":21,"tag":101,"props":2399,"children":2400},{},[2401,2406],{"type":21,"tag":134,"props":2402,"children":2403},{},[2404],{"type":30,"value":2405},"INSERT",{"type":21,"tag":134,"props":2407,"children":2408},{},[2409],{"type":30,"value":2410},"~20 minutes",{"type":21,"tag":101,"props":2412,"children":2413},{},[2414,2419],{"type":21,"tag":134,"props":2415,"children":2416},{},[2417],{"type":30,"value":2418},"COPY",{"type":21,"tag":134,"props":2420,"children":2421},{},[2422],{"type":30,"value":2423},"~20 seconds",{"type":21,"tag":26,"props":2425,"children":2426},{},[2427],{"type":30,"value":2428},"Yes, you read that right. What used to take half an hour now completes in seconds. Our DBAs finally stopped giving us the evil eye during peak load times.",{"type":21,"tag":44,"props":2430,"children":2432},{"id":2431},"lets-break-it-down",[2433],{"type":30,"value":2434},"Let's break it down",{"type":21,"tag":673,"props":2436,"children":2438},{"id":2437},"insert-method",[2439],{"type":30,"value":2440},"INSERT Method",{"type":21,"tag":264,"props":2442,"children":2443},{},[2444,2449,2454,2459,2464,2469,2474,2479],{"type":21,"tag":268,"props":2445,"children":2446},{},[2447],{"type":30,"value":2448},"Simple to implement",{"type":21,"tag":268,"props":2450,"children":2451},{},[2452],{"type":30,"value":2453},"Good for small datasets",{"type":21,"tag":268,"props":2455,"children":2456},{},[2457],{"type":30,"value":2458},"Better for real-time row-by-row updates",{"type":21,"tag":268,"props":2460,"children":2461},{},[2462],{"type":30,"value":2463},"Easier error handling per row",{"type":21,"tag":268,"props":2465,"children":2466},{},[2467],{"type":30,"value":2468},"Painfully slow for bulk uploads",{"type":21,"tag":268,"props":2470,"children":2471},{},[2472],{"type":30,"value":2473},"Creates heavy network traffic",{"type":21,"tag":268,"props":2475,"children":2476},{},[2477],{"type":30,"value":2478},"Causes database connection overhead",{"type":21,"tag":268,"props":2480,"children":2481},{},[2482],{"type":30,"value":2483},"Makes DBAs cry",{"type":21,"tag":673,"props":2485,"children":2487},{"id":2486},"copy-method",[2488],{"type":30,"value":2489},"COPY Method",{"type":21,"tag":264,"props":2491,"children":2492},{},[2493,2498,2503,2508,2513,2518,2523,2528],{"type":21,"tag":268,"props":2494,"children":2495},{},[2496],{"type":30,"value":2497},"Blazing fast for bulk uploads",{"type":21,"tag":268,"props":2499,"children":2500},{},[2501],{"type":30,"value":2502},"Minimal network overhead",{"type":21,"tag":268,"props":2504,"children":2505},{},[2506],{"type":30,"value":2507},"Single transaction",{"type":21,"tag":268,"props":2509,"children":2510},{},[2511],{"type":30,"value":2512},"Makes DBAs smile",{"type":21,"tag":268,"props":2514,"children":2515},{},[2516],{"type":30,"value":2517},"More complex implementation",{"type":21,"tag":268,"props":2519,"children":2520},{},[2521],{"type":30,"value":2522},"All-or-nothing transaction",{"type":21,"tag":268,"props":2524,"children":2525},{},[2526],{"type":30,"value":2527},"Harder to handle individual row errors",{"type":21,"tag":268,"props":2529,"children":2530},{},[2531],{"type":30,"value":2532},"Not suitable for real-time updates",{"type":21,"tag":44,"props":2534,"children":2535},{"id":1217},[2536],{"type":30,"value":1220},{"type":21,"tag":26,"props":2538,"children":2539},{},[2540],{"type":30,"value":2541},"If you're dealing with bulk data uploads in PostgreSQL, switching from INSERT to COPY is like upgrading from a Honda Civic to a Ferrari (without the expensive maintenance). Just remember - with great power comes great responsibility. Make sure your data is clean before attempting the upload, as COPY is an all-or-nothing operation.",{"type":21,"tag":26,"props":2543,"children":2544},{},[2545,2547],{"type":30,"value":2546},"The full implementation and comparison available in Askin Tamanli ",{"type":21,"tag":1957,"props":2548,"children":2551},{"href":2549,"rel":2550},"https://github.com/askintamanli/Fastest-Methods-to-Bulk-Insert-Pandas-Dataframe-into-PostgreSQL",[1961],[2552],{"type":30,"value":2553},"repository",{"type":21,"tag":26,"props":2555,"children":2556},{},[2557,2558,2561],{"type":30,"value":2215},{"type":21,"tag":2217,"props":2559,"children":2560},{},[],{"type":30,"value":2221},{"title":8,"searchDepth":596,"depth":596,"links":2563},[2564,2565,2566,2567,2571],{"id":2289,"depth":596,"text":2292},{"id":2341,"depth":596,"text":2344},{"id":2366,"depth":596,"text":2369},{"id":2431,"depth":596,"text":2434,"children":2568},[2569,2570],{"id":2437,"depth":1260,"text":2440},{"id":2486,"depth":1260,"text":2489},{"id":1217,"depth":596,"text":1220},"content:posts:fastest-way-to-upload-data-into-postgresql.md","posts/fastest-way-to-upload-data-into-postgresql.md","posts/fastest-way-to-upload-data-into-postgresql",{"_path":2576,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":2577,"description":2578,"date":2579,"draft":7,"tags":2580,"thumbnail":2584,"alt_description":2585,"slug":2586,"body":2587,"_type":604,"_id":2796,"_source":606,"_file":2797,"_stem":2798,"_extension":609},"/posts/ai-powered-assistant-for-self-service-analytics","AI-powered Assistant for Self-Service Analytics","Building Self-Service Analytics using telegram bot and ChatGPT","2024-05-26T00:00:00.000Z",[2581,2582,2583],"nlp","ai","self-service","/img/ai_powered_assistant_for_self_service_analytics.png","Using ChatGPT NLP model for self-service analytics","ai-powered-assistant-for-self-service-analytics",{"type":18,"children":2588,"toc":2789},[2589,2602,2607,2613,2627,2633,2638,2646,2679,2685,2690,2698,2703,2711,2716,2724,2729,2735,2740,2745,2750,2762,2766,2771,2781],{"type":21,"tag":22,"props":2590,"children":2591},{},[2592,2597],{"type":21,"tag":26,"props":2593,"children":2594},{},[2595],{"type":30,"value":2596},"You would never be able to hire enough data professionals to meet the data demands of the business, so instead, why not turn the business into data professionals?\"",{"type":21,"tag":26,"props":2598,"children":2599},{},[2600],{"type":30,"value":2601},"                                                                                    -- Random Guy from LinkedIn",{"type":21,"tag":26,"props":2603,"children":2604},{},[2605],{"type":30,"value":2606},"In small start-ups, the demand for data products is ever-growing. As the Lead Data Analyst, I've developed numerous solutions, from automated email reporting systems and comprehensive dashboards integrating external data sources to real-time fraud detection algorithms. However, maintaining these projects makes it challenging to develop new solutions. With budget constraints preventing the hiring of more data professionals, why not empower the business itself to be a Data Product? This article explains how to build Self-Service Analytics using a Telegram bot and ChatGPT's Natural Language Processing (NLP).",{"type":21,"tag":44,"props":2608,"children":2610},{"id":2609},"prerequisites",[2611],{"type":30,"value":2612},"Prerequisites",{"type":21,"tag":26,"props":2614,"children":2615},{},[2616,2618,2625],{"type":30,"value":2617},"As mentioned in my ",{"type":21,"tag":1957,"props":2619,"children":2622},{"href":2620,"rel":2621},"https://blog.baddogdata.com/how-to-build-data-warehouse-in-activity-schema-with-clickhouse",[1961],[2623],{"type":30,"value":2624},"previous article",{"type":30,"value":2626},", as a Data Professional, the first step is ensuring denormalization. Data Marts and Data Warehouses are crucial utilities for your Data Team. Having all data in one place makes it easier to deliver business objectives. Once you have consistent data in one place, you are ready to implement AI into your infrastructure.",{"type":21,"tag":44,"props":2628,"children":2630},{"id":2629},"user-to-assistant-interaction",[2631],{"type":30,"value":2632},"User to Assistant Interaction",{"type":21,"tag":26,"props":2634,"children":2635},{},[2636],{"type":30,"value":2637},"The following scheme explains how self-service analytics works in 6 simple steps.",{"type":21,"tag":26,"props":2639,"children":2640},{},[2641],{"type":21,"tag":516,"props":2642,"children":2645},{"alt":2643,"src":2644},"User and NLP interaction scheme","/img/img39.png",[],{"type":21,"tag":1106,"props":2647,"children":2648},{},[2649,2654,2659,2664,2669,2674],{"type":21,"tag":268,"props":2650,"children":2651},{},[2652],{"type":30,"value":2653},"User sends a request for data in a specific format.",{"type":21,"tag":268,"props":2655,"children":2656},{},[2657],{"type":30,"value":2658},"Backend logic utilizes system prompts so NLP understands user intention.",{"type":21,"tag":268,"props":2660,"children":2661},{},[2662],{"type":30,"value":2663},"NLP returns the SQL query to the backend.",{"type":21,"tag":268,"props":2665,"children":2666},{},[2667],{"type":30,"value":2668},"Backend logic uses SQL to query the Data Warehouse.",{"type":21,"tag":268,"props":2670,"children":2671},{},[2672],{"type":30,"value":2673},"Data Warehouse responds with the relevant Data Frame.",{"type":21,"tag":268,"props":2675,"children":2676},{},[2677],{"type":30,"value":2678},"Backend performs feature processing to return the data in the requested format.",{"type":21,"tag":44,"props":2680,"children":2682},{"id":2681},"user-story",[2683],{"type":30,"value":2684},"User Story",{"type":21,"tag":26,"props":2686,"children":2687},{},[2688],{"type":30,"value":2689},"Imagine a Digital Marketing Specialist named Zhanibek wants to see the sessions by month. He opens the Telegram chat and asks, \"show me sessions by month\".",{"type":21,"tag":26,"props":2691,"children":2692},{},[2693],{"type":21,"tag":516,"props":2694,"children":2697},{"alt":2695,"src":2696},"Chat Bot interaction","/img/img40.png",[],{"type":21,"tag":26,"props":2699,"children":2700},{},[2701],{"type":30,"value":2702},"The response happens in milliseconds. Zhanibek is very excited; he just retrieved the necessary data without even talking to the Data Team. Now, Zhanibek wants to use this data in a spreadsheet, so he asks, \"show me engaged sessions by month in excel\".",{"type":21,"tag":26,"props":2704,"children":2705},{},[2706],{"type":21,"tag":516,"props":2707,"children":2710},{"alt":2708,"src":2709},"Chat Bot export to excel","/img/img41.png",[],{"type":21,"tag":26,"props":2712,"children":2713},{},[2714],{"type":30,"value":2715},"Zhanibek is amazed by the export to Excel feature, but he realizes that a spreadsheet is not convenient enough to comprehend the trends, so he requests, \"show me the chart of activeUsers\".",{"type":21,"tag":26,"props":2717,"children":2718},{},[2719],{"type":21,"tag":516,"props":2720,"children":2723},{"alt":2721,"src":2722},"Chat Bot outputs chart","/img/img42.png",[],{"type":21,"tag":26,"props":2725,"children":2726},{},[2727],{"type":30,"value":2728},"The number of features depends on your actual goal. If needed you can also add a forecasting feature. However, my goal is to resolve ad-hoc requests so my Data Team can focus on product metrics instead of calling the Google Analytics API every time someone needs the current sessions. So I focused mainly on three features - Querying Database, Export to Excel and Matplotlib Charts.",{"type":21,"tag":44,"props":2730,"children":2732},{"id":2731},"concerns",[2733],{"type":30,"value":2734},"Concerns",{"type":21,"tag":26,"props":2736,"children":2737},{},[2738],{"type":30,"value":2739},"It's surprising that tech companies often resist implementing AI. The CTO may warn that the lack of control over NLP may lead to critical DDL statements or even SQL injections. The Head of Cyber Security might warn that calling third-party APIs could leak classified information.",{"type":21,"tag":26,"props":2741,"children":2742},{},[2743],{"type":30,"value":2744},"These are concerns you must address and manage. For example, set read-only credentials for the user account used under backend logic, so no DELETE operations are permitted.",{"type":21,"tag":26,"props":2746,"children":2747},{},[2748],{"type":30,"value":2749},"The NLP model doesn't need to see the actual data, although it works better if a data sample is provided. You can synthesize the data sample so the NLP doesn't access classified data.",{"type":21,"tag":26,"props":2751,"children":2752},{},[2753,2760],{"type":21,"tag":1957,"props":2754,"children":2757},{"href":2755,"rel":2756},"https://github.com/ydataai/ydata-synthetic",[1961],[2758],{"type":30,"value":2759},"YdataAI",{"type":30,"value":2761}," is highly recommended service to synthsize data samples.",{"type":21,"tag":44,"props":2763,"children":2764},{"id":1217},[2765],{"type":30,"value":1220},{"type":21,"tag":26,"props":2767,"children":2768},{},[2769],{"type":30,"value":2770},"Empowering business users with self-service analytics through AI can significantly reduce the burden on your Data Team. By addressing potential security concerns and providing robust tools, you can enable your business to become more data-savvy and self-reliant. This approach not only enhances efficiency but also ensures that your data professionals can focus on delivering Product Metrics.",{"type":21,"tag":26,"props":2772,"children":2773},{},[2774],{"type":21,"tag":1957,"props":2775,"children":2778},{"href":2776,"rel":2777},"https://github.com/AkzhanBerdi/telechat",[1961],[2779],{"type":30,"value":2780},"GitHub Repo",{"type":21,"tag":26,"props":2782,"children":2783},{},[2784,2785,2788],{"type":30,"value":2215},{"type":21,"tag":2217,"props":2786,"children":2787},{},[],{"type":30,"value":2221},{"title":8,"searchDepth":596,"depth":596,"links":2790},[2791,2792,2793,2794,2795],{"id":2609,"depth":596,"text":2612},{"id":2629,"depth":596,"text":2632},{"id":2681,"depth":596,"text":2684},{"id":2731,"depth":596,"text":2734},{"id":1217,"depth":596,"text":1220},"content:posts:ai-powered-assistant-for-self-service-analytics.md","posts/ai-powered-assistant-for-self-service-analytics.md","posts/ai-powered-assistant-for-self-service-analytics",{"_path":2800,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":2801,"description":2802,"date":2803,"draft":7,"tags":2804,"thumbnail":2807,"alt_description":2808,"slug":2809,"body":2810,"_type":604,"_id":3347,"_source":606,"_file":3348,"_stem":3349,"_extension":609},"/posts/how-to-build-data-warehouse-in-activity-schema-with-clickhouse","Building Data Warehouse in Activity Schema with ClickHouse","How to Create a Data Warehouse in Activity-Schema with OLAP database ClickHouse on your local computer","2024-04-07T00:00:00.000Z",[2805,2261,2806],"coding","data-piplines","/img/how_to_build_data_warehouse_in_activity_schema_with_clickhouse.png","Building Data Ware House using ClickHouse","how-to-build-data-warehouse-in-activity-schema-with-clickhouse",{"type":18,"children":2811,"toc":3337},[2812,2824,2829,2837,2843,2848,2853,2858,2863,2869,2874,2882,2887,2892,2897,2903,2908,2916,2930,2935,2941,2955,2964,2969,2977,2982,2991,2999,3004,3013,3021,3026,3035,3043,3048,3057,3065,3070,3098,3103,3112,3120,3125,3131,3136,3145,3150,3159,3164,3173,3182,3187,3195,3200,3206,3211,3220,3225,3233,3239,3248,3253,3262,3270,3275,3284,3292,3297,3305,3309,3314,3319,3324,3329],{"type":21,"tag":22,"props":2813,"children":2814},{},[2815,2820],{"type":21,"tag":26,"props":2816,"children":2817},{},[2818],{"type":30,"value":2819},"\"It doesn't matter who you hire first, whether it's a Data Scientist, Data Analyst, or ML Engineer. The first hire eventually would do Data Engineering job.\"",{"type":21,"tag":26,"props":2821,"children":2822},{},[2823],{"type":30,"value":2601},{"type":21,"tag":26,"props":2825,"children":2826},{},[2827],{"type":30,"value":2828},"I learned about Activity-Schema when applied to a Data Engineering role at the company that do Data Analysis as a Service and they work with a lot of data, so the technical task was to implement an Activity-Schema and build a Data Pipeline. This article would introduce Activity-Schema for Data Warehousing and how to build one using ClickHouse, so let's kick off.",{"type":21,"tag":26,"props":2830,"children":2831},{},[2832],{"type":21,"tag":516,"props":2833,"children":2836},{"alt":2834,"src":2835},"Data Pipeline Schema, modeling and querying data","/img/img25.png",[],{"type":21,"tag":44,"props":2838,"children":2840},{"id":2839},"understanding-data-warehouses-vs-traditional-databases",[2841],{"type":30,"value":2842},"Understanding Data Warehouses vs Traditional Databases",{"type":21,"tag":26,"props":2844,"children":2845},{},[2846],{"type":30,"value":2847},"When it comes to work with data, we've got two essential storage types: the Traditional Database, that keeps the day-to-day transactions and handles so called OLTP or On-Line Transactional Processing in smooth operation. For instance you open your bank APP to check your balance. You go through authentication first, so the APP understands your identity. Then, you click to the balance page, and for the given authentication ID, the APP retrieves your balance exactly, rather than someone else's.",{"type":21,"tag":26,"props":2849,"children":2850},{},[2851],{"type":30,"value":2852},"The OLTP Database is essential for any web application to perform CRUD operations as fast as possible. On the other hand imagine that you are a new recruit as a BI Analyst at random company. You ask what Data Storages we have, and your manager answers - \"oh yeah, there is one called Oracle\", and this is the moment where you should realize that you screwed-up.",{"type":21,"tag":26,"props":2854,"children":2855},{},[2856],{"type":30,"value":2857},"The reason you screwed up is that OLTP databases, as good as they are for keeping your product up and running, are not optimized for data analysis at all. Therefore, you're either forced to work in a non-optimized environment or you have to build one yourself.",{"type":21,"tag":26,"props":2859,"children":2860},{},[2861],{"type":30,"value":2862},"And the only good news is that you are reading exacty right article to build OLAP or On-Line Analytical Processing storage or so-called Data Warehouse that is optimized for Data Analysis, and for that purpose we would not use a traditional Star-Schema, but rather a relatively new concept called Activity-Schema.",{"type":21,"tag":44,"props":2864,"children":2866},{"id":2865},"why-not-to-use-a-star-schema",[2867],{"type":30,"value":2868},"Why not to use a Star Schema",{"type":21,"tag":26,"props":2870,"children":2871},{},[2872],{"type":30,"value":2873},"So what is the schema in the first place? When working with Databases you have a certain architecture of tables and how they are related to one another. And it's usually a matter of scale, but at the simplest example of Star-shema would look like this",{"type":21,"tag":26,"props":2875,"children":2876},{},[2877],{"type":21,"tag":516,"props":2878,"children":2881},{"alt":2879,"src":2880},"Star-Schema explained","/img/img26.png",[],{"type":21,"tag":26,"props":2883,"children":2884},{},[2885],{"type":30,"value":2886},"Fact table is a transactional table that per say accumulates order records, and Dimension tables are the one that feeds Fact tables with a nuances like the client's personal data, the delivery address, and litteraly anything else. It's get more complex when you start generating a Primary and Foreign keys that are used to link tables, and even the links themself has to be further defined as one-to-many or many-to-many relationships and considering that orders is just one out of infinite fact tables that business may operate, this schema is getting complex.",{"type":21,"tag":26,"props":2888,"children":2889},{},[2890],{"type":30,"value":2891},"The complexity of such architecture is called normalization. The more tables and relationships between them, the more normalized your data infrastructure becomes. And it works well in OLTP. But if you want to query a DataFrame with millions of rows and make further analysis on it, that would be a painful experience to identifing Primary and Foreign keys and writing a complex SQL statement with a lot of dependencies taking into account. And that's my hermano is not bueno.",{"type":21,"tag":26,"props":2893,"children":2894},{},[2895],{"type":30,"value":2896},"So, in order to optimize such normalized infrastructure for analytics, you have to reverse engineer its architecture and denormalize it. You don't have to be an IT architecture guy to do that, and that's where Activity-Schema comes into play.",{"type":21,"tag":44,"props":2898,"children":2900},{"id":2899},"so-what-is-activity-schema-and-why-to-use-it",[2901],{"type":30,"value":2902},"So what is Activity Schema and why to use it",{"type":21,"tag":26,"props":2904,"children":2905},{},[2906],{"type":30,"value":2907},"Activity-Schema in contrast to the Star-Schema doesn't have Primary-Foreign key realationships and any other dependencies for a one good reason. It has only one single table complying with \"One-Big-Table\" schema, but instead of dimension tables it has the nested key/value column.",{"type":21,"tag":26,"props":2909,"children":2910},{},[2911],{"type":21,"tag":516,"props":2912,"children":2915},{"alt":2913,"src":2914},"Activity-Schema explained","/img/img27.png",[],{"type":21,"tag":26,"props":2917,"children":2918},{},[2919,2921,2928],{"type":30,"value":2920},"If you read the official ",{"type":21,"tag":1957,"props":2922,"children":2925},{"href":2923,"rel":2924},"https://www.activityschema.com/",[1961],[2926],{"type":30,"value":2927},"documentation",{"type":30,"value":2929},", then this column is called \"Feature\", However, for some reason, I call it 'Attributes'. It doesn't really matter how you call this column. In the end, it's not the name but the data type that is crucial. As you may see, the value in this column may be reminiscent of JSON, but in fact if you are going to follow this guide we will use a ClickHouse's map() Data Type instead of something like jsonb in Postgres for example.",{"type":21,"tag":26,"props":2931,"children":2932},{},[2933],{"type":30,"value":2934},"This approach is one of the most denormalized since we have no dependencies between the tables, in fact the only join that is possible with Activity-Schema is the self-Join. This table is going to be enormous, and for that reason you should consider a column-oriented database that is optimized to carry-out big amount of data like DuckDB or ClickHouse.",{"type":21,"tag":44,"props":2936,"children":2938},{"id":2937},"set-up-your-clickhouse",[2939],{"type":30,"value":2940},"Set-up your ClickHouse",{"type":21,"tag":26,"props":2942,"children":2943},{},[2944,2946,2953],{"type":30,"value":2945},"For the sake of simplicity, we will use our local machine in this tutorial, whereas in the real world the Data Warehouses should be deployed somewhere in the cloud, but with this in mind let's assume that our laptop is the powerful on-premise server, and just for this purpose it's happened to be that my laptop is running on Linux. If you are a Windows guy, then my previous article on ",{"type":21,"tag":1957,"props":2947,"children":2950},{"href":2948,"rel":2949},"https://blog.baddogdata.com/how-to-learn-coding",[1961],[2951],{"type":30,"value":2952},"how to coding",{"type":30,"value":2954},", where I've include the  installation of WSL2 or Windows Subsystem for Linux would help you to keep up with this guide.",{"type":21,"tag":490,"props":2956,"children":2959},{"className":2957,"code":2958,"language":1414,"meta":8},[1412],"curl https://clickhouse.com/ | sh\n",[2960],{"type":21,"tag":495,"props":2961,"children":2962},{"__ignoreMap":8},[2963],{"type":30,"value":2958},{"type":21,"tag":26,"props":2965,"children":2966},{},[2967],{"type":30,"value":2968},"With the power of the Terminal, hit the above bash command, and that should download a ClickHouse client for you. It will take a while to download, and then it should display \"Successfully Downloaded\".",{"type":21,"tag":26,"props":2970,"children":2971},{},[2972],{"type":21,"tag":516,"props":2973,"children":2976},{"alt":2974,"src":2975},"Consol-Log after installing ClickHouse","/img/img28.png",[],{"type":21,"tag":26,"props":2978,"children":2979},{},[2980],{"type":30,"value":2981},"Your next move after installation is to run the ClickHouse Server with the following command.",{"type":21,"tag":490,"props":2983,"children":2986},{"className":2984,"code":2985,"language":1414,"meta":8},[1412],"./clickhouse server\n",[2987],{"type":21,"tag":495,"props":2988,"children":2989},{"__ignoreMap":8},[2990],{"type":30,"value":2985},{"type":21,"tag":26,"props":2992,"children":2993},{},[2994],{"type":21,"tag":516,"props":2995,"children":2998},{"alt":2996,"src":2997},"Consol-Log after running the server","/img/img29.png",[],{"type":21,"tag":26,"props":3000,"children":3001},{},[3002],{"type":30,"value":3003},"Then open a new terminal where you will run the ClickHouse Client",{"type":21,"tag":490,"props":3005,"children":3008},{"className":3006,"code":3007,"language":1414,"meta":8},[1412],"./clickhouse client\n",[3009],{"type":21,"tag":495,"props":3010,"children":3011},{"__ignoreMap":8},[3012],{"type":30,"value":3007},{"type":21,"tag":26,"props":3014,"children":3015},{},[3016],{"type":21,"tag":516,"props":3017,"children":3020},{"alt":3018,"src":3019},"Consol-Log after running the client","/img/img30.png",[],{"type":21,"tag":26,"props":3022,"children":3023},{},[3024],{"type":30,"value":3025},"I'm not even trolling, the ClickHouse command lines has a smile face as the cursor, isn't  it fun? The next ClickHouse commands should be run on the Client side to communicate with your Server. Let's create a database and name it 'activity'.",{"type":21,"tag":490,"props":3027,"children":3030},{"className":3028,"code":3029,"language":1143,"meta":8},[1145],"CREATE DATABASE activity\n",[3031],{"type":21,"tag":495,"props":3032,"children":3033},{"__ignoreMap":8},[3034],{"type":30,"value":3029},{"type":21,"tag":26,"props":3036,"children":3037},{},[3038],{"type":21,"tag":516,"props":3039,"children":3042},{"alt":3040,"src":3041},"Consol-Log after creating a database","/img/img31.png",[],{"type":21,"tag":26,"props":3044,"children":3045},{},[3046],{"type":30,"value":3047},"This indicates a successful database creation. Now let's create the stream table and define each column and it's Data Types. And as mentioned before, we going to use Map Data Type for attributes column.",{"type":21,"tag":490,"props":3049,"children":3052},{"className":3050,"code":3051,"language":1143,"meta":8},[1145],"CREATE TABLE activity.stream (\n    timestamp DateTime,\n    activity_id UUID,\n    activity String,\n    entity String,\n    attributes Map(String, String)\n) ENGINE = MergeTree()\nORDER BY timestamp;\n",[3053],{"type":21,"tag":495,"props":3054,"children":3055},{"__ignoreMap":8},[3056],{"type":30,"value":3051},{"type":21,"tag":26,"props":3058,"children":3059},{},[3060],{"type":21,"tag":516,"props":3061,"children":3064},{"alt":3062,"src":3063},"Consol-Log after creating a table","/img/img32.png",[],{"type":21,"tag":26,"props":3066,"children":3067},{},[3068],{"type":30,"value":3069},"Here, we have defined the columns for the Activity-Schema",{"type":21,"tag":264,"props":3071,"children":3072},{},[3073,3078,3083,3088,3093],{"type":21,"tag":268,"props":3074,"children":3075},{},[3076],{"type":30,"value":3077},"timestamp - is the Date and Time of occurred activity;",{"type":21,"tag":268,"props":3079,"children":3080},{},[3081],{"type":30,"value":3082},"activity_id - is the unique identifier;",{"type":21,"tag":268,"props":3084,"children":3085},{},[3086],{"type":30,"value":3087},"activity - is the topic or nature of data;",{"type":21,"tag":268,"props":3089,"children":3090},{},[3091],{"type":30,"value":3092},"entity - is the subject of an activity",{"type":21,"tag":268,"props":3094,"children":3095},{},[3096],{"type":30,"value":3097},"attributes - is the set of key / value pairs that belogs to the entity",{"type":21,"tag":26,"props":3099,"children":3100},{},[3101],{"type":30,"value":3102},"Now let's take a look at our table with this line of code",{"type":21,"tag":490,"props":3104,"children":3107},{"className":3105,"code":3106,"language":1143,"meta":8},[1145],"DESCRIBE TABLE activity.stream\n",[3108],{"type":21,"tag":495,"props":3109,"children":3110},{"__ignoreMap":8},[3111],{"type":30,"value":3106},{"type":21,"tag":26,"props":3113,"children":3114},{},[3115],{"type":21,"tag":516,"props":3116,"children":3119},{"alt":3117,"src":3118},"Consol-Log describing the table","/img/img33.png",[],{"type":21,"tag":26,"props":3121,"children":3122},{},[3123],{"type":30,"value":3124},"The Activity-Schema is now ready to ingest some data.",{"type":21,"tag":44,"props":3126,"children":3128},{"id":3127},"extracting-data-from-google-analytics",[3129],{"type":30,"value":3130},"Extracting Data From Google Analytics",{"type":21,"tag":26,"props":3132,"children":3133},{},[3134],{"type":30,"value":3135},"Now that we have created our stream table in Activity-Schema, let's ingest some data. I'd use Google Analytics API to ingest the events that are coming to my blog. In order to do that I open a new Terminal to create a virtual environment like so",{"type":21,"tag":490,"props":3137,"children":3140},{"className":3138,"code":3139,"language":1414,"meta":8},[1412],"python3 -m venv .venv\n",[3141],{"type":21,"tag":495,"props":3142,"children":3143},{"__ignoreMap":8},[3144],{"type":30,"value":3139},{"type":21,"tag":26,"props":3146,"children":3147},{},[3148],{"type":30,"value":3149},"then activating it using this ...",{"type":21,"tag":490,"props":3151,"children":3154},{"className":3152,"code":3153,"language":1414,"meta":8},[1412],"source .venv/bin/activate\n",[3155],{"type":21,"tag":495,"props":3156,"children":3157},{"__ignoreMap":8},[3158],{"type":30,"value":3153},{"type":21,"tag":26,"props":3160,"children":3161},{},[3162],{"type":30,"value":3163},"and finally installing dependencies for API requests, and ClickHouse connections.",{"type":21,"tag":490,"props":3165,"children":3168},{"className":3166,"code":3167,"language":1414,"meta":8},[1412],"pip install google-analytics-data clickhouse-connect google-auth-oauthlib\n",[3169],{"type":21,"tag":495,"props":3170,"children":3171},{"__ignoreMap":8},[3172],{"type":30,"value":3167},{"type":21,"tag":490,"props":3174,"children":3177},{"className":3175,"code":3176,"language":895,"meta":8},[897],"from google.analytics.data_v1beta import BetaAnalyticsDataClient\nfrom google.oauth2 import service_account\nfrom google.analytics.data_v1beta.types import (\n    DateRange,\n    Dimension,\n    Metric,\n    RunReportRequest,\n)\nimport clickhouse_connect\nimport pandas as pd\nfrom datetime import date, timedelta\nfrom typing import Dict, Any\n\n# Defining key to access our GA4 project\nKEY = 'key.json'\n\n# Defining main function to access GA4 data\ndef extract_data(property_id='4#12$6&0%', key_file_path=KEY):\n    # Defining creadentials with our KEY\n    credentials = service_account.Credentials.from_service_account_file(\n        key_file_path,\n        scopes=[\"https://www.googleapis.com/auth/analytics.readonly\"]\n    )\n\n    # Creating client instance \n    client = BetaAnalyticsDataClient(credentials=credentials)\n\n    # Using RunReportRequest to define what we want from Google\n    request = RunReportRequest(\n        property=f\"properties/{property_id}\",\n        dimensions=[\n            Dimension(name='date'),\n            Dimension(name='sessionDefaultChannelGroup'),\n            Dimension(name='country'),\n        ],\n        metrics=[\n            Metric(name=\"sessions\"),\n            Metric(name=\"activeUsers\"),\n            Metric(name=\"engagedSessions\"),\n            Metric(name=\"bounceRate\"),\n        ],\n        date_ranges=[DateRange(start_date='2024-01-01', end_date=f'{date.today() + timedelta(days = -1)}')],\n        limit=row_limit\n    )\n\n    # Main code that runs report and saves the values into the rows[]\n    try:\n        response = client.run_report(request)\n\n        rows = []\n        for row in response.rows:\n            rows.append([\n                row.dimension_values[0].value,\n                row.dimension_values[1].value,\n                row.metric_values[0].value,\n                row.metric_values[1].value,\n                row.metric_values[2].value,\n            ])\n\n        columns = ['date', 'channel', 'sessions', 'activeUsers', 'engagedSessions']\n        df = pd.DataFrame(rows, columns=columns)\n        df = df.sort_values(by=['date'])\n        df['date'] = pd.to_datetime(df['date'])\n\n        # Printing our metrics in the console before returning it\n        print(df)\n        return df\n\n    except Exception as e:\n        print(f\"Error: {e}\")\n\n# Calling our function, should be removed in the next Transform step\nextract_data()\n",[3178],{"type":21,"tag":495,"props":3179,"children":3180},{"__ignoreMap":8},[3181],{"type":30,"value":3176},{"type":21,"tag":26,"props":3183,"children":3184},{},[3185],{"type":30,"value":3186},"By running this script we return the DataFrame of our traffic like so",{"type":21,"tag":26,"props":3188,"children":3189},{},[3190],{"type":21,"tag":516,"props":3191,"children":3194},{"alt":3192,"src":3193},"VSCode terminal after extracting data","/img/img34.png",[],{"type":21,"tag":26,"props":3196,"children":3197},{},[3198],{"type":30,"value":3199},"Be aware that you won't be able to access my blog's data, since you don't have credentials to access it, so you might want to refactor this script to access any other data available to you.",{"type":21,"tag":44,"props":3201,"children":3203},{"id":3202},"transforming-data-into-activity-schema",[3204],{"type":30,"value":3205},"Transforming Data into Activity Schema",{"type":21,"tag":26,"props":3207,"children":3208},{},[3209],{"type":30,"value":3210},"As you may see, the returned DataFrame consist of 7 columns and 298 rows, it's not exactly the Big Data, but that's a good sample for us to play around with. Now our task is to transform this Data into a format that would fit into Activity-Schema that has only 5 columns. Let's do it with pandas",{"type":21,"tag":490,"props":3212,"children":3215},{"className":3213,"code":3214,"language":895,"meta":8},[897],"\"\"\" \n    Remove the last line of code from previous script\n    Which is calling extract_data() function \n    Replace it with this block of code\n    To eliminate log prints\n\"\"\"\n\nimport pandas as pd\nfrom typing import Dict, Any\n\n# This function help to convert any incoming Data Type into the String\ndef convert_data_to_string(dictionary: Dict[Any, Any]) -> Dict[str, str]:\n    return {k: str(v) for k, v in dictionary.items()}\n\n# Main transform function\ndef transform_data(df=extract_data()) -> pd.DataFrame:\n    df['activity'] = 'INCOMING_TRAFFIC'    \n    df['timestamp'] = df['date']\n    df['entity'] = df['country']    \n    df['attributes'] = df[\n        ['channel', 'sessions', 'activeUsers', 'engagedSessions', 'bounceRate']].to_dict(orient='records')\n    df = df[['timestamp', 'activity', 'entity', 'attributes']]\n    df.loc[:, 'attributes'] = df['attributes'].apply(lambda x: convert_data_to_string(x))\n    print(df)\n    return df\n\ntransform_data() # This line also should be removed in the Load step\n",[3216],{"type":21,"tag":495,"props":3217,"children":3218},{"__ignoreMap":8},[3219],{"type":30,"value":3214},{"type":21,"tag":26,"props":3221,"children":3222},{},[3223],{"type":30,"value":3224},"Running above script creates a new column called activity, and by default it populates with \"INCOMING_TRAFFIC\" value, as this batch of data is about incoming traffic, some other columns like date and country has been renamed to comply the ClickHouse table we created earlier. And finally we created attributes column, that takes the rest of the data like channel source, sessions numbers etc ...",{"type":21,"tag":26,"props":3226,"children":3227},{},[3228],{"type":21,"tag":516,"props":3229,"children":3232},{"alt":3230,"src":3231},"VSCode Terminal after transforming the data","/img/img35.png",[],{"type":21,"tag":44,"props":3234,"children":3236},{"id":3235},"loading-data-into-clickhouse",[3237],{"type":30,"value":3238},"Loading Data into ClickHouse",{"type":21,"tag":490,"props":3240,"children":3243},{"className":3241,"code":3242,"language":895,"meta":8},[897],"\"\"\"\n    Same as before \n    Remove transform_data() lines\n    With the following script to avoid console noise\n\"\"\"\n\ndef load_data(df=transform_data()):\n\n    # initiate client \n    client = clickhouse_connect.get_client(host='localhost', username='default', password='')\n\n    # Execute insertion command\n    client.insert('activity.stream', df, column_names=['timestamp', 'activity', 'entity', 'attributes'])\n\n    # Close ClickHouse connection\n    client.close()\n    print(\"DataFrame has been uploaded into Activity-Schema\")\n\nload_data()\n",[3244],{"type":21,"tag":495,"props":3245,"children":3246},{"__ignoreMap":8},[3247],{"type":30,"value":3242},{"type":21,"tag":26,"props":3249,"children":3250},{},[3251],{"type":30,"value":3252},"To check if the Data has been loaded, let's get back to the ClickHouse Client Terminal and count the number of rows by running the following SQL script.",{"type":21,"tag":490,"props":3254,"children":3257},{"className":3255,"code":3256,"language":1143,"meta":8},[1145],"SELECT count() FROM activity.stream\n",[3258],{"type":21,"tag":495,"props":3259,"children":3260},{"__ignoreMap":8},[3261],{"type":30,"value":3256},{"type":21,"tag":26,"props":3263,"children":3264},{},[3265],{"type":21,"tag":516,"props":3266,"children":3269},{"alt":3267,"src":3268},"Consol-Log after counting rows in the table","/img/img36.png",[],{"type":21,"tag":26,"props":3271,"children":3272},{},[3273],{"type":30,"value":3274},"We have ingested Google Analytics Data into Activity-Schema, now let's see how to access attributes data, imagine that we are interested in total sessions by country.",{"type":21,"tag":490,"props":3276,"children":3279},{"className":3277,"code":3278,"language":1143,"meta":8},[1145],"SELECT\n    entity,\n    SUM(CAST(attributes['sessions'], 'UInt64)) as sessions_total\nFROM activity.stream\nGROUP BY entity\nORDER BY sessions_total DESC\n",[3280],{"type":21,"tag":495,"props":3281,"children":3282},{"__ignoreMap":8},[3283],{"type":30,"value":3278},{"type":21,"tag":26,"props":3285,"children":3286},{},[3287],{"type":21,"tag":516,"props":3288,"children":3291},{"alt":3289,"src":3290},"Consol-Log after agregating data in the table","/img/img37.png",[],{"type":21,"tag":26,"props":3293,"children":3294},{},[3295],{"type":30,"value":3296},"And there you have it! The Data Warehouse in Activity-Schema ready to store millions rows of data under different activity that can be self-joined via timestamp, entity or attributes. Here, I have built a simple report on sessions by channels and countries.",{"type":21,"tag":26,"props":3298,"children":3299},{},[3300],{"type":21,"tag":516,"props":3301,"children":3304},{"alt":3302,"src":3303},"The dashboard of sessions by channel and countries","/img/img38.png",[],{"type":21,"tag":44,"props":3306,"children":3307},{"id":1217},[3308],{"type":30,"value":1220},{"type":21,"tag":26,"props":3310,"children":3311},{},[3312],{"type":30,"value":3313},"In this article we have discussed OLTP vs OLAP databases, also talked about the differences between the Star-Schema and Activity-Schema. Further we have setup the ClickHouse environment and created activity.stream table. Next, we extracted Google Analytics data, transformed it into the Activity-Schema format, and finally uploaded it to our Data Warehouse named activity.stream.",{"type":21,"tag":26,"props":3315,"children":3316},{},[3317],{"type":30,"value":3318},"If you want to dive into Data Warehousing, then you should learn MPP or Massive Parallel Processing. MPP is the technical approach to set your OLAP database to ingest data from different shards, which is the separate units of clustered CPU working in parallel and then sharing the processed data with one another. Pretty cool stuff!",{"type":21,"tag":26,"props":3320,"children":3321},{},[3322],{"type":30,"value":3323},"Another point to keep in mind is that in this article, we have performed ETL mannually. And a crucial skill in Data Analytics and Engineering is to be able to automate this workflow. For that purpose the Data Orchestration tools like Apache Airflow is very useful. We will cover it in the next articles, very soon.",{"type":21,"tag":26,"props":3325,"children":3326},{},[3327],{"type":30,"value":3328},"I hope you have enjoyed reading this article !",{"type":21,"tag":26,"props":3330,"children":3331},{},[3332,3333,3336],{"type":30,"value":2215},{"type":21,"tag":2217,"props":3334,"children":3335},{},[],{"type":30,"value":2221},{"title":8,"searchDepth":596,"depth":596,"links":3338},[3339,3340,3341,3342,3343,3344,3345,3346],{"id":2839,"depth":596,"text":2842},{"id":2865,"depth":596,"text":2868},{"id":2899,"depth":596,"text":2902},{"id":2937,"depth":596,"text":2940},{"id":3127,"depth":596,"text":3130},{"id":3202,"depth":596,"text":3205},{"id":3235,"depth":596,"text":3238},{"id":1217,"depth":596,"text":1220},"content:posts:how-to-build-data-warehouse-in-activity-schema-with-clickhouse.md","posts/how-to-build-data-warehouse-in-activity-schema-with-clickhouse.md","posts/how-to-build-data-warehouse-in-activity-schema-with-clickhouse",{"_path":3351,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":3352,"description":3353,"date":3354,"draft":7,"tags":3355,"thumbnail":3357,"alt_text":3358,"slug":3359,"body":3360,"_type":604,"_id":5557,"_source":606,"_file":5558,"_stem":5559,"_extension":609},"/posts/product-analyst-technical-task","Product Analyst Technical Task Explained","I've done a technical task when applying for a product analyst job. Let's break it down and learn from it together.","2024-02-24T00:00:00.000Z",[3356],"product_analytics","/img/product_analyst_technical_task.png","Technical Task Explained","product-analyst-technical-task",{"type":18,"children":3361,"toc":5535},[3362,3367,3372,3378,3386,3400,3414,3420,3428,3468,3473,3478,3506,3511,3517,3525,3544,3549,3555,3563,3568,3573,3578,3584,3592,3597,3602,3620,3625,3631,3639,3644,3650,3655,3661,3666,3672,3677,3683,3688,3694,3698,3703,3715,3726,3737,3748,3760,3768,3771,3777,3784,3817,3822,3827,3833,3838,3843,3848,3853,3859,3875,3880,3885,3890,3896,3924,3930,3941,3947,3970,3976,3984,4253,4259,4264,4854,4864,4870,4884,4893,4911,4920,4928,4933,4939,4944,4983,5508,5513,5527],{"type":21,"tag":26,"props":3363,"children":3364},{},[3365],{"type":30,"value":3366},"This article will break down the technical task for Product Analyst position at Kaspi.kz. Generally speaking this company is the bank owned Superr App originated from Central Asia which provides Peer to Peer Instant Money Transactions, Marketplace, Flight Tickets Booking,  Classified Advertising, and even wide range of Government Services from Municipal Fines and Fees to Incorporating a Company.",{"type":21,"tag":26,"props":3368,"children":3369},{},[3370],{"type":30,"value":3371},"According to the job description Product Analyst should assist to Product Manager in defining metrics by analyzing the user behaviour. For that purpose, not only the list of hard-skills such as Statistics, SQL, Python are required, but also the basic principles of Product Management is crucial. The latest is actually the only thing that distinguish the Product Analyst from the Data Scientist or Data Analyst and the rest of the gang. So let's see what it's all about.",{"type":21,"tag":44,"props":3373,"children":3375},{"id":3374},"task-1",[3376],{"type":30,"value":3377},"Task 1",{"type":21,"tag":26,"props":3379,"children":3380},{},[3381],{"type":21,"tag":272,"props":3382,"children":3383},{},[3384],{"type":30,"value":3385},"Choose any popular web or native application and imagine that you are launching your own APP with the same business model. Now build a dashboard for keeping track of overall APP performance.",{"type":21,"tag":26,"props":3387,"children":3388},{},[3389,3391,3398],{"type":30,"value":3390},"This task is all about your strategic vision! Whatever app you'd choose it tells a lot about your interests, background and experience. Personally I'd choose the B2C and C2C marketplace sort of like Alibaba's ",{"type":21,"tag":1957,"props":3392,"children":3395},{"href":3393,"rel":3394},"https://taobao.cn",[1961],[3396],{"type":30,"value":3397},"taobao",{"type":30,"value":3399}," example.",{"type":21,"tag":26,"props":3401,"children":3402},{},[3403,3405,3412],{"type":30,"value":3404},"As the internaional student in China I used that product a lot. Also I currently work in marketplace such as the taobao, so that would be the perfect one for me. I'll use ",{"type":21,"tag":1957,"props":3406,"children":3409},{"href":3407,"rel":3408},"https://excalidraw.com",[1961],[3410],{"type":30,"value":3411},"excalidraw",{"type":30,"value":3413}," to draw a simple dashboard.",{"type":21,"tag":673,"props":3415,"children":3417},{"id":3416},"_11-the-buyers-overview",[3418],{"type":30,"value":3419},"1.1 The Buyers Overview",{"type":21,"tag":26,"props":3421,"children":3422},{},[3423],{"type":21,"tag":516,"props":3424,"children":3427},{"alt":3425,"src":3426},"The Buyers Overview Dashboard","/img/img20.png",[],{"type":21,"tag":26,"props":3429,"children":3430},{},[3431,3433,3438,3440,3445,3447,3452,3454,3459,3461,3466],{"type":30,"value":3432},"The buyers overview shows you overall descriptive business sense of your APP. It's the amount of total ",{"type":21,"tag":272,"props":3434,"children":3435},{},[3436],{"type":30,"value":3437},"Users",{"type":30,"value":3439},", the amount of ",{"type":21,"tag":272,"props":3441,"children":3442},{},[3443],{"type":30,"value":3444},"New Users",{"type":30,"value":3446}," who visited the App for the first time, as oppose to the ",{"type":21,"tag":272,"props":3448,"children":3449},{},[3450],{"type":30,"value":3451},"Retention",{"type":30,"value":3453},", that shows you the percentage of those who visiting the APP more than once. The ",{"type":21,"tag":272,"props":3455,"children":3456},{},[3457],{"type":30,"value":3458},"CSAT",{"type":30,"value":3460},", which stands for Customer Sattisfaction shows you the percentage of sattisfied users, and the ",{"type":21,"tag":272,"props":3462,"children":3463},{},[3464],{"type":30,"value":3465},"AOV",{"type":30,"value":3467}," or Average Order Value, meaning the average paycheck your users spend in your APP. That's the basic line for business performance evaluation. Doesn't matter what you do with your APP, this metrics will signal to you if you do it right or wrong way.",{"type":21,"tag":26,"props":3469,"children":3470},{},[3471],{"type":30,"value":3472},"A little side note. If you ever filled the form such as:",{"type":21,"tag":26,"props":3474,"children":3475},{},[3476],{"type":30,"value":3477},"How would you rate your sattisfaction level of our services?",{"type":21,"tag":1106,"props":3479,"children":3480},{},[3481,3486,3491,3496,3501],{"type":21,"tag":268,"props":3482,"children":3483},{},[3484],{"type":30,"value":3485},"Very Unsattisfied",{"type":21,"tag":268,"props":3487,"children":3488},{},[3489],{"type":30,"value":3490},"Somewhat Unsattisfied",{"type":21,"tag":268,"props":3492,"children":3493},{},[3494],{"type":30,"value":3495},"Neutral",{"type":21,"tag":268,"props":3497,"children":3498},{},[3499],{"type":30,"value":3500},"Somewhat sattisfied",{"type":21,"tag":268,"props":3502,"children":3503},{},[3504],{"type":30,"value":3505},"Very Sattisfied",{"type":21,"tag":26,"props":3507,"children":3508},{},[3509],{"type":30,"value":3510},"Then you did contribute to the CSAT metric, and if your answer lies between 1 and 3, then you votted being not sattisfied, otherwise votting for 4 and 5 would count you as the sattisfied user.",{"type":21,"tag":673,"props":3512,"children":3514},{"id":3513},"_12-the-funnel-overview",[3515],{"type":30,"value":3516},"1.2 The Funnel Overview",{"type":21,"tag":26,"props":3518,"children":3519},{},[3520],{"type":21,"tag":516,"props":3521,"children":3524},{"alt":3522,"src":3523},"The Funnel Overview Dashboard","/img/img21.png",[],{"type":21,"tag":26,"props":3526,"children":3527},{},[3528,3530,3535,3537,3542],{"type":30,"value":3529},"The funnel overview would show you at what stage your users bounces out of the funnel. This is usually done with the web analytics tools such as ",{"type":21,"tag":272,"props":3531,"children":3532},{},[3533],{"type":30,"value":3534},"Google Analytics 4",{"type":30,"value":3536}," or ",{"type":21,"tag":272,"props":3538,"children":3539},{},[3540],{"type":30,"value":3541},"Amplitude",{"type":30,"value":3543},". The web analytic tools allows you to track your custom events like user viewing an Item Page. Then you compare how many Users logged-into the session and viewed items on the marketplace, the next event is probably tracking the percentage of those who added the item to the cart, and eventually track the percentage of users that actually bought something, and that's how you calculate the conversion rate between the events, and by default the conversion calculates from top to bottom of the events, but it may depend on the task you working on.",{"type":21,"tag":26,"props":3545,"children":3546},{},[3547],{"type":30,"value":3548},"Since the numbers are getting smaller on each stage, assuming that users generally following a certain path from main page to the purchase, which is by the way not always true, cause sometimes users following a direct link breaking the funnel logic, but in general the event where you loose the most percentage of your traffic is the event that you should be focusing your diagnosis analysis on. It's very common for a marketplace to have a conversion rate from session to the purchase to be as small as 1% - 3%. And that's also very basic charts for a Product Management, so being fammiliar with the funnels helps a tone.",{"type":21,"tag":673,"props":3550,"children":3552},{"id":3551},"_13-items-and-sellers-selection",[3553],{"type":30,"value":3554},"1.3 Items and Sellers Selection",{"type":21,"tag":26,"props":3556,"children":3557},{},[3558],{"type":21,"tag":516,"props":3559,"children":3562},{"alt":3560,"src":3561},"The Items Concentration Dashboard","/img/img23.png",[],{"type":21,"tag":26,"props":3564,"children":3565},{},[3566],{"type":30,"value":3567},"Having a numbers on a dashboard is good, but what do you do when they are not meeting your targets. Then it's a good idea to descend down the level of your business concept and come-up with customer story. In case of marketplace, your customer story might sound like:",{"type":21,"tag":26,"props":3569,"children":3570},{},[3571],{"type":30,"value":3572},"\"I want to buy a product, so I open this APP to look for it. Is there what I'm looking for? Is there many alternatives to what I'm looking for?\"",{"type":21,"tag":26,"props":3574,"children":3575},{},[3576],{"type":30,"value":3577},"To answer this abstract and general user question we will evaluate our supply substances which are the items and sellers or so-called SKU and Merchants by generated revenue. The Parreto principle is a good reference for a supply-side concentration analysis, either top SKU or top Merchants that generates 80% of the revenue should be at least 20% from total SKU or Merchant amount. If Pareto principle is not sattisfied, then neither your client story. That's an exapmle of diagnosis analytics.",{"type":21,"tag":673,"props":3579,"children":3581},{"id":3580},"_14-customer-segmentation-overview",[3582],{"type":30,"value":3583},"1.4 Customer Segmentation Overview",{"type":21,"tag":26,"props":3585,"children":3586},{},[3587],{"type":21,"tag":516,"props":3588,"children":3591},{"alt":3589,"src":3590},"The RFM Analysis Dashboard","/img/img22.png",[],{"type":21,"tag":26,"props":3593,"children":3594},{},[3595],{"type":30,"value":3596},"This is where the analytics gets more advanced, cause that takes you to the prescriptive analysis. When having a big customer base, it's crucial to have a different or rather personal approach for the marketing touch. Also you can be very prejudice and provide different service level, depending on how important a certain customer to your revenue.",{"type":21,"tag":26,"props":3598,"children":3599},{},[3600],{"type":30,"value":3601},"The RFM analysis ranks your customer base over three different attribute such as Recency, Frequency and Monetary, basically you take your total user amount and divide them into 10 groups by each attribute:",{"type":21,"tag":1106,"props":3603,"children":3604},{},[3605,3610,3615],{"type":21,"tag":268,"props":3606,"children":3607},{},[3608],{"type":30,"value":3609},"How recent they been active using your APP",{"type":21,"tag":268,"props":3611,"children":3612},{},[3613],{"type":30,"value":3614},"How frequently they been active using your APP",{"type":21,"tag":268,"props":3616,"children":3617},{},[3618],{"type":30,"value":3619},"How much revenue they been generating using your APP",{"type":21,"tag":26,"props":3621,"children":3622},{},[3623],{"type":30,"value":3624},"You end-up with three different scoring models with ranks from 1 to 10 for each customer. You add the scores for each client and divide that by 3 getting the total RMF score, and that helps you to segment the customers. How exactly you should segment your customer base totally depends on your specific business model. But in general you need to find out customers that are sensitive to the price and attract them using promotions whereas big spender customers should be attracted with more privilege offers.",{"type":21,"tag":44,"props":3626,"children":3628},{"id":3627},"task-2",[3629],{"type":30,"value":3630},"Task 2",{"type":21,"tag":26,"props":3632,"children":3633},{},[3634],{"type":21,"tag":272,"props":3635,"children":3636},{},[3637],{"type":30,"value":3638},"Given the following ideas for increasing the conversion rate from service page to contact, help your Product Manager on Classified Advertisement Services to prioritize the experiments. Also pick at least one idea and provide the preliminary experiment design",{"type":21,"tag":26,"props":3640,"children":3641},{},[3642],{"type":30,"value":3643},"This task would evaluate your communication skills, including your attention to details and ability to ask right questions and clarifying the given information.",{"type":21,"tag":673,"props":3645,"children":3647},{"id":3646},"idea-a",[3648],{"type":30,"value":3649},"Idea A",{"type":21,"tag":26,"props":3651,"children":3652},{},[3653],{"type":30,"value":3654},"The rating and review blocks. Making rating and review blocks available on the listing page and service card may increase the conversion rate.",{"type":21,"tag":673,"props":3656,"children":3658},{"id":3657},"idea-b",[3659],{"type":30,"value":3660},"Idea B",{"type":21,"tag":26,"props":3662,"children":3663},{},[3664],{"type":30,"value":3665},"The subscription feature. Giving the users an option to subscribe to any service and being notified when new one being added to the platform would increase conversion rate.",{"type":21,"tag":673,"props":3667,"children":3669},{"id":3668},"idea-c",[3670],{"type":30,"value":3671},"Idea C",{"type":21,"tag":26,"props":3673,"children":3674},{},[3675],{"type":30,"value":3676},"The service booking feature. Giving the users an option to directly book the service on the APP would increase the conversion rate from service page to contact.",{"type":21,"tag":673,"props":3678,"children":3680},{"id":3679},"idea-d",[3681],{"type":30,"value":3682},"Idea D",{"type":21,"tag":26,"props":3684,"children":3685},{},[3686],{"type":30,"value":3687},"The working hours block. Adding the working hours to the service card would increase the coversion rate.",{"type":21,"tag":44,"props":3689,"children":3691},{"id":3690},"_21-answering-via-e-mail-to-the-product-manager",[3692],{"type":30,"value":3693},"2.1 Answering via E-mail to the Product Manager",{"type":21,"tag":3695,"props":3696,"children":3697},"hr",{},[],{"type":21,"tag":26,"props":3699,"children":3700},{},[3701],{"type":30,"value":3702},"Dear John Doe,",{"type":21,"tag":26,"props":3704,"children":3705},{},[3706,3708,3713],{"type":30,"value":3707},"Idea ",{"type":21,"tag":272,"props":3709,"children":3710},{},[3711],{"type":30,"value":3712},"A",{"type":30,"value":3714}," sounds good to me! Let's add it to the current sprint. It seems to be easy to execute, and I'm very confident on the impact, cause every users should be able to see the recomendation block on the listing page. I'd provide the A/B test design by the end of the day, so we can run it tomorow.",{"type":21,"tag":26,"props":3716,"children":3717},{},[3718,3719,3724],{"type":30,"value":3707},{"type":21,"tag":272,"props":3720,"children":3721},{},[3722],{"type":30,"value":3723},"B",{"type":30,"value":3725}," sounds easy to implement, however I'm uncertain on how many users actually need this feature. It doesn't seems like users tend to search the same service over and over again. However if the development side is ready to implement, then this experiment also might go into the backlog on this week.",{"type":21,"tag":26,"props":3727,"children":3728},{},[3729,3730,3735],{"type":30,"value":3707},{"type":21,"tag":272,"props":3731,"children":3732},{},[3733],{"type":30,"value":3734},"C",{"type":30,"value":3736}," sounds cool! Does this feature should be integrated into suppliers CRM? If online booking feature is available on the listing page then there is a big confidence that it might boost the conversion rate, but we have to be certain that it will be ready anytime soon, otherwise let's postpone it to the next sprint.",{"type":21,"tag":26,"props":3738,"children":3739},{},[3740,3741,3746],{"type":30,"value":3707},{"type":21,"tag":272,"props":3742,"children":3743},{},[3744],{"type":30,"value":3745},"D",{"type":30,"value":3747}," in my opinion may decrease the conversion rate, cause it makes clear whether the working hours suit your needs or not. But if you insist, it still very easy to implement so we can run the test on this week.",{"type":21,"tag":26,"props":3749,"children":3750},{},[3751,3753,3758],{"type":30,"value":3752},"Let's discuss with the rest of the team, but I'd suggest to prioritize ideas in the following order ",{"type":21,"tag":272,"props":3754,"children":3755},{},[3756],{"type":30,"value":3757},"A -> C -> B -> D",{"type":30,"value":3759},".",{"type":21,"tag":26,"props":3761,"children":3762},{},[3763,3764,3767],{"type":30,"value":2215},{"type":21,"tag":2217,"props":3765,"children":3766},{},[],{"type":30,"value":2221},{"type":21,"tag":3695,"props":3769,"children":3770},{},[],{"type":21,"tag":673,"props":3772,"children":3774},{"id":3773},"_22-the-experiment-design-example-for-idea-a",[3775],{"type":30,"value":3776},"2.2 The Experiment Design Example for Idea A",{"type":21,"tag":3778,"props":3779,"children":3781},"h4",{"id":3780},"experiment-objectives",[3782],{"type":30,"value":3783},"Experiment Objectives:",{"type":21,"tag":26,"props":3785,"children":3786},{},[3787,3789,3794,3796,3801,3803,3808,3810,3815],{"type":30,"value":3788},"Determine whether  the new ",{"type":21,"tag":272,"props":3790,"children":3791},{},[3792],{"type":30,"value":3793},"reviews",{"type":30,"value":3795}," and ",{"type":21,"tag":272,"props":3797,"children":3798},{},[3799],{"type":30,"value":3800},"rating",{"type":30,"value":3802}," blocks on the ",{"type":21,"tag":272,"props":3804,"children":3805},{},[3806],{"type":30,"value":3807},"listing page",{"type":30,"value":3809},", and ",{"type":21,"tag":272,"props":3811,"children":3812},{},[3813],{"type":30,"value":3814},"service card",{"type":30,"value":3816}," has or has not significant increase on conversion rate by at least 5%",{"type":21,"tag":26,"props":3818,"children":3819},{},[3820],{"type":30,"value":3821},"H0: New blocks does not increase conversion rate from service page to contact",{"type":21,"tag":26,"props":3823,"children":3824},{},[3825],{"type":30,"value":3826},"H1: New blocks does increase conversion rate from service page to contact",{"type":21,"tag":3778,"props":3828,"children":3830},{"id":3829},"aa-test-groups",[3831],{"type":30,"value":3832},"A/A test groups:",{"type":21,"tag":26,"props":3834,"children":3835},{},[3836],{"type":30,"value":3837},"Preliminary experiment to identify two identical groups for comparability. Both groups receiving identical interface, and that ensures randomization in user selection for each group, making sure that any difference in A/B test would happened due to recent changes.",{"type":21,"tag":26,"props":3839,"children":3840},{},[3841],{"type":30,"value":3842},"Group A_1: control group",{"type":21,"tag":26,"props":3844,"children":3845},{},[3846],{"type":30,"value":3847},"Group A_2: experimental group",{"type":21,"tag":26,"props":3849,"children":3850},{},[3851],{"type":30,"value":3852},"The A/A test duration time should be as much as needed for collecting required sample size.",{"type":21,"tag":3778,"props":3854,"children":3856},{"id":3855},"ab-test-groups",[3857],{"type":30,"value":3858},"A/B test groups:",{"type":21,"tag":26,"props":3860,"children":3861},{},[3862,3864,3868,3869,3873],{"type":30,"value":3863},"The main experiment, where the control group A (ex A_1) receives a standard interface, and the experimental group B (ex_A2) additionally receives new blocks of ",{"type":21,"tag":272,"props":3865,"children":3866},{},[3867],{"type":30,"value":3793},{"type":30,"value":3795},{"type":21,"tag":272,"props":3870,"children":3871},{},[3872],{"type":30,"value":3800},{"type":30,"value":3874}," on the listing page and service card.",{"type":21,"tag":26,"props":3876,"children":3877},{},[3878],{"type":30,"value":3879},"Group A: control group",{"type":21,"tag":26,"props":3881,"children":3882},{},[3883],{"type":30,"value":3884},"Group B: experimental group",{"type":21,"tag":26,"props":3886,"children":3887},{},[3888],{"type":30,"value":3889},"The A/B test duration time should be at least 7 days, preferably even 14 days. The duration time in scope of entire weeks helps to properly assess the impact of weekend and working days alike.",{"type":21,"tag":3778,"props":3891,"children":3893},{"id":3892},"metrics-to-be-tracked-during-experiment",[3894],{"type":30,"value":3895},"Metrics to be tracked during experiment:",{"type":21,"tag":1106,"props":3897,"children":3898},{},[3899,3904,3909,3914,3919],{"type":21,"tag":268,"props":3900,"children":3901},{},[3902],{"type":30,"value":3903},"Conversion Rate - the ratio of contact page visits to the service card visits",{"type":21,"tag":268,"props":3905,"children":3906},{},[3907],{"type":30,"value":3908},"ARPPU - average revenue per user",{"type":21,"tag":268,"props":3910,"children":3911},{},[3912],{"type":30,"value":3913},"AOV - Average Order Value",{"type":21,"tag":268,"props":3915,"children":3916},{},[3917],{"type":30,"value":3918},"CTR to service card - Clicks from listing page to service card",{"type":21,"tag":268,"props":3920,"children":3921},{},[3922],{"type":30,"value":3923},"CTR to contact page- Clicks from the service card to contact page",{"type":21,"tag":3778,"props":3925,"children":3927},{"id":3926},"calculating-sample-size-using-r",[3928],{"type":30,"value":3929},"Calculating sample size using R:",{"type":21,"tag":490,"props":3931,"children":3936},{"className":3932,"code":3934,"language":3935,"meta":8},[3933],"language-r","library(pwr)\n\np1 \u003C- 0.0121 # Current conversion rate - 1.21%\np2 \u003C- 0.0127 # Set target + 5% conversion to 1.27%\n\nalpha \u003C- 0.05 # Set the Confidence Interval to 95%\nbeta \u003C- 0.20 # Set the power to 80% (1 - beta)\n\nz_alpha \u003C- qnorm(1 - alpha/2)\nz_beta \u003C- qnorm(1 - beta)\n\nsample_size \u003C- pwr.2p.test(h = ES.h(p1, p2), sig.level = alpha, power = 1 - beta)\n\nsample_size\n","r",[3937],{"type":21,"tag":495,"props":3938,"children":3939},{"__ignoreMap":8},[3940],{"type":30,"value":3934},{"type":21,"tag":3778,"props":3942,"children":3944},{"id":3943},"sample-size",[3945],{"type":30,"value":3946},"Sample size:",{"type":21,"tag":26,"props":3948,"children":3949},{},[3950,3958,3960,3968],{"type":21,"tag":628,"props":3951,"children":3952},{},[3953],{"type":21,"tag":272,"props":3954,"children":3955},{},[3956],{"type":30,"value":3957},"1,067,834",{"type":30,"value":3959}," observations in total or ",{"type":21,"tag":628,"props":3961,"children":3962},{},[3963],{"type":21,"tag":272,"props":3964,"children":3965},{},[3966],{"type":30,"value":3967},"533,917",{"type":30,"value":3969}," observations per sample",{"type":21,"tag":44,"props":3971,"children":3973},{"id":3972},"task-3",[3974],{"type":30,"value":3975},"Task 3",{"type":21,"tag":26,"props":3977,"children":3978},{},[3979],{"type":21,"tag":272,"props":3980,"children":3981},{},[3982],{"type":30,"value":3983},"According to given table calculate the Monthly Active Users for the October",{"type":21,"tag":93,"props":3985,"children":3986},{},[3987,4010],{"type":21,"tag":97,"props":3988,"children":3989},{},[3990],{"type":21,"tag":101,"props":3991,"children":3992},{},[3993,3997,4001,4005],{"type":21,"tag":105,"props":3994,"children":3995},{},[3996],{"type":30,"value":3712},{"type":21,"tag":105,"props":3998,"children":3999},{},[4000],{"type":30,"value":3723},{"type":21,"tag":105,"props":4002,"children":4003},{},[4004],{"type":30,"value":3734},{"type":21,"tag":105,"props":4006,"children":4007},{},[4008],{"type":30,"value":4009},"MAU",{"type":21,"tag":127,"props":4011,"children":4012},{},[4013,4036,4058,4080,4102,4124,4146,4168,4190,4211,4232],{"type":21,"tag":101,"props":4014,"children":4015},{},[4016,4021,4026,4031],{"type":21,"tag":134,"props":4017,"children":4018},{},[4019],{"type":30,"value":4020},"year_month",{"type":21,"tag":134,"props":4022,"children":4023},{},[4024],{"type":30,"value":4025},"new_users",{"type":21,"tag":134,"props":4027,"children":4028},{},[4029],{"type":30,"value":4030},"retention_rate",{"type":21,"tag":134,"props":4032,"children":4033},{},[4034],{"type":30,"value":4035},"-",{"type":21,"tag":101,"props":4037,"children":4038},{},[4039,4044,4049,4054],{"type":21,"tag":134,"props":4040,"children":4041},{},[4042],{"type":30,"value":4043},"1/1/2023",{"type":21,"tag":134,"props":4045,"children":4046},{},[4047],{"type":30,"value":4048},"12000",{"type":21,"tag":134,"props":4050,"children":4051},{},[4052],{"type":30,"value":4053},"100",{"type":21,"tag":134,"props":4055,"children":4056},{},[4057],{"type":30,"value":4035},{"type":21,"tag":101,"props":4059,"children":4060},{},[4061,4066,4071,4076],{"type":21,"tag":134,"props":4062,"children":4063},{},[4064],{"type":30,"value":4065},"2/1/2023",{"type":21,"tag":134,"props":4067,"children":4068},{},[4069],{"type":30,"value":4070},"11500",{"type":21,"tag":134,"props":4072,"children":4073},{},[4074],{"type":30,"value":4075},"0.2053",{"type":21,"tag":134,"props":4077,"children":4078},{},[4079],{"type":30,"value":4035},{"type":21,"tag":101,"props":4081,"children":4082},{},[4083,4088,4093,4098],{"type":21,"tag":134,"props":4084,"children":4085},{},[4086],{"type":30,"value":4087},"3/1/2023",{"type":21,"tag":134,"props":4089,"children":4090},{},[4091],{"type":30,"value":4092},"10000",{"type":21,"tag":134,"props":4094,"children":4095},{},[4096],{"type":30,"value":4097},"0.1812",{"type":21,"tag":134,"props":4099,"children":4100},{},[4101],{"type":30,"value":4035},{"type":21,"tag":101,"props":4103,"children":4104},{},[4105,4110,4115,4120],{"type":21,"tag":134,"props":4106,"children":4107},{},[4108],{"type":30,"value":4109},"4/1/2023",{"type":21,"tag":134,"props":4111,"children":4112},{},[4113],{"type":30,"value":4114},"17000",{"type":21,"tag":134,"props":4116,"children":4117},{},[4118],{"type":30,"value":4119},"0.1715",{"type":21,"tag":134,"props":4121,"children":4122},{},[4123],{"type":30,"value":4035},{"type":21,"tag":101,"props":4125,"children":4126},{},[4127,4132,4137,4142],{"type":21,"tag":134,"props":4128,"children":4129},{},[4130],{"type":30,"value":4131},"5/1/2023",{"type":21,"tag":134,"props":4133,"children":4134},{},[4135],{"type":30,"value":4136},"14350",{"type":21,"tag":134,"props":4138,"children":4139},{},[4140],{"type":30,"value":4141},"0.1595",{"type":21,"tag":134,"props":4143,"children":4144},{},[4145],{"type":30,"value":4035},{"type":21,"tag":101,"props":4147,"children":4148},{},[4149,4154,4159,4164],{"type":21,"tag":134,"props":4150,"children":4151},{},[4152],{"type":30,"value":4153},"6/1/2023",{"type":21,"tag":134,"props":4155,"children":4156},{},[4157],{"type":30,"value":4158},"12200",{"type":21,"tag":134,"props":4160,"children":4161},{},[4162],{"type":30,"value":4163},"0.144",{"type":21,"tag":134,"props":4165,"children":4166},{},[4167],{"type":30,"value":4035},{"type":21,"tag":101,"props":4169,"children":4170},{},[4171,4176,4181,4186],{"type":21,"tag":134,"props":4172,"children":4173},{},[4174],{"type":30,"value":4175},"7/1/2023",{"type":21,"tag":134,"props":4177,"children":4178},{},[4179],{"type":30,"value":4180},"11100",{"type":21,"tag":134,"props":4182,"children":4183},{},[4184],{"type":30,"value":4185},"0.14",{"type":21,"tag":134,"props":4187,"children":4188},{},[4189],{"type":30,"value":4035},{"type":21,"tag":101,"props":4191,"children":4192},{},[4193,4198,4203,4207],{"type":21,"tag":134,"props":4194,"children":4195},{},[4196],{"type":30,"value":4197},"8/1/2023",{"type":21,"tag":134,"props":4199,"children":4200},{},[4201],{"type":30,"value":4202},"14784",{"type":21,"tag":134,"props":4204,"children":4205},{},[4206],{"type":30,"value":4185},{"type":21,"tag":134,"props":4208,"children":4209},{},[4210],{"type":30,"value":4035},{"type":21,"tag":101,"props":4212,"children":4213},{},[4214,4219,4224,4228],{"type":21,"tag":134,"props":4215,"children":4216},{},[4217],{"type":30,"value":4218},"9/1/2023",{"type":21,"tag":134,"props":4220,"children":4221},{},[4222],{"type":30,"value":4223},"13347",{"type":21,"tag":134,"props":4225,"children":4226},{},[4227],{"type":30,"value":4185},{"type":21,"tag":134,"props":4229,"children":4230},{},[4231],{"type":30,"value":4035},{"type":21,"tag":101,"props":4233,"children":4234},{},[4235,4240,4245,4249],{"type":21,"tag":134,"props":4236,"children":4237},{},[4238],{"type":30,"value":4239},"10/1/2023",{"type":21,"tag":134,"props":4241,"children":4242},{},[4243],{"type":30,"value":4244},"20220",{"type":21,"tag":134,"props":4246,"children":4247},{},[4248],{"type":30,"value":4185},{"type":21,"tag":134,"props":4250,"children":4251},{},[4252],{"type":30,"value":4035},{"type":21,"tag":44,"props":4254,"children":4256},{"id":4255},"_31-calculating-mau",[4257],{"type":30,"value":4258},"3.1 Calculating MAU",{"type":21,"tag":26,"props":4260,"children":4261},{},[4262],{"type":30,"value":4263},"Assuming that the given APP has been launched in January 2023, then MAU for January should be equal to the amount of new users, then for the february you'd need to multiply previous month MAU on the current retention rate and add the new users in the way like following excel formula would calculate it for you.",{"type":21,"tag":93,"props":4265,"children":4266},{},[4267,4328],{"type":21,"tag":97,"props":4268,"children":4269},{},[4270],{"type":21,"tag":101,"props":4271,"children":4272},{},[4273,4278,4282,4287,4292,4297,4302,4307,4312,4316,4320,4324],{"type":21,"tag":105,"props":4274,"children":4275},{},[4276],{"type":30,"value":4277},"Month",{"type":21,"tag":105,"props":4279,"children":4280},{},[4281],{"type":30,"value":3444},{"type":21,"tag":105,"props":4283,"children":4284},{},[4285],{"type":30,"value":4286},"20,53%",{"type":21,"tag":105,"props":4288,"children":4289},{},[4290],{"type":30,"value":4291},"18,12%",{"type":21,"tag":105,"props":4293,"children":4294},{},[4295],{"type":30,"value":4296},"17,15%",{"type":21,"tag":105,"props":4298,"children":4299},{},[4300],{"type":30,"value":4301},"15,95%",{"type":21,"tag":105,"props":4303,"children":4304},{},[4305],{"type":30,"value":4306},"14,40%",{"type":21,"tag":105,"props":4308,"children":4309},{},[4310],{"type":30,"value":4311},"14%",{"type":21,"tag":105,"props":4313,"children":4314},{},[4315],{"type":30,"value":4311},{"type":21,"tag":105,"props":4317,"children":4318},{},[4319],{"type":30,"value":4311},{"type":21,"tag":105,"props":4321,"children":4322},{},[4323],{"type":30,"value":4311},{"type":21,"tag":105,"props":4325,"children":4326},{},[4327],{"type":30,"value":4009},{"type":21,"tag":127,"props":4329,"children":4330},{},[4331,4389,4447,4504,4560,4614,4666,4716,4764,4810],{"type":21,"tag":101,"props":4332,"children":4333},{},[4334,4339,4343,4348,4353,4358,4363,4368,4373,4377,4381,4385],{"type":21,"tag":134,"props":4335,"children":4336},{},[4337],{"type":30,"value":4338},"January",{"type":21,"tag":134,"props":4340,"children":4341},{},[4342],{"type":30,"value":4048},{"type":21,"tag":134,"props":4344,"children":4345},{},[4346],{"type":30,"value":4347},"2464",{"type":21,"tag":134,"props":4349,"children":4350},{},[4351],{"type":30,"value":4352},"2174",{"type":21,"tag":134,"props":4354,"children":4355},{},[4356],{"type":30,"value":4357},"2058",{"type":21,"tag":134,"props":4359,"children":4360},{},[4361],{"type":30,"value":4362},"1914",{"type":21,"tag":134,"props":4364,"children":4365},{},[4366],{"type":30,"value":4367},"1728",{"type":21,"tag":134,"props":4369,"children":4370},{},[4371],{"type":30,"value":4372},"1680",{"type":21,"tag":134,"props":4374,"children":4375},{},[4376],{"type":30,"value":4372},{"type":21,"tag":134,"props":4378,"children":4379},{},[4380],{"type":30,"value":4372},{"type":21,"tag":134,"props":4382,"children":4383},{},[4384],{"type":30,"value":4372},{"type":21,"tag":134,"props":4386,"children":4387},{},[4388],{"type":30,"value":4048},{"type":21,"tag":101,"props":4390,"children":4391},{},[4392,4397,4401,4406,4411,4416,4421,4426,4431,4435,4439,4442],{"type":21,"tag":134,"props":4393,"children":4394},{},[4395],{"type":30,"value":4396},"February",{"type":21,"tag":134,"props":4398,"children":4399},{},[4400],{"type":30,"value":4070},{"type":21,"tag":134,"props":4402,"children":4403},{},[4404],{"type":30,"value":4405},"2361",{"type":21,"tag":134,"props":4407,"children":4408},{},[4409],{"type":30,"value":4410},"2084",{"type":21,"tag":134,"props":4412,"children":4413},{},[4414],{"type":30,"value":4415},"1972",{"type":21,"tag":134,"props":4417,"children":4418},{},[4419],{"type":30,"value":4420},"1834",{"type":21,"tag":134,"props":4422,"children":4423},{},[4424],{"type":30,"value":4425},"1656",{"type":21,"tag":134,"props":4427,"children":4428},{},[4429],{"type":30,"value":4430},"1610",{"type":21,"tag":134,"props":4432,"children":4433},{},[4434],{"type":30,"value":4430},{"type":21,"tag":134,"props":4436,"children":4437},{},[4438],{"type":30,"value":4430},{"type":21,"tag":134,"props":4440,"children":4441},{},[],{"type":21,"tag":134,"props":4443,"children":4444},{},[4445],{"type":30,"value":4446},"13964",{"type":21,"tag":101,"props":4448,"children":4449},{},[4450,4455,4459,4464,4469,4474,4479,4484,4489,4493,4496,4499],{"type":21,"tag":134,"props":4451,"children":4452},{},[4453],{"type":30,"value":4454},"March",{"type":21,"tag":134,"props":4456,"children":4457},{},[4458],{"type":30,"value":4092},{"type":21,"tag":134,"props":4460,"children":4461},{},[4462],{"type":30,"value":4463},"2053",{"type":21,"tag":134,"props":4465,"children":4466},{},[4467],{"type":30,"value":4468},"1812",{"type":21,"tag":134,"props":4470,"children":4471},{},[4472],{"type":30,"value":4473},"1715",{"type":21,"tag":134,"props":4475,"children":4476},{},[4477],{"type":30,"value":4478},"1595",{"type":21,"tag":134,"props":4480,"children":4481},{},[4482],{"type":30,"value":4483},"1440",{"type":21,"tag":134,"props":4485,"children":4486},{},[4487],{"type":30,"value":4488},"1400",{"type":21,"tag":134,"props":4490,"children":4491},{},[4492],{"type":30,"value":4488},{"type":21,"tag":134,"props":4494,"children":4495},{},[],{"type":21,"tag":134,"props":4497,"children":4498},{},[],{"type":21,"tag":134,"props":4500,"children":4501},{},[4502],{"type":30,"value":4503},"14535",{"type":21,"tag":101,"props":4505,"children":4506},{},[4507,4512,4516,4521,4526,4531,4536,4541,4546,4549,4552,4555],{"type":21,"tag":134,"props":4508,"children":4509},{},[4510],{"type":30,"value":4511},"April",{"type":21,"tag":134,"props":4513,"children":4514},{},[4515],{"type":30,"value":4114},{"type":21,"tag":134,"props":4517,"children":4518},{},[4519],{"type":30,"value":4520},"3490",{"type":21,"tag":134,"props":4522,"children":4523},{},[4524],{"type":30,"value":4525},"3080",{"type":21,"tag":134,"props":4527,"children":4528},{},[4529],{"type":30,"value":4530},"2916",{"type":21,"tag":134,"props":4532,"children":4533},{},[4534],{"type":30,"value":4535},"2712",{"type":21,"tag":134,"props":4537,"children":4538},{},[4539],{"type":30,"value":4540},"2448",{"type":21,"tag":134,"props":4542,"children":4543},{},[4544],{"type":30,"value":4545},"2380",{"type":21,"tag":134,"props":4547,"children":4548},{},[],{"type":21,"tag":134,"props":4550,"children":4551},{},[],{"type":21,"tag":134,"props":4553,"children":4554},{},[],{"type":21,"tag":134,"props":4556,"children":4557},{},[4558],{"type":30,"value":4559},"23195",{"type":21,"tag":101,"props":4561,"children":4562},{},[4563,4568,4572,4577,4582,4587,4592,4597,4600,4603,4606,4609],{"type":21,"tag":134,"props":4564,"children":4565},{},[4566],{"type":30,"value":4567},"May",{"type":21,"tag":134,"props":4569,"children":4570},{},[4571],{"type":30,"value":4136},{"type":21,"tag":134,"props":4573,"children":4574},{},[4575],{"type":30,"value":4576},"2946",{"type":21,"tag":134,"props":4578,"children":4579},{},[4580],{"type":30,"value":4581},"2600",{"type":21,"tag":134,"props":4583,"children":4584},{},[4585],{"type":30,"value":4586},"2461",{"type":21,"tag":134,"props":4588,"children":4589},{},[4590],{"type":30,"value":4591},"2289",{"type":21,"tag":134,"props":4593,"children":4594},{},[4595],{"type":30,"value":4596},"2066",{"type":21,"tag":134,"props":4598,"children":4599},{},[],{"type":21,"tag":134,"props":4601,"children":4602},{},[],{"type":21,"tag":134,"props":4604,"children":4605},{},[],{"type":21,"tag":134,"props":4607,"children":4608},{},[],{"type":21,"tag":134,"props":4610,"children":4611},{},[4612],{"type":30,"value":4613},"23538",{"type":21,"tag":101,"props":4615,"children":4616},{},[4617,4622,4626,4631,4636,4641,4646,4649,4652,4655,4658,4661],{"type":21,"tag":134,"props":4618,"children":4619},{},[4620],{"type":30,"value":4621},"June",{"type":21,"tag":134,"props":4623,"children":4624},{},[4625],{"type":30,"value":4158},{"type":21,"tag":134,"props":4627,"children":4628},{},[4629],{"type":30,"value":4630},"2505",{"type":21,"tag":134,"props":4632,"children":4633},{},[4634],{"type":30,"value":4635},"2211",{"type":21,"tag":134,"props":4637,"children":4638},{},[4639],{"type":30,"value":4640},"2092",{"type":21,"tag":134,"props":4642,"children":4643},{},[4644],{"type":30,"value":4645},"1946",{"type":21,"tag":134,"props":4647,"children":4648},{},[],{"type":21,"tag":134,"props":4650,"children":4651},{},[],{"type":21,"tag":134,"props":4653,"children":4654},{},[],{"type":21,"tag":134,"props":4656,"children":4657},{},[],{"type":21,"tag":134,"props":4659,"children":4660},{},[],{"type":21,"tag":134,"props":4662,"children":4663},{},[4664],{"type":30,"value":4665},"23504",{"type":21,"tag":101,"props":4667,"children":4668},{},[4669,4674,4678,4683,4688,4693,4696,4699,4702,4705,4708,4711],{"type":21,"tag":134,"props":4670,"children":4671},{},[4672],{"type":30,"value":4673},"July",{"type":21,"tag":134,"props":4675,"children":4676},{},[4677],{"type":30,"value":4180},{"type":21,"tag":134,"props":4679,"children":4680},{},[4681],{"type":30,"value":4682},"2279",{"type":21,"tag":134,"props":4684,"children":4685},{},[4686],{"type":30,"value":4687},"2011",{"type":21,"tag":134,"props":4689,"children":4690},{},[4691],{"type":30,"value":4692},"1904",{"type":21,"tag":134,"props":4694,"children":4695},{},[],{"type":21,"tag":134,"props":4697,"children":4698},{},[],{"type":21,"tag":134,"props":4700,"children":4701},{},[],{"type":21,"tag":134,"props":4703,"children":4704},{},[],{"type":21,"tag":134,"props":4706,"children":4707},{},[],{"type":21,"tag":134,"props":4709,"children":4710},{},[],{"type":21,"tag":134,"props":4712,"children":4713},{},[4714],{"type":30,"value":4715},"24051",{"type":21,"tag":101,"props":4717,"children":4718},{},[4719,4724,4728,4733,4738,4741,4744,4747,4750,4753,4756,4759],{"type":21,"tag":134,"props":4720,"children":4721},{},[4722],{"type":30,"value":4723},"August",{"type":21,"tag":134,"props":4725,"children":4726},{},[4727],{"type":30,"value":4202},{"type":21,"tag":134,"props":4729,"children":4730},{},[4731],{"type":30,"value":4732},"3035",{"type":21,"tag":134,"props":4734,"children":4735},{},[4736],{"type":30,"value":4737},"2679",{"type":21,"tag":134,"props":4739,"children":4740},{},[],{"type":21,"tag":134,"props":4742,"children":4743},{},[],{"type":21,"tag":134,"props":4745,"children":4746},{},[],{"type":21,"tag":134,"props":4748,"children":4749},{},[],{"type":21,"tag":134,"props":4751,"children":4752},{},[],{"type":21,"tag":134,"props":4754,"children":4755},{},[],{"type":21,"tag":134,"props":4757,"children":4758},{},[],{"type":21,"tag":134,"props":4760,"children":4761},{},[4762],{"type":30,"value":4763},"29176",{"type":21,"tag":101,"props":4765,"children":4766},{},[4767,4772,4776,4781,4784,4787,4790,4793,4796,4799,4802,4805],{"type":21,"tag":134,"props":4768,"children":4769},{},[4770],{"type":30,"value":4771},"September",{"type":21,"tag":134,"props":4773,"children":4774},{},[4775],{"type":30,"value":4223},{"type":21,"tag":134,"props":4777,"children":4778},{},[4779],{"type":30,"value":4780},"2740",{"type":21,"tag":134,"props":4782,"children":4783},{},[],{"type":21,"tag":134,"props":4785,"children":4786},{},[],{"type":21,"tag":134,"props":4788,"children":4789},{},[],{"type":21,"tag":134,"props":4791,"children":4792},{},[],{"type":21,"tag":134,"props":4794,"children":4795},{},[],{"type":21,"tag":134,"props":4797,"children":4798},{},[],{"type":21,"tag":134,"props":4800,"children":4801},{},[],{"type":21,"tag":134,"props":4803,"children":4804},{},[],{"type":21,"tag":134,"props":4806,"children":4807},{},[4808],{"type":30,"value":4809},"29913",{"type":21,"tag":101,"props":4811,"children":4812},{},[4813,4818,4822,4825,4828,4831,4834,4837,4840,4843,4846,4849],{"type":21,"tag":134,"props":4814,"children":4815},{},[4816],{"type":30,"value":4817},"October",{"type":21,"tag":134,"props":4819,"children":4820},{},[4821],{"type":30,"value":4244},{"type":21,"tag":134,"props":4823,"children":4824},{},[],{"type":21,"tag":134,"props":4826,"children":4827},{},[],{"type":21,"tag":134,"props":4829,"children":4830},{},[],{"type":21,"tag":134,"props":4832,"children":4833},{},[],{"type":21,"tag":134,"props":4835,"children":4836},{},[],{"type":21,"tag":134,"props":4838,"children":4839},{},[],{"type":21,"tag":134,"props":4841,"children":4842},{},[],{"type":21,"tag":134,"props":4844,"children":4845},{},[],{"type":21,"tag":134,"props":4847,"children":4848},{},[],{"type":21,"tag":134,"props":4850,"children":4851},{},[4852],{"type":30,"value":4853},"38625",{"type":21,"tag":26,"props":4855,"children":4856},{},[4857,4859],{"type":30,"value":4858},"The MAU for October 2023 should be equal to 22 414 users. But we are not just a aregular excel users, don't we. Let's predict the MAU for the next year using forcasting model ",{"type":21,"tag":272,"props":4860,"children":4861},{},[4862],{"type":30,"value":4863},"Prophet",{"type":21,"tag":44,"props":4865,"children":4867},{"id":4866},"_32-predicting-mau",[4868],{"type":30,"value":4869},"3.2 Predicting MAU",{"type":21,"tag":26,"props":4871,"children":4872},{},[4873,4875,4882],{"type":30,"value":4874},"Let's create a plot using Facebook's library for timeseries analysis called ",{"type":21,"tag":1957,"props":4876,"children":4879},{"href":4877,"rel":4878},"https://facebook.github.io/prophet/",[1961],[4880],{"type":30,"value":4881},"prophet",{"type":30,"value":4883},". Install the package running the following code",{"type":21,"tag":490,"props":4885,"children":4888},{"className":4886,"code":4887,"language":3935,"meta":8},[3933],"install.package(\"prophet\")\n",[4889],{"type":21,"tag":495,"props":4890,"children":4891},{"__ignoreMap":8},[4892],{"type":30,"value":4887},{"type":21,"tag":26,"props":4894,"children":4895},{},[4896,4898,4903,4904,4909],{"type":30,"value":4897},"Then run the following code, assuming we have our table in the working directory named as 'input.csv', and rename year_month and MAU columns to ",{"type":21,"tag":272,"props":4899,"children":4900},{},[4901],{"type":30,"value":4902},"ds",{"type":30,"value":3795},{"type":21,"tag":272,"props":4905,"children":4906},{},[4907],{"type":30,"value":4908},"y",{"type":30,"value":4910}," respectively.",{"type":21,"tag":490,"props":4912,"children":4915},{"className":4913,"code":4914,"language":3935,"meta":8},[3933],"# Load the library\nlibrary(prophet)\n\n# Read data from CSV file\ndf \u003C- read.csv('input.csv')\n\n# Convert it to Date\ndf$ds \u003C- as.Date(df$ds)\n\n# Initialize the prophet model\nm \u003C- prophet()\n\n# Fit the model\nm \u003C- fit.prophet(m, df)\n\n# Create a dataframe for future dates\nfuture \u003C- make_future_dataframe(m, periods = 12, freq = \"month\")\n\n# Predict MAU for future dates\nforecast \u003C- predict(m, future)\n\n# Plot the forecast\nplot(m, forecast)\n",[4916],{"type":21,"tag":495,"props":4917,"children":4918},{"__ignoreMap":8},[4919],{"type":30,"value":4914},{"type":21,"tag":26,"props":4921,"children":4922},{},[4923],{"type":21,"tag":516,"props":4924,"children":4927},{"alt":4925,"src":4926},"The Prophet Time Series Forecasting Plot","/img/img24.png",[],{"type":21,"tag":26,"props":4929,"children":4930},{},[4931],{"type":30,"value":4932},"On the plot the black dots representing the actual values, the straight blue line is the model predicted values, and the lightblue area is the 96% confidence level of prediction accuracy. According to this model, we are likely to exceed 25 000 MAU by the end of the year.",{"type":21,"tag":673,"props":4934,"children":4936},{"id":4935},"conclusion-or-what-could-be-done-differently",[4937],{"type":30,"value":4938},"Conclusion or What Could be Done Differently",{"type":21,"tag":26,"props":4940,"children":4941},{},[4942],{"type":30,"value":4943},"I really enjoyed completing this technical task, except maybe writing SQL statements, that's why int's not included in the article. However there is a few things that could have been done differently.",{"type":21,"tag":264,"props":4945,"children":4946},{},[4947,4963,4968,4973],{"type":21,"tag":268,"props":4948,"children":4949},{},[4950,4952,4957,4958],{"type":30,"value":4951},"The Dashboard metrics could be organized in the ",{"type":21,"tag":272,"props":4953,"children":4954},{},[4955],{"type":30,"value":4956},"North Star",{"type":30,"value":3536},{"type":21,"tag":272,"props":4959,"children":4960},{},[4961],{"type":30,"value":4962},"MECE framework",{"type":21,"tag":268,"props":4964,"children":4965},{},[4966],{"type":30,"value":4967},"The email answer to John Doe could mention the affect on the key metrics",{"type":21,"tag":268,"props":4969,"children":4970},{},[4971],{"type":30,"value":4972},"The A/B test design could contain the Client Story and Problem",{"type":21,"tag":268,"props":4974,"children":4975},{},[4976,4978],{"type":30,"value":4977},"The MAU calculation might have take into account the retention for each month as a cohort, thus MAU for October is equal to ",{"type":21,"tag":272,"props":4979,"children":4980},{},[4981],{"type":30,"value":4982},"38 625",{"type":21,"tag":93,"props":4984,"children":4985},{},[4986,5040],{"type":21,"tag":97,"props":4987,"children":4988},{},[4989],{"type":21,"tag":101,"props":4990,"children":4991},{},[4992,4996,5000,5004,5008,5012,5016,5020,5024,5028,5032,5036],{"type":21,"tag":105,"props":4993,"children":4994},{},[4995],{"type":30,"value":4277},{"type":21,"tag":105,"props":4997,"children":4998},{},[4999],{"type":30,"value":3444},{"type":21,"tag":105,"props":5001,"children":5002},{},[5003],{"type":30,"value":4286},{"type":21,"tag":105,"props":5005,"children":5006},{},[5007],{"type":30,"value":4291},{"type":21,"tag":105,"props":5009,"children":5010},{},[5011],{"type":30,"value":4296},{"type":21,"tag":105,"props":5013,"children":5014},{},[5015],{"type":30,"value":4301},{"type":21,"tag":105,"props":5017,"children":5018},{},[5019],{"type":30,"value":4306},{"type":21,"tag":105,"props":5021,"children":5022},{},[5023],{"type":30,"value":4311},{"type":21,"tag":105,"props":5025,"children":5026},{},[5027],{"type":30,"value":4311},{"type":21,"tag":105,"props":5029,"children":5030},{},[5031],{"type":30,"value":4311},{"type":21,"tag":105,"props":5033,"children":5034},{},[5035],{"type":30,"value":4311},{"type":21,"tag":105,"props":5037,"children":5038},{},[5039],{"type":30,"value":4009},{"type":21,"tag":127,"props":5041,"children":5042},{},[5043,5094,5144,5193,5241,5288,5334,5379,5423,5466],{"type":21,"tag":101,"props":5044,"children":5045},{},[5046,5050,5054,5058,5062,5066,5070,5074,5078,5082,5086,5090],{"type":21,"tag":134,"props":5047,"children":5048},{},[5049],{"type":30,"value":4338},{"type":21,"tag":134,"props":5051,"children":5052},{},[5053],{"type":30,"value":4048},{"type":21,"tag":134,"props":5055,"children":5056},{},[5057],{"type":30,"value":4347},{"type":21,"tag":134,"props":5059,"children":5060},{},[5061],{"type":30,"value":4352},{"type":21,"tag":134,"props":5063,"children":5064},{},[5065],{"type":30,"value":4357},{"type":21,"tag":134,"props":5067,"children":5068},{},[5069],{"type":30,"value":4362},{"type":21,"tag":134,"props":5071,"children":5072},{},[5073],{"type":30,"value":4367},{"type":21,"tag":134,"props":5075,"children":5076},{},[5077],{"type":30,"value":4372},{"type":21,"tag":134,"props":5079,"children":5080},{},[5081],{"type":30,"value":4372},{"type":21,"tag":134,"props":5083,"children":5084},{},[5085],{"type":30,"value":4372},{"type":21,"tag":134,"props":5087,"children":5088},{},[5089],{"type":30,"value":4372},{"type":21,"tag":134,"props":5091,"children":5092},{},[5093],{"type":30,"value":4048},{"type":21,"tag":101,"props":5095,"children":5096},{},[5097,5101,5105,5109,5113,5117,5121,5125,5129,5133,5137,5140],{"type":21,"tag":134,"props":5098,"children":5099},{},[5100],{"type":30,"value":4396},{"type":21,"tag":134,"props":5102,"children":5103},{},[5104],{"type":30,"value":4070},{"type":21,"tag":134,"props":5106,"children":5107},{},[5108],{"type":30,"value":4405},{"type":21,"tag":134,"props":5110,"children":5111},{},[5112],{"type":30,"value":4410},{"type":21,"tag":134,"props":5114,"children":5115},{},[5116],{"type":30,"value":4415},{"type":21,"tag":134,"props":5118,"children":5119},{},[5120],{"type":30,"value":4420},{"type":21,"tag":134,"props":5122,"children":5123},{},[5124],{"type":30,"value":4425},{"type":21,"tag":134,"props":5126,"children":5127},{},[5128],{"type":30,"value":4430},{"type":21,"tag":134,"props":5130,"children":5131},{},[5132],{"type":30,"value":4430},{"type":21,"tag":134,"props":5134,"children":5135},{},[5136],{"type":30,"value":4430},{"type":21,"tag":134,"props":5138,"children":5139},{},[],{"type":21,"tag":134,"props":5141,"children":5142},{},[5143],{"type":30,"value":4446},{"type":21,"tag":101,"props":5145,"children":5146},{},[5147,5151,5155,5159,5163,5167,5171,5175,5179,5183,5186,5189],{"type":21,"tag":134,"props":5148,"children":5149},{},[5150],{"type":30,"value":4454},{"type":21,"tag":134,"props":5152,"children":5153},{},[5154],{"type":30,"value":4092},{"type":21,"tag":134,"props":5156,"children":5157},{},[5158],{"type":30,"value":4463},{"type":21,"tag":134,"props":5160,"children":5161},{},[5162],{"type":30,"value":4468},{"type":21,"tag":134,"props":5164,"children":5165},{},[5166],{"type":30,"value":4473},{"type":21,"tag":134,"props":5168,"children":5169},{},[5170],{"type":30,"value":4478},{"type":21,"tag":134,"props":5172,"children":5173},{},[5174],{"type":30,"value":4483},{"type":21,"tag":134,"props":5176,"children":5177},{},[5178],{"type":30,"value":4488},{"type":21,"tag":134,"props":5180,"children":5181},{},[5182],{"type":30,"value":4488},{"type":21,"tag":134,"props":5184,"children":5185},{},[],{"type":21,"tag":134,"props":5187,"children":5188},{},[],{"type":21,"tag":134,"props":5190,"children":5191},{},[5192],{"type":30,"value":4503},{"type":21,"tag":101,"props":5194,"children":5195},{},[5196,5200,5204,5208,5212,5216,5220,5224,5228,5231,5234,5237],{"type":21,"tag":134,"props":5197,"children":5198},{},[5199],{"type":30,"value":4511},{"type":21,"tag":134,"props":5201,"children":5202},{},[5203],{"type":30,"value":4114},{"type":21,"tag":134,"props":5205,"children":5206},{},[5207],{"type":30,"value":4520},{"type":21,"tag":134,"props":5209,"children":5210},{},[5211],{"type":30,"value":4525},{"type":21,"tag":134,"props":5213,"children":5214},{},[5215],{"type":30,"value":4530},{"type":21,"tag":134,"props":5217,"children":5218},{},[5219],{"type":30,"value":4535},{"type":21,"tag":134,"props":5221,"children":5222},{},[5223],{"type":30,"value":4540},{"type":21,"tag":134,"props":5225,"children":5226},{},[5227],{"type":30,"value":4545},{"type":21,"tag":134,"props":5229,"children":5230},{},[],{"type":21,"tag":134,"props":5232,"children":5233},{},[],{"type":21,"tag":134,"props":5235,"children":5236},{},[],{"type":21,"tag":134,"props":5238,"children":5239},{},[5240],{"type":30,"value":4559},{"type":21,"tag":101,"props":5242,"children":5243},{},[5244,5248,5252,5256,5260,5264,5268,5272,5275,5278,5281,5284],{"type":21,"tag":134,"props":5245,"children":5246},{},[5247],{"type":30,"value":4567},{"type":21,"tag":134,"props":5249,"children":5250},{},[5251],{"type":30,"value":4136},{"type":21,"tag":134,"props":5253,"children":5254},{},[5255],{"type":30,"value":4576},{"type":21,"tag":134,"props":5257,"children":5258},{},[5259],{"type":30,"value":4581},{"type":21,"tag":134,"props":5261,"children":5262},{},[5263],{"type":30,"value":4586},{"type":21,"tag":134,"props":5265,"children":5266},{},[5267],{"type":30,"value":4591},{"type":21,"tag":134,"props":5269,"children":5270},{},[5271],{"type":30,"value":4596},{"type":21,"tag":134,"props":5273,"children":5274},{},[],{"type":21,"tag":134,"props":5276,"children":5277},{},[],{"type":21,"tag":134,"props":5279,"children":5280},{},[],{"type":21,"tag":134,"props":5282,"children":5283},{},[],{"type":21,"tag":134,"props":5285,"children":5286},{},[5287],{"type":30,"value":4613},{"type":21,"tag":101,"props":5289,"children":5290},{},[5291,5295,5299,5303,5307,5311,5315,5318,5321,5324,5327,5330],{"type":21,"tag":134,"props":5292,"children":5293},{},[5294],{"type":30,"value":4621},{"type":21,"tag":134,"props":5296,"children":5297},{},[5298],{"type":30,"value":4158},{"type":21,"tag":134,"props":5300,"children":5301},{},[5302],{"type":30,"value":4630},{"type":21,"tag":134,"props":5304,"children":5305},{},[5306],{"type":30,"value":4635},{"type":21,"tag":134,"props":5308,"children":5309},{},[5310],{"type":30,"value":4640},{"type":21,"tag":134,"props":5312,"children":5313},{},[5314],{"type":30,"value":4645},{"type":21,"tag":134,"props":5316,"children":5317},{},[],{"type":21,"tag":134,"props":5319,"children":5320},{},[],{"type":21,"tag":134,"props":5322,"children":5323},{},[],{"type":21,"tag":134,"props":5325,"children":5326},{},[],{"type":21,"tag":134,"props":5328,"children":5329},{},[],{"type":21,"tag":134,"props":5331,"children":5332},{},[5333],{"type":30,"value":4665},{"type":21,"tag":101,"props":5335,"children":5336},{},[5337,5341,5345,5349,5353,5357,5360,5363,5366,5369,5372,5375],{"type":21,"tag":134,"props":5338,"children":5339},{},[5340],{"type":30,"value":4673},{"type":21,"tag":134,"props":5342,"children":5343},{},[5344],{"type":30,"value":4180},{"type":21,"tag":134,"props":5346,"children":5347},{},[5348],{"type":30,"value":4682},{"type":21,"tag":134,"props":5350,"children":5351},{},[5352],{"type":30,"value":4687},{"type":21,"tag":134,"props":5354,"children":5355},{},[5356],{"type":30,"value":4692},{"type":21,"tag":134,"props":5358,"children":5359},{},[],{"type":21,"tag":134,"props":5361,"children":5362},{},[],{"type":21,"tag":134,"props":5364,"children":5365},{},[],{"type":21,"tag":134,"props":5367,"children":5368},{},[],{"type":21,"tag":134,"props":5370,"children":5371},{},[],{"type":21,"tag":134,"props":5373,"children":5374},{},[],{"type":21,"tag":134,"props":5376,"children":5377},{},[5378],{"type":30,"value":4715},{"type":21,"tag":101,"props":5380,"children":5381},{},[5382,5386,5390,5394,5398,5401,5404,5407,5410,5413,5416,5419],{"type":21,"tag":134,"props":5383,"children":5384},{},[5385],{"type":30,"value":4723},{"type":21,"tag":134,"props":5387,"children":5388},{},[5389],{"type":30,"value":4202},{"type":21,"tag":134,"props":5391,"children":5392},{},[5393],{"type":30,"value":4732},{"type":21,"tag":134,"props":5395,"children":5396},{},[5397],{"type":30,"value":4737},{"type":21,"tag":134,"props":5399,"children":5400},{},[],{"type":21,"tag":134,"props":5402,"children":5403},{},[],{"type":21,"tag":134,"props":5405,"children":5406},{},[],{"type":21,"tag":134,"props":5408,"children":5409},{},[],{"type":21,"tag":134,"props":5411,"children":5412},{},[],{"type":21,"tag":134,"props":5414,"children":5415},{},[],{"type":21,"tag":134,"props":5417,"children":5418},{},[],{"type":21,"tag":134,"props":5420,"children":5421},{},[5422],{"type":30,"value":4763},{"type":21,"tag":101,"props":5424,"children":5425},{},[5426,5430,5434,5438,5441,5444,5447,5450,5453,5456,5459,5462],{"type":21,"tag":134,"props":5427,"children":5428},{},[5429],{"type":30,"value":4771},{"type":21,"tag":134,"props":5431,"children":5432},{},[5433],{"type":30,"value":4223},{"type":21,"tag":134,"props":5435,"children":5436},{},[5437],{"type":30,"value":4780},{"type":21,"tag":134,"props":5439,"children":5440},{},[],{"type":21,"tag":134,"props":5442,"children":5443},{},[],{"type":21,"tag":134,"props":5445,"children":5446},{},[],{"type":21,"tag":134,"props":5448,"children":5449},{},[],{"type":21,"tag":134,"props":5451,"children":5452},{},[],{"type":21,"tag":134,"props":5454,"children":5455},{},[],{"type":21,"tag":134,"props":5457,"children":5458},{},[],{"type":21,"tag":134,"props":5460,"children":5461},{},[],{"type":21,"tag":134,"props":5463,"children":5464},{},[5465],{"type":30,"value":4809},{"type":21,"tag":101,"props":5467,"children":5468},{},[5469,5473,5477,5480,5483,5486,5489,5492,5495,5498,5501,5504],{"type":21,"tag":134,"props":5470,"children":5471},{},[5472],{"type":30,"value":4817},{"type":21,"tag":134,"props":5474,"children":5475},{},[5476],{"type":30,"value":4244},{"type":21,"tag":134,"props":5478,"children":5479},{},[],{"type":21,"tag":134,"props":5481,"children":5482},{},[],{"type":21,"tag":134,"props":5484,"children":5485},{},[],{"type":21,"tag":134,"props":5487,"children":5488},{},[],{"type":21,"tag":134,"props":5490,"children":5491},{},[],{"type":21,"tag":134,"props":5493,"children":5494},{},[],{"type":21,"tag":134,"props":5496,"children":5497},{},[],{"type":21,"tag":134,"props":5499,"children":5500},{},[],{"type":21,"tag":134,"props":5502,"children":5503},{},[],{"type":21,"tag":134,"props":5505,"children":5506},{},[5507],{"type":30,"value":4853},{"type":21,"tag":26,"props":5509,"children":5510},{},[5511],{"type":30,"value":5512},"I always find SQL super boring in technical tasks, cause there is no sufficient reward, whereas in the real life extracting the right data would create the fuel for your model, dashboard or whatever data product you might need.",{"type":21,"tag":26,"props":5514,"children":5515},{},[5516,5518,5525],{"type":30,"value":5517},"However if you are interesting in learning SQL, which is crucial for Data kinda jobs. I'd highly recommend the free resources by ",{"type":21,"tag":1957,"props":5519,"children":5522},{"href":5520,"rel":5521},"https://dataacademy.kz",[1961],[5523],{"type":30,"value":5524},"DataAcademy",{"type":30,"value":5526},". This is a clean content without any ads or payment subscriptions. So definitely check this out!",{"type":21,"tag":26,"props":5528,"children":5529},{},[5530,5531,5534],{"type":30,"value":2215},{"type":21,"tag":2217,"props":5532,"children":5533},{},[],{"type":30,"value":2221},{"title":8,"searchDepth":596,"depth":596,"links":5536},[5537,5543,5549,5552,5553,5554],{"id":3374,"depth":596,"text":3377,"children":5538},[5539,5540,5541,5542],{"id":3416,"depth":1260,"text":3419},{"id":3513,"depth":1260,"text":3516},{"id":3551,"depth":1260,"text":3554},{"id":3580,"depth":1260,"text":3583},{"id":3627,"depth":596,"text":3630,"children":5544},[5545,5546,5547,5548],{"id":3646,"depth":1260,"text":3649},{"id":3657,"depth":1260,"text":3660},{"id":3668,"depth":1260,"text":3671},{"id":3679,"depth":1260,"text":3682},{"id":3690,"depth":596,"text":3693,"children":5550},[5551],{"id":3773,"depth":1260,"text":3776},{"id":3972,"depth":596,"text":3975},{"id":4255,"depth":596,"text":4258},{"id":4866,"depth":596,"text":4869,"children":5555},[5556],{"id":4935,"depth":1260,"text":4938},"content:posts:product-analyst-technical-task.md","posts/product-analyst-technical-task.md","posts/product-analyst-technical-task",{"_path":5561,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":5562,"description":5563,"date":5564,"draft":7,"tags":5565,"thumbnail":5567,"alt_description":5568,"slug":5569,"body":5570,"_type":604,"_id":6604,"_source":606,"_file":6605,"_stem":6606,"_extension":609},"/posts/statistical-data-analysis-101","Statistical Data Analysis 101","Learn about Statistical Analysis, Central Tendency, Variability, Variance, and back propagation.","2024-02-01T00:00:00.000Z",[5566],"stats","/img/statistical_data_analysis_101.png","Statistical Data Analysis in 4 steps","statistical-data-analysis-101",{"type":18,"children":5571,"toc":6584},[5572,5578,5600,5611,5620,5632,5638,5643,5649,5654,5662,5670,5674,5717,5723,5734,5742,5747,5756,5764,5769,5774,5780,5792,5800,5808,5813,5821,5829,5834,5843,5851,5856,5862,5874,5883,5891,5896,5905,5913,5918,5924,5929,6076,6081,6087,6092,6100,6108,6116,6124,6130,6135,6143,6151,6156,6161,6170,6182,6187,6196,6205,6213,6238,6243,6248,6254,6259,6265,6270,6278,6286,6290,6327,6332,6340,6345,6353,6361,6366,6376,6384,6394,6402,6407,6413,6418,6423,6428,6436,6441,6447,6458,6466,6474,6479,6487,6493,6510,6518,6526,6531,6537,6542,6550,6558,6563,6571,6576],{"type":21,"tag":44,"props":5573,"children":5575},{"id":5574},"bloody-beginners-guide-part-i",[5576],{"type":30,"value":5577},"Bloody Beginners Guide. Part I",{"type":21,"tag":26,"props":5579,"children":5580},{},[5581,5583,5590,5592,5598],{"type":30,"value":5582},"In this article we will take a look at the basic concepts of Statistical Analysis. We will be using the R programming language to visualize essential details, so download ",{"type":21,"tag":1957,"props":5584,"children":5587},{"href":5585,"rel":5586},"https://posit.co/download/rstudio-desktop/",[1961],[5588],{"type":30,"value":5589},"RStudio Desktop",{"type":30,"value":5591}," and install both R and R studio. If you are not comfortable programming at all, then check out my previous post about ",{"type":21,"tag":1957,"props":5593,"children":5595},{"href":2948,"rel":5594},[1961],[5596],{"type":30,"value":5597},"how to start coding",{"type":30,"value":5599}," and build the basics of using Terminal and VS Code. Even though it's not covering the R, it still a good point to start.",{"type":21,"tag":26,"props":5601,"children":5602},{},[5603,5605,5610],{"type":30,"value":5604},"Now let's kick-off with the following command to generate random dataset. Print the entire block of the code into R studio editor, then select the code you want to run and press ",{"type":21,"tag":628,"props":5606,"children":5607},{},[5608],{"type":30,"value":5609},"Ctrl + Enter",{"type":30,"value":3759},{"type":21,"tag":490,"props":5612,"children":5615},{"code":5613,"language":3935,"meta":8,"className":5614},"set.seed(123)  # Allows you to reproduce the same random data as mine\ndata \u003C- rnorm(100, mean = 50, sd = 10) # Generating a random dataset\ndata # Calling the dataset to be printed in the console\n",[3933],[5616],{"type":21,"tag":495,"props":5617,"children":5618},{"__ignoreMap":8},[5619],{"type":30,"value":5613},{"type":21,"tag":26,"props":5621,"children":5622},{},[5623,5625,5630],{"type":30,"value":5624},"Everything after the hash sign ",{"type":21,"tag":272,"props":5626,"children":5627},{},[5628],{"type":30,"value":5629},"#",{"type":30,"value":5631}," is a comment for you ro read, it's not part of the code.",{"type":21,"tag":44,"props":5633,"children":5635},{"id":5634},"measure-of-central-tendency",[5636],{"type":30,"value":5637},"Measure of Central Tendency",{"type":21,"tag":26,"props":5639,"children":5640},{},[5641],{"type":30,"value":5642},"The most basic concept of statistical data analysis is the average expected value. Let's find the mean and median average points in our dataset that we've just created. Building the visual graphs helps to make it clear at glance. We will start with the mean.",{"type":21,"tag":673,"props":5644,"children":5646},{"id":5645},"arithmetic-mean",[5647],{"type":30,"value":5648},"Arithmetic Mean",{"type":21,"tag":26,"props":5650,"children":5651},{},[5652],{"type":30,"value":5653},"The mean, of course, is just a ratio between the amount of values and total number of observations. It sounds simple, but look at this scary formula. It's worth breaking it down and memorizing, as you will see it more often on your learning journey.",{"type":21,"tag":26,"props":5655,"children":5656},{},[5657],{"type":21,"tag":272,"props":5658,"children":5659},{},[5660],{"type":30,"value":5661},"Mean Formula:",{"type":21,"tag":490,"props":5663,"children":5665},{"code":5664},"x̄ = (1/n) × Σ(xᵢ) for i=1 to n\n",[5666],{"type":21,"tag":495,"props":5667,"children":5668},{"__ignoreMap":8},[5669],{"type":30,"value":5664},{"type":21,"tag":26,"props":5671,"children":5672},{},[5673],{"type":30,"value":703},{"type":21,"tag":264,"props":5675,"children":5676},{},[5677,5687,5697,5707],{"type":21,"tag":268,"props":5678,"children":5679},{},[5680,5685],{"type":21,"tag":272,"props":5681,"children":5682},{},[5683],{"type":30,"value":5684},"x̄",{"type":30,"value":5686}," (x-bar) = the mean",{"type":21,"tag":268,"props":5688,"children":5689},{},[5690,5695],{"type":21,"tag":272,"props":5691,"children":5692},{},[5693],{"type":30,"value":5694},"n",{"type":30,"value":5696}," = number of observations",{"type":21,"tag":268,"props":5698,"children":5699},{},[5700,5705],{"type":21,"tag":272,"props":5701,"children":5702},{},[5703],{"type":30,"value":5704},"Σ",{"type":30,"value":5706}," (sigma) = sum of all values",{"type":21,"tag":268,"props":5708,"children":5709},{},[5710,5715],{"type":21,"tag":272,"props":5711,"children":5712},{},[5713],{"type":30,"value":5714},"xᵢ",{"type":30,"value":5716}," = individual values",{"type":21,"tag":3778,"props":5718,"children":5720},{"id":5719},"formula-explanation",[5721],{"type":30,"value":5722},"Formula Explanation",{"type":21,"tag":26,"props":5724,"children":5725},{},[5726,5728,5732],{"type":30,"value":5727},"The mean, denoted by ",{"type":21,"tag":272,"props":5729,"children":5730},{},[5731],{"type":30,"value":5684},{"type":30,"value":5733},", equals the sum of all values (expressed by Σ from i=1 to n of xᵢ), and the multiplication by (1/n) is merely a division by the number of observations. It's fair enough to interpret this formula as:",{"type":21,"tag":490,"props":5735,"children":5737},{"code":5736},"x̄ = (x₁ + x₂ + ... + xₙ) / n\n",[5738],{"type":21,"tag":495,"props":5739,"children":5740},{"__ignoreMap":8},[5741],{"type":30,"value":5736},{"type":21,"tag":26,"props":5743,"children":5744},{},[5745],{"type":30,"value":5746},"Now enough of math! Let's visualize our dataset using histogram. Also we will draw two absolute lines, the red line indicating the mean, and the green line representing the median.",{"type":21,"tag":490,"props":5748,"children":5751},{"code":5749,"language":3935,"meta":8,"className":5750},"# Plotting the data\nhist(data, main = \"Measure of Central Tendency\", \n           xlab = \"Values\", \n           ylab = \"Observations\",\n           col = \"lightblue\", border = \"black\")\n\n# Adding a vertical line for the mean and median\nabline(v = mean(data), col = \"red\", lwd = 2)\nabline(v = median(data), col = \"green\", lwd = 2)\n# Getting the values\nprint(mean(data))\nprint(median(data))\n",[3933],[5752],{"type":21,"tag":495,"props":5753,"children":5754},{"__ignoreMap":8},[5755],{"type":30,"value":5749},{"type":21,"tag":26,"props":5757,"children":5758},{},[5759],{"type":21,"tag":516,"props":5760,"children":5763},{"alt":5761,"src":5762},"Normal Distribution with mean and median","/img/img12.png",[],{"type":21,"tag":26,"props":5765,"children":5766},{},[5767],{"type":30,"value":5768},"Looking at the graph we can see the mean and median both at the center, as expected. The console line calculated $50.90 and $50.61 as the mean and median respectively.",{"type":21,"tag":26,"props":5770,"children":5771},{},[5772],{"type":30,"value":5773},"But what is this median and why do we need it?",{"type":21,"tag":673,"props":5775,"children":5777},{"id":5776},"median",[5778],{"type":30,"value":5779},"Median",{"type":21,"tag":26,"props":5781,"children":5782},{},[5783,5785,5790],{"type":30,"value":5784},"Unlike the mean, the median (denoted by ",{"type":21,"tag":272,"props":5786,"children":5787},{},[5788],{"type":30,"value":5789},"x̃",{"type":30,"value":5791},") is the exact middle value in the dataset. It's pretty much straightforward with odd numbers of observations, you just sort the spendings in ascending order and point to the value in the middle. As the formula suggests - divide the number of observations by 2 and round it upwards. That's your median!",{"type":21,"tag":26,"props":5793,"children":5794},{},[5795],{"type":21,"tag":272,"props":5796,"children":5797},{},[5798],{"type":30,"value":5799},"Median Formula (odd number of values):",{"type":21,"tag":490,"props":5801,"children":5803},{"code":5802},"x̃ = x₍ₙ₊₁₎/₂\n",[5804],{"type":21,"tag":495,"props":5805,"children":5806},{"__ignoreMap":8},[5807],{"type":30,"value":5802},{"type":21,"tag":26,"props":5809,"children":5810},{},[5811],{"type":30,"value":5812},"However our dataset has even 100 values. In this case the division by 2 would not result with the value in the middle. So in this case we have to find the arithmetic mean between the two values closest to the middle. That would be the sum of number 50 and 51, divided by 2.",{"type":21,"tag":26,"props":5814,"children":5815},{},[5816],{"type":21,"tag":272,"props":5817,"children":5818},{},[5819],{"type":30,"value":5820},"Median Formula (even number of values):",{"type":21,"tag":490,"props":5822,"children":5824},{"code":5823},"x̃ = (x₍ₙ/₂₎ + x₍ₙ/₂₊₁₎) / 2\n",[5825],{"type":21,"tag":495,"props":5826,"children":5827},{"__ignoreMap":8},[5828],{"type":30,"value":5823},{"type":21,"tag":26,"props":5830,"children":5831},{},[5832],{"type":30,"value":5833},"Use the following script to get a new random distribution, but less symmetric this time.",{"type":21,"tag":490,"props":5835,"children":5838},{"code":5836,"language":3935,"meta":8,"className":5837},"set.seed(123)  # Keep this for reproducibility\n# Generating a new random dataset with a longer tail on one side\ndata \u003C- c(rnorm(80, mean = 50, sd = 10), rnorm(20, mean = 80, sd = 10))\n\n# Plotting the data\nhist(data, main = \"Measure of Central Tendency\", \n           xlab = \"Values\", \n           ylab = \"Observations\",col = \"lightblue\", border = \"black\")\n\n# Adding an absolute line for the mean and median\nabline(v = mean(data), col = \"red\", lwd = 2)\nabline(v = median(data), col = \"green\", lwd = 2)\n# Getting the mean and median value in bash style\ncat(\"Mean:\", mean(data), \"\\n\")\ncat(\"Median:\", median(data), \"\\n\")\n",[3933],[5839],{"type":21,"tag":495,"props":5840,"children":5841},{"__ignoreMap":8},[5842],{"type":30,"value":5836},{"type":21,"tag":26,"props":5844,"children":5845},{},[5846],{"type":21,"tag":516,"props":5847,"children":5850},{"alt":5848,"src":5849},"Not Normal Distribution with mean and median","/img/img13.png",[],{"type":21,"tag":26,"props":5852,"children":5853},{},[5854],{"type":30,"value":5855},"Note how the median and mean are further apart from each other, to be precise they are $52.78 and $56.90 respectively. That's due to the fact that we've defined some spendings to be more than $100. And that's why we use median more often, cause it's resistant to the outliers and always point to the middle value unlike the mean.",{"type":21,"tag":44,"props":5857,"children":5859},{"id":5858},"quantiles",[5860],{"type":30,"value":5861},"Quantiles",{"type":21,"tag":26,"props":5863,"children":5864},{},[5865,5867,5872],{"type":30,"value":5866},"A quantile divides the dataset into several equal parts. The median is actually a particular case of a quantile which divides the dataset into two quantiles. Let's create a couple of variables using the ",{"type":21,"tag":272,"props":5868,"children":5869},{},[5870],{"type":30,"value":5871},"'quantile()'",{"type":30,"value":5873}," function.",{"type":21,"tag":490,"props":5875,"children":5878},{"code":5876,"language":3935,"meta":8,"className":5877},"# Defining median using quantiles function\nmy_median \u003C- quantile(data, probs = c(0.50))\n\n# Creating histogram plot\nhist(data, main = \"Median\", \n     xlab = \"Values\", \n     ylab = \"Observations\",col = \"lightblue\", border = \"black\")\n\n# Drawing a quantile\nabline(v = my_median, col = \"orange\", lwd =2)\n",[3933],[5879],{"type":21,"tag":495,"props":5880,"children":5881},{"__ignoreMap":8},[5882],{"type":30,"value":5876},{"type":21,"tag":26,"props":5884,"children":5885},{},[5886],{"type":21,"tag":516,"props":5887,"children":5890},{"alt":5888,"src":5889},"Displaying Median using quantiles function","/img/img14.png",[],{"type":21,"tag":26,"props":5892,"children":5893},{},[5894],{"type":30,"value":5895},"In this way we divide the dataset into two quantiles, everything above the median and everything below it. The dataset maybe sliced in any given number of quantiles. Let's cut it down to four quartiles.",{"type":21,"tag":490,"props":5897,"children":5900},{"code":5898,"language":3935,"meta":8,"className":5899},"# Defining quartiles\nquartile \u003C- quantile(data, probs = c(0.25,0.50, 0.75))\n\n# Creating histogram plot\nhist(data, main = \"Quartiles\", \n     xlab = \"Values\", \n     ylab = \"Observations\",col = \"lightblue\", border = \"black\")\n\n# Drawing a quantile\nabline(v = quartile, col = \"orange\", lwd =2)\n",[3933],[5901],{"type":21,"tag":495,"props":5902,"children":5903},{"__ignoreMap":8},[5904],{"type":30,"value":5898},{"type":21,"tag":26,"props":5906,"children":5907},{},[5908],{"type":21,"tag":516,"props":5909,"children":5912},{"alt":5910,"src":5911},"Displaying quartiles using quntile function","/img/img15.png",[],{"type":21,"tag":26,"props":5914,"children":5915},{},[5916],{"type":30,"value":5917},"Now we have four equal quantiles in terms of total amount of values, but notice the difference in range. We can keep slicing and dicing the distribution to quintiles and deciles or even percentiles if you need to. But we will move on to the next concept.",{"type":21,"tag":44,"props":5919,"children":5921},{"id":5920},"measure-of-variability",[5922],{"type":30,"value":5923},"Measure of variability",{"type":21,"tag":26,"props":5925,"children":5926},{},[5927],{"type":30,"value":5928},"The Central tendency is not the only measure to look at when analyzing the data. Now imagine you've been set for a business trip to three different cities. Let's check the weather to make an assumptions on how to dress and what clothes to pack in your luggage.",{"type":21,"tag":93,"props":5930,"children":5931},{},[5932,5973],{"type":21,"tag":97,"props":5933,"children":5934},{},[5935],{"type":21,"tag":101,"props":5936,"children":5937},{},[5938,5943,5948,5953,5958,5963,5968],{"type":21,"tag":105,"props":5939,"children":5940},{},[5941],{"type":30,"value":5942},"City",{"type":21,"tag":105,"props":5944,"children":5945},{"align":112},[5946],{"type":30,"value":5947},"Monday",{"type":21,"tag":105,"props":5949,"children":5950},{"align":112},[5951],{"type":30,"value":5952},"Tuesday",{"type":21,"tag":105,"props":5954,"children":5955},{"align":112},[5956],{"type":30,"value":5957},"Wednesday",{"type":21,"tag":105,"props":5959,"children":5960},{"align":112},[5961],{"type":30,"value":5962},"Thursday",{"type":21,"tag":105,"props":5964,"children":5965},{"align":112},[5966],{"type":30,"value":5967},"Friday",{"type":21,"tag":105,"props":5969,"children":5970},{"align":112},[5971],{"type":30,"value":5972},"Saturday",{"type":21,"tag":127,"props":5974,"children":5975},{},[5976,6008,6041],{"type":21,"tag":101,"props":5977,"children":5978},{},[5979,5984,5988,5992,5996,6000,6004],{"type":21,"tag":134,"props":5980,"children":5981},{},[5982],{"type":30,"value":5983},"Lulea",{"type":21,"tag":134,"props":5985,"children":5986},{"align":112},[5987],{"type":30,"value":138},{"type":21,"tag":134,"props":5989,"children":5990},{"align":112},[5991],{"type":30,"value":138},{"type":21,"tag":134,"props":5993,"children":5994},{"align":112},[5995],{"type":30,"value":138},{"type":21,"tag":134,"props":5997,"children":5998},{"align":112},[5999],{"type":30,"value":138},{"type":21,"tag":134,"props":6001,"children":6002},{"align":112},[6003],{"type":30,"value":138},{"type":21,"tag":134,"props":6005,"children":6006},{"align":112},[6007],{"type":30,"value":138},{"type":21,"tag":101,"props":6009,"children":6010},{},[6011,6016,6021,6025,6029,6033,6037],{"type":21,"tag":134,"props":6012,"children":6013},{},[6014],{"type":30,"value":6015},"Columbus",{"type":21,"tag":134,"props":6017,"children":6018},{"align":112},[6019],{"type":30,"value":6020},"-5",{"type":21,"tag":134,"props":6022,"children":6023},{"align":112},[6024],{"type":30,"value":165},{"type":21,"tag":134,"props":6026,"children":6027},{"align":112},[6028],{"type":30,"value":6020},{"type":21,"tag":134,"props":6030,"children":6031},{"align":112},[6032],{"type":30,"value":165},{"type":21,"tag":134,"props":6034,"children":6035},{"align":112},[6036],{"type":30,"value":6020},{"type":21,"tag":134,"props":6038,"children":6039},{"align":112},[6040],{"type":30,"value":165},{"type":21,"tag":101,"props":6042,"children":6043},{},[6044,6049,6054,6059,6064,6068,6072],{"type":21,"tag":134,"props":6045,"children":6046},{},[6047],{"type":30,"value":6048},"Dublin",{"type":21,"tag":134,"props":6050,"children":6051},{"align":112},[6052],{"type":30,"value":6053},"-30",{"type":21,"tag":134,"props":6055,"children":6056},{"align":112},[6057],{"type":30,"value":6058},"-10",{"type":21,"tag":134,"props":6060,"children":6061},{"align":112},[6062],{"type":30,"value":6063},"10",{"type":21,"tag":134,"props":6065,"children":6066},{"align":112},[6067],{"type":30,"value":6063},{"type":21,"tag":134,"props":6069,"children":6070},{"align":112},[6071],{"type":30,"value":6063},{"type":21,"tag":134,"props":6073,"children":6074},{"align":112},[6075],{"type":30,"value":6063},{"type":21,"tag":26,"props":6077,"children":6078},{},[6079],{"type":30,"value":6080},"If you take the weekly average temperature for each city of Lulea, Columbus or Dublin you will get the exactly same 0°C in every city. We don't care about the mean in this situation, but what we do care about - is the range or spread of the values.",{"type":21,"tag":673,"props":6082,"children":6084},{"id":6083},"range",[6085],{"type":30,"value":6086},"Range",{"type":21,"tag":26,"props":6088,"children":6089},{},[6090],{"type":30,"value":6091},"Calculating the range is pretty simple. You sort the values in ascending order, then find the difference between the maximum and minimum values. However be aware that just like an avarage, the range is quite sensitive for outliers.",{"type":21,"tag":26,"props":6093,"children":6094},{},[6095],{"type":21,"tag":272,"props":6096,"children":6097},{},[6098],{"type":30,"value":6099},"Range Formula:",{"type":21,"tag":490,"props":6101,"children":6103},{"code":6102},"Sort values: x₁ ≤ x₂ ≤ ... ≤ xₙ\nRange (R) = xₙ - x₁\n",[6104],{"type":21,"tag":495,"props":6105,"children":6106},{"__ignoreMap":8},[6107],{"type":30,"value":6102},{"type":21,"tag":26,"props":6109,"children":6110},{},[6111],{"type":21,"tag":272,"props":6112,"children":6113},{},[6114],{"type":30,"value":6115},"Example:",{"type":21,"tag":490,"props":6117,"children":6119},{"code":6118},"R(Dublin) = 10 - (-30) = 40\n",[6120],{"type":21,"tag":495,"props":6121,"children":6122},{"__ignoreMap":8},[6123],{"type":30,"value":6118},{"type":21,"tag":673,"props":6125,"children":6127},{"id":6126},"interquartile-range",[6128],{"type":30,"value":6129},"Interquartile Range",{"type":21,"tag":26,"props":6131,"children":6132},{},[6133],{"type":30,"value":6134},"Since all outliers are usually marginal values, they should lie on either edge of the dataset i.e. within the first or fourth quartile. That's why the Interquartile Range or IQR for short is used to eliminate the outliers impact. We slice the dataset to contain only the second and third quartiles, and that is the sweetest data for us to make an assumptions on.",{"type":21,"tag":26,"props":6136,"children":6137},{},[6138],{"type":21,"tag":272,"props":6139,"children":6140},{},[6141],{"type":30,"value":6142},"IQR Formula:",{"type":21,"tag":490,"props":6144,"children":6146},{"code":6145},"IQR = x̃₀.₇₅ - x̃₀.₂₅\n",[6147],{"type":21,"tag":495,"props":6148,"children":6149},{"__ignoreMap":8},[6150],{"type":30,"value":6145},{"type":21,"tag":26,"props":6152,"children":6153},{},[6154],{"type":30,"value":6155},"(This is the 75th percentile minus the 25th percentile)",{"type":21,"tag":26,"props":6157,"children":6158},{},[6159],{"type":30,"value":6160},"Let's create a new data for order transactions and group it by the payment providers. Now run the following script to create the dataframe.",{"type":21,"tag":490,"props":6162,"children":6165},{"code":6163,"language":3935,"meta":8,"className":6164},"set.seed(123) # Keep this for reproducibility\n\n# Create a new dataset for \ndata1 \u003C- c(rnorm(50, mean = 150, sd = 30), rnorm(50, mean = 200, sd = 40), 300)\ndata2 \u003C- c(rnorm(50, mean = 180, sd = 20), rnorm(50, mean = 220, sd = 30), 250)\ndata3 \u003C- c(rnorm(50, mean = 160, sd = 25), rnorm(50, mean = 210, sd = 35), 270)\n\n# Create the data frame\ndf \u003C- data.frame(\n  Group = rep(c(\"Jusan\", \"Halyk\", \"Kaspi\"), each = 101),\n  Values = c(data1, data2, data3)\n)\n",[3933],[6166],{"type":21,"tag":495,"props":6167,"children":6168},{"__ignoreMap":8},[6169],{"type":30,"value":6163},{"type":21,"tag":26,"props":6171,"children":6172},{},[6173,6175,6180],{"type":30,"value":6174},"Further we will use the advance visualization ",{"type":21,"tag":272,"props":6176,"children":6177},{},[6178],{"type":30,"value":6179},"'ggplot2'",{"type":30,"value":6181}," library to create the boxplot that will enable us to assess measures of variability and central tendency alike.",{"type":21,"tag":26,"props":6183,"children":6184},{},[6185],{"type":30,"value":6186},"The installation would be required first, and then you can run the boxplot script.",{"type":21,"tag":490,"props":6188,"children":6191},{"code":6189,"language":3935,"meta":8,"className":6190},"# Installing ggplot2 library\ninstall.packages(\"ggplot2\")\n# Importing it\nlibrary(ggplot2)\n",[3933],[6192],{"type":21,"tag":495,"props":6193,"children":6194},{"__ignoreMap":8},[6195],{"type":30,"value":6189},{"type":21,"tag":490,"props":6197,"children":6200},{"code":6198,"language":3935,"meta":8,"className":6199},"# Creating a boxplot\nggplot(df, aes(x = Group, y = Values, fill = Group)) +\n  geom_boxplot(notch = TRUE, color = \"darkblue\", alpha = 0.7) +\n  labs(title = \"Comparison of Three Groups\",\n       y = \"Values\", x = \"Groups\") +\n  theme_bw() +\n  scale_fill_manual(values = c(\"Jusan\" = \"darkorange\", \"Halyk\" = \"darkgreen\", \"Kaspi\" = \"#bd1206\"))\n",[3933],[6201],{"type":21,"tag":495,"props":6202,"children":6203},{"__ignoreMap":8},[6204],{"type":30,"value":6198},{"type":21,"tag":26,"props":6206,"children":6207},{},[6208],{"type":21,"tag":516,"props":6209,"children":6212},{"alt":6210,"src":6211},"The boxplot displaying payment providers","/img/img16.png",[],{"type":21,"tag":26,"props":6214,"children":6215},{},[6216,6218,6223,6225,6230,6231,6236],{"type":30,"value":6217},"Looking at this graph we can see three different plots representing ",{"type":21,"tag":272,"props":6219,"children":6220},{},[6221],{"type":30,"value":6222},"'Halyk'",{"type":30,"value":6224},", ",{"type":21,"tag":272,"props":6226,"children":6227},{},[6228],{"type":30,"value":6229},"'Jusan'",{"type":30,"value":3809},{"type":21,"tag":272,"props":6232,"children":6233},{},[6234],{"type":30,"value":6235},"'Kaspi'",{"type":30,"value":6237}," in green, orange and red respectively.",{"type":21,"tag":26,"props":6239,"children":6240},{},[6241],{"type":30,"value":6242},"Think of the sandglass shape like this: the middle line is the median, the whole sandglass thing is the IQR. The bottom part is the second quartile, the top part is the third quartile, above it is the fourth quartile, and below the IQR is the first quartile. And the outlier? It's a lonely point that separated from the Halyk Group boxplot.",{"type":21,"tag":26,"props":6244,"children":6245},{},[6246],{"type":30,"value":6247},"If you made it this far, then congrats fellow! You've learn how to perform decent descriptive analysis using measures of Central Tendency and Variablity. The next section would get a little bit tricky, but pretty much exciting for nerds like me.",{"type":21,"tag":44,"props":6249,"children":6251},{"id":6250},"variance",[6252],{"type":30,"value":6253},"Variance",{"type":21,"tag":26,"props":6255,"children":6256},{},[6257],{"type":30,"value":6258},"Sometimes we have to understand how far the values in the dataset from the mean or median. That helps both in Statistical and Machine Learning.",{"type":21,"tag":673,"props":6260,"children":6262},{"id":6261},"absolute-deviation",[6263],{"type":30,"value":6264},"Absolute Deviation",{"type":21,"tag":26,"props":6266,"children":6267},{},[6268],{"type":30,"value":6269},"The absolute deviation is the sum of differences between the average and each and every given value of the dataset divided by total observations. The math is pretty much straightforward.",{"type":21,"tag":26,"props":6271,"children":6272},{},[6273],{"type":21,"tag":272,"props":6274,"children":6275},{},[6276],{"type":30,"value":6277},"Absolute Deviation Formula:",{"type":21,"tag":490,"props":6279,"children":6281},{"code":6280},"D = Σ(xᵢ - A) / n   for i=1 to n\n",[6282],{"type":21,"tag":495,"props":6283,"children":6284},{"__ignoreMap":8},[6285],{"type":30,"value":6280},{"type":21,"tag":26,"props":6287,"children":6288},{},[6289],{"type":30,"value":703},{"type":21,"tag":264,"props":6291,"children":6292},{},[6293,6302,6311,6319],{"type":21,"tag":268,"props":6294,"children":6295},{},[6296,6300],{"type":21,"tag":272,"props":6297,"children":6298},{},[6299],{"type":30,"value":3745},{"type":30,"value":6301}," = deviation",{"type":21,"tag":268,"props":6303,"children":6304},{},[6305,6309],{"type":21,"tag":272,"props":6306,"children":6307},{},[6308],{"type":30,"value":3712},{"type":30,"value":6310}," = average (mean or median)",{"type":21,"tag":268,"props":6312,"children":6313},{},[6314,6318],{"type":21,"tag":272,"props":6315,"children":6316},{},[6317],{"type":30,"value":5714},{"type":30,"value":5716},{"type":21,"tag":268,"props":6320,"children":6321},{},[6322,6326],{"type":21,"tag":272,"props":6323,"children":6324},{},[6325],{"type":30,"value":5694},{"type":30,"value":5696},{"type":21,"tag":26,"props":6328,"children":6329},{},[6330],{"type":30,"value":6331},"However, there is a technical problem with this formula, it would only work if our distribution doesn't have any negative value within it, just like our dataset happens to be.",{"type":21,"tag":26,"props":6333,"children":6334},{},[6335],{"type":21,"tag":516,"props":6336,"children":6339},{"alt":6337,"src":6338},"Absolute Deviation of a certain value","/img/img17.png",[],{"type":21,"tag":26,"props":6341,"children":6342},{},[6343],{"type":30,"value":6344},"Otherwise, just like with the weather example where the negative and positive values cancels eachother out, it would be rather meaningless to have 0 deviation at actual 10°C range, so for this reason the formula has been modified to use absolute values (modules) like so:",{"type":21,"tag":26,"props":6346,"children":6347},{},[6348],{"type":21,"tag":272,"props":6349,"children":6350},{},[6351],{"type":30,"value":6352},"Modified Formula:",{"type":21,"tag":490,"props":6354,"children":6356},{"code":6355},"D = Σ|xᵢ - A| / n   for i=1 to n\n",[6357],{"type":21,"tag":495,"props":6358,"children":6359},{"__ignoreMap":8},[6360],{"type":30,"value":6355},{"type":21,"tag":26,"props":6362,"children":6363},{},[6364],{"type":30,"value":6365},"The vertical bars | | represent absolute value, which makes all differences positive.",{"type":21,"tag":26,"props":6367,"children":6368},{},[6369,6371],{"type":30,"value":6370},"Depending on your requirements you may need either an ",{"type":21,"tag":272,"props":6372,"children":6373},{},[6374],{"type":30,"value":6375},"Absolute Median Deviation:",{"type":21,"tag":490,"props":6377,"children":6379},{"code":6378},"D(x̃₀.₅) = Σ|xᵢ - x̃₀.₅| / n   for i=1 to n\n",[6380],{"type":21,"tag":495,"props":6381,"children":6382},{"__ignoreMap":8},[6383],{"type":30,"value":6378},{"type":21,"tag":26,"props":6385,"children":6386},{},[6387,6389],{"type":30,"value":6388},"Or an ",{"type":21,"tag":272,"props":6390,"children":6391},{},[6392],{"type":30,"value":6393},"Absolute Mean Deviation:",{"type":21,"tag":490,"props":6395,"children":6397},{"code":6396},"D(x̄) = Σ|xᵢ - x̄| / n   for i=1 to n\n",[6398],{"type":21,"tag":495,"props":6399,"children":6400},{"__ignoreMap":8},[6401],{"type":30,"value":6396},{"type":21,"tag":26,"props":6403,"children":6404},{},[6405],{"type":30,"value":6406},"Essentially the same thing, but different point of reference to the average.",{"type":21,"tag":44,"props":6408,"children":6410},{"id":6409},"back-propagation",[6411],{"type":30,"value":6412},"Back Propagation",{"type":21,"tag":26,"props":6414,"children":6415},{},[6416],{"type":30,"value":6417},"Absolute values do a great job eliminating the problem of negative values. However when the task requires optimization for machine learning algorithms like neural networks for instance, then the Absolute Values become a huge problem.",{"type":21,"tag":26,"props":6419,"children":6420},{},[6421],{"type":30,"value":6422},"How does a machine learn to distinguish between cat and dog? It learns through a process of back propagation: it gives a shot saying \"cat is dog\", it then receives an error, this error is propagated back to the algorithm optimizing through yet another numerous iteration.",{"type":21,"tag":26,"props":6424,"children":6425},{},[6426],{"type":30,"value":6427},"At the very core of its engine, neural networks use a function's derivative to adjust itself in case of error. But what happens if the algorithm uses absolute deviation?",{"type":21,"tag":26,"props":6429,"children":6430},{},[6431],{"type":21,"tag":516,"props":6432,"children":6435},{"alt":6433,"src":6434},"Absolute Value Function","/img/img18.png",[],{"type":21,"tag":26,"props":6437,"children":6438},{},[6439],{"type":30,"value":6440},"The Absolute Value Function gets to a corner point where the Lipschitz condition is violated, and no derivative is available to propagate the error back to the machine, hence the neural network algorithm breaks.",{"type":21,"tag":673,"props":6442,"children":6444},{"id":6443},"mean-squared-error",[6445],{"type":30,"value":6446},"Mean Squared Error",{"type":21,"tag":26,"props":6448,"children":6449},{},[6450,6452,6456],{"type":30,"value":6451},"Finding the derivative requires a tangent line that gives the machine a new direction to adjust itself. Therefore the Absolute Deviation evolved into ",{"type":21,"tag":272,"props":6453,"children":6454},{},[6455],{"type":30,"value":6446},{"type":30,"value":6457}," which uses square instead of absolute value.",{"type":21,"tag":26,"props":6459,"children":6460},{},[6461],{"type":21,"tag":272,"props":6462,"children":6463},{},[6464],{"type":30,"value":6465},"Mean Squared Error Formula:",{"type":21,"tag":490,"props":6467,"children":6469},{"code":6468},"S²(A) = Σ(xᵢ - A)² / n   for i=1 to n\n",[6470],{"type":21,"tag":495,"props":6471,"children":6472},{"__ignoreMap":8},[6473],{"type":30,"value":6468},{"type":21,"tag":26,"props":6475,"children":6476},{},[6477],{"type":30,"value":6478},"This allows keeping absolute values and the function's derivative both at the same time, and that optimizes back propagation to work as expected.",{"type":21,"tag":26,"props":6480,"children":6481},{},[6482],{"type":21,"tag":516,"props":6483,"children":6486},{"alt":6484,"src":6485},"Tangency Line on a function","/img/img19.png",[],{"type":21,"tag":673,"props":6488,"children":6490},{"id":6489},"sample-variance",[6491],{"type":30,"value":6492},"Sample Variance",{"type":21,"tag":26,"props":6494,"children":6495},{},[6496,6498,6503,6505,6509],{"type":30,"value":6497},"The optimization task always requires Mean Squared Error to be as minimum as possible. That's pretty obvious! We want our neural network to be precise, and not making any mistakes, hence the name Mean Squared ",{"type":21,"tag":272,"props":6499,"children":6500},{},[6501],{"type":30,"value":6502},"Error",{"type":30,"value":6504},". The MSE gets its minimum value when the Central Tendency equals the Arithmetic Mean (A = x̄), and that's called ",{"type":21,"tag":272,"props":6506,"children":6507},{},[6508],{"type":30,"value":6492},{"type":30,"value":3759},{"type":21,"tag":26,"props":6511,"children":6512},{},[6513],{"type":21,"tag":272,"props":6514,"children":6515},{},[6516],{"type":30,"value":6517},"Sample Variance Formula:",{"type":21,"tag":490,"props":6519,"children":6521},{"code":6520},"S̃² = Σ(xᵢ - x̄)² / n   for i=1 to n\n",[6522],{"type":21,"tag":495,"props":6523,"children":6524},{"__ignoreMap":8},[6525],{"type":30,"value":6520},{"type":21,"tag":26,"props":6527,"children":6528},{},[6529],{"type":30,"value":6530},"It may feel a bit off to use square values, but so what? We don't care about actual values if it allows the machine to learn how to distinguish between cancer and healthy cells, right?",{"type":21,"tag":673,"props":6532,"children":6534},{"id":6533},"standard-deviation",[6535],{"type":30,"value":6536},"Standard Deviation",{"type":21,"tag":26,"props":6538,"children":6539},{},[6540],{"type":30,"value":6541},"In the end if the original values are required, we are always allowed to apply the square root to the Sample Variance, and that's how the Standard Deviation kicks into the game.",{"type":21,"tag":26,"props":6543,"children":6544},{},[6545],{"type":21,"tag":272,"props":6546,"children":6547},{},[6548],{"type":30,"value":6549},"Standard Deviation Formula:",{"type":21,"tag":490,"props":6551,"children":6553},{"code":6552},"S̃ = √[Σ(xᵢ - x̄)² / n]   for i=1 to n\n",[6554],{"type":21,"tag":495,"props":6555,"children":6556},{"__ignoreMap":8},[6557],{"type":30,"value":6552},{"type":21,"tag":26,"props":6559,"children":6560},{},[6561],{"type":30,"value":6562},"Or written out:",{"type":21,"tag":490,"props":6564,"children":6566},{"code":6565},"S̃ = square root of [(1/n) × sum of (xᵢ - x̄)² for all i from 1 to n]\n",[6567],{"type":21,"tag":495,"props":6568,"children":6569},{"__ignoreMap":8},[6570],{"type":30,"value":6565},{"type":21,"tag":26,"props":6572,"children":6573},{},[6574],{"type":30,"value":6575},"That's enough for an introduction into Statistical Data Analysis, I hope you enjoyed.",{"type":21,"tag":26,"props":6577,"children":6578},{},[6579,6580,6583],{"type":30,"value":2215},{"type":21,"tag":2217,"props":6581,"children":6582},{},[],{"type":30,"value":2221},{"title":8,"searchDepth":596,"depth":596,"links":6585},[6586,6587,6591,6592,6596,6599],{"id":5574,"depth":596,"text":5577},{"id":5634,"depth":596,"text":5637,"children":6588},[6589,6590],{"id":5645,"depth":1260,"text":5648},{"id":5776,"depth":1260,"text":5779},{"id":5858,"depth":596,"text":5861},{"id":5920,"depth":596,"text":5923,"children":6593},[6594,6595],{"id":6083,"depth":1260,"text":6086},{"id":6126,"depth":1260,"text":6129},{"id":6250,"depth":596,"text":6253,"children":6597},[6598],{"id":6261,"depth":1260,"text":6264},{"id":6409,"depth":596,"text":6412,"children":6600},[6601,6602,6603],{"id":6443,"depth":1260,"text":6446},{"id":6489,"depth":1260,"text":6492},{"id":6533,"depth":1260,"text":6536},"content:posts:statistical-data-analysis-101.md","posts/statistical-data-analysis-101.md","posts/statistical-data-analysis-101",{"_path":6608,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":6609,"description":6610,"date":6611,"draft":7,"tags":6612,"thumbnail":6613,"alt_description":6614,"slug":6615,"body":6616,"_type":604,"_id":7765,"_source":606,"_file":7766,"_stem":7767,"_extension":609},"/posts/how-to-learn-coding","How to Learn Coding","Start coding with these 3 steps! Learn to code in a browser, explore the terminal, and dive into an IDE like VS Code. Follow hands-on exercises and build a bilingual project","2023-12-27T00:00:00.000Z",[2805],"/img/how_to_learn_coding.png","Learn to code in three simple steps","how-to-learn-coding",{"type":18,"children":6617,"toc":7750},[6618,6624,6629,6634,6639,6645,6688,6696,6701,6707,6712,6723,6731,6736,6742,6754,6763,6768,6777,6785,6804,6809,6815,6832,6838,6857,6868,6873,6878,6883,6892,6897,6906,6911,6919,6930,6939,6944,6950,6962,6971,6980,6999,7008,7013,7022,7027,7033,7038,7047,7092,7098,7128,7137,7149,7157,7163,7168,7177,7189,7198,7209,7221,7229,7234,7240,7252,7271,7284,7293,7299,7304,7313,7344,7352,7382,7390,7402,7411,7419,7450,7462,7471,7479,7495,7504,7514,7523,7533,7542,7559,7568,7578,7587,7605,7610,7614,7619,7624,7638,7649,7655,7661,7673,7679,7691,7697,7720,7726,7731,7737,7742],{"type":21,"tag":44,"props":6619,"children":6621},{"id":6620},"start-to-code-in-these-3-steps",[6622],{"type":30,"value":6623},"Start to Code in These 3 Steps",{"type":21,"tag":26,"props":6625,"children":6626},{},[6627],{"type":30,"value":6628},"You probably landed this page because you want to learn to code, which is great! Most people still didn't figure out what they want to do in their life, but you already did the hardest part. The coding is truly powerful skill. It enables you to write instructions to the machine which in turn builds stuff like websites, mobile applications, rocket missiles launching systems and many more.",{"type":21,"tag":26,"props":6630,"children":6631},{},[6632],{"type":30,"value":6633},"As Data Scientists we use Coding skills to access, process and manage raw data, build graphs to visually comprehend the nature of dataset we ended-up working with, and eventually develop a statistical and machine learning models to predict the likelihood of certain events depending on the given task.",{"type":21,"tag":26,"props":6635,"children":6636},{},[6637],{"type":30,"value":6638},"In this article we will focus on how to start coding immediately, without too much of a theory behind it. We will run a decent amount of lines of code by following the best practices of learning by doing. Please enjoy.",{"type":21,"tag":44,"props":6640,"children":6642},{"id":6641},"step-1-learn-to-code-in-a-browser",[6643],{"type":30,"value":6644},"Step 1: Learn to Code in a Browser",{"type":21,"tag":26,"props":6646,"children":6647},{},[6648,6650,6655,6657,6662,6663,6668,6670,6675,6676,6681,6682,6686],{"type":30,"value":6649},"As you read this article through a webpage, you already have the tools at your fingertips. Press ",{"type":21,"tag":272,"props":6651,"children":6652},{},[6653],{"type":30,"value":6654},"Ctrl",{"type":30,"value":6656}," + ",{"type":21,"tag":272,"props":6658,"children":6659},{},[6660],{"type":30,"value":6661},"Shift",{"type":30,"value":6656},{"type":21,"tag":272,"props":6664,"children":6665},{},[6666],{"type":30,"value":6667},"J",{"type":30,"value":6669}," if running on Windows or ",{"type":21,"tag":272,"props":6671,"children":6672},{},[6673],{"type":30,"value":6674},"Command",{"type":30,"value":6656},{"type":21,"tag":272,"props":6677,"children":6678},{},[6679],{"type":30,"value":6680},"Option",{"type":30,"value":6656},{"type":21,"tag":272,"props":6683,"children":6684},{},[6685],{"type":30,"value":6667},{"type":30,"value":6687}," if running on MacOS. It will open the console which appears to be on the right side of the browser in my case. It might appear on the bottom as well, but let's just focus on JavaScript.",{"type":21,"tag":26,"props":6689,"children":6690},{},[6691],{"type":21,"tag":516,"props":6692,"children":6695},{"alt":6693,"src":6694},"Browser console ready to receive an input","/img/img1.png",[],{"type":21,"tag":26,"props":6697,"children":6698},{},[6699],{"type":30,"value":6700},"JavaScript is an excellent language to start experimenting with, especially if you're attracted to dynamic interfaces. I encourage you to type all of the commands manually to have grasp feeling of what it's like to be a programmer.",{"type":21,"tag":3778,"props":6702,"children":6704},{"id":6703},"arithmetics",[6705],{"type":30,"value":6706},"Arithmetics",{"type":21,"tag":26,"props":6708,"children":6709},{},[6710],{"type":30,"value":6711},"We will start with math operations using numbers and arithmetic signs in the console just like we would do using simple calculator. The result will be returned by the console line even before you hit the enter button, try it yourself.",{"type":21,"tag":490,"props":6713,"children":6718},{"className":6714,"code":6716,"language":6717,"meta":8},[6715],"language-javascript","3 + 4 // Performs the addition\n\n10 - 5 // Performs the substruction\n\n25 * 3 // Perfroms the multiplication\n\n125 / 5 // Performs division\n\n5 ** 2 // Performs exponential\n\nMath.sqrt(25) // The method to perform square root\n\nclear() // function that clears the console for convinience, optional\n","javascript",[6719],{"type":21,"tag":495,"props":6720,"children":6721},{"__ignoreMap":8},[6722],{"type":30,"value":6716},{"type":21,"tag":26,"props":6724,"children":6725},{},[6726],{"type":21,"tag":516,"props":6727,"children":6730},{"alt":6728,"src":6729},"Arithmetic commands executed in the console","/img/img2.png",[],{"type":21,"tag":26,"props":6732,"children":6733},{},[6734],{"type":30,"value":6735},"Well Done! Did you feel akward with the Math.sqrt() method ? You will learn about methods along the way, and don't worry if you missreading a couple of words or two. When learning a programming it's inevitable to come across the concepts that are not yet familiar to you. Now let's get a little bit dirty and nail one of those concepts.",{"type":21,"tag":3778,"props":6737,"children":6739},{"id":6738},"variables",[6740],{"type":30,"value":6741},"Variables",{"type":21,"tag":26,"props":6743,"children":6744},{},[6745,6747,6752],{"type":30,"value":6746},"Let's create a variable ",{"type":21,"tag":272,"props":6748,"children":6749},{},[6750],{"type":30,"value":6751},"number",{"type":30,"value":6753}," that will prompt you to store the value of any number.",{"type":21,"tag":490,"props":6755,"children":6758},{"className":6756,"code":6757,"language":6717,"meta":8},[6715],"const number = prompt('Enter the square rootable number: ');\n",[6759],{"type":21,"tag":495,"props":6760,"children":6761},{"__ignoreMap":8},[6762],{"type":30,"value":6757},{"type":21,"tag":26,"props":6764,"children":6765},{},[6766],{"type":30,"value":6767},"Now you are about to find out why it's getting dirty. When coding it's essential for a programmer to be very attentive and accurate to the command prompt. Let's try the following code and see what heppens.",{"type":21,"tag":490,"props":6769,"children":6772},{"className":6770,"code":6771,"language":6717,"meta":8},[6715],"const result = Math.sqrt(number);\nconsole.log(`The square root of ${number} is ${result}`);\n",[6773],{"type":21,"tag":495,"props":6774,"children":6775},{"__ignoreMap":8},[6776],{"type":30,"value":6771},{"type":21,"tag":26,"props":6778,"children":6779},{},[6780],{"type":21,"tag":516,"props":6781,"children":6784},{"alt":6782,"src":6783},"Creating variable with prompt input to print it using console.log","/img/img3.png",[],{"type":21,"tag":26,"props":6786,"children":6787},{},[6788,6790,6795,6797,6802],{"type":30,"value":6789},"If you are windy like me, there is a big chances your are confused the backtick sign ",{"type":21,"tag":628,"props":6791,"children":6792},{},[6793],{"type":30,"value":6794},"`",{"type":30,"value":6796},"   with the quote sign  ",{"type":21,"tag":628,"props":6798,"children":6799},{},[6800],{"type":30,"value":6801},"'",{"type":30,"value":6803},"   using which would not return the expected result. Eventually the learning curve will leave you frustruated with the console and error messages, but as aspiring programmers we should embrace it as the part of our job, and keep learning.",{"type":21,"tag":26,"props":6805,"children":6806},{},[6807],{"type":30,"value":6808},"JavaScript is bad ass programming language. It has both Front and Back end web frameworks, and even mobile-app frameworks for Android and IOS devices alike. To put it into perspective - my beloved Python only handles the back-end for web applications.",{"type":21,"tag":673,"props":6810,"children":6812},{"id":6811},"step-2-learn-to-code-in-terminal",[6813],{"type":30,"value":6814},"Step 2: Learn to Code in Terminal",{"type":21,"tag":26,"props":6816,"children":6817},{},[6818,6820,6824,6825,6830],{"type":30,"value":6819},"Just now we've been using browser as the mediator between the User and the Machine which is only possible with JavaScript, but now we would like to write instructions directly into the machine itself. If you're using Windows operating system, you might need to make a few tweaks to ensure a seamless coding experience. While Windows is widely used and affordable, it's just never meant to be a developing machine. Whereas on MacOS you may just press ",{"type":21,"tag":272,"props":6821,"children":6822},{},[6823],{"type":30,"value":6674},{"type":30,"value":6656},{"type":21,"tag":272,"props":6826,"children":6827},{},[6828],{"type":30,"value":6829},"Т",{"type":30,"value":6831}," to find a Terminal and skip the next section.",{"type":21,"tag":3778,"props":6833,"children":6835},{"id":6834},"install-wsl2-on-windows",[6836],{"type":30,"value":6837},"Install WSL2 on Windows",{"type":21,"tag":26,"props":6839,"children":6840},{},[6841,6843,6848,6850,6855],{"type":30,"value":6842},"In order to successfully run any code on the Windows machine, you should do it in the Windows Subsystem for Linux or ",{"type":21,"tag":272,"props":6844,"children":6845},{},[6846],{"type":30,"value":6847},"WSL2",{"type":30,"value":6849}," for short. Linux has many advantages over Windows, but we will focus only on ",{"type":21,"tag":272,"props":6851,"children":6852},{},[6853],{"type":30,"value":6854},"Terminal",{"type":30,"value":6856},", and for that we should open the PowerShell as Administrator and run the following command.",{"type":21,"tag":490,"props":6858,"children":6863},{"className":6859,"code":6861,"language":6862,"meta":8},[6860],"language-powershell","wsl --install\n","powershell",[6864],{"type":21,"tag":495,"props":6865,"children":6866},{"__ignoreMap":8},[6867],{"type":30,"value":6861},{"type":21,"tag":26,"props":6869,"children":6870},{},[6871],{"type":30,"value":6872},"Take a a little break, since the installation might take a few minutes.",{"type":21,"tag":26,"props":6874,"children":6875},{},[6876],{"type":30,"value":6877},"If running on Windows 11, the Ubuntu distro should be installed by default. If so create a Linux username and password. I would suggest to set easy password like single space.",{"type":21,"tag":26,"props":6879,"children":6880},{},[6881],{"type":30,"value":6882},"For Windows 10 users proceed with the Distro installation as following.",{"type":21,"tag":490,"props":6884,"children":6887},{"className":6885,"code":6886,"language":6862,"meta":8},[6860],"wsl --list --online\n",[6888],{"type":21,"tag":495,"props":6889,"children":6890},{"__ignoreMap":8},[6891],{"type":30,"value":6886},{"type":21,"tag":26,"props":6893,"children":6894},{},[6895],{"type":30,"value":6896},"This line of code will list the available Linux distributions. Choose any distro name. I'll choose Debian for demonstrative purposes, but you can install Ubuntu or Kali it doesn't really matter.",{"type":21,"tag":490,"props":6898,"children":6901},{"className":6899,"code":6900,"language":6862,"meta":8},[6860],"wsl --install -d Debian\n",[6902],{"type":21,"tag":495,"props":6903,"children":6904},{"__ignoreMap":8},[6905],{"type":30,"value":6900},{"type":21,"tag":26,"props":6907,"children":6908},{},[6909],{"type":30,"value":6910},"Create username and password after the installation. You might need to restart your machine and check the start menu to see if there is a Linux distribution available. Open your Debian or Ubuntu or whatever distro you've installed, and that's your Terminal.",{"type":21,"tag":26,"props":6912,"children":6913},{},[6914],{"type":21,"tag":516,"props":6915,"children":6918},{"alt":6916,"src":6917},"Linux Terminal installed and opened via Start menu","/img/img4.png",[],{"type":21,"tag":26,"props":6920,"children":6921},{},[6922,6924,6928],{"type":30,"value":6923},"Inside the Linux Terminal you are able to run ",{"type":21,"tag":628,"props":6925,"children":6926},{},[6927],{"type":30,"value":1414},{"type":30,"value":6929}," commands, try this for instance:",{"type":21,"tag":490,"props":6931,"children":6934},{"className":6932,"code":6933,"language":1414,"meta":8},[1412],"sudo apt-get install && sudo apt-get upgrade\n",[6935],{"type":21,"tag":495,"props":6936,"children":6937},{"__ignoreMap":8},[6938],{"type":30,"value":6933},{"type":21,"tag":26,"props":6940,"children":6941},{},[6942],{"type":30,"value":6943},"It should prompt for your password and update your Linux system. Isn't it cool ? You'd never miss any update with such a bad ass interactive interface. I literally prompt this command every signle day, and it still blowing my mind.",{"type":21,"tag":3778,"props":6945,"children":6947},{"id":6946},"terminal-and-python-shell",[6948],{"type":30,"value":6949},"Terminal and Python Shell",{"type":21,"tag":26,"props":6951,"children":6952},{},[6953,6955,6960],{"type":30,"value":6954},"Now regardless of your Operating System, you can run ",{"type":21,"tag":272,"props":6956,"children":6957},{},[6958],{"type":30,"value":6959},"Python",{"type":30,"value":6961}," directly in your Terminal using this bash command.",{"type":21,"tag":490,"props":6963,"children":6966},{"className":6964,"code":6965,"language":895,"meta":8},[897],"python3\n",[6967],{"type":21,"tag":495,"props":6968,"children":6969},{"__ignoreMap":8},[6970],{"type":30,"value":6965},{"type":21,"tag":490,"props":6972,"children":6975},{"className":6973,"code":6974,"language":1414,"meta":8},[1412],"Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\n>>>\n",[6976],{"type":21,"tag":495,"props":6977,"children":6978},{"__ignoreMap":8},[6979],{"type":30,"value":6974},{"type":21,"tag":26,"props":6981,"children":6982},{},[6983,6985,6990,6992,6997],{"type":30,"value":6984},"As we are entered In the state of tripple arrows ",{"type":21,"tag":272,"props":6986,"children":6987},{},[6988],{"type":30,"value":6989},">>>",{"type":30,"value":6991},", we are in so called ",{"type":21,"tag":272,"props":6993,"children":6994},{},[6995],{"type":30,"value":6996},"Python Shell",{"type":30,"value":6998}," which is a powerful Python way to interact with your machine. We can perform the arithmetic commands as we did using JavaScript.",{"type":21,"tag":490,"props":7000,"children":7003},{"className":7001,"code":7002,"language":895,"meta":8},[897],"1 + 1\n2 * 5\n5 ** 2\n13 / 4\n13 // 4 # This is a strict division\n",[7004],{"type":21,"tag":495,"props":7005,"children":7006},{"__ignoreMap":8},[7007],{"type":30,"value":7002},{"type":21,"tag":26,"props":7009,"children":7010},{},[7011],{"type":30,"value":7012},"The square root is more tricky, to do that you have to import the Math module first.",{"type":21,"tag":490,"props":7014,"children":7017},{"className":7015,"code":7016,"language":895,"meta":8},[897],"import math\nmath.sqrt(25)\n",[7018],{"type":21,"tag":495,"props":7019,"children":7020},{"__ignoreMap":8},[7021],{"type":30,"value":7016},{"type":21,"tag":26,"props":7023,"children":7024},{},[7025],{"type":30,"value":7026},"While the neccessity of importing modules might look like a disadvantage when comparing to JavaScript. The modularity in Python is actually what makes it so widely used in many different fields from Machine Learning to Space Engineering, because there is just limitless amount of modules available for different purposes.",{"type":21,"tag":3778,"props":7028,"children":7030},{"id":7029},"lists",[7031],{"type":30,"value":7032},"Lists",{"type":21,"tag":26,"props":7034,"children":7035},{},[7036],{"type":30,"value":7037},"Now let's create a few variables using Python Shell.",{"type":21,"tag":490,"props":7039,"children":7042},{"className":7040,"code":7041,"language":895,"meta":8},[897],"oranges = 5\nkids = ['Aki', 'Kenshin', 'Konrat']\n",[7043],{"type":21,"tag":495,"props":7044,"children":7045},{"__ignoreMap":8},[7046],{"type":30,"value":7041},{"type":21,"tag":26,"props":7048,"children":7049},{},[7050,7052,7056,7058,7063,7065,7070,7072,7077,7078,7083,7085,7090],{"type":30,"value":7051},"The first variable contains the number ",{"type":21,"tag":272,"props":7053,"children":7054},{},[7055],{"type":30,"value":165},{"type":30,"value":7057}," representing the amount of oranges, and the second one contains the ",{"type":21,"tag":272,"props":7059,"children":7060},{},[7061],{"type":30,"value":7062},"list",{"type":30,"value":7064}," of names representing kids names ",{"type":21,"tag":272,"props":7066,"children":7067},{},[7068],{"type":30,"value":7069},"Aki",{"type":30,"value":7071}," , ",{"type":21,"tag":272,"props":7073,"children":7074},{},[7075],{"type":30,"value":7076},"Kenshin",{"type":30,"value":3795},{"type":21,"tag":272,"props":7079,"children":7080},{},[7081],{"type":30,"value":7082},"Konrat",{"type":30,"value":7084}," respectively. It's way more intuitive to create a variable when comparing to JavaScript, because of the Python's beautiful syntax, however we are paying the price for that ease of syntax, and the price is the ",{"type":21,"tag":272,"props":7086,"children":7087},{},[7088],{"type":30,"value":7089},"indentation",{"type":30,"value":7091},", which makes it a bit more challenging. We will get to that in the next section.",{"type":21,"tag":3778,"props":7093,"children":7095},{"id":7094},"for-loops",[7096],{"type":30,"value":7097},"For loops",{"type":21,"tag":26,"props":7099,"children":7100},{},[7101,7103,7108,7110,7115,7116,7121,7123],{"type":30,"value":7102},"Now imaging that we have to supply our kids with oranges programmatically. Let's do that using the ",{"type":21,"tag":272,"props":7104,"children":7105},{},[7106],{"type":30,"value":7107},"for loop",{"type":30,"value":7109}," which is the sequence of instructions written in short. Make sure to follow the indentation format by creating space using ",{"type":21,"tag":272,"props":7111,"children":7112},{},[7113],{"type":30,"value":7114},"tab",{"type":30,"value":3536},{"type":21,"tag":272,"props":7117,"children":7118},{},[7119],{"type":30,"value":7120},"four spaces",{"type":30,"value":7122}," on each line that follows ",{"type":21,"tag":272,"props":7124,"children":7125},{},[7126],{"type":30,"value":7127},"for i in kids:",{"type":21,"tag":490,"props":7129,"children":7132},{"className":7130,"code":7131,"language":895,"meta":8},[897],"for i in kids:\n    oranges -= 1\n    print(f'{i} received an orange, {oranges} oranges remaining')\n",[7133],{"type":21,"tag":495,"props":7134,"children":7135},{"__ignoreMap":8},[7136],{"type":30,"value":7131},{"type":21,"tag":26,"props":7138,"children":7139},{},[7140,7142,7147],{"type":30,"value":7141},"Every command line that being indented will be repeated as the sequence for each kid in the list subtracting the amount of oranges by 1, printing the name and amount of oranges remained. Now execute the following loop in the Python Shell by ",{"type":21,"tag":272,"props":7143,"children":7144},{},[7145],{"type":30,"value":7146},"double",{"type":30,"value":7148}," clicking the Enter button. The expected result is shown below.",{"type":21,"tag":26,"props":7150,"children":7151},{},[7152],{"type":21,"tag":516,"props":7153,"children":7156},{"alt":7154,"src":7155},"Terminal displays Python Shell for loop result","/img/img5.png",[],{"type":21,"tag":3778,"props":7158,"children":7160},{"id":7159},"if-else",[7161],{"type":30,"value":7162},"If else",{"type":21,"tag":26,"props":7164,"children":7165},{},[7166],{"type":30,"value":7167},"Now we've got a problem, there is just two oranges left for three of the kids. How would you solve this problem? If you are an old school parents like mine you might have a straightforward approach like so:",{"type":21,"tag":490,"props":7169,"children":7172},{"className":7170,"code":7171,"language":895,"meta":8},[897],"for i in kids:\n    if 'K' in kids:\n        oranges -= 1\n        print(f'{i} received an orange, {oranges} oranges remaining')\n",[7173],{"type":21,"tag":495,"props":7174,"children":7175},{"__ignoreMap":8},[7176],{"type":30,"value":7171},{"type":21,"tag":26,"props":7178,"children":7179},{},[7180,7182,7187],{"type":30,"value":7181},"This code scans every name in the list, and if it contains big letter 'K' then kid gets an orange. This is due to if statement at the line 2 which leaves Aki without an orange. Let's use ",{"type":21,"tag":272,"props":7183,"children":7184},{},[7185],{"type":30,"value":7186},"else",{"type":30,"value":7188}," statement to fix this.",{"type":21,"tag":490,"props":7190,"children":7193},{"className":7191,"code":7192,"language":895,"meta":8},[897],"if oranges == 3:\n    for i in kids:\n        oranges -= 1\n        print(f'{i} received an orange, {oranges} oranges remaining')\nelse:\n    print('We will go to the Whole Foods to buy more oranges for our kids')\n",[7194],{"type":21,"tag":495,"props":7195,"children":7196},{"__ignoreMap":8},[7197],{"type":30,"value":7192},{"type":21,"tag":26,"props":7199,"children":7200},{},[7201,7203,7207],{"type":30,"value":7202},"In this solution we initially check if the stock of oranges is equal to the exact number of our kids which is 3, if so we proceed as before, ",{"type":21,"tag":272,"props":7204,"children":7205},{},[7206],{"type":30,"value":7186},{"type":30,"value":7208}," we are stating:",{"type":21,"tag":22,"props":7210,"children":7211},{},[7212],{"type":21,"tag":26,"props":7213,"children":7214},{},[7215,7220],{"type":21,"tag":628,"props":7216,"children":7217},{},[7218],{"type":30,"value":7219},"We will go to the Whole Foods to buy more oranges for our kids",{"type":30,"value":3759},{"type":21,"tag":26,"props":7222,"children":7223},{},[7224],{"type":21,"tag":516,"props":7225,"children":7228},{"alt":7226,"src":7227},"Terminal displays Python Shell If Else statement result ","/img/img6.png",[],{"type":21,"tag":26,"props":7230,"children":7231},{},[7232],{"type":30,"value":7233},"If you got so far into the article, then you are true bad dog, congrats! As for now you have an idea of what is variables, lists, loops and If-else statements. We will revise them in upcoming articles in this beginner's guide series, but for now take rest for a while, cause that was a lot to process for a bloody beginner.",{"type":21,"tag":673,"props":7235,"children":7237},{"id":7236},"step-3-learn-to-code-in-ide",[7238],{"type":30,"value":7239},"Step 3: Learn to Code in IDE",{"type":21,"tag":26,"props":7241,"children":7242},{},[7243,7245,7250],{"type":30,"value":7244},"Just now we've been using Python Shell to directly communicate with the machine by one line at the time. In fact it's not the most effective way to write and execute the code. Now let's explore the Integrated Developing Environment or ",{"type":21,"tag":272,"props":7246,"children":7247},{},[7248],{"type":30,"value":7249},"IDE",{"type":30,"value":7251}," for short.",{"type":21,"tag":26,"props":7253,"children":7254},{},[7255,7257,7262,7264,7269],{"type":30,"value":7256},"IDE is simply a text editor for creating files of code, which often referred as ",{"type":21,"tag":272,"props":7258,"children":7259},{},[7260],{"type":30,"value":7261},"scripts",{"type":30,"value":7263},". Running scripts through a Terminal allows you to execute multiple lines of code at once. We are going to use ",{"type":21,"tag":272,"props":7265,"children":7266},{},[7267],{"type":30,"value":7268},"VS Code",{"type":30,"value":7270}," because eventually you will be watching YouTube to keep learning programming, and most of the time it will be using VS code.",{"type":21,"tag":26,"props":7272,"children":7273},{},[7274,7276,7282],{"type":30,"value":7275},"Follow the link to download and install the ",{"type":21,"tag":1957,"props":7277,"children":7280},{"href":7278,"rel":7279},"https://code.visualstudio.com/",[1961],[7281],{"type":30,"value":7268},{"type":30,"value":7283},". There is one gangsta move to open IDE in Linux Terminal. Try this bash command if running on WSL.",{"type":21,"tag":490,"props":7285,"children":7288},{"className":7286,"code":7287,"language":1414,"meta":8},[1412],"code .\n",[7289],{"type":21,"tag":495,"props":7290,"children":7291},{"__ignoreMap":8},[7292],{"type":30,"value":7287},{"type":21,"tag":3778,"props":7294,"children":7296},{"id":7295},"directory",[7297],{"type":30,"value":7298},"Directory",{"type":21,"tag":26,"props":7300,"children":7301},{},[7302],{"type":30,"value":7303},"Open your Terminal and make directory for our project using bash.",{"type":21,"tag":490,"props":7305,"children":7308},{"className":7306,"code":7307,"language":1414,"meta":8},[1412],"mkdir orange_project\n",[7309],{"type":21,"tag":495,"props":7310,"children":7311},{"__ignoreMap":8},[7312],{"type":30,"value":7307},{"type":21,"tag":26,"props":7314,"children":7315},{},[7316,7318,7323,7325,7330,7332,7337,7339,7343],{"type":30,"value":7317},"It creates a directory named ",{"type":21,"tag":272,"props":7319,"children":7320},{},[7321],{"type":30,"value":7322},"orange_project",{"type":30,"value":7324}," in your WSL home directory. Projects are managed in directories accessible from your IDE. Open VS Code, look at the tool bar and choose ",{"type":21,"tag":272,"props":7326,"children":7327},{},[7328],{"type":30,"value":7329},"File",{"type":30,"value":7331}," -> ",{"type":21,"tag":272,"props":7333,"children":7334},{},[7335],{"type":30,"value":7336},"Open Folder",{"type":30,"value":7338}," it will drop-down the menu like on the picture shown below where you can choose any available directory. Let's choose ",{"type":21,"tag":272,"props":7340,"children":7341},{},[7342],{"type":30,"value":7322},{"type":30,"value":3759},{"type":21,"tag":26,"props":7345,"children":7346},{},[7347],{"type":21,"tag":516,"props":7348,"children":7351},{"alt":7349,"src":7350},"VS Code drop-down menu for choosing a directory","/img/img7.png",[],{"type":21,"tag":26,"props":7353,"children":7354},{},[7355,7357,7362,7364,7368,7370,7374,7376,7381],{"type":30,"value":7356},"Creating new files should be as easy, but choose ",{"type":21,"tag":272,"props":7358,"children":7359},{},[7360],{"type":30,"value":7361},"New File",{"type":30,"value":7363}," instead of ",{"type":21,"tag":272,"props":7365,"children":7366},{},[7367],{"type":30,"value":7336},{"type":30,"value":7369}," at the tool bar. Alternatively you can click on the icons to create the file or the directory. Don't be confused with the word ",{"type":21,"tag":628,"props":7371,"children":7372},{},[7373],{"type":30,"value":7295},{"type":30,"value":7375}," as it's just a fancy way to say ",{"type":21,"tag":628,"props":7377,"children":7378},{},[7379],{"type":30,"value":7380},"folder",{"type":30,"value":3759},{"type":21,"tag":26,"props":7383,"children":7384},{},[7385],{"type":21,"tag":516,"props":7386,"children":7389},{"alt":7387,"src":7388},"VS Code icons to create file or folder","/img/img8.png",[],{"type":21,"tag":26,"props":7391,"children":7392},{},[7393,7395,7400],{"type":30,"value":7394},"Let's create two files in our orange project, The first one named ",{"type":21,"tag":272,"props":7396,"children":7397},{},[7398],{"type":30,"value":7399},"script.py",{"type":30,"value":7401},", and it will contain the Python programm for fair orange distribution system that we've developed in the Step 2.",{"type":21,"tag":490,"props":7403,"children":7406},{"className":7404,"code":7405,"language":6717,"meta":8},[6715],"oranges = 5\nkids = ['Aki', 'Kenshin', 'Konrat']\nif oranges >= 3:\n    for i in kids:\n        oranges -= 1\n        print(f'{i} received an orange, {oranges} oranges remaining')\nelse:\n    print('We are going to buy more oranges for our kids')\n",[7407],{"type":21,"tag":495,"props":7408,"children":7409},{"__ignoreMap":8},[7410],{"type":30,"value":7405},{"type":21,"tag":26,"props":7412,"children":7413},{},[7414],{"type":21,"tag":516,"props":7415,"children":7418},{"alt":7416,"src":7417},"Python script created in VS Code","/img/img9.png",[],{"type":21,"tag":26,"props":7420,"children":7421},{},[7422,7424,7429,7430,7435,7437,7441,7443,7448],{"type":30,"value":7423},"After creating/updating each file, remember to press ",{"type":21,"tag":272,"props":7425,"children":7426},{},[7427],{"type":30,"value":7428},"'Ctrl'",{"type":30,"value":6656},{"type":21,"tag":272,"props":7431,"children":7432},{},[7433],{"type":30,"value":7434},"'S'",{"type":30,"value":7436}," or go to ",{"type":21,"tag":272,"props":7438,"children":7439},{},[7440],{"type":30,"value":7329},{"type":30,"value":7442}," and hit ",{"type":21,"tag":272,"props":7444,"children":7445},{},[7446],{"type":30,"value":7447},"Save",{"type":30,"value":7449},". Otherwise VS Code would not be able to find your files in the project.",{"type":21,"tag":26,"props":7451,"children":7452},{},[7453,7455,7460],{"type":30,"value":7454},"Now create the second file ",{"type":21,"tag":272,"props":7456,"children":7457},{},[7458],{"type":30,"value":7459},"script.js",{"type":30,"value":7461}," which is the same programm, but written in JavaScript.",{"type":21,"tag":490,"props":7463,"children":7466},{"className":7464,"code":7465,"language":895,"meta":8},[897],"let oranges = 2;\nlet kids = ['Aki', 'Kenshin', 'Konrat'];\n\nif (oranges >= 3) {\n    for (let i of kids) {\n        oranges -= 1;\n        console.log(`${i} received an orange, ${oranges} oranges remaining`);\n    }\n} else {\n    console.log('We are going to buy more oranges for our kids');\n}\n",[7467],{"type":21,"tag":495,"props":7468,"children":7469},{"__ignoreMap":8},[7470],{"type":30,"value":7465},{"type":21,"tag":26,"props":7472,"children":7473},{},[7474],{"type":21,"tag":516,"props":7475,"children":7478},{"alt":7476,"src":7477},"JavaScript script created in VS Code","/img/img10.png",[],{"type":21,"tag":26,"props":7480,"children":7481},{},[7482,7484,7488,7489,7493],{"type":30,"value":7483},"Again press ",{"type":21,"tag":272,"props":7485,"children":7486},{},[7487],{"type":30,"value":7428},{"type":30,"value":6656},{"type":21,"tag":272,"props":7490,"children":7491},{},[7492],{"type":30,"value":7434},{"type":30,"value":7494}," and lets run our very last bash commands into the Terminal.",{"type":21,"tag":490,"props":7496,"children":7499},{"className":7497,"code":7498,"language":1414,"meta":8},[1412],"pwd\n",[7500],{"type":21,"tag":495,"props":7501,"children":7502},{"__ignoreMap":8},[7503],{"type":30,"value":7498},{"type":21,"tag":26,"props":7505,"children":7506},{},[7507,7512],{"type":21,"tag":272,"props":7508,"children":7509},{},[7510],{"type":30,"value":7511},"PWD",{"type":30,"value":7513}," - Prints Working Directory starting from the root (/)",{"type":21,"tag":490,"props":7515,"children":7518},{"className":7516,"code":7517,"language":1414,"meta":8},[1412],"ls\n",[7519],{"type":21,"tag":495,"props":7520,"children":7521},{"__ignoreMap":8},[7522],{"type":30,"value":7517},{"type":21,"tag":26,"props":7524,"children":7525},{},[7526,7531],{"type":21,"tag":272,"props":7527,"children":7528},{},[7529],{"type":30,"value":7530},"LS",{"type":30,"value":7532}," - Lists the files in your current directory",{"type":21,"tag":490,"props":7534,"children":7537},{"className":7535,"code":7536,"language":1414,"meta":8},[1412],"cd orange_project\n",[7538],{"type":21,"tag":495,"props":7539,"children":7540},{"__ignoreMap":8},[7541],{"type":30,"value":7536},{"type":21,"tag":26,"props":7543,"children":7544},{},[7545,7550,7552,7557],{"type":21,"tag":272,"props":7546,"children":7547},{},[7548],{"type":30,"value":7549},"CD",{"type":30,"value":7551}," - Changes directory. When it comes to working with files you actually don't need to print it's entire name. Just specify the first couple of letters like ",{"type":21,"tag":628,"props":7553,"children":7554},{},[7555],{"type":30,"value":7556},"ora",{"type":30,"value":7558}," and press tab. The bash should type the rest of the name for you.",{"type":21,"tag":490,"props":7560,"children":7563},{"className":7561,"code":7562,"language":1414,"meta":8},[1412],"python3 script.py\n",[7564],{"type":21,"tag":495,"props":7565,"children":7566},{"__ignoreMap":8},[7567],{"type":30,"value":7562},{"type":21,"tag":26,"props":7569,"children":7570},{},[7571,7576],{"type":21,"tag":272,"props":7572,"children":7573},{},[7574],{"type":30,"value":7575},"PYTHON3",{"type":30,"value":7577}," - Executes Python files",{"type":21,"tag":490,"props":7579,"children":7582},{"className":7580,"code":7581,"language":1414,"meta":8},[1412],"node script.js\n",[7583],{"type":21,"tag":495,"props":7584,"children":7585},{"__ignoreMap":8},[7586],{"type":30,"value":7581},{"type":21,"tag":26,"props":7588,"children":7589},{},[7590,7595,7597,7600],{"type":21,"tag":272,"props":7591,"children":7592},{},[7593],{"type":30,"value":7594},"NODE",{"type":30,"value":7596}," - Executes JavaScript files",{"type":21,"tag":2217,"props":7598,"children":7599},{},[],{"type":21,"tag":516,"props":7601,"children":7604},{"alt":7602,"src":7603},"Python and JavaScript files executed using Terminal","/img/img11.png",[],{"type":21,"tag":26,"props":7606,"children":7607},{},[7608],{"type":30,"value":7609},"And that's it! Your very first executed Python and JavaScript files.",{"type":21,"tag":44,"props":7611,"children":7612},{"id":1217},[7613],{"type":30,"value":1220},{"type":21,"tag":26,"props":7615,"children":7616},{},[7617],{"type":30,"value":7618},"In this article we've accessed the console in the browser, explored the Terminal and even created bilingual project using VS Code. It's way too much for the first experience with programming. Congrats you with such a huge milestone!",{"type":21,"tag":26,"props":7620,"children":7621},{},[7622],{"type":30,"value":7623},"Keep in mind that this guide is meant for a complete beginners to boost their confidence with some heavy lifting environment set-up and hands-on experience, but the basics and theory are essential to back-up this practice. So follow the FAQ section below to see the recommendations.",{"type":21,"tag":22,"props":7625,"children":7626},{},[7627],{"type":21,"tag":26,"props":7628,"children":7629},{},[7630,7632],{"type":30,"value":7631},"The words printed here is experience. You must go through the concepts.\n-- ",{"type":21,"tag":7633,"props":7634,"children":7635},"cite",{},[7636],{"type":30,"value":7637},"Bad Dog",{"type":21,"tag":26,"props":7639,"children":7640},{},[7641,7643,7647],{"type":30,"value":7642},"Follow me on LinkedIn to stay up to date for upcoming articles in this series of ",{"type":21,"tag":272,"props":7644,"children":7645},{},[7646],{"type":30,"value":6609},{"type":30,"value":7648},". Also feel free to reach out with any kind of feedback. I'd appreciate that.",{"type":21,"tag":44,"props":7650,"children":7652},{"id":7651},"faq-from-coding-beginners",[7653],{"type":30,"value":7654},"FAQ from Coding Beginners",{"type":21,"tag":673,"props":7656,"children":7658},{"id":7657},"whats-better-javascript-or-python",[7659],{"type":30,"value":7660},"What's better JavaScript or Python?",{"type":21,"tag":26,"props":7662,"children":7663},{},[7664,7666,7671],{"type":30,"value":7665},"It depends on what is your final goal. If you trying to build a website then JavaScript is the better choice for the front-end elements. For Machine Learning tasks you are going to use Python modules, that will save you a lifetime amount of work rather than writing them from the scratch in any other language. For a Statistical Learning ",{"type":21,"tag":272,"props":7667,"children":7668},{},[7669],{"type":30,"value":7670},"R",{"type":30,"value":7672}," is the way to go. So focus on your goal and the choice of programming language will come easy.",{"type":21,"tag":673,"props":7674,"children":7676},{"id":7675},"can-i-pivot-from-data-science-to-software-engineering",[7677],{"type":30,"value":7678},"Can I pivot from Data Science to Software Engineering?",{"type":21,"tag":26,"props":7680,"children":7681},{},[7682,7684,7689],{"type":30,"value":7683},"Indeed many people may find Data Science way too theorethical, thus no guarantees - that you will get to the actual ",{"type":21,"tag":628,"props":7685,"children":7686},{},[7687],{"type":30,"value":7688},"population",{"type":30,"value":7690},". Afterall Software Engineering is about building actual stuff you can use, whereas Data Science more about pointing at the significance of some data. So Yes, the Software Engineeing is cool, and no shame in switching to that.",{"type":21,"tag":673,"props":7692,"children":7694},{"id":7693},"what-should-i-learn-next-to-be-a-good-coder",[7695],{"type":30,"value":7696},"What Should I learn next to be a good coder?",{"type":21,"tag":26,"props":7698,"children":7699},{},[7700,7702,7709,7711,7718],{"type":30,"value":7701},"You can start with YouTube ",{"type":21,"tag":1957,"props":7703,"children":7706},{"href":7704,"rel":7705},"https://www.youtube.com/watch?v=IDDmrzzB14M&list=PLhQjrBD2T380F_inVRXMIHCqLaNUd7bN4",[1961],[7707],{"type":30,"value":7708},"CS50",{"type":30,"value":7710}," courses. It will back you up with solid fundamentals to better understand the content you've red so far. It's also essential for growth in Tech companies, so just start with that and keep wondering what's next. Another source I'd recommend is the ",{"type":21,"tag":1957,"props":7712,"children":7715},{"href":7713,"rel":7714},"https://interviews.school",[1961],[7716],{"type":30,"value":7717},"interviews.school",{"type":30,"value":7719},". This website has been developed by Google employee guiding users to prepare for the interview in a big Tech companies such as MANG.",{"type":21,"tag":673,"props":7721,"children":7723},{"id":7722},"should-i-read-programming-books",[7724],{"type":30,"value":7725},"Should I Read Programming Books?",{"type":21,"tag":26,"props":7727,"children":7728},{},[7729],{"type":30,"value":7730},"This is another good resource for a contextual learning. However there is just so much to cover as a beginner. I'd rather focus on IT basics like HTTP, Network Protocols, Binary Code Processing, Data Types and Data Structures, Databases, Git and so much more, so Google Search is better at this point.",{"type":21,"tag":673,"props":7732,"children":7734},{"id":7733},"should-i-use-llm-like-gpt-and-gemini",[7735],{"type":30,"value":7736},"Should I use LLM like GPT and Gemini?",{"type":21,"tag":26,"props":7738,"children":7739},{},[7740],{"type":30,"value":7741},"Sure yes! The only scenario I'd not recommend LLM is at the very end of deadline for your homework, cause of the temptation to copy paste the response without even reading it. Always be curiouse and critical to every response LLM would provide you.",{"type":21,"tag":26,"props":7743,"children":7744},{},[7745,7746,7749],{"type":30,"value":2215},{"type":21,"tag":2217,"props":7747,"children":7748},{},[],{"type":30,"value":2221},{"title":8,"searchDepth":596,"depth":596,"links":7751},[7752,7753,7757,7758],{"id":6620,"depth":596,"text":6623},{"id":6641,"depth":596,"text":6644,"children":7754},[7755,7756],{"id":6811,"depth":1260,"text":6814},{"id":7236,"depth":1260,"text":7239},{"id":1217,"depth":596,"text":1220},{"id":7651,"depth":596,"text":7654,"children":7759},[7760,7761,7762,7763,7764],{"id":7657,"depth":1260,"text":7660},{"id":7675,"depth":1260,"text":7678},{"id":7693,"depth":1260,"text":7696},{"id":7722,"depth":1260,"text":7725},{"id":7733,"depth":1260,"text":7736},"content:posts:how-to-learn-coding.md","posts/how-to-learn-coding.md","posts/how-to-learn-coding",{"_path":7769,"_dir":6,"_draft":7,"_partial":7,"_locale":8,"title":7770,"description":7771,"date":7772,"draft":7,"tags":7773,"thumbnail":7775,"alt_description":7776,"slug":7777,"body":7778,"_type":604,"_id":7866,"_source":606,"_file":7867,"_stem":7868,"_extension":609},"/posts/hello-world","I'm a Bad Dog","Welcome to Bad Dog Data blog! Here we will learn about programming, statistical data analysis and machine learning","2023-12-16T00:00:00.000Z",[7774],"personal","/img/hello_world.png","The bad dog is a data scientist","hello-world",{"type":18,"children":7779,"toc":7863},[7780,7786,7791,7796,7802,7820,7826,7844,7849,7854],{"type":21,"tag":44,"props":7781,"children":7783},{"id":7782},"greetings-to-all-curious-minds",[7784],{"type":30,"value":7785},"Greetings to all curious minds!",{"type":21,"tag":26,"props":7787,"children":7788},{},[7789],{"type":30,"value":7790},"I'm Akzhan, your bad dog from the vibrant city of Almaty, Kazakhstan. By day, I learn just enough of new stuff to navigate the ever-shifting landscape of Data World as a Data Product Manager. I do love coding, stats, and business – a potent trifecta for understanding the world through numbers and algorithms, or even scaling any business like online commerce for instance.",{"type":21,"tag":26,"props":7792,"children":7793},{},[7794],{"type":30,"value":7795},"Statistical analysis is so much fun, sometimes even mind-blowing! Forget boring spreadsheets. Here, you'll learn about statistical learning, how to transform it into machine learning, hence interpret the world around you and make data-driven decisions.",{"type":21,"tag":3778,"props":7797,"children":7799},{"id":7798},"well-dive-deep-into",[7800],{"type":30,"value":7801},"We'll dive deep into:",{"type":21,"tag":264,"props":7803,"children":7804},{},[7805,7810,7815],{"type":21,"tag":268,"props":7806,"children":7807},{},[7808],{"type":30,"value":7809},"Statistical and Machine Learning models  ",{"type":21,"tag":268,"props":7811,"children":7812},{},[7813],{"type":30,"value":7814},"The frameworks for consumer behavior analysis  ",{"type":21,"tag":268,"props":7816,"children":7817},{},[7818],{"type":30,"value":7819},"Latest paradigms in the field of Data Engineering  ",{"type":21,"tag":3778,"props":7821,"children":7823},{"id":7822},"well-also",[7824],{"type":30,"value":7825},"We'll also:",{"type":21,"tag":264,"props":7827,"children":7828},{},[7829,7834,7839],{"type":21,"tag":268,"props":7830,"children":7831},{},[7832],{"type":30,"value":7833},"Translate the complex data models, turning numbers into powerful stories.  ",{"type":21,"tag":268,"props":7835,"children":7836},{},[7837],{"type":30,"value":7838},"Hack the programming and algorithms, making coding accessible and engaging.  ",{"type":21,"tag":268,"props":7840,"children":7841},{},[7842],{"type":30,"value":7843},"Bridge the technical gap and encourage non-technical crowd to start new projects.  ",{"type":21,"tag":26,"props":7845,"children":7846},{},[7847],{"type":30,"value":7848},"Whether you're a seasoned pro or a curious newbie, this blog may serve you with different technical approaches when it comes to working with data. Join me as we explore the fascinating intersection of data, business, and the human experience. Let's learn, grow, and maybe even rewrite the rules of business, one byte at a time.",{"type":21,"tag":26,"props":7850,"children":7851},{},[7852],{"type":30,"value":7853},"Wanna become a real dog? Let's go!",{"type":21,"tag":26,"props":7855,"children":7856},{},[7857,7858,7861],{"type":30,"value":2215},{"type":21,"tag":2217,"props":7859,"children":7860},{},[],{"type":30,"value":7862},"\nBad Dog.",{"title":8,"searchDepth":596,"depth":596,"links":7864},[7865],{"id":7782,"depth":596,"text":7785},"content:posts:hello-world.md","posts/hello-world.md","posts/hello-world",1775984392935]