Advertised Sequences and View Counts - Printable Version +- Online Sequencer Forums (https://onlinesequencer.net/forum) +-- Forum: Off Topic (https://onlinesequencer.net/forum/forum-7.html) +--- Forum: General Discussion (https://onlinesequencer.net/forum/forum-8.html) +--- Thread: Advertised Sequences and View Counts (/thread-2701.html) |
Advertised Sequences and View Counts - Kirbyderp - 06-01-2018 Advertised Sequences and View Counts
A Statistical Inference to Answer an OS-Related Question
Welcome to what might be the one time I actually get a thread I made pinned! But on a more serious note, this was a huge project I did, and will hopefully be useful to other users as well. While I'll try and make it as easy to understand as possible (as I know that there are a multitude of users who don't know statistics), there are a few things that I'll assume that you already know. Since I have a feeling that a majority of OS users will just TL;DR this, I'll mark important sections like this. Assumed Knowledge I will be assuming that you already have knowledge on the things listed below: Addition, subtraction, multiplication, division, fractions, percentages, variables, linear functions, PEMDAS, inequalities, radicals, and averages/means. Also you should know a little about experiments. Now that that's out of the way, I'll give you a quick rundown on many of the different terms and symbols that you might find in this thread.
The Question Hey! Remember This? https://onlinesequencer.net/forum/showthread.php?tid=2655 You might've noticed how there were only 15 of the 30 non-drum kit instruments advertised there. If you were especially curious about why, you might've visited my user page and ran into the other 15 sequences. Well, this wasn't your everyday ideas factory. It was an experiment! You see, many users (me included) self-advertise their sequences on the OS forums. While I can't be sure of why they do it, I can make a few guesses, such as:
Does advertising your sequences on the OS forums actually increase their view count? The Experiment So, in order to do this, I set up the experiment. This experiment had 2 treatment groups: the first being 15 sequences that were advertised, and the second being the other 15 sequences that weren't. The 30 sequences were made, and each was assigned a number based on the picture below, which you'll probably recognize as the instrument select dropdown menu. Each sequence used 1 instrument different from the instruments of the other sequences. Each sequence was the same note-wise, although maybe not octave wise (as many instruments have different ranges). This was mostly for efficiency purposes. Each sequence was assigned a number from 1 to 30 based on the instrument they used. I assigned the numbers from top to bottom, ignoring the drum kits (so electric piano would be 1, grand piano would be 2.....music box would be 5, xylophone would be 6, and so on). I then used a random sequence generator, which randomly divided the 30 numbers up into 2 columns. The numbers in the left column represented the advertised sequences, and the numbers in the right column represented the un-advertised sequences. The output is below. Bias Control Here's where I got to give my shoutouts to Lucent. Without her these results would've been a lot less reliable. You see, since all of the sequences were identical note-wise, someone would've pointed this out. In order to prevent this, I contacted Lucent and asked if the experiment thread (https://onlinesequencer.net/forum/showthread.php?tid=2655) could be made so that no one could reply to this. If someone replied saying that the sequences were all the same then that would definitely effect how (and how much) the sequences would be viewed. If someone did respond, then none of the results from my experiment would've been trustworthy. That's why I had to make sure that it couldn't be replied to. The Data Once the experiment was over, the view counts of each sequence were counted up and put into the data table below. Oh yeah, I forgot to show that there was candy in the data. Hypotheses There are 2 hypotheses to a hypothesis test, the null hypothesis and alternative hypothesis. The null hypothesis (H0) is that advertising sequences does not increase the view counts of the sequences advertised (µA=µN or µA-µN=0). The alternative hypothesis (HA) is that advertising sequences does increase the view counts of the sequences advertised (µA>µN or µA-µN>0). The reason that it shows µ is that we want to know if this is true for the population, (as we already know the results for the sample). We can write the hypothesis to show a potential causal relationship because an experiment was performed (There are 2 ways to get data for tests like this, surveying or doing an experiment. An experiment controls for outside factors that can influence data, while surveying can't. So we can imply causation with a well-designed experiment, but not when surveying). You'll see why there is an alternative way of writing hypotheses (using µA-µN) later on. Conditions In order to perform a hypothesis test, some conditions must be met.
We can check to see if randomization was used. In this case, it was. Since we don't have a large sample, we'll have to check if it is likely that the data came from a population that is Normally distribution. We do this by making histograms of the data (one histogram for each group). See the histograms below. Since both histograms are unimodal and symmetric, we are good to proceed with the test. A t-model (with parameters below) to perform a 2-sample t-test for the difference in means. RE: Advertised Sequences and View Counts - Kirbyderp - 06-01-2018 Doing the Test Now lets actually get into the mechanics of doing the hypothesis test. These are the summary statistics that will be needed to perform a hypothesis test. We will be using the top 3 rows out of the bottom 4 to compute the test statistic (this test uses a t-ratio). The t-ratio is determined by this equation: (x̄A-x̄N-(µA-µN))/SE(x̄A-x̄N). x̄A-x̄N represents the observed difference in means of the data. µA-µN represents what we are hypothesizing to be the true difference in the actually means of the population. We subtract µA-µN from x̄A-x̄N because it shows the difference from what we'd expect. It's this difference that we want to test to see if it is significant. So why do we divide by SE(x̄A-x̄N)? Well, different sets of data from different population will have different means and will vary differently. However, the test being performed uses a single model (or equation, if you will) to determine if a difference is significant. The problem is, while the equation might be good for one scale, it would be terrible for another. Dividing by SE(x̄A-x̄N) "standardizes" the data onto one common scale, allowing the test to use it's "one-size-fits-all" method. We've know that the observed difference in the data is (x̄A-x̄N) 4.7333333.... If we assume the null hypothesis to be true, then we are assuming that there is no difference between the actual population means (We are assuming that µA-µN=0. Which is why that alternative way of writing the null hypothesis is used). We can also calculate SE(x̄A-x̄N) to be about 1.04. This makes the t-ratio (4.7333333....-0)/~1.04≈4.54. We then use the model this test uses (a t-model) to turn the t-ratio of about 4.54 into a P-value. A P-value is the probability of getting the difference we saw (or a larger difference) in the data if there is no actual difference in the populations. The area under the curve (where the arrow is pointing; to the right of the bar) shows the P-value. The area is very small. So small, in fact, that you can't even see it. As it turns out the P-value is about .015%, which is extremely miniscule. Conclusion The P-value is very small. The probability of seeing a difference of 4.73333... between the means of the data is .015% if the null hypothesis (which is that there is no difference between the true means) is true. I find this probability too small, and reject the null hypothesis (a fancy way of saying: "this probability is so small that I don't think that the null hypothesis is actually correct) in favor of the alternative. These results lead me to believe that advertising your sequences on the OS forums does increase the amount of views that they get. Well, if you're reading this text, then you have most likely just read through (or at least skimmed or looked at the highlighted sections) this giant wall of text that is me trying to explain a statistical process to you. Thank you for not tl;dr-ing this (I put a lot of effort into it). If you have any questions regarding this inference, please don't hesitate to ask! I'd be happy to clear up any confusions. RE: Advertised Sequences and View Counts - Kirbyderp - 06-01-2018 Reserved cuz only 5 attachments per post :/ RE: Advertised Sequences and View Counts - Kirbyderp - 06-01-2018 Reserved cuz only 5 attachments per post :/ RE: Advertised Sequences and View Counts - Kirbyderp - 06-01-2018 Reserved cuz only 5 attachments per post :/ RE: Advertised Sequences and View Counts - Kirbyderp - 06-01-2018 Reserved cuz only 5 attachments per post :/ RE: Advertised Sequences and View Counts - Kirbyderp - 06-01-2018 I also like to organize my longer posts by sections, so yeah. RE: Advertised Sequences and View Counts - Kirbyderp - 06-01-2018 I think I'll leave it here for now. This'll hopefully be enough space. I'll probably get it done tomorrow, but I gtg for tonight. For now, I'll leave you to ponder this question: does advertising sequences actually increase their view count? RE: Advertised Sequences and View Counts - Palpatrump - 06-02-2018 OBJECTION! did you advertise it in chat? tl;dr RE: Advertised Sequences and View Counts - Kirbyderp - 06-02-2018 (06-02-2018, 05:29 PM)Palpatrump Wrote: OBJECTION! No, in the forums. |