No More Seat Costs: Semaphore Plans Just Got Better!

  • Talk to a developer
  • Start building for free
  • System Status
  • Semaphore Cloud
  • Semaphore On-Premise
  • Semaphore Hybrid
  • Premium support
  • Docker & Kubernetes
  • vs GitHub Actions
  • vs Travis CI
  • vs Bitbucket
  • Write with us
  • Get started

Getting Started With Property-Based Testing in Python With Hypothesis and Pytest

Avatar

This tutorial will be your gentle guide to property-based testing. Property-based testing is a testing philosophy; a way of approaching testing, much like unit testing is a testing philosophy in which we write tests that verify individual components of your code.

By going through this tutorial, you will:

  • learn what property-based testing is;
  • understand the key benefits of using property-based testing;
  • see how to create property-based tests with Hypothesis;
  • attempt a small challenge to understand how to write good property-based tests; and
  • Explore several situations in which you can use property-based testing with zero overhead.

What is Property-Based Testing?

In the most common types of testing, you write a test by running your code and then checking if the result you got matches the reference result you expected. This is in contrast with property-based testing , where you write tests that check that the results satisfy certain properties . This shift in perspective makes property-based testing (with Hypothesis) a great tool for a variety of scenarios, like fuzzing or testing roundtripping.

In this tutorial, we will be learning about the concepts behind property-based testing, and then we will put those concepts to practice. In order to do that, we will use three tools: Python, pytest, and Hypothesis.

  • Python will be the programming language in which we will write both our functions that need testing and our tests.
  • pytest will be the testing framework.
  • Hypothesis will be the framework that will enable property-based testing.

Both Python and pytest are simple enough that, even if you are not a Python programmer or a pytest user, you should be able to follow along and get benefits from learning about property-based testing.

Setting up your environment to follow along

If you want to follow along with this tutorial and run the snippets of code and the tests yourself – which is highly recommendable – here is how you set up your environment.

Installing Python and pip

Start by making sure you have a recent version of Python installed. Head to the Python downloads page and grab the most recent version for yourself. Then, make sure your Python installation also has pip installed. [ pip ] is the package installer for Python and you can check if you have it on your machine by running the following command:

(This assumes python is the command to run Python on your machine.) If pip is not installed, follow their installation instructions .

Installing pytest and Hypothesis

pytest, the Python testing framework, and Hypothesis, the property-based testing framework, are easy to install after you have pip. All you have to do is run this command:

This tells pip to install pytest and Hypothesis and additionally it tells pip to update to newer versions if any of the packages are already installed.

To make sure pytest has been properly installed, you can run the following command:

The output on your machine may show a different version, depending on the exact version of pytest you have installed.

To ensure Hypothesis has been installed correctly, you have to open your Python REPL by running the following:

and then, within the REPL, type import hypothesis . If Hypothesis was properly installed, it should look like nothing happened. Immediately after, you can check for the version you have installed with hypothesis.__version__ . Thus, your REPL session would look something like this:

Your first property-based test

In this section, we will write our very first property-based test for a small function. This will show how to write basic tests with Hypothesis.

The function to test

Suppose we implemented a function gcd(n, m) that computes the greatest common divisor of two integers. (The greatest common divisor of n and m is the largest integer d that divides evenly into n and m .) What’s more, suppose that our implementation handles positive and negative integers. Here is what this implementation could look like:

If you save that into a file, say gcd.py , and then run it with:

you will enter an interactive REPL with your function already defined. This allows you to play with it a bit:

Now that the function is running and looks about right, we will test it with Hypothesis.

The property test

A property-based test isn’t wildly different from a standard (pytest) test, but there are some key differences. For example, instead of writing inputs to the function gcd , we let Hypothesis generate arbitrary inputs. Then, instead of hardcoding the expected outputs, we write assertions that ensure that the solution satisfies the properties that it should satisfy.

Thus, to write a property-based test, you need to determine the properties that your answer should satisfy.

Thankfully for us, we already know the properties that the result of gcd must satisfy:

“[…] the greatest common divisor (GCD) of two or more integers […] is the largest positive integer that divides each of the integers.”

So, from that Wikipedia quote, we know that if d is the result of gcd(n, m) , then:

  • d is positive;
  • d divides n ;
  • d divides m ; and
  • no other number larger than d divides both n and m .

To turn these properties into a test, we start by writing the signature of a test_ function that accepts the same inputs as the function gcd :

(The prefix test_ is not significant for Hypothesis. We are using Hypothesis with pytest and pytest looks for functions that start with test_ , so that is why our function is called test_gcd .)

The arguments n and m , which are also the arguments of gcd , will be filled in by Hypothesis. For now, we will just assume that they are available.

If n and m are arguments that are available and for which we want to test the function gcd , we have to start by calling gcd with n and m and then saving the result. It is after calling gcd with the supplied arguments and getting the answer that we get to test the answer against the four properties listed above.

Taking the four properties into account, our test function could look like this:

Go ahead and put this test function next to the function gcd in the file gcd.py . Typically, tests live in a different file from the code being tested but this is such a small example that we can have everything in the same file.

Plugging in Hypothesis

We have written the test function but we still haven’t used Hypothesis to power the test. Let’s go ahead and use Hypothesis’ magic to generate a bunch of arguments n and m for our function gcd. In order to do that, we need to figure out what are all the legal inputs that our function gcd should handle.

For our function gcd , the valid inputs are all integers, so we need to tell Hypothesis to generate integers and feed them into test_gcd . To do that, we need to import a couple of things:

given is what we will use to tell Hypothesis that a test function needs to be given data. The submodule strategies is the module that contains lots of tools that know how to generate data.

With these two imports, we can annotate our test:

You can read the decorator @given(st.integers(), st.integers()) as “the test function needs to be given one integer, and then another integer”. To run the test, you can just use pytest :

(Note: depending on your operating system and the way you have things configured, pytest may not end up in your path, and the command pytest gcd.py may not work. If that is the case for you, you can use the command python -m pytest gcd.py instead.)

As soon as you do so, Hypothesis will scream an error message at you, saying that you got a ZeroDivisionError . Let us try to understand what Hypothesis is telling us by looking at the bottom of the output of running the tests:

This shows that the tests failed with a ZeroDivisionError , and the line that reads “Falsifying example: …” contains information about the test case that blew our test up. In our case, this was n = 0 and m = 0 . So, Hypothesis is telling us that when the arguments are both zero, our function fails because it raises a ZeroDivisionError .

The problem lies in the usage of the modulo operator % , which does not accept a right argument of zero. The right argument of % is zero if n is zero, in which case the result should be m . Adding an if statement is a possible fix for this:

However, Hypothesis still won’t be happy. If you run your test again, with pytest gcd.py , you get this output:

This time, the issue is with the very first property that should be satisfied. We can know this because Hypothesis tells us which assertion failed while also telling us which arguments led to that failure. In fact, if we look further up the output, this is what we see:

This time, the issue isn’t really our fault. The greatest common divisor is not defined when both arguments are zero, so it is ok for our function to not know how to handle this case. Thankfully, Hypothesis lets us customise the strategies used to generate arguments. In particular, we can say that we only want to generate integers between a minimum and a maximum value.

The code below changes the test so that it only runs with integers between 1 and 100 for the first argument ( n ) and between -500 and 500 for the second argument ( m ):

That is it! This was your very first property-based test.

Why bother with Property-Based Testing?

To write good property-based tests you need to analyse your problem carefully to be able to write down all the properties that are relevant. This may look quite cumbersome. However, using a tool like Hypothesis has very practical benefits:

  • Hypothesis can generate dozens or hundreds of tests for you, while you would typically only write a couple of them;
  • tests you write by hand will typically only cover the edge cases you have already thought of, whereas Hypothesis will not have that bias; and
  • thinking about your solution to figure out its properties can give you deeper insights into the problem, leading to even better solutions.

These are just some of the advantages of using property-based testing.

Using Hypothesis for free

There are some scenarios in which you can use property-based testing essentially for free (that is, without needing to spend your precious brain power), because you don’t even need to think about properties. Let’s look at two such scenarios.

Testing Roundtripping

Hypothesis is a great tool to test roundtripping. For example, the built-in functions int and str in Python should roundtrip. That is, if x is an integer, then int(str(x)) should still be x . In other words, converting x to a string and then to an integer again should not change its value.

We can write a simple property-based test for this, leveraging the fact that Hypothesis generates dozens of tests for us. Save this in a Python file:

Now, run this file with pytest. Your test should pass!

Did you notice that, in our gcd example above, the very first time we ran Hypothesis we got a ZeroDivisionError ? The test failed, not because of an assert, but simply because our function crashed.

Hypothesis can be used for tests like this. You do not need to write a single property because you are just using Hypothesis to see if your function can deal with different inputs. Of course, even a buggy function can pass a fuzzing test like this, but this helps catch some types of bugs in your code.

Comparing against a gold standard

Sometimes, you want to test a function f that computes something that could be computed by some other function f_alternative . You know this other function is correct (that is why you call it a “gold standard”), but you cannot use it in production because it is very slow, or it consumes a lot of resources, or for some other combination of reasons.

Provided it is ok to use the function f_alternative in a testing environment, a suitable test would be something like the following:

When possible, this type of test is very powerful because it directly tests if your solution is correct for a series of different arguments.

For example, if you refactored an old piece of code, perhaps to simplify its logic or to make it more performant, Hypothesis will give you confidence that your new function will work as it should.

The importance of property completeness

In this section you will learn about the importance of being thorough when listing the properties that are relevant. To illustrate the point, we will reason about property-based tests for a function called my_sort , which is your implementation of a sorting function that accepts lists of integers.

The results are sorted

When thinking about the properties that the result of my_sort satisfies, you come up with the obvious thing: the result of my_sort must be sorted.

So, you set out to assert this property is satisfied:

Now, the only thing missing is the appropriate strategy to generate lists of integers. Thankfully, Hypothesis knows a strategy to generate lists, which is called lists . All you need to do is give it a strategy that generates the elements of the list.

Now that the test has been written, here is a challenge. Copy this code into a file called my_sort.py . Between the import and the test, define a function my_sort that is wrong (that is, write a function that does not sort lists of integers) and yet passes the test if you run it with pytest my_sort.py . (Keep reading when you are ready for spoilers.)

Notice that the only property that we are testing is “all elements of the result are sorted”, so we can return whatever result we want , as long as it is sorted. Here is my fake implementation of my_sort :

This passes our property test and yet is clearly wrong because we always return an empty list. So, are we missing a property? Perhaps.

The lengths are the same

We can try to add another obvious property, which is that the input and the output should have the same length, obviously. This means that our test becomes:

Now that the test has been improved, here is a challenge. Write a new version of my_sort that passes this test and is still wrong. (Keep reading when you are ready for spoilers.)

Notice that we are only testing for the length of the result and whether or not its elements are sorted, but we don’t test which elements are contained in the result. Thus, this fake implementation of my_sort would work:

Use the right numbers

To fix this, we can add the obvious property that the result should only contain numbers from the original list. With sets, this is easy to test:

Now that our test has been improved, I have yet another challenge. Can you write a fake version of my_sort that passes this test? (Keep reading when you are ready for spoilers).

Here is a fake version of my_sort that passes the test above:

The issue here is that we were not precise enough with our new property. In fact, set(result) <= set(int_list) ensures that we only use numbers that were available in the original list, but it doesn’t ensure that we use all of them. What is more, we can’t fix it by simply replacing the <= with == . Can you see why?I will give you a hint. If you just replace the <= with a == , so that the test becomes:

then you can write this passing version of my_sort that is still wrong:

This version is wrong because it reuses the largest element of the original list without respecting the number of times each integer should be used. For example, for the input list [1, 1, 2, 2, 3, 3] the result should be unchanged, whereas this version of my_sort returns [1, 2, 3, 3, 3, 3] .

The final test

A test that is correct and complete would have to take into account how many times each number appears in the original list, which is something the built-in set is not prepared to do. Instead, one could use the collections.Counter from the standard library:

So, at this point, your test function test_my_sort is complete. At this point, it is no longer possible to fool the test! That is, the only way the test will pass is if my_sort is a real sorting function.

Use properties and specific examples

This section showed that the properties that you test should be well thought-through and you should strive to come up with a set of properties that are as specific as possible. When in doubt, it is better to have properties that may look redundant over having too few.

Another strategy that you can follow to help mitigate the danger of having come up with an insufficient set of properties is to mix property-based testing with other forms of testing, which is perfectly reasonable.

For example, on top of having the property-based test test_my_sort , you could add the following test:

This article covered two examples of functions to which we added property-based tests. We only covered the basics of using Hypothesis to run property-based tests but, more importantly, we covered the fundamental concepts that enable a developer to reason about and write complete property-based tests.

Property-based testing isn’t a one-size-fits-all solution that means you will never have to write any other type of test, but it does have characteristics that you should take advantage of whenever possible. In particular, we saw that property-based testing with Hypothesis was beneficial in that:

This article also went over a couple of common gotchas when writing property-based tests and listed scenarios in which property-based testing can be used with no overhead.

If you are interested in learning more about Hypothesis and property-based testing, we recommend you take a look at the Hypothesis docs and, in particular, to the page “What you can generate and how” .

CI/CD Weekly Newsletter 🔔

Semaphore uncut podcast 🎙️.

python hypothesis settings

Learn CI/CD

Level up your developer skills to use CI/CD at its max.

5 thoughts on “ Getting Started With Property-Based Testing in Python With Hypothesis and Pytest ”

Awesome intro to property based testing for Python. Thank you, Dan and Rodrigo!

Greeting! Unfortunately, I don’t understand due to translation difficulties. PyCharm writes error messages and does not run the codes. The installation was done fine, check ok. I created a virtual environment. I would like a single good, usable, complete code, an example of what to write in gcd.py and what in test_gcd.py, which the development environment runs without errors. Thanks!

Thanks for article!

“it is better to have properties that may look redundant over having too few” Isn’t it the case with: assert len(result) == len(int_list) and: assert Counter(result) == Counter(int_list) ? I mean: is it possible to satisfy the second condition without satisfying the first ?

Yes. One case could be if result = [0,1], int_list = [0,1,1], and the implementation of Counter returns unique count.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Avatar

CI/CD Weekly Newsletter

🔔 Get notified when the new articles and interviews are out.

Pytest With Eric

How to Use Hypothesis and Pytest for Robust Property-Based Testing in Python

There will always be cases you didn’t consider, making this an ongoing maintenance job. Unit testing solves only some of these issues.

Example-Based Testing vs Property-Based Testing

Project set up, getting started, prerequisites, simple example, source code, simple example — unit tests, example-based testing, running the unit test, property-based testing, complex example, source code, complex example — unit tests, discover bugs with hypothesis, define your own hypothesis strategies, model-based testing in hypothesis, additional reading.

Statology

Statistics Made Easy

How to Perform Hypothesis Testing in Python (With Examples)

A hypothesis test is a formal statistical test we use to reject or fail to reject some statistical hypothesis.

This tutorial explains how to perform the following hypothesis tests in Python:

  • One sample t-test
  • Two sample t-test
  • Paired samples t-test

Let’s jump in!

Example 1: One Sample t-test in Python

A one sample t-test is used to test whether or not the mean of a population is equal to some value.

For example, suppose we want to know whether or not the mean weight of a certain species of some turtle is equal to 310 pounds.

To test this, we go out and collect a simple random sample of turtles with the following weights:

Weights : 300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303

The following code shows how to use the ttest_1samp() function from the scipy.stats library to perform a one sample t-test:

The t test statistic is  -1.5848 and the corresponding two-sided p-value is  0.1389 .

The two hypotheses for this particular one sample t-test are as follows:

  • H 0 :  µ = 310 (the mean weight for this species of turtle is 310 pounds)
  • H A :  µ ≠310 (the mean weight is not  310 pounds)

Because the p-value of our test (0.1389) is greater than alpha = 0.05, we fail to reject the null hypothesis of the test.

We do not have sufficient evidence to say that the mean weight for this particular species of turtle is different from 310 pounds.

Example 2: Two Sample t-test in Python

A two sample t-test is used to test whether or not the means of two populations are equal.

For example, suppose we want to know whether or not the mean weight between two different species of turtles is equal.

To test this, we collect a simple random sample of turtles from each species with the following weights:

Sample 1 : 300, 315, 320, 311, 314, 309, 300, 308, 305, 303, 305, 301, 303

Sample 2 : 335, 329, 322, 321, 324, 319, 304, 308, 305, 311, 307, 300, 305

The following code shows how to use the ttest_ind() function from the scipy.stats library to perform this two sample t-test:

The t test statistic is – 2.1009 and the corresponding two-sided p-value is 0.0463 .

The two hypotheses for this particular two sample t-test are as follows:

  • H 0 :  µ 1 = µ 2 (the mean weight between the two species is equal)
  • H A :  µ 1 ≠ µ 2 (the mean weight between the two species is not equal)

Since the p-value of the test (0.0463) is less than .05, we reject the null hypothesis.

This means we have sufficient evidence to say that the mean weight between the two species is not equal.

Example 3: Paired Samples t-test in Python

A paired samples t-test is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample.

For example, suppose we want to know whether or not a certain training program is able to increase the max vertical jump (in inches) of basketball players.

To test this, we may recruit a simple random sample of 12 college basketball players and measure each of their max vertical jumps. Then, we may have each player use the training program for one month and then measure their max vertical jump again at the end of the month.

The following data shows the max jump height (in inches) before and after using the training program for each player:

Before : 22, 24, 20, 19, 19, 20, 22, 25, 24, 23, 22, 21

After : 23, 25, 20, 24, 18, 22, 23, 28, 24, 25, 24, 20

The following code shows how to use the ttest_rel() function from the scipy.stats library to perform this paired samples t-test:

The t test statistic is – 2.5289  and the corresponding two-sided p-value is 0.0280 .

The two hypotheses for this particular paired samples t-test are as follows:

  • H 0 :  µ 1 = µ 2 (the mean jump height before and after using the program is equal)
  • H A :  µ 1 ≠ µ 2 (the mean jump height before and after using the program is not equal)

Since the p-value of the test (0.0280) is less than .05, we reject the null hypothesis.

This means we have sufficient evidence to say that the mean jump height before and after using the training program is not equal.

Additional Resources

You can use the following online calculators to automatically perform various t-tests:

One Sample t-test Calculator Two Sample t-test Calculator Paired Samples t-test Calculator

Featured Posts

5 Regularization Techniques You Should Know

Hey there. My name is Zach Bobbitt. I have a Masters of Science degree in Applied Statistics and I’ve worked on machine learning algorithms for professional businesses in both healthcare and retail. I’m passionate about statistics, machine learning, and data visualization and I created Statology to be a resource for both students and teachers alike.  My goal with this site is to help you learn statistics through using simple terms, plenty of real-world examples, and helpful illustrations.

One Reply to “How to Perform Hypothesis Testing in Python (With Examples)”

Nice post. Could you please clear my one doubt regarding alpha value . i can see in your example, it is a two tail test. As i understand in that case our alpha value should be alpha/2 i.e 0.025 . Here you are taking it as 0.05. ?

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Join the Statology Community

Sign up to receive Statology's exclusive study resource: 100 practice problems with step-by-step solutions. Plus, get our latest insights, tutorials, and data analysis tips straight to your inbox!

By subscribing you accept Statology's Privacy Policy.

python hypothesis settings

Visual Design.

Upgrade to get unlimited access ($10 one off payment).

7 Tips for Beginner to Future-Proof your Machine Learning Project

7 Tips for Beginner to Future-Proof your Machine Learning Project

LLM Prompt Engineering Techniques for Knowledge Graph Integration

LLM Prompt Engineering Techniques for Knowledge Graph Integration

Develop a Data Analytics Web App in 3 Steps

Develop a Data Analytics Web App in 3 Steps

What Does ChatGPT Say About Machine Learning Trend and How Can We Prepare For It?

What Does ChatGPT Say About Machine Learning Trend and How Can We Prepare For It?

  • Apr 14, 2022

An Interactive Guide to Hypothesis Testing in Python

Updated: Jun 12, 2022

Statistical Test in Python Cheatsheet

upgrade and grab the cheatsheet from our infographics gallery

What is hypothesis testing.

Hypothesis testing is an essential part in inferential statistics where we use observed data in a sample to draw conclusions about unobserved data - often the population.

Implication of hypothesis testing:

clinical research: widely used in psychology, biology and healthcare research to examine the effectiveness of clinical trials

A/B testing: can be applied in business context to improve conversions through testing different versions of campaign incentives, website designs ...

feature selection in machine learning: filter-based feature selection methods use different statistical tests to determine the feature importance

college or university: well, if you major in statistics or data science, it is likely to appear in your exams

For a brief video walkthrough along with the blog, check out my YouTube channel.

4 Steps in Hypothesis testing

Step 1. define null and alternative hypothesis.

Null hypothesis (H0) can be stated differently depends on the statistical tests, but generalize to the claim that no difference, no relationship or no dependency exists between two or more variables.

Alternative hypothesis (H1) is contradictory to the null hypothesis and it claims that relationships exist. It is the hypothesis that we would like to prove right. However, a more conservational approach is favored in statistics where we always assume null hypothesis is true and try to find evidence to reject the null hypothesis.

Step 2. Choose the appropriate test

Common Types of Statistical Testing including t-tests, z-tests, anova test and chi-square test

how to choose the statistical test

T-test: compare two groups/categories of numeric variables with small sample size

Z-test: compare two groups/categories of numeric variables with large sample size

ANOVA test: compare the difference between two or more groups/categories of numeric variables

Chi-Squared test: examine the relationship between two categorical variables

Correlation test: examine the relationship between two numeric variables

Step 3. Calculate the p-value

How p value is calculated primarily depends on the statistical testing selected. Firstly, based on the mean and standard deviation of the observed sample data, we are able to derive the test statistics value (e.g. t-statistics, f-statistics). Then calculate the probability of getting this test statistics given the distribution of the null hypothesis, we will find out the p-value. We will use some examples to demonstrate this in more detail.

Step 4. Determine the statistical significance

p value is then compared against the significance level (also noted as alpha value) to determine whether there is sufficient evidence to reject the null hypothesis. The significance level is a predetermined probability threshold - commonly 0.05. If p value is larger than the threshold, it means that the value is likely to occur in the distribution when the null hypothesis is true. On the other hand, if lower than significance level, it means it is very unlikely to occur in the null hypothesis distribution - hence reject the null hypothesis.

Hypothesis Testing with Examples

Kaggle dataset “ Customer Personality Analysis” is used in this case study to demonstrate different types of statistical test. T-test, ANOVA and Chi-Square test are sensitive to large sample size, and almost certainly will generate very small p-value when sample size is large . Therefore, I took a random sample (size of 100) from the original data:

T-test is used when we want to test the relationship between a numeric variable and a categorical variable.There are three main types of t-test.

one sample t-test: test the mean of one group against a constant value

two sample t-test: test the difference of means between two groups

paired sample t-test: test the difference of means between two measurements of the same subject

For example, if I would like to test whether “Recency” (the number of days since customer’s last purchase - numeric value) contributes to the prediction of “Response” (whether the customer accepted the offer in the last campaign - categorical value), I can use a two sample t-test.

The first sample would be the “Recency” of customers who accepted the offer:

The second sample would be the “Recency” of customers who rejected the offer:

To compare the “Recency” of these two groups intuitively, we can use histogram (or distplot) to show the distributions.

python hypothesis settings

It appears that positive response have lower Recency compared to negative response. To quantify the difference and make it more scientific, let’s follow the steps in hypothesis testing and carry out a t-test.

Step1. define null and alternative hypothesis

null: there is no difference in Recency between the customers who accepted the offer in the last campaign and who did not accept the offer

alternative: customers who accepted the offer has lower Recency compared to customers who did not accept the offer

Step 2. choose the appropriate test

To test the difference between two independent samples, two-sample t-test is the most appropriate statistical test which follows student t-distribution. The shape of student-t distribution is determined by the degree of freedom, calculated as the sum of two sample size minus 2.

In python, simply import the library scipy.stats and create the t-distribution as below.

Step 3. calculate the p-value

There are some handy functions in Python calculate the probability in a distribution. For any x covered in the range of the distribution, pdf(x) is the probability density function of x — which can be represented as the orange line below, and cdf(x) is the cumulative density function of x — which can be seen as the cumulative area. In this example, we are testing the alternative hypothesis that — Recency of positive response minus the Recency of negative response is less than 0. Therefore we should use a one-tail test and compare the t-statistics we get against the lowest value in this distribution — therefore p-value can be calculated as cdf(t_statistics) in this case.

python hypothesis settings

ttest_ind() is a handy function for independent t-test in python that has done all of these for us automatically. Pass two samples rececency_P and recency_N as the parameters, and we get the t-statistics and p-value.

t-test in python

Here I use plotly to visualize the p-value in t-distribution. Hover over the line and see how point probability and p-value changes as the x shifts. The area with filled color highlights the p-value we get for this specific test.

Check out the code in our Code Snippet section, if you want to build this yourself.

An interactive visualization of t-distribution with t-statistics vs. significance level.

Step 4. determine the statistical significance

The commonly used significance level threshold is 0.05. Since p-value here (0.024) is smaller than 0.05, we can say that it is statistically significant based on the collected sample. A lower Recency of customer who accepted the offer is likely not occur by chance. This indicates the feature “Response” may be a strong predictor of the target variable “Recency”. And if we would perform feature selection for a model predicting the "Recency" value, "Response" is likely to have high importance.

Now that we know t-test is used to compare the mean of one or two sample groups. What if we want to test more than two samples? Use ANOVA test.

ANOVA examines the difference among groups by calculating the ratio of variance across different groups vs variance within a group . Larger ratio indicates that the difference across groups is a result of the group difference rather than just random chance.

As an example, I use the feature “Kidhome” for the prediction of “NumWebPurchases”. There are three values of “Kidhome” - 0, 1, 2 which naturally forms three groups.

Firstly, visualize the data. I found box plot to be the most aligned visual representation of ANOVA test.

box plot for ANOVA test

It appears there are distinct differences among three groups. So let’s carry out ANOVA test to prove if that’s the case.

1. define hypothesis:

null hypothesis: there is no difference among three groups

alternative hypothesis: there is difference between at least two groups

2. choose the appropriate test: ANOVA test for examining the relationships of numeric values against a categorical value with more than two groups. Similar to t-test, the null hypothesis of ANOVA test also follows a distribution defined by degrees of freedom. The degrees of freedom in ANOVA is determined by number of total samples (n) and the number of groups (k).

dfn = n - 1

dfd = n - k

3. calculate the p-value: To calculate the p-value of the f-statistics, we use the right tail cumulative area of the f-distribution, which is 1 - rv.cdf(x).

python hypothesis settings

To easily get the f-statistics and p-value using Python, we can use the function stats.f_oneway() which returns p-value: 0.00040.

An interactive visualization of f-distribution with f-statistics vs. significance level. (Check out the code in our Code Snippet section, if you want to build this yourself. )

4. determine the statistical significance : Compare the p-value against the significance level 0.05, we can infer that there is strong evidence against the null hypothesis and very likely that there is difference in “NumWebPurchases” between at least two groups.

Chi-Squared Test

Chi-Squared test is for testing the relationship between two categorical variables. The underlying principle is that if two categorical variables are independent, then one categorical variable should have similar composition when the other categorical variable change. Let’s look at the example of whether “Education” and “Response” are independent.

First, use stacked bar chart and contingency table to summary the count of each category.

python hypothesis settings

If these two variables are completely independent to each other (null hypothesis is true), then the proportion of positive Response and negative Response should be the same across all Education groups. It seems like composition are slightly different, but is it significant enough to say there is dependency - let’s run a Chi-Squared test.

null hypothesis: “Education” and “Response” are independent to each other.

alternative hypothesis: “Education” and “Response” are dependent to each other.

2. choose the appropriate test: Chi-Squared test is chosen and you probably found a pattern here, that Chi-distribution is also determined by the degree of freedom which is (row - 1) x (column - 1).

3. calculate the p-value: p value is calculated as the right tail cumulative area: 1 - rv.cdf(x).

python hypothesis settings

Python also provides a useful function to get the chi statistics and p-value given the contingency table.

An interactive visualization of chi-distribution with chi-statistics vs. significance level. (Check out the code in our Code Snippet section, if you want to build this yourself. )

4. determine the statistical significanc e: the p-value here is 0.41, suggesting that it is not statistical significant. Therefore, we cannot reject the null hypothesis that these two categorical variables are independent. This further indicates that “Education” may not be a strong predictor of “Response”.

Thanks for reaching so far, we have covered a lot of contents in this article but still have two important hypothesis tests that are worth discussing separately in upcoming posts.

z-test: test the difference between two categories of numeric variables - when sample size is LARGE

correlation: test the relationship between two numeric variables

Hope you found this article helpful. If you’d like to support my work and see more articles like this, treat me a coffee ☕️ by signing up Premium Membership with $10 one-off purchase.

Take home message.

In this article, we interactively explore and visualize the difference between three common statistical tests: t-test, ANOVA test and Chi-Squared test. We also use examples to walk through essential steps in hypothesis testing:

1. define the null and alternative hypothesis

2. choose the appropriate test

3. calculate the p-value

4. determine the statistical significance

  • Data Science

Recent Posts

How to Self Learn Data Science in 2022

Commentaires

What Is Hypothesis Testing? Types and Python Code Example

MENE-EJEGI OGBEMI

Curiosity has always been a part of human nature. Since the beginning of time, this has been one of the most important tools for birthing civilizations. Still, our curiosity grows — it tests and expands our limits. Humanity has explored the plains of land, water, and air. We've built underwater habitats where we could live for weeks. Our civilization has explored various planets. We've explored land to an unlimited degree.

These things were possible because humans asked questions and searched until they found answers. However, for us to get these answers, a proven method must be used and followed through to validate our results. Historically, philosophers assumed the earth was flat and you would fall off when you reached the edge. While philosophers like Aristotle argued that the earth was spherical based on the formation of the stars, they could not prove it at the time.

This is because they didn't have adequate resources to explore space or mathematically prove Earth's shape. It was a Greek mathematician named Eratosthenes who calculated the earth's circumference with incredible precision. He used scientific methods to show that the Earth was not flat. Since then, other methods have been used to prove the Earth's spherical shape.

When there are questions or statements that are yet to be tested and confirmed based on some scientific method, they are called hypotheses. Basically, we have two types of hypotheses: null and alternate.

A null hypothesis is one's default belief or argument about a subject matter. In the case of the earth's shape, the null hypothesis was that the earth was flat.

An alternate hypothesis is a belief or argument a person might try to establish. Aristotle and Eratosthenes argued that the earth was spherical.

Other examples of a random alternate hypothesis include:

  • The weather may have an impact on a person's mood.
  • More people wear suits on Mondays compared to other days of the week.
  • Children are more likely to be brilliant if both parents are in academia, and so on.

What is Hypothesis Testing?

Hypothesis testing is the act of testing whether a hypothesis or inference is true. When an alternate hypothesis is introduced, we test it against the null hypothesis to know which is correct. Let's use a plant experiment by a 12-year-old student to see how this works.

The hypothesis is that a plant will grow taller when given a certain type of fertilizer. The student takes two samples of the same plant, fertilizes one, and leaves the other unfertilized. He measures the plants' height every few days and records the results in a table.

After a week or two, he compares the final height of both plants to see which grew taller. If the plant given fertilizer grew taller, the hypothesis is established as fact. If not, the hypothesis is not supported. This simple experiment shows how to form a hypothesis, test it experimentally, and analyze the results.

In hypothesis testing, there are two types of error: Type I and Type II.

When we reject the null hypothesis in a case where it is correct, we've committed a Type I error. Type II errors occur when we fail to reject the null hypothesis when it is incorrect.

In our plant experiment above, if the student finds out that both plants' heights are the same at the end of the test period yet opines that fertilizer helps with plant growth, he has committed a Type I error.

However, if the fertilized plant comes out taller and the student records that both plants are the same or that the one without fertilizer grew taller, he has committed a Type II error because he has failed to reject the null hypothesis.

What are the Steps in Hypothesis Testing?

The following steps explain how we can test a hypothesis:

Step #1 - Define the Null and Alternative Hypotheses

Before making any test, we must first define what we are testing and what the default assumption is about the subject. In this article, we'll be testing if the average weight of 10-year-old children is more than 32kg.

Our null hypothesis is that 10 year old children weigh 32 kg on average. Our alternate hypothesis is that the average weight is more than 32kg. Ho denotes a null hypothesis, while H1 denotes an alternate hypothesis.

Step #2 - Choose a Significance Level

The significance level is a threshold for determining if the test is valid. It gives credibility to our hypothesis test to ensure we are not just luck-dependent but have enough evidence to support our claims. We usually set our significance level before conducting our tests. The criterion for determining our significance value is known as p-value.

A lower p-value means that there is stronger evidence against the null hypothesis, and therefore, a greater degree of significance. A p-value of 0.05 is widely accepted to be significant in most fields of science. P-values do not denote the probability of the outcome of the result, they just serve as a benchmark for determining whether our test result is due to chance. For our test, our p-value will be 0.05.

Step #3 - Collect Data and Calculate a Test Statistic

You can obtain your data from online data stores or conduct your research directly. Data can be scraped or researched online. The methodology might depend on the research you are trying to conduct.

We can calculate our test using any of the appropriate hypothesis tests. This can be a T-test, Z-test, Chi-squared, and so on. There are several hypothesis tests, each suiting different purposes and research questions. In this article, we'll use the T-test to run our hypothesis, but I'll explain the Z-test, and chi-squared too.

T-test is used for comparison of two sets of data when we don't know the population standard deviation. It's a parametric test, meaning it makes assumptions about the distribution of the data. These assumptions include that the data is normally distributed and that the variances of the two groups are equal. In a more simple and practical sense, imagine that we have test scores in a class for males and females, but we don't know how different or similar these scores are. We can use a t-test to see if there's a real difference.

The Z-test is used for comparison between two sets of data when the population standard deviation is known. It is also a parametric test, but it makes fewer assumptions about the distribution of data. The z-test assumes that the data is normally distributed, but it does not assume that the variances of the two groups are equal. In our class test example, with the t-test, we can say that if we already know how spread out the scores are in both groups, we can now use the z-test to see if there's a difference in the average scores.

The Chi-squared test is used to compare two or more categorical variables. The chi-squared test is a non-parametric test, meaning it does not make any assumptions about the distribution of data. It can be used to test a variety of hypotheses, including whether two or more groups have equal proportions.

Step #4 - Decide on the Null Hypothesis Based on the Test Statistic and Significance Level

After conducting our test and calculating the test statistic, we can compare its value to the predetermined significance level. If the test statistic falls beyond the significance level, we can decide to reject the null hypothesis, indicating that there is sufficient evidence to support our alternative hypothesis.

On the other contrary, if the test statistic does not exceed the significance level, we fail to reject the null hypothesis, signifying that we do not have enough statistical evidence to conclude in favor of the alternative hypothesis.

Step #5 - Interpret the Results

Depending on the decision made in the previous step, we can interpret the result in the context of our study and the practical implications. For our case study, we can interpret whether we have significant evidence to support our claim that the average weight of 10 year old children is more than 32kg or not.

For our test, we are generating random dummy data for the weight of the children. We'll use a t-test to evaluate whether our hypothesis is correct or not.

For a better understanding, let's look at what each block of code does.

The first block is the import statement, where we import numpy and scipy.stats . Numpy is a Python library used for scientific computing. It has a large library of functions for working with arrays. Scipy is a library for mathematical functions. It has a stat module for performing statistical functions, and that's what we'll be using for our t-test.

The weights of the children were generated at random since we aren't working with an actual dataset. The random module within the Numpy library provides a function for generating random numbers, which is randint .

The randint function takes three arguments. The first (20) is the lower bound of the random numbers to be generated. The second (40) is the upper bound, and the third (100) specifies the number of random integers to generate. That is, we are generating random weight values for 100 children. In real circumstances, these weight samples would have been obtained by taking the weight of the required number of children needed for the test.

Using the code above, we declared our null and alternate hypotheses stating the average weight of a 10-year-old in both cases.

t_stat and p_value are the variables in which we'll store the results of our functions. stats.ttest_1samp is the function that calculates our test. It takes in two variables, the first is the data variable that stores the array of weights for children, and the second (32) is the value against which we'll test the mean of our array of weights or dataset in cases where we are using a real-world dataset.

The code above prints both values for t_stats and p_value .

Lastly, we evaluated our p_value against our significance value, which is 0.05. If our p_value is less than 0.05, we reject the null hypothesis. Otherwise, we fail to reject the null hypothesis. Below is the output of this program. Our null hypothesis was rejected.

In this article, we discussed the importance of hypothesis testing. We highlighted how science has advanced human knowledge and civilization through formulating and testing hypotheses.

We discussed Type I and Type II errors in hypothesis testing and how they underscore the importance of careful consideration and analysis in scientific inquiry. It reinforces the idea that conclusions should be drawn based on thorough statistical analysis rather than assumptions or biases.

We also generated a sample dataset using the relevant Python libraries and used the needed functions to calculate and test our alternate hypothesis.

Thank you for reading! Please follow me on LinkedIn where I also post more data related content.

Technical support engineer with 4 years of experience & 6 months in data analytics. Passionate about data science, programming, & statistics.

If you read this far, thank the author to show them you care. Say Thanks

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

  • Pydantic Settings
  • Pydantic People

Hypothesis is the Python library for property-based testing . Hypothesis can infer how to construct type-annotated classes, and supports builtin types, many standard library types, and generic types from the typing and typing_extensions modules by default.

Pydantic v2.0 drops built-in support for Hypothesis and no more ships with the integrated Hypothesis plugin.

We are temporarily removing the Hypothesis plugin in favor of studying a different mechanism. For more information, see the issue annotated-types/annotated-types#37 .

The Hypothesis plugin may be back in a future release. Subscribe to pydantic/pydantic#4682 for updates.

  • Android Malware 22
  • Artificial Intelligence 4
  • Check Point Research Publications 370
  • Cloud Security 1
  • Data & Threat Intelligence 1
  • Data Analysis 0
  • Global Cyber Attack Reports 303
  • How To Guides 11
  • Ransomware 1
  • Russo-Ukrainian War 1
  • Security Report 1
  • Threat and data analysis 0
  • Threat Research 169
  • Web 3.0 Security 8

python hypothesis settings

Foxit PDF “Flawed Design” Exploitation

Research by: Antonis Terefos

Introduction

PDF (Portable Document Format) files have become an integral part of modern digital communication. Renowned for their universality and fidelity, PDFs offer a robust platform for sharing documents across diverse computing environments. PDFs have evolved into a standard format for presenting text, images, and multimedia content with consistent layout and formatting, irrespective of the software, hardware, or operating system used to view them. This versatility has made PDFs indispensable in fields ranging from business and academia to government and personal use, serving as a reliable means of exchanging information in a structured and accessible manner.

In the realm of PDF viewers, Adobe Acrobat Reader reigns supreme as the industry’s dominant player. However, while Adobe Acrobat Reader holds the biggest market share, notable contenders are vying for attention, with Foxit PDF Reader being a prominent alternative. With more than 700 million users located in more than 200 countries and significant customers in the government sector like the US Air Force, Army, Navy & Missile Defense Agency, as well as in the technological sector like Google, Microsoft, Intel & Dell.

Check Point Research has identified an unusual pattern of behavior involving PDF exploitation, mainly targeting users of Foxit Reader. This exploit triggers security warnings that could deceive unsuspecting users into executing harmful commands. Check Point Research has observed variants of this exploit being actively utilized in the wild. Its low detection rate is attributed to the prevalent use of Adobe Reader in most sandboxes or antivirus solutions, as Adobe Reader is not susceptible to this specific exploit. Additionally, Check Point Research has observed various exploit builders, ranging from those coded in .NET to those written in Python, being used to deploy this exploit.

This exploit has been used by multiple threat actors, from e-crime to espionage. The campaigns have taken advantage of the low detection rate and protection against this exploit where actors have been spotted sharing those malicious PDF files even using nontraditional means such as Facebook. Check Point Research isolated and investigated three in-depth cases, ranging from an espionage campaign with a military focus to e-crime with multiple links and tools, achieving impressive attack chains.

The “Flawed Design”

Check Point Research discovered that samples from  EXPMON  produced unusual behavior when executed with Foxit Reader compared to Adobe Reader. The exploitation of victims occurs through a flawed design in Foxit Reader, which shows as a default option the “OK,” which could lead the majority of the targets to ignore those messages and execute the malicious code. The malicious command is executed once the victim “Agrees” to the default options twice.

The victim scenario is shown below: when opening the file, we come across the first pop-up, the default option “Trust once,” which is the correct approach.

Figure 1 - First pop-up warning.

Once clicking “OK“, the target comes across a second pop-up. If there were any chance the targeted user would read the first message, the second would be “Agreed” without reading. This is the case that the Threat Actors are taking advantage of this flawed logic and common human behavior, which provides as the default choice the most “harmful” one.

python hypothesis settings

Attaching a debugger, we can observe the executed command and, with the use of PowerShell, will download and execute a malicious file.

python hypothesis settings

Executed Command:

Analyzing the PDF file statically, we can obtain the executed logic behind it.

The initial link, which references the root of the PDF, is shown using the key  /Root . In this case, points to object  1 . Following this object, we can observe the key  /OpenAction , which by itself doesn’t indicate malicious activity. This is a key in a PDF file’s catalog dictionary. It specifies an action to be performed automatically when the document is opened. The next keys are responsible for the execution of the command,  /S /Launch  indicating to the Foxit Reader to launch an external application and  /Win  providing the information needed for the launched application. Later, keys  /F  and  /P  provide the application to execute and its parameters.

This sequence of keys triggers the two previous warnings in Foxit Reader, and with the flawed design and careless users, it is able to execute malicious commands that appear highly leveraged by threat actors. Meanwhile, the key  /Launch  appears not to be triggered for Adobe Reader.

Campaigns utilizing PDF Exploit

Check Point Research collected a plethora of malicious PDF files, taking advantage of the specific exploit targeting Foxit Reader users. Despite the majority of sandboxes and VirusTotal failing to trigger the exploit, given Adobe’s prevalence as the primary PDF Reader, numerous files from previous campaigns remained unretrieved. Nonetheless, we acquired a sufficient amount of dropped payloads from various origins, revealing a diverse range of malicious tools within the infection chain and prominent malware families such as:

  • Agent-Tesla
  • NanoCore RAT

We meticulously isolated and conducted in-depth research on particular instances where the initial PDF samples resulted in interesting campaigns. Through the analysis, we aimed to uncover unique insights into the nature and mechanisms of these infections.

Case I. Windows & Android Botnets with a Scent of Espionage

While researching, we stumbled upon a malicious PDF file with a suspicious “military” related name, “ Regarding Invitation to attend defence services Asia 2024 and National Security Asia 2024.pdf ”. The PDF was possibly distributed via a link to download. The campaign’s attack chain is simple, with the PDF downloading and executing a downloader of two executables, which will later on collect and upload various files such as Documents, Images, Archives, and Databases.

python hypothesis settings

Command & Control

The downloader provides no functionality other than downloading and executing the two payloads, and the information sent to the C&C, which registers the bot, only displays the victims that received the following stage payloads.

Figure 5 - Downloader’s Control Panel, Bot information.

Based on the creation date of those “bot-registration” files, we obtained the campaign dates and number of Bots added to the Botnet per day. The primary campaign appears to have occurred on April 5, 2024, which is the day with the most registered bots.

Figure 6 - Registered Bots per Day.

The attack chain and the use of specific tools testify to a campaign focused on espionage, and further findings of android infections using Rafel RAT testify to this assumption even further. Based on the obtained victim data, the Threat Actor has the capability of performing hybrid campaigns, which also resulted in a Two Factor Authentication (2FA) bypass. Check Point Research considers these campaigns to have been performed by the  APT-C-35 / DoNot Team .

Windows Campaign Technical Analysis

The PDF document was still hosted on the C&C, suggesting it could be downloaded using a download link instead of being sent as a file to potential victims.

Figure 7 - Distributing Server Open-directory.

Check Point Research analyzed the specific PDF document and discovered it was built using an open-source  PDF Builder , released on February 13, 2024. The command used once the “exploit” is triggered downloads an executable file from a remote server and executes it.

Machine Information

The executed downloader collects machine Information and writes it into “ %Appdata%/TestLog/$PC_Name.txt ”:

  • Computer name

String decryption

The malware contains strings important to its functionality and is encrypted with a custom algorithm.

Figure 8 - Decryption Algorithm.

Network communication

The downloader has an unusual approach to retrieving the data that will be sent. It enumerates the files inside the folder  %Appdata%/TestLog/  and uploads it to the C&C:  hxxps://mailservicess.com/ghijkl/ghijkl/index.php

After registering the bot to the C&C, it downloads two payloads and stores them as %Appdata%/Intel/index.exe and %Appdata%/Intel/upload.exe . Both are executed with parameters “ pp ” with a “big” time difference between each other.

Persistence

The malware copies itself at  %Appdata%/Intel/Mozila/Systems.exe  and sets a Run registry path “ SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\Run ” named “ TailoredExperiencesWithDiagnosticDataEnabled ” and values the copied path.

Downloaded Payloads

The first payload, “ in.exe ”, stored as “ index.exe ”, does not contain any network-related functionality. It is used for listing files inside the specific root directories  C:\\ ,  D:\\ ,  E:\\ ,  F:\\ ,  G:\\ ,  H:\\ ,  I:\\  and  Z:\\  and copies files with the below extensions to folder  %AppData%/htdocs/ .

A text summary of all copied files will be created at  %AppData%/output.exe .

The second payload, “ up.exe ,” stored as “ upload.exe ,” is executed after some time from the first payload and uses a similar string decryption to the downloader.

The uploader enumerates the files from  %Appdata%/htdocs/  and uploads them to the C&C using the same network communication used for the downloader.

The group has used those two downloaded payloads, but through further research, we discovered another tool that could be dropped depending on the interests of the group. The internal tool names are:

  • indexer , which copies and makes a summary of files of interest.
  • upload , a tool that uploads previous files
  • screen,  a tool making screenshots and saves them to the same folder to be picked up by  upload .

Based on analyzed tools, we believe that further undiscovered tools could exist that serve different needs, such as stealers, which would drop their results into the mentioned folder so the upload tool could send them to the C&C.

Check Point Research also observed evidence of other malware and tooling from directories discovered on the C&C, but we haven’t managed to obtain any samples that could further verify our findings. The folders we discovered were:

  • /AhMyth/ , is an open-source Android RAT
  • /sliver/ , an open-source cross-platform red team framework similar to Cobalt Strike.
  • /Keres/,  is a PowerShell reverse-shell backdoor with persistence for Windows and Linux.

Case II. Chained-Campaign

During this campaign, the multiple links to follow, commands, and files executed in order to result in a stealer and two mines. The initial part of the infection chain was achieved with a malicious PDF document targeting Foxit PDF Reader users. The file’s name is “ swift v2.pdf ”, possibly mainly targeting users from the United States, among other countries.

Figure 9 - Attack Chain.

To this day, the PDF file still has a low detection rate among antivirus solutions, posing an even bigger threat. In one of the campaigns, the Threat Actor distributed it also via Facebook, passing undetected by the Social Media’s malware detectors.

Figure 10 - VirusTotal low detection rate of PDF file.

Campaign Technical Analysis

Analyzing statically, the command triggered is  cmd.exe , and the malicious BAT file is downloaded by executing  curl .

The malicious payload opens the browser on a Facebook page; we are not exactly sure what this action is done for, possibly to distract the user from the malicious activities to be performed or from the empty PDF page. We managed to obtain similar BAT payloads with different legitimate pages opened, such as Amazon. One hypothesis could be that the website opened could indicate the platform where the users were targeted.

The “First payload” will download a second BAT file and store it in the  %Startup%  folder as  WindowsUpdate.bat  to maintain persistence. On reboot, the machine will execute using PowerShell, a Python file.

The “First payload” once dropping the persistence BAT file downloads and “installs” Python 3.9 at  C:\\Users\\Public\\python . At this point, it is even clearer that the final payload will be a Python file, which will be downloaded again using  curl  and then executed.

This Python file is a Loader that executes dynamically downloaded code. The first  exec  call will download an obfuscated Python info stealer and Miner dropper and the second  exec  will execute it. This info stealer targets only Chrome and Edge browsers and steals user’s credentials and cookies. In order to retrieve the actual C&C, the malware makes a GET request and then a POST to  /up/cookie-password-all  with the user’s Personal Identifiable Information (PII).

Figure 11 - Actual C&C.

For “closing”, the malware makes two last GET requests to retrieve the actual URL for the miners to drop. Using PowerShell commands, downloads unzips and executes the miners.

Both of the miners are stored in Gitlab ( @topworld20241 ), and both of the ZIP archives contain the file to be executed  config.vbs  with the instructions and configuration of each miner.

python hypothesis settings

Case III. Python Stealer with Low Detection

Another way of delivering the malicious end file could be more direct, such as downloading the malicious file from DiscordApp and executing it. This was the case with the below PDF infection chain downloading a malicious Python file.

Figure 13 - Blank-Grabber low detection rate

Python files are not the usual suspects, which is testified even by the low detection rate; even more shocking is that this Python stealer is an open-source project called  Blank-Grabber  and not a newly discovered malware.

The PDF executes PowerShell and downloads the malicious file from DiscordApp, resulting in a “legitimate” appearing network traffic. The Python malware is then downloaded as  lol.pyw  and executed on the victim’s machine.

The malware is functional and possesses many features, from a Graphical Builder to UAC Bypass, Anti VM, and stealing capabilities from various browsers and applications.

Figure 14 - Features as listed on the GitHub project.

What is interesting to see is the  class VmProtect , which lists all the anti-VM techniques.

From identifying known:

  • Computer names
  • Users names
  • Registry key & values
  • killing running tasks related to Virtual machines or any other malware-reversing related tools.
  • Making Internet-related checks regarding the emulation of the network and checking if the system is hosted online.

Another interesting part is the function, which results in UAC bypass,  def UACbypass .

Blank-Grabber appears to be a fully functional open-source infostealer, and its low detection rate makes it an even bigger threat for targeted users.

Case IV. From PDF to PDF to … Remcos

Another interesting case occurred when a malicious PDF included a hyperlink to an attachment hosted on  trello.com . Upon downloading, it revealed a secondary PDF file containing malicious code, which takes advantage of this “exploitation” of Foxit Reader users. The attack chain is once again impressive, with multiple files being dropped in order to infect the victim with the final payload. In total, more than 10 files were executed, with the final malware  Remcos  RAT being injected into memory using the  DynnamicWrapperX .

Figure 15 - Attack Chain leading to Remcos RAT

The Threat Actor performing this campaign,  @Silentkillertv , appears to be selling also malicious tools via Telegram. On the 27th of April, the Threat Actor published a PDF Exploit, which targets Foxit PDF Reader and has  “100% Bypass of Anti-Viruses”  as well as is able to bypass  “Gmail, Yahoo, Facebook, and Hotmail file sharing restrictions” .

Figure 16 - Telegram Channel.

The first PDF file contained a malicious hyperlink that downloaded a PDF file named “ Facebook_Adversting_project.pdf ”.

Figure 17 - First PDF file.

Once clicking the link, the victim receives the second PDF file, which is hosted on trello.com a legitimate website. Similar to Discord, Threat Actors have been taking advantage of legitimate websites in order to host and distribute malicious files.

Figure 18 - The second PDF hosted on trello.com.

The file was uploaded on the 27th of April by “ Bechtelar Libby @bechtelarlibby ”.

Figure 19 -

The user’s initial activity seems to date back to March 1st, 2024. Judging by the file and folder names generated by the suspicious account, it appears that the targeted countries included Vietnam and Korea, among others.

Figure 20 - Activity traced back to the 1st of March.

In the second PDF, the exploitation targeting Foxit users is executed through the following PowerShell command:

At this point, multiple “links”/files need to be followed in order to retrieve the final payload. The first payload is downloaded as  NCGHDFHGTDFJMDFGKJHFTYFUKFYU.LNK  and is a  .lnk . This file downloads using curl a  .hta  file from a remote server and executes it from this location  %AppData%\STARTGOVFFGHJFKJHFTFDGHJF.HTA .

The HTA file initiates two requests to the identical server, fetching two files. One is a VBScript file, while the other is a genuine image, utilized as a decoy. Notably, this HTA file contained comments written in Arabic.

The third payload is stored as  %temp%\FGHJFTFDHBJVJHGVHJKFVJGTFKHFJH.VBS  and is executed before the genuine image. This VBScript is straightforward, downloading additional VBScript code and executing the “response” accordingly.

The VBScript code executes the following command:

The fifth payload is yet another  .hta  file that communicates with the same endpoint. It downloads and executes another VBScript file (sixth), which in turn downloads yet another VBScript file (seventh).

At this stage, the attack chain employed two PDF files employing distinct methods of “exploitation” and entailed seven requests and executions of scripting language files. The seventh payload (VBS) contains embedded Base64 strings.

  • DynamicWrapperX  Loader, dynwrapx.dll (stored as  AUTOUPDATESTART.dll )
  • Shellcode, to be injected into the Loader

Once the injection process is completed, it proceeds to load and execute the Shellcode, which subsequently decrypts the malicious executable. The infection ultimately manifests as Remcos RAT with the command and control server located at  139.99.85[.]106:2404 , operating under the botnet name “ Telegram : @Silentkillertv ”. Another instance of Remcos, identified by the hash  2266f701f749d4f393b8a123bd7208ec7d5b18bbd22eb47853b906686327ad59 , also utilizes the same command and control server. However, in this case, the botnet name was “ RemoteHost ”.

Check Point managed to uncover various “online-fingerprints,” ranging from YouTube and TikTok accounts to Telegram accounts and channels established by the actor. These platforms were utilized to disseminate malicious tools and resources.

Figure 21 - <code>@Silentkillertv</code> YouTube Chanel.

Telegram Message from the Threat Actor:

After comprehending the exploit and identifying its key components, we initiated a hunt for additional malicious samples. Among the pool of collected files, we discovered several .NET and Python files that triggered our detection rule. Upon closer examination, we determined that these files were, in fact, the builders responsible for generating malicious samples.

Regardless of the programming language, all builders exhibit a consistent structure. The PDF template utilized for the exploit includes placeholder text, which is intended to be substituted once the user provides input for the URL from which to download the malicious file.

Python Builders

Check Point obtained two Python Builders, which were developed by the same author as they feature identical Python code. The only variation lies in the PowerShell command embedded in the PDF exploit template.

python hypothesis settings

Figure 23 – Python Builder Source Code.

The command employed in this builder initially utilizes CMD, which then triggers PowerShell.

python hypothesis settings

In the other Python builder, instead of dropping the payload as a  .vbs  file, it is dropped as a  .exe  file.

Once the actor has successfully built the PDF exploit, the final message is written in Portuguese:  “Payload generated successfully .”

.NET Builders

For .NET, we obtained multiple Builders “Avict Softwares I Exploit PDF”, “PDF Exploit Builder 2023”, and “FuckCrypt.” All of those three builders have similar code, and we wouldn’t be surprised if actors stole each other’s code and made their own builders.

python hypothesis settings

“FuckCrypt” comprises two functionalities: one is “Exe to VBS,” and the other is the PDF exploit.

All the builders have the “same” commands and flow. The only thing different between them is the filenames. Below is their generic command with $+STRING, which shows the differences between them.

The Python builders share similar names with the “PDF Exploit Builder” (supporting only EXE), implying either they were developed by the same individual or that one of the builders was “copied” and developed to another language. A scenario where the code was stolen from .NET and rewritten Python seems more plausible. The similarity in names between “Avict Software” (which supports only EXE) and “FuckCrypt” (VBS) indicates a similar situation of potential code stealing between developers or the same author, as seen in the previous scenario.

Builders Statistics

From the observed filenames in the commands, it appears that the most frequently used builder is the “PDF Exploit Builder” & Python variants. There’s also the possibility that manual commands were added or that additional builders exist beyond those obtained.

Figure 23 - Most Used Builders based on PDF Commands analysis.

Except from the observed Builders, we also discovered a  GitHub  project created on February 13 providing another .NET builder with exactly the same “exploit” commands as the previously mentioned. This same Builder is used by the APT group  APT-C-35 / DoNot Team.

Figure 25 - GitHub Builder

While this “exploit” doesn’t fit the classical definition of triggering malicious activities, it could be more accurately categorized as a form of “phishing” or manipulation aimed at Foxit PDF Reader users, coaxing them into habitually clicking “OK” without understanding the potential risks involved. Threat Actors vary from E-crime to APT groups, with the underground ecosystem taking advantage of this “exploit” for years, as it had been “rolling undetected” as most AV & Sandboxes utilize the major player in PDF Readers, Adobe. The infection success and the low detection rate allow PDFs to be distributed via many untraditional ways, such as Facebook, without being stopped by any detection rules. Check Point reported the issue to Foxit Reader, which acknowledged it and stated that it would be resolved in version 2024 3.

Recommendations

Until the software update is applied, Foxit users are advised to remain vigilant about potential exploitation and adhere to classic defense practices. To mitigate the risks of being affected by such threats, it is essential to:

  • Keep operating systems and applications updated through timely patches and other means.
  • Be cautious of unexpected emails with links, especially from unknown senders.
  • Enhance cybersecurity awareness among employees.
  • Consult security specialists for any doubts or uncertainties.

Check Point Threat Emulation, Harmony Endpoint, and Harmony Mobile Protect provide comprehensive coverage of attack tactics, file types, and operating systems and protect its customers against the type of attacks and the “exploit” described in this report.

  • Exploit.Wins.FoxitExploit.ta.A

Malicious Files:

POPULAR POSTS

python hypothesis settings

  • Artificial Intelligence
  • Check Point Research Publications

python hypothesis settings

  • Threat Research

python hypothesis settings

BLOGS AND PUBLICATIONS

python hypothesis settings

  • Global Cyber Attack Reports

“The Turkish Rat” Evolved Adwind in a Massive Ongoing Phishing Campaign

python hypothesis settings

“The Next WannaCry” Vulnerability is Here

python hypothesis settings

‘RubyMiner’ Cryptominer Affects 30% of WW Networks

python hypothesis settings

SUBSCRIBE TO CYBER INTELLIGENCE REPORTS

Country —Please choose an option— China India United States Indonesia Brazil Pakistan Nigeria Bangladesh Russia Japan Mexico Philippines Vietnam Ethiopia Egypt Germany Iran Turkey Democratic Republic of the Congo Thailand France United Kingdom Italy Burma South Africa South Korea Colombia Spain Ukraine Tanzania Kenya Argentina Algeria Poland Sudan Uganda Canada Iraq Morocco Peru Uzbekistan Saudi Arabia Malaysia Venezuela Nepal Afghanistan Yemen North Korea Ghana Mozambique Taiwan Australia Ivory Coast Syria Madagascar Angola Cameroon Sri Lanka Romania Burkina Faso Niger Kazakhstan Netherlands Chile Malawi Ecuador Guatemala Mali Cambodia Senegal Zambia Zimbabwe Chad South Sudan Belgium Cuba Tunisia Guinea Greece Portugal Rwanda Czech Republic Somalia Haiti Benin Burundi Bolivia Hungary Sweden Belarus Dominican Republic Azerbaijan Honduras Austria United Arab Emirates Israel Switzerland Tajikistan Bulgaria Hong Kong (China) Serbia Papua New Guinea Paraguay Laos Jordan El Salvador Eritrea Libya Togo Sierra Leone Nicaragua Kyrgyzstan Denmark Finland Slovakia Singapore Turkmenistan Norway Lebanon Costa Rica Central African Republic Ireland Georgia New Zealand Republic of the Congo Palestine Liberia Croatia Oman Bosnia and Herzegovina Puerto Rico Kuwait Moldov Mauritania Panama Uruguay Armenia Lithuania Albania Mongolia Jamaica Namibia Lesotho Qatar Macedonia Slovenia Botswana Latvia Gambia Kosovo Guinea-Bissau Gabon Equatorial Guinea Trinidad and Tobago Estonia Mauritius Swaziland Bahrain Timor-Leste Djibouti Cyprus Fiji Reunion (France) Guyana Comoros Bhutan Montenegro Macau (China) Solomon Islands Western Sahara Luxembourg Suriname Cape Verde Malta Guadeloupe (France) Martinique (France) Brunei Bahamas Iceland Maldives Belize Barbados French Polynesia (France) Vanuatu New Caledonia (France) French Guiana (France) Mayotte (France) Samoa Sao Tom and Principe Saint Lucia Guam (USA) Curacao (Netherlands) Saint Vincent and the Grenadines Kiribati United States Virgin Islands (USA) Grenada Tonga Aruba (Netherlands) Federated States of Micronesia Jersey (UK) Seychelles Antigua and Barbuda Isle of Man (UK) Andorra Dominica Bermuda (UK) Guernsey (UK) Greenland (Denmark) Marshall Islands American Samoa (USA) Cayman Islands (UK) Saint Kitts and Nevis Northern Mariana Islands (USA) Faroe Islands (Denmark) Sint Maarten (Netherlands) Saint Martin (France) Liechtenstein Monaco San Marino Turks and Caicos Islands (UK) Gibraltar (UK) British Virgin Islands (UK) Aland Islands (Finland) Caribbean Netherlands (Netherlands) Palau Cook Islands (NZ) Anguilla (UK) Wallis and Futuna (France) Tuvalu Nauru Saint Barthelemy (France) Saint Pierre and Miquelon (France) Montserrat (UK) Saint Helena, Ascension and Tristan da Cunha (UK) Svalbard and Jan Mayen (Norway) Falkland Islands (UK) Norfolk Island (Australia) Christmas Island (Australia) Niue (NZ) Tokelau (NZ) Vatican City Cocos (Keeling) Islands (Australia) Pitcairn Islands (UK)

We value your privacy!

BFSI uses cookies on this site. We use cookies to enable faster and easier experience for you. By continuing to visit this website you agree to our use of cookies.

IMAGES

  1. how to find hypothesis in python for beginners #2

    python hypothesis settings

  2. A Complete Guide to Hypothesis Testing in Python

    python hypothesis settings

  3. hypothesis: Property-based Testing in Python

    python hypothesis settings

  4. 5-minute intro to property-based testing in Python with hypothesis

    python hypothesis settings

  5. Hypothesis Testing in Python: Specifying the Decision Rule

    python hypothesis settings

  6. A Complete Guide to Hypothesis Testing in Python

    python hypothesis settings

VIDEO

  1. Test of Hypothesis using Python

  2. Week 12: Lecture 60

  3. What Is A Hypothesis?

  4. Data Analyst with Python-Hypothesis Testing with Men's and Women's Soccer Matches Project

  5. Assignment 4|Statistical Analysis in Python

  6. NPTEL NOC24-CS20

COMMENTS

  1. Settings

    If True, seed Hypothesis' random number generator using a hash of the test function, so that every run will test the same set of examples until you update Hypothesis, Python, or the test function. This allows you to check for regressions and look for bugs using separate settings profiles - for example running quick deterministic tests on ...

  2. python

    Configuring all tests. To change default behavior for all tests somewhere during bootstrap of your test run (e.g. for pytest it can be conftest.py module) we can define a custom hypothesis settings profile and then use it during tests invocation like. from hypothesis import settings. settings.register_profile('my-profile-name', max_examples=10)

  3. Hypothesis Testing with Python: Step by step hands-on tutorial with

    It tests the null hypothesis that the population variances are equal (called homogeneity of variance or homoscedasticity). Suppose the resulting p-value of Levene's test is less than the significance level (typically 0.05).In that case, the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances.

  4. Automating Unit Tests in Python with Hypothesis

    Using Hypothesis settings for property-based testing of Python code Upping your game: Using composite strategies. So far, the examples I've used are simple. Hypothesis can handle much more complex test cases using composite strategies, which, as the name suggests, allows you to combine strategies to generate testing examples.

  5. Hypothesis Testing with Python

    Hypothesis testing is used to address questions about a population based on a subset from that population. For example, A/B testing is a framework for learning about consumer behavior based on a small sample of consumers. This course assumes some preexisting knowledge of Python, including the NumPy and pandas libraries.

  6. Getting Started With Property-Based Testing in Python With Hypothesis

    We can write a simple property-based test for this, leveraging the fact that Hypothesis generates dozens of tests for us. Save this in a Python file: from hypothesis import given, strategies as st. @given(st.integers()) def test_int_str_roundtripping(x): assert x == int(str(x)) Now, run this file with pytest.

  7. Property-Based Testing In Python: Hypothesis is AWESOME

    Hypothesis is an awesome Python package for property-based testing. In this video I talk about what property-based testing is, and show you a practical examp...

  8. How to Use Hypothesis and Pytest for Robust Property-Based Testing in

    To use Hypothesis in this example, we import the given, strategies and assume in-built methods.. The @given decorator is placed just before each test followed by a strategy.. A strategy is specified using the strategy.X method which can be st.list(), st.integers(), st.text() and so on.. Here's a comprehensive list of strategies.. Strategies are used to generate test data and can be heavily ...

  9. How to Perform Hypothesis Testing in Python (With Examples)

    Example 1: One Sample t-test in Python. A one sample t-test is used to test whether or not the mean of a population is equal to some value. For example, suppose we want to know whether or not the mean weight of a certain species of some turtle is equal to 310 pounds. To test this, we go out and collect a simple random sample of turtles with the ...

  10. An Interactive Guide to Hypothesis Testing in Python

    In this article, we interactively explore and visualize the difference between three common statistical tests: t-test, ANOVA test and Chi-Squared test. We also use examples to walk through essential steps in hypothesis testing: 1. define the null and alternative hypothesis. 2. choose the appropriate test.

  11. Demystifying hypothesis testing with simple Python examples

    The basis of hypothesis testing has two attributes: (a) Null Hypothesis and (b) Alternative Hypothesis. The null hypothesis is, in general, the boring stuff i.e. it assumes that nothing interesting happens/happened.. The alternative hypothesis is, where the action is i.e. some observation/ phenomenon is real (i.e. not a fluke) and statistical analysis will give us more insights on that.

  12. 17 Statistical Hypothesis Tests in Python (Cheat Sheet)

    In this post, you will discover a cheat sheet for the most popular statistical hypothesis tests for a machine learning project with examples using the Python API. Each statistical test is presented in a consistent way, including: The name of the test. What the test is checking. The key assumptions of the test. How the test result is interpreted.

  13. A Guide on How to Simulate and Visualize Hypothesis Tests Using Python

    The Hypothesis The hypothesis forms the base of our experiment based on which the rest of the steps are carried out. Here, we define the null and alternate hypotheses for our experiment.

  14. What Is Hypothesis Testing? Types and Python Code Example

    A null hypothesis is one's default belief or argument about a subject matter. In the case of the earth's shape, the null hypothesis was that the earth was flat. ... Numpy is a Python library used for scientific computing. It has a large library of functions for working with arrays. Scipy is a library for mathematical functions.

  15. Hypothesis

    Hypothesis. Hypothesis is the Python library for property-based testing.Hypothesis can infer how to construct type-annotated classes, and supports builtin types, many standard library types, and generic types from the typing and typing_extensions modules by default. Pydantic v2.0 drops built-in support for Hypothesis and no more ships with the integrated Hypothesis plugin.

  16. Quick start guide

    A detail: This works because Hypothesis ignores any arguments it hasn't been told to provide (positional arguments start from the right), so the self argument to the test is simply ignored and works as normal. This also means that Hypothesis will play nicely with other ways of parameterizing tests. e.g it works fine if you use pytest fixtures ...

  17. Foxit PDF "Flawed Design" Exploitation

    The "Flawed Design". Check Point Research discovered that samples from EXPMON produced unusual behavior when executed with Foxit Reader compared to Adobe Reader. The exploitation of victims occurs through a flawed design in Foxit Reader, which shows as a default option the "OK," which could lead the majority of the targets to ignore ...

  18. Some more examples

    This is just insertion sort slightly modified - we swap a node backwards until swapping it further would violate the order constraints. The reason this works is because our order is a partial order already (this wouldn't produce a valid result for a general topological sorting - you need the transitivity).

  19. Hypothesis Testing Explained (How I Wish It Was Explained to Me)

    I first learned about hypothesis testing in the first year of my Bachelor's in Statistics. Ever since I've always felt that I was missing something about it.. What particularly bothered me was the presence of elements that seemed quite arbitrary, like those "magic numbers" such as 80% Power or 97.5% Confidence.. So I recently tried to make a deep dive into the topic and, at some point ...