For a lot of tasks online, proxies are essential. But when you first start learning about this topic, it can feel overwhelming. In this simple proxy guide you will learn from start to finish everything you need to know to be comfortable with this topic.
What are proxies and what are they used for?
Every time you connect to the internet, your internet connection is assigned an Internet Protocol (IP) address. Without having an address, no one would know where to send the stuff that you request online. Unlike actual addresses though, this can change depending on your internet connection.
Proxies are servers through which you can pass any web request as a middleman of sorts. These web requests can be some critical details such as credit card information that you do not want to expose. Therefore, it isn’t a good idea to use something like free proxies to process sensitive information. That doesn’t mean you can’t use it at all. In certain cases you could be better off using a proxy server. Specially if you have it set up on your own. There are many use cases for proxies.
Even if you use a Proxy or VPN your actual IP address remains the same. But you get to use that middleman to show the world a unique IP address.
Simply put, proxies serve you an internet connection from a different location also known as a proxy server. They act similar to VPNs and let you access data that might be hidden from you. It specializes proxies in distinct ways.
One of the most common use case is to protect privacy. And gain access to an IP connection that belongs to you for a while. This helps out a lot specially if you are dealing with things such as social media that is blocked within certain countries. That do not like it when your IP address keeps changing when you use something like free VPNs.
A few more popular use cases
Web scraping: If you’ve ever been involved in web scraping or were tasked with some project that involved scraping the web you might have noticed that sometimes there just isn’t a way to deal with the constant IP based locks that imposed for requesting too much information. Using proxies specially high quality ones which we will discuss about in a bit completely bypasses those limits. Having one extra proxy to scrape from could speed up the entire process two times over.
Monitoring usage: often big companies and universities use proxies to monitor and block certain sites off of the network. There could be some video sharing or entertainment websites that your employers might not want you to be looking at during work hours. So they may log the data of when you access this site without blocking you out. Or block you out and notify you.
Security: Depending on the type of proxy that you are using if you set up everything correctly, you can create a proxy server only accepts and processes encrypted data. All of this has to be set up though. And you shouldn’t confuse proxies made specifically for high security with those which are just meant to scrape data.
Managing Social Media: If you are a Social Media Marketing (SMM) agency or something of that sort, having quality proxies is crucial. This way every single social media profile that your agency manages can be operated using a new IP address. And preventing any security triggers caused due to overlapping of IP between multiple social media accounts.
Competitive Analysis: Advertising is a very competitive industry, especially in some niches. Therefore, the usage of proxies to do some competitive analysis is pretty common. Many solo advertisers and marketing agencies will employ multiple profiles. Each assigned to different proxies to gather data on the competition.
Types of proxies
Two major categories of proxies are Free and Paid. Everything else comes under this. For the most part you will want to work with paid proxies for some of the reasons mentioned blow. But theres nothing stopping you from experimenting with the free ones.
Any proxy that can be found for free of cost comes under this category. They can be of any quality and any type. You never know what proxy type you’re getting unless you test it. Once you start working with free proxies and scraping them, you will notice over time that there are thousands of free proxies available for you to use.
If you’re using them for something like web scraping go ahead. But if you’re planning on using them for anything that requires you to reveal your personal information, you could be in trouble.
If something is free, there is almost always going to be some sort of a catch. Here since a proxy server is the middleman it can not only steal your credentials but also replace ads and even inject more ads and malicious code into pages.
That being said, free proxies are still worth trying out. Especially if you are going to be working using tools that don’t require you to load scripts. Or to enter any sort of private data. Often if you can filter them properly, you might notice free proxies can work for a lot of things that you previously might have thought as impossible. I’ve run many throwaway social media accounts using just free proxies. And there are probably many people who use them in mass.
Free proxies have some drawbacks but since the investment cost is literally zero. you can find some quality free proxies and things can get very interesting. The only thing you need to keep in mind though is that since they are free, you won’t be the only one using them.
Any proxy that has to be purchased and is off limits to proxy scrapers comes under this category. There are some services out there that scrape free proxies and sell them but if we can find them for free to begin with, it doesn’t really change the fact that they aren’t really paid proxies. Generally we consider paid proxies to be better than free proxies since there is a consistency and you don’t have to search for good proxies like searching for a needle in a haystack like you have to do with the free ones. Still though sometimes especially if you’re buying data center proxies free proxies that you filter for quality can be better.
Where paid proxies really shine is when it comes to all other types of proxies specially residential and 4g mobile. The main quality of paid proxies is the consistency. You can use and as long as you renew your subscription you will always have the same IP address assigned to you unless you ask for a change or specially select a service that changes frequently like the rotating proxies.
There are two main options that you have when you purchase they will be either Shared or Dedicated. Shared proxies are shared amongst 3-6 people and “dedicated” are, well, dedicated to you.
So you might wonder what’s even the point of using shared proxies? The answer is simple. For some tasks you don’t really mind if your IP is being shared with a few more people. In most cases shared proxies cost less than half as much as the dedicated counterparts.
HTTP/HTTPS and SOCKS 4/5
Any proxy that makes use of the HTTP protocol falls under the HTTP/HTTPS category. They can be recorded and even if there is just a small chance it still exists especially with public free proxies.
With socks however this just isn’t technically possible because SOCKS cant read data. Not just that but they are also usually faster than the HTTP/HTTPS counterpart. There are a lot more things that can be said about these two protocols but that’s out of the scope of this guide.
Whenever you visit any public restaurant or a hotel and connect to the public wifi for the first time they greet you with a page that asks you to register and/or accept some terms of service. Once the user follows through with this request, it connects them to a transparent proxy which then routes the traffic to the actual destination. This destination now knows that you are behind a proxy.
Transparent proxies like the name suggests are transparent and can prevent access to certain sites based on the rules to which the server has been set.
Another one of the reasons for having transparent proxy servers is for caching data. Lets say you have a hundred people on your network and all route data through a transparent proxy server. This server can now store all the data that the users browse and serve them directly without ever having to process the complete request.
When it comes to proxies for scraping or automation though, transparent proxies almost always belong to the free camp and are considered low quality.
Anonymous proxies can either be free or paid. This is the type of proxy which is considered as top quality. The original IP of the user can’t be traced because the proxy server sends no information about the original issuer of the request. One of the few ways that most services even go about tracing these sorts of proxies is by using publically available blacklists of free proxies.
This is one reason why you can sometimes find those gold nuggets of proxies while you are scraping them for free. Still though since these proxies are publically available, they get blacklisted quick.
If you are having trouble with users using anonymous users and if your blacklist just doesn’t seem to work, a good idea would be to create your own auto updating blacklists.
Majority of the proxies that you can find online within a few searches when you search for the term such as “buy proxies” will be data center proxies. They are very cheap to create and to buy. This can be a good thing or a terrible thing depending on what your objectives are. In the recent years however data center proxies have received a lot of poor reputation because of just how easy it is to blacklist them in mass. Another one of drawbacks with using data center proxies is that since they are so close together on the same subnets linking every proxy together becomes a lot easier.
One of the good things about these type of proxies is that they are usually fast. And don’t have limits on usage.
They are also much cheaper than buying a VPN. So you could grab one of these proxies for like a buck or a shared proxy for a few cents. And browse from a completely different part of the world easily.
When it comes to block rate of proxies residential proxies have the lowest ratio. This is because any proxy that qualifies for residential proxy isn’t in way tied to a data center. The meaning of this proxy type is in its name itself.
We call any IP address that is issued by an ISP as a residential IP. There are various companies that emply a various of techniques to bring you these types of proxies. Sometimes a residential person might lease out their IP address. Or it could be part of some sort of a free software that instead of asking for a payment utilizes your IP address as a proxy. There some shady ways that some companies go about accquiring them too but thats for an whole another topic.
Why choose residential over datacenter?
As mentioned earlier this proxy type has the lowest block rates. In some cases you simply cannot use data center proxies because they are completely blacklisted by many platforms. Do blacklists exist for residential proxies? Sure but they are much more difficult to blacklist.
Mobile 3G/4G LTE
Currently mobile proxies proxies are considered as the crème de la crème of all proxy types. Which rightly makes them more expensive than any other proxy type. The key reason for this is that social media companies and literally any platform that you can think of that has protection against the usage of proxies absolutely love mobile IP connections.
There are many reasons for this but the biggest one is that majority of the world right now connects to the internet from a mobile device using a 3G or 4G/LTE connection. Since data centers cannot reproduce 3G or 4G/LTE connection they have to be set up in certain ways.
If you have a 3G/4G LTE connection on your phone, you too can set up a mobile proxy of your own. In the coming topics we will discuss this. When you use a mobile proxy, the vector of automation detection by virtue of IP address suddenly disappears. A single connection of this proxy type usually provides you with tons of IP addresses since you get assigned a new one every time you connect to your cell tower. This means even if your particular mobile proxy gets blacklisted you can fetch a new IP within seconds.
Scraping free proxies
There are two ways you can get free proxies, The first is use websites which already scrape the proxies for you or you can use libraries which can aid you in scraping your own proxies.
Using proxy list websites
Here are a few free proxy list that you can use to grab proxies right now.
- openproxy.space – This site has a great UI and is very easy to navigate. You find various types of proxies neatly divided into different categories. They update their lists daily and have a large variety.
- spys.one – This site has been very popular since it has been so consistent for so many years. They have proxies for almost every single country available and they update the lists daily. With an average collection size of over 20,000 it is good.
- proxy-list.download – With an amazing UI and a good sorting feature, this site is quite a pleasure to the eyes. The lists are decent in size and its one of the best proxy list sites out there.
A lot of these proxy list sites will mention that they have already been checked and verified that they work. But still You never know with free proxies so if you will use them always use a checker and filter out only the proxies that work and are sufficiently fast. You can run a check either programmatically or using websites such as checkerproxy.
Programmatically using multiple sources
You can use any language to get this done for the purpose of this post ill be using Python along with ProxyBroker.
- Python 3.5 or higher
Install the latest release by using:
pip install proxybroker
To find 50 working HTTP/HTTPS proxies run this script:
import asyncio from proxybroker import Broker async def show(proxies): while True: proxy = await proxies.get() if proxy is None: break print(proxy) proxies = asyncio.Queue() broker = Broker(proxies) tasks = asyncio.gather( broker.find(types=['HTTP', 'HTTPS'], limit=50), show(proxies)) loop = asyncio.get_event_loop() loop.run_until_complete(tasks)
You can have a look at more examples using this library here proxybroker.readthedocs.io there are some neat things you can do such as specifying the speed or the country from where you want to scrape them.
Setting up your own mobile proxy
For this to work you will need at least one more connection other than your primary internet connection prefrebly a 3G/4G LTE connection and access to a windows OS so you can test this out for yourself quickly. If you use linux please have a look into squid here’s an amazing tutorial on this software by phoenixnap.
CCProxy is easy-to-use and powerful proxy server. CCProxy can support broadband, DSL, dial-up, optical fiber, satellite, ISDN and DDN connections, it helps you build your own proxy server and share Internet connection within the LAN efficiently and easily.
We will be using ccproxy on windows to set up our first mobile proxy.
Begin with tethering your mobile’s 3G/4G LTE connection to your PC. Now we need to figure out whats the local IP of our mobile connection. Usually this will be something like 192.168.1.100. But just to be sure go to your command prompt and enter ipconfig /all and you will figure out what is your mobile’s local IP.
Once you have the local IP of the mobile head over to the options menu.
Here you select if you want to proxy all connections or only specific ones by selecting them. Lets say you have 5 mobile connections connected to your PC you can select exactly which one you want to use as a proxy. 0.0.0.0 stands for all. You can also change the port that it uses. But we can leave it at 808 for now. Once you’re done with that head over to the advanced section and go to the networks tab.
Here, you will want to select Enable multiple IPs outgoing and deselect disable external users. Hit okay and everything should be ready.
Fire up your firefox browser and chrome or any other browser side by side. In firefox set your browsers proxy to the proxy we just created using local IP. In my case this would be 192.168.1.100:808. Then just go to google and type “what is my ip” in both the browsers and hit search.
Now we can use this mobile proxy however we want. You could run your extra social media pages or use it for scraping the possibilities are endless. Mobile data in some countries can be expensive though but it still does its job and is often worth it.
How platforms detect you are using a proxy
In most cases all it really boils down to is blacklisting. Lets say you are using a proxy from a datacenter and some proxies have IP addresses which look like this 18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52.
It becomes very obvious and easy for a blacklisting agency if there are even just two ips out of the 256 which perform automation activities to blacklist all ips on that subnet. Most times, blacklisting is as simple as looking at the ISP. Some ISP’s are well known to be spammy and have a poor reputation.
Another way that a platform can detect if you are using a proxy is by looking at the variations of requests that you send. Lets say you are using a social media automation tool which tells the server that you are using “Device A” and here is a browser fingerprint which includes details such as version of the app, user-agent strings, time zone, resolution and many little details we often overlook.
With all of these details if you use the same proxy with various accounts or during various requests that each produce a different fingerprints it becomes very easy to flag that particular proxy and the entire subnet if it occurs multiple times on the subnet.
Dealing with the leaks
There are two main ways through which you could be leaking your real IP address. The one that strikes out and that most people are unaware of is the WebRTC leak vulnerability. If you use something like selenium or any sort of browser that supports WebRTC your real IP can get leaked easily.
You might ask yourself if it’s this bad, why don’t I just turn it off? The problem with that is if you turn off a feature that most of the browsers have on by default it just puts a mark on your browsers and Proxy. And it becomes very easy to just assume all requests being sent by this IP are done so through automation. A better way to prevent this would be by reporting a fake value when a server requests it or at the very least prevent it from getting the real IP. We can achieve the latter by loading our automation browsers using extensions such as WebRTC leak prevent.
The other way a leak might happen is by analyzing the browser fingerprints and/or blacklists. Up until a few years ago most platforms only relied on blacklists to detect if the IP that was requesting data from their servers was a bot. Now however they use more sophisticated ways and the detection techniques will keep developing as automators too keep developing.