Tag Archive for: Data Extraction

Installing Firefox & The iMacros Addon + Speed Tip’s

The video below takes you through downloading Firefox and the iMacro’s extension, plus some speed tips to improve the performance of your macros in Firefox.

 

Installing Firefox & The iMacros Addon Video

 

 

Helpful Links

Data is the Life Blood of an eCommerce Business

Introduction

Firstly let’s quickly identify whom this article aimed at and why on earth you should even consider reading the rest of this article.

I’ve personally spent over four years at 3rd party software providers working with businesses that have got themselves into operational & data issues that could have been avoided when it comes to marketplaces such as eBay & Amazon.

Some businesses have had no tool prior to moving into them and some have done but have made fatal mistakes along the way. Such mistakes have cost them untold amounts of wasted resources. This article is is aimed at helping you identify such issues as early on in the process as possible and to help you understand that data is the lifeblood of an eCommerce business.

New Businesses / Proof of Concept
This article is aimed at a brand new business that has just started exploiting the marketplaces for the first time, after all you need to have proof that the channels work and you need to start from somewhere.

Manual processes at this stage are a “good thing”, there is so much to be learned and as such the ‘manual’ entry experience is absolutely required, so that the people in the business understand the requirement for a tool as soon as possible and that they’re able to scale their operation.

Take the points on board now regarding data and look to the future of your business.

Businesses that have proven the marketplaces work for them.
This article is also aimed at businesses that have already proved that the marketplace channels are viable for them and are paying attention to them each day to manually load on products to the channels and most importantly processing orders from them too.

There is one exception to this, this is a business that has carried out substantial intelligence on the marketplaces and needs to leapfrog the basics and move straight to a large scale model as quickly as possible. If this is the case for your business, then you need to know how valuable data is and why you should retain it when it comes to the popular marketplaces.

As Your Business Grows

As the business grows bigger and bigger, normally by working (harder) on achieving one or all of these factors:

  • More inventory
  • Better priced inventory
  • More staff within the team
  • Adding more sales channels
  • Higher quality descriptions
  • Better order management
  • Lower operational costs
  • More profit
  • etc…

The cracks within the starting phase start to show very quickly, some identify this early on, some never identify it at all. Some unknowingly start to tackle this issue, but don’t actually realise why.

These could be through micro-management of staff & processes, after all most small businesses are rarely started & run by highly experienced managers, but by normal people and haven’t had the training or experience in dealing with such issues and I know personally that when I ran my own business, this is a trap that I got myself stuck in (and one that I’ve done my darn hardest to help others realise).

Putting internal management part aside and the other points above, what a small business owner(s) rarely tend to recognise is that “data” is one of core the reasons why they’re able to operate a business at all.

Data in a Business is Like Blood in a Human

Without blood, we’re pretty much screwed. The same as data within a business. Data is the lifeblood of the business.

This also ports to non-eCommerce businesses too, however is amplified by the very nature of the “e” part in eCommerce.

This article is entirely aimed at helping you understand that when you’re processing your orders in say eBay’s Selling Manager Pro, loading inventory to Amazon via spreadsheets or Seller Desktop, you are dealing with data. Your businesses data.

So when you’re ticking the boxes to print the orders out in SMP, using the sell your item form on eBay or adding items manually into Amazon using the add a product link. You are working with data, data that you should have absolute control over. It is after all the blood of the business.

Also the data that you are creating is exceptionally valuable to the business, the most obvious data is inventory data, however the not so obvious is the order data, that provides a unique history and insight to what the business has done.

So the moment you manually add inventory data into a marketplace like eBay or Amazon or process one or more orders in their web interfaces, you are potentially giving away the largest asset of the business to a process that you do not fully control or own and in the case of Amazon, you cannot get that data back out easily either.

An Example of Bad Data – Amazon

Let’s take Amazon as an example. It’s not unheard of for a business to have spent hundreds of hours loading thousand upon thousand of inventory records onto Amazon manually. And of course, this most likely had some huge positive effect to the business, namely orders & profit.

However what has been happening is that the business has been building Amazon a superb product database, that the business no longer has ready access to. You’re unable (either easily and certainly not officially) to export the product data that was originally created. While at the same time, Amazon are using that data to market not only your businesses data more effectively to more customers, but to other businesses too.

If you are manually creating products on Amazon using the web interface, stop immediately.

(More on this later in this article)

An Example of Bad Data – eBay

eBay UK LogoThe data requirements of eBay are huge. Lets taking the clothing categories for example. In recent updates the pressure to include attribute data as part of the listing process has become mandatory (along with many other categories too).

This compared to Amazon has positive effects, eBay are then able to leverage this data and then allow customers to drill down on their searches and ultimately makes a better buying experience and ultimately more sales for the business.

However, those attributes are now locked into eBay’s platform and getting them out again is painful and will cause a massive mop-up job of cleaning the junk out to make it “portable” later.

If you are manually creating products on eBay using the web interface, stop immediately.

(More on this later in this article)

Ironically in both examples it’s your data. Once it’s been entered into the said platforms, getting it out again in a format that is reusable is in some cases just not possible.

I’ve only ever worked with one company that elected to completely ditch all their product data to start again afresh. What they did make of it a second time around with a structured approach was truly amazing, but out of the hundreds of businesses I’ve worked with, this was the exception, rather than the rule.

What is more common is for businesses to try and do is to extract their product data and bring it into a 3rd party tool. In doing so (as I mentioned above) causes an absolute mess to try and sort out. The descriptions that come back from eBay are the full descriptions, so the data held within them is difficult to impossible to extract. Item specifics can be lost completely and any attribute data that formed variations can also be lost too.

In a recent conversation, it was mentioned that a business had tried using an import tool from a 3rd party software provider and the mess that was left over, not only went wrong multiple times, but was in such a state they were forced into abandoning it completely.

This is exactly the situation that you should avoid.

Step in the Saviours

It does not matter what software tool you use and this could of course be a combination of tools. The critical factor is that you have access to both the inventory data and sales data externally, outside of the marketplaces themselves.

To spell it out very clearly:

The moment you suspect the business is going to work long term, employ 3rd party software that is not created by the marketplaces as soon as possible or if it is, you can easily gain access to the original data.

Frankly I do not care what software it is and neither should you (to a point, that point normally being cost), as long as it allows you to enter the product data outside of the marketplaces and retain this data.

As far as sales orders go, having the ability to process them externally is normally a huge positive to the business, however in the case of Amazon, you are able export the sales data for current or later use, where as eBay, no real export and sales data can be lost after 45 days or so.

There are providers that can cost several thousand pounds over a year are not suited towards smaller businesses, however there are plenty of options out there and again the core criteria outside of cost, is:

  • Access to the raw product data
  • Access to sales data

Other requirements such as stock control, order management in the case of this article are either assumed or luxuries.

Free Examples

We all like free and here are two free options for both eBay & Amazon.

Amazon
If we take the Amazon marketplace as an example here, if you stop entering data manually and start using the import sheets they provide for the inventory data, because you’re storing this on one of the computers you have access to, after sending it to Amazon to create the product, keep the sheets, because you can use it at a later date, either to update the records that were in the file or to transfer the data to another marketplace/sales channel.

eBay
If you’re looking for a free option for eBay, then for product data eBay’s File Exchange this covers both product creation, updates and also order data too. However can be clunky and in massive accounts, can take several hours to process requests (I tried this on an account with over 10,000 live listings, it took 5 hours even for a basic export, on a smaller scale it would be adequate though).

Other Providers
There are a multitude of providers that can offer superior options outside of this, they are not all SaaS models (where you pay them a commission on sales) and if your requirements are low, then community extensions to popular website open-source products can be free or very inexpensive (covering these is not a concern for this article, just that they exist).

Ultimate Goal For Your Data

However the ultimate goal in regards to your sales and inventory data is that you have access to them and you can then re-use the data. I call this “portable data”, data that you can use as you need to, where ever that may be.

Sticking with the example approach of this article:

Lets say you start a business on eBay or Amazon, you see the signs that its going to go well and do put some form of software in place to maintain control and ownership of your data, you can then grow and grow as the business dictates. You can move software providers relatively easily and if you want to add in new sales channels, such as a website or another marketplace, with a few tweaks to the data you already have access to, you can do. The other option of not doing this, is frankly nasty.

Summary

Data either product or sales related is the life blood of an eCommerce business.

It needs to be “clean“, it needs to be “yours” and it needs to be “portable“.

If you keep bashing data into marketplaces and don’t retain a copy, then you’re basically capping the potential of you and your business. And that’s something neither of us desire.

Website Data Extraction/Scraping & Form Filling Expert

Due to demand, this service is only available to existing clients and is no longer available.

Website Data ScrapingWeb scraping can be hugely advantageous to businesses, allowing them to function more effectively and keep up-to-date with information that is on specific websites more frequently and accurately.

This is especially true when you consider the applications that can be created can be run by numerous members of staff on an ad-hoc basis or even automated everyday at certain times or that they allow the access to complex data from suppliers for more effective merchandising or keeping their internal systems updated more frequently with stock and pricing information.

This can be also a very quick process too, taking only a few hours to complete most projects and then only taking just seconds for small projects to run and depending on the complexity & speed of the users connection to the internet.

I have several years experience with web scraping over many projects and requirements. In this article I cover the details of extraction in more depth and include examples where suitable. If you have a project in-mind, contact me today with your requirements.

I specialise on small & medium scale scraping projects, such as extracting data from supplier websites for product information, stock & prices updates.

However I can readily tackle multi-tiered extractions and also create clean data from complex situations to import into 3rd party applications with little to no input from the user.

Not only can most data be extracted from most websites, the data can also be posted to websites from data files such as CSV.  This could be form filling for job applications, listing products on to websites or online dating requests, not just extracting product, service or article data from a website.

If you can do it in a web browser, then it can most likely it can be automated.

The possibilities are almost endless.

If have a project in mind, Contact Matthew today, it could be completed in just a few hours.

 

Getting The Edge With Data Extraction

Using automated tools to grab or post data to the web could trim hours off each day or week. Extracting the latest stock & prices from suppliers could mean higher profitability and less back-orders. It could even mean reams of data from suppliers websites and give your business the edge over your competitors.

It doesn’t matter if its behind password protected content, if you can “see it” in your web browser, chances are it can be extracted. If you’re entering data manually into website forms, chances are high that it can be automated too.

I’ve worked on numerous projects where clients have been able to ensure that they’re back-office tools are up to date as possible with the latest information from suppliers and even allowed businesses to work with suppliers they’ve never been able to do with before, because the requirements to extract data from supplier websites has been too restrictive either due to time or cost.

Knowing what your competitors prices are can be a huge advantage when it comes to pricing especially in the eCommerce environment we have today. If you’ve got the data and they can be matched to other sites, within one click and a few minutes, the latest pricing information from competitors could be yours. As many times as you want, whenever you want.

Scraping & data extraction can solve this in a cost effective manner. One script, used over and over. Anytime you want by however members of staff you have.

If you want the edge, Contact Matthew today.

 

The Required Tools Are Free

Using two free applications, the first is the Firefox web browser and a free add-on called iMacros, simple to very complex web automation can be completed.

This allows completed projects to be run by the owner using free-to-use tools, so that any extraction or processing can be run by the owner or staff members as many times as they require and however often they require.

Also extract processing can be obtained using JavaScript to process complex data inputs or extracted data from websites. I cover this in more detail in the “extra data processing” section.

Don’t worry if you’ve never used either of these before, if you’ve used a web browser and can press a button, its that simple. I’ll help you get started and its very easy to do. I also include instructional video’s to get you set up. It’ll take no more than 10 minutes.

Simple Extraction

In this scenario, data elements from a single page can be extracted and then saved to a CSV file.

Example:

This could be a product detail page of a TV and the required elements, such as:

  • Product title
  • Price(s)
  • Stock number
  • Model number
  • Images
  • Product specifics
  • Descriptions
  • Reviews

Are all extracted and then are then saved to a CSV file for your own use.

The time it takes to make a simple extraction of data from a single page varies greatly, this is because the data on the page can sometimes be very poorly formatted or if there are lots of fields that need to be extracted this can take quite some time.

If have a project in mind, Contact Matthew today, it could be completed in just a few hours.

Extra Data Processing

Extra processing can be applied to the extracted data before saving to a CSV file. This is very handy when you only want or require cleaned data to be saved. Most of the time its obvious that cleaning is needed and basic cleaning of the data is included in the macro.

The quickest way of identifying any processing you require on extracted data is to provide an example file of how you would like the final data to look like.

Example:

If one of the extracted fields was a complex data field such as and email address held with other data in JavaScript, such as this:

<script language=”javascript” type=”text/javascript”>var contactInfoFirstName = “Vivian”; var contactInfoLastName = “Smith”; var contactInfoCompanyName = ” REALTY LLC”; var contactInfoEmail = “[email protected]”; </script>

Instead of including the extra information in the export, the email address can be identified and only that data field is extracted. Or if all the data held in the JavaScript is required, this could be split into separate columns, such as:

First Name,         Last Name,        Company Name,              Email Address
Vivan,                   Smith,                   REALTY LLC,                        [email protected]

Also, if the data needs to be formatted for import into a 3rd party application, such as ChannelAdvisor, eSellerPro, Linnworks or a website application, this isn’t a problem either. I’m exceptionally competent with Microsoft Excel & VBA and can help you leverage the gained data and format it in a complete solution that requires the least amount of input from you or your staff.

Even if you have basic requirements or highly complex Contact Matthew today, your data extraction project could be completed in just a few hours and fully customised to your business requirements.

Paginated Extraction

This can vary from site to site, however complex extraction could involve navigating several product pages on a website such as search results, then navigating to each product that is in the search result and then processing a simple extraction or a complex extraction on the products detail page.

Example (property)  – Website: Homepath.com

In this example, not only is the requirement is to extract the data found for a specific property; it is also required for ALL the search results to be extracted.

This would involve extracting all the results and then navigating to each property page and extracting the data on the property detail pages.

The time taken to extract the data from such pages varies on both the number of property results to go through and the amount of data that is to be extracted from the property details page.

Example (products) Website: Microdirect.co.uk

In this example similar to the properties, the requirement is to extract the data from each of the product pages, but to also to extract the product details pages data for all the pages in the search results.

The macro would navigate through each of the page results (say 10), identify each of the products, then one-by-one work its way through the products, saving the data to a file.

Need data from pages & pages of a website? Not a problem, Contact Matthew today, it could be completed in just a few hours.

Ultra Complex Extraction

These normally consist of a requirement of data to be processed from a CSV file, then external processing & scraping by the macro and then possibly depending upon the results, further processing or scraping is to be completed. Such projects are normally very complex and can take some time to complete.

Working with multiple tiered drop down boxes (options) fall into this category, as normally by their very nature can be complex to deal with. It’s also worth noting that is possible to work with multiple tiers of options, for example, when making one section, the results cause sub-options to appear. Sites that need image recognition technologies also fall into this category.

However it’s easier to explain an example rather go minute detail.

Example

For this example, you have a CSV file that has a number of terms that need to be searched for on a dating website, once these searches are made, the details are saved and then it is required to contact/email of the persons through another form.

The macro will make intelligent searches for these terms and the matching results (these are likely to be paginated) are saved to a separate file. Then for each result that was saved, the macro will then are then sent customised contact messages through another form found on the same or different website.

Do you feel your requirements are complicated or the website you’d like to extract from or post to isn’t simple? Contact Matthew today, I’ll be able to let you know exact times & can create the project for you at a fixed cost.

Saving Data & File Types

Extracted data is normally saved as CSV files. The data is separated by commas “,” and will open in Microsoft Excel or Open Office easily. For most applications using a comma will work perfectly.

However sometimes, the data that is extracted is complex (such as raw HTML) and using a comma as the separator causes issues with leakage when viewing in Microsoft Excel or Open Office, this is when using other characters such as the pipe “|” comes in very handy to separate the data fields (eg title and image).

The separator can be any single combination of characters you wish, some common examples are:

  • Comma “,”
  • Tab ”      “
  • Pipe “|”
  • Double pipe “||”
  • Semi-colon “;”
  • Double semi-colon “;;”

It will be quite clear from the onset which separator is required either from the data is being extracted or the projects requirements. If you have any special requirements, please discuss this beforehand.

XML or SQL insert statements can also be created if desired, however this can add several hours onto projects due to its complexities.

File types an issue? I can pre-process data files before-hand in other applications id needed.  Contact Matthew today, it could be completed in just a few hours.

Speed of Extraction/Form Filling

As a general rule, the projects I create run exceptionally fast, however there are two factors that will limit the speed of them:

  • The speed of the website being interacted with
  • The speed of your connection to the internet

You can also make project scripts run much faster by ensuring that the following options in your iMacro’s settings are turned exactly the same as those shown below.

You can find the options page shown below by clicking the “Edit” tab on the iMacro’s side bar, then pressing the button called “Options”.

Imacros Option Panel

Even if you above looks complicated, its not. Instructional video’s are included and I’ll make it exceptionally easy for you. Contact Matthew today, it could be completed in just a few hours.

Exceptions & Un-Scrape-able Pages

It is important that your processing requirements are discussed before hand with examples, so that I can confirm whether or not automated scraping will suit your requirements. In most cases it will do, but sometimes it’s just not possible.

In some cases, it is not possible to extract data from pages over & over due to:

  • A poor ‘make up’ of the page
  • Inconsistent page layouts
  • Page structures that vary enormously from one page to another
  • Use of flash or excessive use of AJAX
  • User code ‘capture’ boxes (like recapture)

When this happens, then the only consistent method of extracting data from such pages is by a human and scraping will unlikely be suitable for your requirements. This is rare, but does occur. If I identify this (and I will very quickly), I’ll let you know ASAP.

I am unwilling to work with data of questionable content. The below above are just common-sense really, I’ve added them for completeness.

  • Adult orientated material (porn is a no, services are a no, ‘products’ are ok)
  • Sites that are focused towards children
  • Identifiable records on people such as medical records (order related data is fine if they are yours).
  • Most government sites
  • In situations where I suspect the data will be grossly miss-used for fraudulent or illegal purposes.

Unsure on what your requirements are or just not sure if web scraping is the right way forwards for your business requirements. Contact Matthew now, I’ll be able to help you and turn it into plain English for you.

What Are the Limits of Extraction/Processing?

Most normal limitations are caused by very large or very deep page requirements of a project. That doesn’t mean they’re not possible, just that it could take some time to code for and also for to run each time by you.

The projects that I create suit smaller scale situations, such as one off extractions or extractions that need to be run by the owner over and over, such as on a daily basis to collect the latest product & pricing information from a supplier.

The real limitations come in to force when the requirements are for huge scale extraction, such as hundreds of thousands of records or exceptionally complex and exceptionally deep extractions. This is when using tools such as Pyhon , C++, Perl or other languages that allow spidering of websites would be more suitable.

This is not a speciality of mine, however due to my experience with scraping, I can assist you with project management of such projects with 3rd party contractors. Contact Matthew now if this is what you need.

Anonymity & Use of Proxies

If you need to keep the activities of such scripting hidden to remain anonymous, then this can be achieved on small scale projects using free proxies with no interaction from yourself.

In larger or more repetitive situations then either I can help you setup your browser to use free to use proxies (can be unreliable at times) or in most cases I’ve found leveraging inexpensive a services that are very easy to use and most importantly reliable.

If this is a concern for you, don’t worry I’ve done it all before. Contact Matthew now if this is a requirement for your project.

Do you provide ‘open’ code?

For ‘small’ or ‘simple’ macros, yes the code is open and you or your development team are able to edit as required.

However for some complex or ultra complex macro’s the code is obfuscated due to the extra functions that are normally included. This is non-negotiable as I have developed custom functions that allow me to uniquely deal with complex situations of data extraction & posting.

Is Web Scraping Legal?

The answer to this can be both yes and no depending upon the circumstances.

Legal Data Extraction
For example if you own the site and data that is being extracted, then you own the data and you’re using it for your own purposes. If you gain permission beforehand, for example from a supplier to extract data from their website, this is also legal.

I have worked on projects where an employee has left a company and there is no access to the back-ends/administration consoles of websites and the only way of obtaining the data held on the site is by scraping. I’ve done BuddyPress, WordPress, PHP-Nuke, phpBB, e107 & vBulletin sites previously to name just a few.

Also I have completed many projects where product data is extracted for use by a business to obtain up-to-date pricing and stock information from suppliers websites, along with extra product & categorisation data too.

Illegal Data Extraction
Because the macro’s are run on your or staff computers, scenarios outside of where the sites are owned or permission has been granted, fall into your discretion.

I cannot be held responsible for any legal actions that may proceed from your actions of running such scripts on 3rd party websites. As such I strongly recommended that you contact the 3rd party to seek consent and check any privacy or usage policies that they may have prior to extraction.

Contact Matthew

If you’ve got a clear idea on what you’d like done or just even if you’re just not sure if its even possible, Contact Matthew today and I’ll be able to tell you if it is possible, how long, how much and when the completed project will be with you.