|ECommerce Website Designer & Solution Provider
We take care of following things before making any E-commerce site
By integrating with the external data such as demographics, consumer and household data and business data such as marketing, sales and product management a broader picture and more accurate solutions can be arrived at. The diagram shows a possible architecture for the web house.
Building a successful data warehouse itself is a challenging task and building a data mining model on the data poses lots of challenges, starting from the understanding of the business problem, data preparation to the building and deploying the mining model. Web poses specific challenges in terms of cleaning, transforming and loading the data for the purpose of analysis, as normally 90% of the click-stream data is of not much importance from analytic perspective.
List of possible challenges:
Calculation of the Dwell time for a content page. The time spent by the visitor on a particular page provides a good measure showing the interests of the visitor. Direct ways are not available to calculate the dwell time of a visitor on a page.
Identification of a User Session. A visitor can be characterized by studying his browsing behaviour in a session, which is a collection of web based transactions related by time. Computing the start and end of a session is a complex process.
Managing E-commerce Website Structure Information: The structure of the web site is an important information. With the continuous changes in creating and maintaining electronic documents, there are multiple challenges in the ETL process for loading and maintaining the web site structure. The challenges include handling dynamic pages, handling ancillary pages, extracting page title and category and handling frequently changes in the pages served in the web site.
Let us look into each of the problems at a detailed level and more importantly how to address them.
Identification of the Origin of the Visitor Web is the most anonymous thing on the earth and the web site visitors want to be anonymous. It is a great challenge to discover the personalities of these anonymous visitors based on their behaviour during the time they interact with your web site, and capturing enough information to do so without infringing into their privacy. There are four levels in which a user can be identified.
o Based on Visitor's IP Address
o A persistent identifier for that session only
o A persistent identifier that lets you know the same web browser on a particular computer has returned for a repeat session
o A persistent identifier that lets know the particular human being has returned to our web site
Based on the Visitor's IP address get the country rather than the person name. It is better to know at least the country of the visitor instead of anonymity. Knowing the country of the visitor provides with opportunities to personalize the web site for his needs as well in gaining the browsing behaviour of the person with respect to the local time of the user.
The IP addresses are allocated dynamically by the Internet Service Providers (ISPs) to their customers. The IP address is not the unique way to identify a web site visitor. There are databases maintained for each part of the globe which gives the country, contact person of the ISP, his mail-id, phone number, fax number, IP address allocating authority and the route to the IP address etc. This helps to identify the part of the globe from which the visitor is originating.
A persistent identifier for that session only can be passed through URLs, hidden fields or session identifiers. This will help avoid the problem of proxy servers. But only current session can be recorded No way of tracking repeat visits and the browser Caching. Clicking of the back button is not recorded in the web server log. This makes it impossible to have a complete map of users actions. A possible solution for this could be the use of No-Cache tags in the HTML content
A persistent identifier that lets know the same web browser on a particular computer has returned for a repeat session can be implemented through persistent cookies stored on the client machine. The cookie is a record placed on a user's PC by a web browser in response to a request from a web server. The cookie contents are specified by the web server and can only be read from the domain that has specified the cookie. This provides a way to identify the machine from which the user is accessing the net and not the user. The problems with cookies are that the user might have disabled the cookies. Even if the cookies are enabled the user may delete it at any point of time.
A persistent identifier that lets know the particular human being has returned to the web site is normally implemented via access through user/password. Online forms like registration or preferences for customisation are an excellent source to link customers to clicks generated by them. By far, it is the most effective method of gathering visitor information. Online forms also have problems. It is believed that when asked for their name on an Internet form, men will enter a pseudonym 50 percent of the time, and women will use a pseudonym 80 percent of the time. It is not preferable to ask the user to fill in the form while he is visiting the site for the first time, as it can be repulsive.
Calculation of Dwell Time Dwell time is the time spent by the visitor on a content page. It is an important measure of the relevance of the content for the user and effectiveness of the page in attracting the visitor. The dwell time can be calculated by finding the difference between the 2 content page requests and subtracting the time required to load the content page and the ancillary files from the value. But the time required to load streaming media files like real audio and mpeg may not be considered for the dwell time computation. In this case, the dwell time is to be computed using the beginning of the streaming media download regardless of whether the rest of the content if fully downloaded.
Identification of User Session The start and end of a user session is to be identified in order to analyse the user behaviour in a session as well as for measuring the effectiveness of the design of the web site in keeping the visitor for more time in the site. This also helps in identifying the various entry pages through which a visitor enters the web site and effectively design these pages by providing links to other pages and putting appropriate ad-banners in the pages depending upon the context. Any page in a web site can be the entry page for a visitor as key word search in search engines can lead the visitor to any page in the web site. The identification of a session also helps in identifying the most popular exit pages, which could be the session killers. Identifying the session killers and effectively redesigning them may keep the users in the web site for more time. But there is no direct way to identify the start and the end of a user session.
Managing E-commerce Website StructureInformation Web sites may serve static or dynamic pages or a combination of both and each page served may contain or have links to different types of files or documents, images, multimedia, embedded scripts, etc. Pages can be static html documents or can just consist of a template and an Application Server can serve the content for the different components of the template. The type of files served may change frequently. Many new pages can be added on a daily or weekly basis and the old pages may be superseded. According to its purpose, the files may have a classification like Company information, Product catalogue, Technical support, Ordering page, etc. Content pages may have page titles, which will be required for analysis and it should be extracted and loaded to page dimensions.
Dynamic pages. Pages can be generated and served dynamically based on the parameters given by the visitor in a previous page. A dynamic page can consist of a template with different components and the content for each component can be generated dynamically based on a given set of parameters. The page used will be the same but the content served will be different at different instances of time. Storing all the instances of the dynamic page will drastically increase the size of the page dimension.