Saturday, August 31, 2019

Professional Values for the BSN Student Essay

Nursing students as training professionals in the field of hospital has to have the ability to handle ethical issues within the said industry. Undeniably though, it is essential enough to consider this particular matter when dealing with healthcare professions such as nursing. True, being a nurse requires one to become highly involved in different human operations and are thus more susceptible to issues that are related to the said situations. Consequently, the said healthcare professionals are required to learn different levels of ethical concerns even during their training years yet. This particular training ensures them of the capabilities that they have to at least manifest their personal concern for their patients as well as their professional standing for the performance of their duties towards their clients.   Ã‚  Ã‚  Ã‚     Being a nurse itself requires hard work and perseverance in treating patients from different ages, genders and situations. According to the Department of Nursing Education, applicants for the nursing job must have certain characteristics that will help them give the needed medication for the patients they care for. The said characteristics particularly involve that of the five major values of professionalism that must be given full attention by nurses in performing their duties to the public. The five values include Altruism, Autonomy, Human Dignity, Integrity, and Social Justice. First to be attended in discussion is that of the value of autonomy and altruism which suggests that individuals be able to perform at a minimum level of supervision provided to them my their administrational officials. Among these characteristics is being able to functionally use the five major senses in a fast paced demand of performance. Another is being able to observe a client at a certain distance, also being able to use muscular movements along with the functions of the major senses in coordination. In short, a nurse who would deal with patients in the emergency room must be able to do multi-tasking to become an asset nurse to the medical team.   Ã‚  Ã‚  Ã‚   Human dignity and integrity is also conscientiously considered within the process as the situations are being dealt with by the nurses in their profession. This includes the basic application of the major rules of nursing in actual operations in handling the health cases of the patients who are in need of careful assistance. As an addition to these attitudes or characters, the basic understanding of a nurse on the hospital rules regarding visitation and family involvement in the medication is also a vital factor that must be considered. According to Laura Marco, â€Å"Family life is vital to a person’s health†; this is why family participation must always be considered as a necessary agent in medicating any patient. In short, family participation must be continuous in order to achieve the â€Å"total care therapeutic surroundings.†   Ã‚  Ã‚  Ã‚   One of the key factors also considered in the abilities that a perfect nurse should acquire is being able to demonstrate utilization of intellectual strength. This means that a nurse must be able to dispose off good judgment even when he is faced with exceptionally complicated and serious medical cases. A nurse must be able to promptly complete all responsibilities such as attending to the care of each patient brought in the emergency room. To become an asset to the medical team, a nurse should be able to think in a mature way and tolerate â€Å"physically taxing workloads† and yet still function at the best performance possible. Stress is always a part of the nursing job. Professionalism is also an important factor to be considered in nursing. In emergency situations, it should be preempted by the nurse to receive negative and demanding commands from the patient or the relatives as well. Remaining calm and continuously focused to aiding the patient should keep the nurse remains in his professional characteristics. In this way, instead of making the situation worse, the attending nurse would even be able to help the relatives and the patient himself to remain calm in the middle of a strenuous situation. This particular value actually involves the consideration that each nurse places on their responsibility to pose social justice in their work. Professionalism involves the consideration that each nurse palaces on the ethical standards that they are placing as part of their professional basis of competency at work.   Ã‚  Ã‚  Ã‚   Yes, being a nurse has never been an easy task, and being a part of a medical team to help patients during emergency cases is much more demanding. But the strong and ideal efforts of a nurse placed in his job could help him attain the required attributes that a perfect nurse should have. His determination to continuously improve and develop his skills in becoming the perfect nurse is among the key concepts to help him attain his best in his chosen career. References: Lewis, Sharon M. Medical-Surgical Nursing: Assessment and Management of Clinical Problems, Single Volume.(1999). Brooklyn New York. Potter, A. Fundamentals of Nursing. (2000). San Francisco California. Barnes, S. Fundamentals of Nursing: Concepts, Process, and Practice, Seventh Study Guide Edition. (2002). Chicago Press. Chicago.   

Friday, August 30, 2019

Knowledge management and intellectual capital Essay

Knowledge is something that comes from information processed by using data. It includes experience, values, insights, and contextual information and helps in evaluation and incorporation of new experiences and creation of new knowledge. People use their knowledge in making decisions as well as many other actions. In the last few years, many organizations realize they own a vast amount of knowledge and that this knowledge needs to be managed in order to be useful. â€Å"Knowledge management (KM) system† is a phrase that is used to describe the creation of knowledge repositories, improvement of knowledge access and sharing as well as communication through collaboration, enhancing the knowledge environment and managing knowledge as an asset for an organization. Intellectual capital is considered as a key influencer of innovation and competitive advantage in today’s knowledge based economy. Knowledge management helps in obtaining, growing and sustaining intellectual capital in organisations. This paper focuses on how knowledge management and intellectual capital helps the organization to achieve their goals and as well as the relation between these two concepts. Key words: knowledge management, intellectual capital, organizational goals, benefits Introduction: Knowledge is something that comes from information processed by using data. It includes experience, values, insights, and contextual information and helps in evaluation and incorporation of new experiences and creation of new knowledge. Knowledge originates from, and is applied by knowledge workers who are involved in a particular job or task. People use their knowledge in making decisions as well as many other actions. In the last few years, many organizations realize they own a vast amount of knowledge and that this knowledge needs to be managed in order to be useful. Knowledge management is not one single discipline. Rather, it an integration of numerous endeavours and fields of study. Knowledge management is a discipline that seeks to improve the performance of individuals and organizations by maintaining and leveraging the present and future value of knowledge assets. Knowledge management systems encompass both human and automated activities and their associated artifacts. So, what is Knowledge? Knowledge is a fluid mix of framed experience, values, contextual information, expert insight and institution that provides an environment and framework for evaluating and incorporating new experiences and information. From this perspective, knowledge management is not so much a new practice as it is an integrating practice. It offers a framework for balancing the numerous of technologies and approaches that provide value, tying them together into a seamless whole. It helps analysts and designers better address the interests of stakeholders across interrelated knowledge flows and, by doing so, better enables individuals, systems and organizations to exhibit truly intelligent behaviour in multiple contexts. The reasons why companies invest in KM are that it either gives them a temporal effectiveness or efficiency advantage over their competitors, or they do it to try to negate the competitive advantage of others. For the purpose of this research, KM is defined to include the five fundamental processes of: (1) Knowledge acquisition (KA) (2) Knowledge creation (KC) (3) Knowledge documentation (KD) (4) Knowledge transfer (KT) and (5) Knowledge application (KAP) These five KM processes are not necessarily sequential but rather iterative and overlap. The effective management of knowledge necessitates a thorough understanding of the relationships not only among the KM processes themselves but also between the KM processes and the intellectual assets of an organization. Intellectual capital (IC): Intellectual capital can include the skills and knowledge that a company has developed about how to make its goods and services. It also includes insight about information pertaining to the company’s history; customers; vendors; processes; stakeholders; and all other information that might have value for a competitor that, perhaps, is not common knowledge. Intellectual capital is therefore, not only organizational knowledge, it is also industry knowledge. It is the combination of both cognitive knowledge and intuitive/experience-related knowledge. Intellectual capital is known for creating innovation and competitive advantage in this knowledge based era. But knowledge management plays a dominant role in obtaining, growing and sustaining intellectual capital in organizations which implies that the successful implementation and usage of KM ensures the acquisition and growth of Intellectual capital. Organizations should deploy and manage their IC resources in order to maximize value creation. The IC term was first introduced by Galbraith (1969) as a form of Knowledge, intellect, and brainpower activity that uses knowledge to create value. Since then, different views of IC have been emerged. For instance, view IC as a knowledge that can be converted into value. IC as the aggregation of all knowledge and competencies of employees that enable an organization to achieve competitive advantages. In addition, IC is defined to include all non-tangible assets and resources in an organization, including its processes, innovation capacity, and patents as well as the tacit knowledge of its members and their network of collaborators and contact. In spite of its multidimensionality, this research conceptualizes IC as consisting of three basic interrelated dimensions: Human capital (HC) Organizational (or structural) capital (OC), Relational (or customer) capital (OR) Human Capital encompasses attitudes, skills, and competences of the members of an organization. Organizational Capital includes elements such as organizational culture, routines and practices, and intellectual property. Relational Capital, however, includes relationships with customers, partners, and other stakeholders. The investments in Human Capital, Organizational Capital, and Relational Capital are expected to increase the value of an organization. The management of intellectual capital involves: Identifying key IC which drive the strategic performance of an organisation. Visualizing the value creation pathways and transformations of key IC Measuring performance and in particular the dynamic transformations. Cultivating the key IC using KM processes The internal and external reporting of performance Knowledge management and Intellectual capital: IC and KM serve different purposes and include the whole range of intellectual activities from knowledge creation to knowledge leverage. IC and KM as a set of managerial activities aiming at identifying and valuing the knowledge assets of an organization as well as leveraging these assets through the creation and sharing of new knowledge. KM and IC are believed to be closely coupled. When KM activities are used to develop and maintain IC, it becomes a resource of sustainable competitive advantage. On the other hand, when IC is properly utilized and exploited, it increases the absorptive capacity of the organization, which, in turn, facilitates its KM processes. Knowledge can add value to organizations through intangible assets such as Intellectual capital. Conceivably, the socialization, externalization, combination, and internalization (SECI) model is a more fitting theoretical foundation for understanding the KM-IC relationship. The SECI model outlines different interactive spaces (Ba), in which tacit knowledge can be made Explicit. The IC components (e.g. HC, OC and RC) represent the input for the knowledge creation process in the SECI model, and its main output takes the form of commercially exploitable intangibles. The four processes of the SECI model involve not only knowledge creation and utilization but also the other KM components including knowledge transfer, knowledge documentation, and knowledge acquisition. Knowledge transfer (sharing) is the common factor of the four processes of the SECI model. Socialization facilitates the conversion of new tacit knowledge through shared experience, which allows the less communicated knowledge to be communicated. Therefore, the socialization processes involve knowledge transfer. In addition, externalization is the process of articulating tacit knowledge into explicit knowledge, which can be shared by others. In the combination and internalization processes, knowledge is exchanged and reconfigured through documents, meetings, or communication networks. Effective execution of the SECI processes can generate different types of IC. Socialization involves the accumulation of HC, OC, and RC by sharing and transferring experiences through joint activities. Also, the conversion of tacit knowledge into explicit knowledge through externalization creates and accumulates OC. Combination creates knowledge structures in the form of systemic, institutionalized knowledge (i.e. OC) that can be directly disseminated and distributed. Internalization, on the other hand, accumulates HC and RC through learning by doing. Review of Literature: Francis Bacon has emphasized on importance of knowledge management in organizations with his famous phrase â€Å"knowledge is power† (Muller-Merbach, H. 2005). The strategy that considers knowledge along with other resources such as land, work and capital as an asset is knowledge management (Nonaka and Takouchi, 1995). Dell (1996) believes that knowledge management is a systematic approach for finding, understanding and applying of knowledge in order to create knowledge. According to Simon (1999) knowledge management is intelligent planning of processes, tools, structures and etc with the purpose of increasing, restructuring, sharing or improving of knowledge application that is apparent in each of three elements of mental capital, i.e. structural, human and social. Some of the clear-sighted believe that knowledge management is not a technology (Clair Guy, 2002; Lang, 2001; DiMatta, 1997; Koenig,2002; McInerey, 2002). This process helps organizations to be able to use their assets, work faster and more wisely and obtain more capital (Shawarswalder, 1999). An increased attention is focused on KM and IC management in the organisation. In the last decade there has been a shift in management focus from traditional accountancy practices where financial capital is paramount, to growing realisation that intangible assets are of greater significance in our knowledge-based economy (Egbu et al 2000, 2001). Knowledge can be a valuable resource for competitive advantage and harnessing its value is one of the pre-eminent challenges of management. Identifying and exploiting knowledge assets, or intellectual capital (IC), has been vastly documented. There are different types of knowledge in an organisation from the tacit knowledge of individuals, which is unarticulated and intuitive, to explicit knowledge that is codified and easily transmitted (Nonaka and Takeuchi, 1995). Further distinctions have been made by academics and practitioners involved in the IC debate. Three components of IC have been identified comprising human, structural and customer capital (Edvinsson, 2000; Bontis, 1998; Bontis et al., 2000). However, it is asserted that the human capital in an organisation is the most important intangible asset, especially in terms of innovation (Edvinsson, 2000; Stewart, 1997; Brooking, 1996). Marr et al. (2003) argue that KM is a fundamental activity for growing and sustaining IC in organizations. Bontis (1999) posits that managing organizational knowledge encompasses two related issues: organizational learning flows and intellectual capital stocks. Organizational learning, as a part of KM (Rastogi, 2000), reflects the management’s effort to managing knowledge and ensures that IC is continually developed, accumulated, and exploited. A thorough review of the relevant literature and discussions with targeted researchers in the field would suggest that the development of successful knowledge management programmes involve due cognisance of many factors. Compilation of data: Knowledge Management consists of managerial activities that focus on the development and control of knowledge in an organization to fulfil organizational objectives. The knowledge sharing takes place in the organizations in two ways, explicit and tacit. The knowledge management seem to in two tracks as dynamic process or static object. Depends on how individuals understand what knowledge is and their aims both intellectual capital and knowledge management actors thus emphasize either the static or the dynamic properties of knowledge. Measuring the knowledge management is growing area of interest in the knowledge management field. The metrics are being developed and applied by the some organizations, but limitation of current measures is that they do not necessarily address the knowledge level and the types of value added knowledge that individuals obtain. The intellectual capital is most valuable asset it brings intellectual capital firmly on to the management agenda. The sum of everything everybody in organization knows that gives a competitive edge in the market place. The individual intellect effect more attribute of an organization. The intellectual capital characterizing as Intellectual material that has been formalized, captured and leveraged to produce the static properties of knowledge are inventions, ideas, computer programs, patents, etc., as Intellectual Capital also include human resources, Human Capital, but emphasize that it is clearly to the advantage of the knowledge firm to transform the innovations produced by its human resource into intellectual assets, to which the firm can assert rights of ownership. The measures for intellectual capital in use: 1. Value extraction 2. Customer capital 3. Structural capital 4. Value creation 5. Human capital Components of intellectual capital: Human capital indicators Structural capital indicators The knowledge management community needs to be responsive to the needs management in the organization by trying to adequately measure the intellectual capital and assess the worthiness of the knowledge management initiatives. Developing metrics and studies for measuring intellectual capital will help to consolidate the knowledge management field and give the discipline further credibility. Applying of knowledge is very important to the supply chain design and operation. Intellectual capital and knowledge management principle helps to enterprise supply chains. Knowledge management is formalizes approaches to understanding and benefiting knowledge assets at the firm level. The drivers which maximizes the enterprise supply chains Operational efficiency Opportunities to better service customer and stakeholders need A spring board for innovation A foundation concept in the field of intangible assets that is important for practice in that there are two dimension of knowledge, explicit and tacit. Next we develop these ideas further by interleaving intangible and traditional firm assets. Later we indentify the special characteristics priorities for the four generic supply chain models The intellectual capital approach: Intellectual capital comprises all the nonmonetary and nonphysical resources that are fully or partially controlled by the organization and contribute to the value creation. Three categories of intellectual assets are organizational, relationship and human. Strategies to manage knowledge: 1. Operational excellence 2. Design excellence Conclusion: Hence we would like to conclude that this paper has considered the importance of knowledge management and intellectual capital to organisations. Knowledge management practices differ from organisation to organisation. Organisations are at different stages in the knowledge management trajectory. Organisations ‘learn’ at different rates and apply different techniques (formal and informal) in managing knowledge. In the study on which this paper is based, there is a general consensus that the management of knowledge assets is vital for business. Knowledge Management and Intellectual Capital should be integrated to maximize organizational effectiveness. However, the relationship between KM and IC is complex and so is its management. In order to effectively manage such a relationship, it is imperative to understand where and how the accumulated IC is reflected in managing KM activities in organizations. The management of knowledge and intellectual capital provides opportunities for project creativity and innovation. However, the effective implementation of knowledge management in organisations depends on many factors, which includes people, culture, structure, leadership, people and the environment. In most organisations, there is a lack of appropriate formal measuring constructs for the measurement of the benefits of knowledge assets to organisational performance. Managers operating in the knowledge economy are required to be â€Å"knowledge leaders,† who must be aware of the relationship between knowledge and those who possess it in order to successfully fulfil their leadership responsibilities. Based on the findings of this research, managers in the organizations are expected to develop strategies, adopt structures, and construct systems that effectively coordinate and integrate the efforts aiming at managing knowledge, human resource, and customer relationship in order to enhance knowledge flows, accumulate IC, and create and sustain business values. References: Intellectual capital and knowledge management: A new era of management thinking?- Jodee Allanson Reconfiguring knowledge management – combining intellectual capital, intangible assets and knowledge creation – Tomi Hussi Intellectual capital and Knowledge management effectiveness Bernard Marr, Oliver Gupta, Stephen Pike, Goran Roos. Developing knowledge management metrics for measuring intellectual capital – Jay Liebowitz Influence of KM and Intellectual capital on organisational innovations – Charles Egbu, Katherine Botte rill and Mike Bates

Pamantasan ng Lungsod ng Marikina Essay

There are lots of effects which DotA brings to our society, to be particular, the youth. With no doubt this game is one of the hottest game in the market. In every Cyber Cafe you can see gamers stick with their screen and mouse and  keyboard, with their face that full of concentration and excitements finding ways to defeat their opposing team or enemies. Actually Computer games or DotA serves as a platform for youth to communicate. Teenagers who initislly don’t know each other can easily become friends through Computer games or playing DotA. Chapter I Introduction DotA effects have been continue for several years since the launch of the War of War Craft and the Frozen Throne. Almost every people especially the youth has played this game at the very beginning of the years. What is the content that DotA offers to the gamers? And how’s the excitements that brings towards the gamers. And you may ask a key question, how long will DotA effects last? The lifestyle of the youth who have been playing Computer Games especially DotA is affected by the game. There are both advantage and disadvantages for them. Let’s talkl about the advantages first. As one of the most playable games online, DotA can make the players become alert in the mental, They will also turn to be strategic and cooperative through computing the magic, damages, gold, physical reduction, present and other stuff, they will get more lore in mathematics. Thus the youth can also get some benefits on playing Computer games. Statement of the problem or Thesis Statement Why is Computer games or DotA so Addicting to Students? 1. A Time killer Boredom is the most common problem of most people today. DotA can consume a lot of time without you even noticing it . you just say after the game â€Å"WTF !, Im late !† 2. Non-exhasuting game Unlike basketball or other physical sports , you can play DotA until you can still mangae to sit, look at the monitor, use mouse and keyboard and think, Yes , using your is aslo tiring but it’ll takes an average of 3 games before you’d want to take a rest 3. Fame Source Most players want to be the best in this game to gain fame which I find natural but technically nonsense. I have to admit that thirst for fame drove me to practice and improve my game. After getting the fame I wanted, I asked myself â€Å"Now what?†. For players who don’t have plan on having DotA as their profession ,Fame isn’t that important. 4. Team work game – When we were kids, we already love having team battles. That’s why a lot of team sports games cames up and multi-player computer games have been invented. Playing with teammates is more addicting than playing alone. 5. Tranquilizing DotA makes you forget your problems and make you think of simpler problems (like how to win the game) 6. Non violent war We love wars. That’s why there are shows like Wrestling, UFC ,Action films , etc. DotA is a chance to engage in wars safely. We can fight all day long and just stand up from our computer without even having a scratch on our face. 7. Easy to play Surveys shows that DotA is played by more people thatn other strategy like Starcraft. One of the reason is its simplicity. You only have to control one hero (great news for people who are not into doing micros.) 8. No height or physical disadvantages In basketball you cannot have a team composed of 5 short players. In Rugby , you should be muscular. In DotA you can be as thin and as short as you want and still own everyone. 9. Losing makes you thirsty to win On the other hand, losing is still addicting because you become more urge to have that wonderful feeling of winning. 10. Winning feels Good Yup, winning in every game makes you feel good and addicting. Background of the Study Significance of the study How to overcome DotA Addiction ? 1. Accept responsibility The problem lies within the individual, not within DotA. No attempt at beating addiction can succeed until the individual accepts its existence. 2. Identify the Impact How many hours a day do you spend playing DotA? Do you normally go out on the weekend?When was the last time you read a book?Identifying the negative impacts of the addiction will help you focus on positive improvements and getting back the things that you really are missing. 3. Avoid Blame Blaming others for problems that you alone must face does not solve the problems. 4. Set limits If you decide that you have 1 hour per day to spend playing DotA. Since DotA requires many hours of gameplay to have fun, you likely should consider a different game or different genre of games. 5. Stay positive Be positive whenever possible. While negative reinforcement is sometimes necessary, positive reinforcements will always go further in the end. What is DotA ? DotA is basically game expanded from version of War of War Craft, which initially a strategic game similar to Red Alert Series. But eventually evolved into current state. Gamers can play Dota in a Wide range ; from single player, local LAN, or LAN over internet connection with various country’s gamers. There contain variety of Heroes to be chosen as your character, There are many type of gaming modes , and different type of map for the game modes. There are lots of gamers who are more expert in the world of playing DotA . Chapter V Summary and Conclusion Summary and Conclusion We know that Playing computer games especially DotA brings bad effects to the students, they influenceed by the other gamers like â€Å"trash talks† they know how to gamble because they are playing DotA for â€Å"pustahan†. DotA really affects the lifestyle of the youth who are into this game, although it has one good benefit, but it corrupts the mind and the way the youth think. It also weakened the body system, money and moral values were not given importance because of this game.

Thursday, August 29, 2019

Early americas history Essay Example | Topics and Well Written Essays - 750 words

Early americas history - Essay Example This paper will analyze two of the articles available in the book during a period when slavery was rampant in the south (Johnson, 2012). One of the articles is ‘Plantation rules’ written by Barrow as a code of regulations that he wrote down. The article appears in one of the plantation journals written during a time when slavery of black people was the order of the day. The article presents us with clear picture of the rules that a black slave adhered to and the level of ownership that the owner felt. The document depicts what was happening in the American past at around 1852. Barrow wrote down this article with his black slaves in mind because he expected them to understand precisely his expectations for as long as they worked under him (255-256). From the article, some facts become clear about the period of 1800’s in the United States. During this time, blacks worked for white landowners as slaves. In addition, the owners of land perceived slaves as their proper ty. Therefore, they formulated rules that governed the entire life of the salve. The article highlights the restrictions that the slaves went through on the farms. They worked all day long and had to acquire permission in order to engage in any extra activity. The owner of the slaves controlled their movements. Through this, he intended to ensure that slaves did not interact with other slaves fro m other farms. He was aware of the potential reactions of black slaves. If they met too often and without control from their owners, slaves were likely to stage a rebellion. From the article, a reader realizes that black slaves did not have an opportunity to enjoy their rights as free individuals. They received minimal allowances for their work. They often worked for long hours. In addition, the owner limited the development of relationships of the slaves forbidding them from marrying from a different farm. The article highlights the plight of slaves in the southern states in ancient Americ a. It tells the facts from the owner’s point of view without altering and it presents reliable information of the fate of Negroes who ended up in white farms (258). However, the article does not reveal the story from the slave’s point of view. However, it provides the reader with an opportunity to experience the attitude and power exercised by slave owners in ancient America. The article highlights the core issue that led to racial issues between the whites and blacks. The article is the ‘confession of Nat Turner’ officially published by Ruffin Gray. Gray was the lawyer to Turner, a slave who was responsible for the organization of a slave revolution in one of the southern states, Virginia. Turner had been a slave who could not bear the conditions that surrounded slavery and called upon other slaves to raise a rebellion against the whites who were continually oppressing the slaves and overworking them. Turner’s confession targeted the entire America n public at that time. He was in jail and felt took the responsibility of narrating his reasons and contributions in organizing the revolution. He made his confessions to Gray, the lawyer who published the confession (259). From his account, it becomes evident that slavery in the southern states was very rampant. In addition, the article elaborates how the salves perceived the situation they were going through. Turner claims that he sought to organize a revolt with a divine motivation. He described

Wednesday, August 28, 2019

Nursing research Essay Example | Topics and Well Written Essays - 250 words - 7

Nursing research - Essay Example On the other hand, components of a research study involve  logical flow since one step leads to the other step as a researcher builds on the  previous  step to progress with research. In effect, careful planning of the study will ensure that the researcher addressed expected limitations effectively, which eliminates the possibility of unexpected variables affecting the direction of research. It is common knowledge that study designs are plans that indicate the process of collecting data, the research subjects, and the process of data analysis in order to answer the research questions. In line with this, researchers should select data collection instruments carefully and ensure that the instruments passed the reliability and validity tests in order to ensure that the results were beneficial to the nursing practice. In order to establish the validity and reliability of the instruments of research, it is important to carry out reliability and validity tests. During the tests, a reliable research instrument will produce the expected results from a research study while an unreliable tool will not produce the expected results. In effect, the instrument will not be valid, and a researcher should find other tools that will be reliable and

Tuesday, August 27, 2019

Systems Media Table Assignment Example | Topics and Well Written Essays - 1500 words

Systems Media Table - Assignment Example For instance, it can be used by small business, house record or as a phone directory The Hospital information system records manage, stores, manipulate and display the patients, medicine, doctors, beds and other resources record. Furthermore, the hospital management system is helpful in complete handling and running the administrative, financial and clinical operations of the hospital. Specialty information system technology offers a lot of different types of services. Specialty information system offers services for a lot of areas for example in proprietary or specialized processes connected to IT applications (e.g. project management, systems planning, network administration, database design, systems integration, network engineering, helpdesk support etc.). Administrative information systems offer facilities and supports in business/enterprise-wide requirements managing, maintaining and implementing the human resources and administration to finance, budgeting, payroll, research, time and effort reporting, etc. The main purpose of the operation support system is to deal with the telecom network based supporting processes like that provisioning services, maintaining network inventory, managing faults and configuring network components. A documentation system is a set of computer programs that is utilized to keep track and store electronic documents. The documentation system is also used to manage and handle the images of paper documents. Basically, these systems are used by organizations, business, and institutions for basic content management. In addition, these systems are used with the incorporation of digital asset management, enterprise content management (ECM) systems, workflow systems, document imaging, and records management systems.

Monday, August 26, 2019

Analysis and summary report of findings Essay Example | Topics and Well Written Essays - 750 words

Analysis and summary report of findings - Essay Example of helping residents in NG7 reduce their household costs- such as heating bills with the aim or converting the fuel and food expenditure to other household basic necessities. The NEST project’s approach entails the promotion of awareness local fuel and food through open talks and educational outreach which can act as a catalyst in development and financial support food and fuel crisis. Conservation and improvisation are some of the methods that can be used to reduce domestic expenditure. The NEST Project focuses its concerns on domestic income and expenditure with the analysis of the NG7 community. One of the major issues is the rising of electricity and water bills in relation to their impacts on domestic expenditure and conservation practices. a research survey for local NG7 community residents indicates that the rising gas , electricity and water bills creates a negative impact on the domestic expenditure as less money is set aside to spend on basic necessities such as food, clothing and security (Henry, 2010). In this case, a large number of residents strongly agree with the analogy that rising gas, electricity and water bills reduces the budget expenditure of the basic necessities of a household. NEST project research survey for local residents in NG7 portrays that rising gas, electricity and water bills directly affect the health and wellbeing of many families. A majority of the residence in NG7 area agree with this notion according to NEST project research survey for local residents in NG7. Family health and wellbeing largely depends on the family income and expenditure. In cases of lack of balance between the income and the expenditure rate, families tend to suffer from health and wellbeing issues as many people ignore social and health responsibilities2. Therefore, inadequate fund leads to poor health and wellbeing in most families in NG7 area. One of the common ways used to cut down the expenditure rate in a household is the use of DIY tools. The

Sunday, August 25, 2019

The positive, beneficial functions of IR law Essay

The positive, beneficial functions of IR law - Essay Example 114). The losses suffered from World War I and World War II caused the international community to review international laws and this brought about the creation of the United Nations, a body that is charged with upholding international laws and preventing such conflicts from reoccurring. For instance, the UN peace keeping missions have brought about sustainable peace in conflict regions, like in the Ethiopia – Eritrean war and the conflict in Darfur, Sudan. If international laws were absent in such cases, it is most likely that there would be never ending conflicts in such parts of the world. International laws also regulate the conduct of states that have competitive advantage over others with regard to commons. Likewise, these laws are useful in terms of protecting the position of disadvantaged parties in such situations as Hoffmann (1968) has pointed out (p. 115). It would be difficult for landlocked countries to have access to sea ports if the international laws were non-existent. It is therefore great that the United Nations Convention put in place the Law of the Sea which provides landlocked countries with access to sea ports and therefore allows them to trade competitively. In the end, international law allows disadvantaged states end up with a fairly level playing field similar to that of their more advantaged counterparts. Additionally, Hoffmann (1968) emphasizes that international laws allow for the gathering of support from the international community on matters of interest (p. 115). International laws recognize that a states sovereignty does not necessarily translate to the protection of laws and upholding of values. Human rights and environmental conditions can easily be victimized when a states activities go on unquestioned or unchallenged. The reason for the invasion of Iraq by the United States, although highly criticized, was argued to be a mission to disarm the Saddam Husseins regime off weapons of

Saturday, August 24, 2019

Chicano studies Essay Example | Topics and Well Written Essays - 1750 words

Chicano studies - Essay Example These movies need close introspection and a thorough cognition would definitely enable us to deep delve into some of the important socio-economic and cultural discourse of the time with a good taste of aesthetic operating throughout the film. Comparison of two films occurs from some mutual paradigm. Portrait of Teresa by Pastor Vega and Salt of the Earth by Herbert J. Biberman both the films are contemporary and are based on contemporary socio-economic issue of a similar geographical terrain. Both the films encapsulate a strong feminist discourse and centres round the deconstruction of archetypal stereotype traditional and conventional role of woman in society. Portrait of Teresa directed by Pastor Vega was released in the year 1979 and apparently seems a trajectory of women with much dramatic presentation. But the language of camera pushes its limit beyond the initial portrayal of Teresa overwhelmed with her family which comprise of her husband Ramon and three children and her job as a crew leader in the textile factory to a realm where she moves beyond the ordinary role of a household woman trying to seek the attention of her husband and becoming expert in mere domestic duties to a revolutionary and a dominant motivating factor in labour movement (The Internet Movie Database, â€Å"Retrato de Teresa (1979)†). Teresa moves beyond the parameters of odd jobs and dirty dishes and her husband failing to accept her in the new role get separated and start an affair. When her husband wishes to reconcile, Teresa asks him what if during the time of separation she also had an affair. Block-headed Ramon fails to pass Teresa’s test with his chauvinist reply â€Å"But men are different† and with if he loses Teresa forever who with her head held high in self-esteem courageously wishes to move beyond the limits of an ordinary woman performing only her household duties (Rich, â€Å"Portrait of Teresa Double Day, Double Standards†). On the other hand, the film Salt of the

Friday, August 23, 2019

First Amendment freedoms Essay Example | Topics and Well Written Essays - 2000 words

First Amendment freedoms - Essay Example The Bill of Rights consists of the first ten amendments, which contain procedural and substantive guarantees of individual liberties and limits upon government control and intervention. The First Amendment, perhaps the best known of these freedoms and protections, prohibits the establishment of a state-supported church, requires the separation of church and state, and guarantees freedom of worship, of speech and the press, the rights of peaceable assembly, association and petition. While some Supreme Court justices have declared that First Amendment freedoms are absolute or occupy a preferred position, the Court has routinely held they may be limited so as to protect the rights of others (e.g. libel, privacy), or to guard against subversion of the government and the spreading of dissension in wartime. Thus, the Court's majority has remained firm - the First Amendment rights are not absolute. Only two Supreme Court justices, Justice Hugo Black and Justice William O. Douglas, insisted the First Amendment rights are absolute and their dissenting opinions fell to the wayside. Most court cases involving the First Amendment involve weighing two concerns: public vs. private. Also, the Supreme Court has often defined certain speech, also known as "at risk speech," as being unprotected by the First Amendment (Corwin 56). Freedom of speech and expression is not a luxury of democracy, but it should be recognized as a necessity. In order for a democratic form of government to function and continue to exist, it must have free expression and educated criticism. Most of the development of the United States' free society has come about because of public debate and disclosure, in both oratory and written form. The First Amendment was written because at America's inception, citizens demanded a guarantee of their basic freedoms. Without the First Amendment, religious minorities could be persecuted, the government might well establish a national religion, protesters could be silenced, the press could not criticize government, and citizens could not mobilize for social change. When the U.S. Constitution was signed on Sept. 17, 1787, it did not contain the essential freedoms now outlined in the Bill of Rights, because many of the Framers viewed their inclusion as unnecessary. However, after vigorous debate, the Bill of Rights was adopted. The first freedoms guaranteed in this historic document were articulated in the 45 words written by James Madison that we have come to know as the First Amendment. The Bill of Rights - the first 10 amendments to the Constitution - went into effect on Dec. 15, 1791, when the state of Virginia ratified it, giving the bill the majority of ratifying states required to protect citizens from the power of the federal government. First Amendment Speech and Provision was absolutely rigid by original intent, higher than modern standards (indeed unreasonable by modern standards) and not coincident with eighteenth century perceptions of the proper extent of the right to publish or speak freely. The strongest piece of new evidence involves the unofficial reporter who sat close to the Speaker of the House of Representatives. Near the end of the first session of the First Congress, after the drafting of the First Amendment (then third) but before submission of the amendment to the

Thursday, August 22, 2019

Chapter 3 - Neurology Clerkship Thesis Example | Topics and Well Written Essays - 3000 words

Chapter 3 - Neurology Clerkship - Thesis Example Taking into account the level of students in the course and the clinical nature of the course work, content that focused on practical clinical knowledge was made a priority for the eBook (M Nilsson, Nilsson, Pilhammar, & Wenestam, 2009). It was decided, however, that the content of the book must match the delivery of the content. Clinical students are expected to demonstrate proficiency through their ability to apply content learned in the classroom to patients themselves. Because of the nature of the medical profession, clinical students must demonstrate deep analytical skills in which the student must diagnose and provide treatment for such diagnosis (Malau-Aduli, et al., 2013). Therefore, since the eBook was developed for such student stakeholders, it was agreed upon to develop an eBook that delivered content through an explanatory and problem-based learning model. The development of the eBook occurred after the subject matter experts compiled the content of the eBook. The content was broken down into two main sections: 1) Neuroanatomy section and 2) Pathology section. The Neuroanatomy section of the book centered its content on the review of the anatomy of the brain. Subject matter experts provided multiple Computer Tomography Scans (CT Scans) and Magnetic Resonance Imaging Scans (MRI Scans) of healthy brains. This information was essential for students to progress adequately through the book, for much of the terminology and the images throughout the eBook would refer back to this section (Cotter & Cohan, 2011). This section also served as a glossary for those who needed to review basic information. The Pathology section introduced students to neurological ailments. The content for this section provided students with the information required to learn specific neurological ailments they might experience in the clinical setting. Subject matter experts, understanding that this section prepared students for their clinical rounds, provided real world

Antibiotic Sensitivity Lab Essay Example for Free

Antibiotic Sensitivity Lab Essay Why is it an important feature of antimicrobial agents? Selective toxicity is the ability of a chemical or drug to kill a microorganism without harming its host. Selective toxicity is important to microbial agents because it enables these agents to inhibit or kill a microorganism by interacting with microbial functions or structures different from those of the host thereby showing little or no effect to the host. B. What are broad and narrow spectrum antimicrobials? What are the pros and cons of each? Broad spectrum antimicrobials are drugs that are effective against a wide variety of both gram positive and gram negative bacteria. Narrow spectrum antimicrobials are effective only against gram negative bacteria. Pros and cons of each: Narrow spectrum microbial is normally better to use because they cause less damage to the bodys normal flora. They are less likely to cause drugs resistant strains of microorganisms because they are specific in nature and are less likely to cause super infection by opportunistic microorganisms like yeast infections. The main disadvantages are that Narrow spectrum microbial sometimes is more prone to allergic reaction to the host. Broad spectrum antimicrobials on the other hand also have their own advantages in that they are able to deal with more than one kind of bacteria and as such one does not have to use drugs indiscriminately reducing chances of allergic reactions and drug toxicity. The main disadvantage is that they cause more harm to the bodys normal flora. C. What is direct selection? Direct selection is the selection of antibiotic-resistant normal floras in an individual whenever this individual is given an antibiotic. This process is normally accelerated significantly by either improper use or the overuse of antibiotics. D. What is the difference between an antibiotic and an antimicrobial chemical? Antibiotics are substances that are produced as metabolic products of one Microorganism which are able to inhibit or kill other microorganisms. Antimicrobial chemicals are chemicals that are synthesized in a laboratory and can be used therapeutically on microorganisms. E. What is the mode of action for each of the following: a. acitracin: works by inhibiting peptidoglycan synthesis in actively dividing bacteria which normally results in osmotic lysis. b. nystatin: exerts its antifungal activity by binding to ergosterol found in fungal cell membranes. Binding to ergosterol causes the formation of pores in the membrane. Potassium and other cellular constituents leak from the pores causing cell death. c. tetracycline: exert their bacteriostatic effect by inhibiting protein synthesis in bacteria. This antibiotic prevents transfer-RNA (tRNA) molecules from binding to the 30S subunit of bacterial ribosomes. . ciprofloxin: Contains agents that inhibit one or more enzymes in the DNA synthesis pathway F. Describe three mechanisms by which microbes might become resistant to the action of an antimicrobial drug? Microbes may become resistant by producing enzymes that will detoxify or inactivate the antibiotic such as penicillinase and other beta-lactamases. Microbes may also alter the target site in the bacterium to reduce or block binding of the antibiotic in the process producing a slightly altered ribosomal subunit that still functions but to which the drug cant bind. Microbes may also prevent the transport of the antimicrobial agent into the bacterium thereby producing an altered cytoplasmic membrane or outer membrane. G. Why do you think neglecting to finish a prescribed course of antibiotics might contribute to the rise of antibiotic resistance? If you dont finish the medication, all the bacteria causing the infection may not be killed. Then, the infection could come back in that same place or even show up somewhere else. When the bacteria are undertreated, some of them may have enough time to have these mistakes occur in their DNA. Then, when they multiply, you get a bunch of bacteria that no longer respond to the antibiotics. H. What is a tube dilution test? How is it used to determine susceptibility? Tube dilution test is one of the tests that can be used to tell which antimicrobial agent is most likely to combat a specific pathogen. This test is conducted by preparing a series of culture tubes where each tube contains a liquid medium and a different concentration of an antimicrobial agent. These tubes are then inoculated with the test organism and then incubated. After the incubation they are examined for growth.

Wednesday, August 21, 2019

Data Pre-processing Tool

Data Pre-processing Tool Chapter- 2 Real life data rarely comply with the necessities of various data mining tools. It is usually inconsistent and noisy. It may contain redundant attributes, unsuitable formats etc. Hence data has to be prepared vigilantly before the data mining actually starts. It is well known fact that success of a data mining algorithm is very much dependent on the quality of data processing. Data processing is one of the most important tasks in data mining. In this context it is natural that data pre-processing is a complicated task involving large data sets. Sometimes data pre-processing take more than 50% of the total time spent in solving the data mining problem. It is crucial for data miners to choose efficient data preprocessing technique for specific data set which can not only save processing time but also retain the quality of the data for data mining process. A data pre-processing tool should help miners with many data mining activates. For example, data may be provided in different formats as discussed in previous chapter (flat files, database files etc). Data files may also have different formats of values, calculation of derived attributes, data filters, joined data sets etc. Data mining process generally starts with understanding of data. In this stage pre-processing tools may help with data exploration and data discovery tasks. Data processing includes lots of tedious works, Data pre-processing generally consists of Data Cleaning Data Integration Data Transformation And Data Reduction. In this chapter we will study all these data pre-processing activities. 2.1 Data Understanding In Data understanding phase the first task is to collect initial data and then proceed with activities in order to get well known with data, to discover data quality problems, to discover first insight into the data or to identify interesting subset to form hypothesis for hidden information. The data understanding phase according to CRISP model can be shown in following . 2.1.1 Collect Initial Data The initial collection of data includes loading of data if required for data understanding. For instance, if specific tool is applied for data understanding, it makes great sense to load your data into this tool. This attempt possibly leads to initial data preparation steps. However if data is obtained from multiple data sources then integration is an additional issue. 2.1.2 Describe data Here the gross or surface properties of the gathered data are examined. 2.1.3 Explore data This task is required to handle the data mining questions, which may be addressed using querying, visualization and reporting. These include: Sharing of key attributes, for instance the goal attribute of a prediction task Relations between pairs or small numbers of attributes Results of simple aggregations Properties of important sub-populations Simple statistical analyses. 2.1.4 Verify data quality In this step quality of data is examined. It answers questions such as: Is the data complete (does it cover all the cases required)? Is it accurate or does it contains errors and if there are errors how common are they? Are there missing values in the data? If so how are they represented, where do they occur and how common are they? 2.2 Data Preprocessing Data preprocessing phase focus on the pre-processing steps that produce the data to be mined. Data preparation or preprocessing is one most important step in data mining. Industrial practice indicates that one data is well prepared; the mined results are much more accurate. This means this step is also a very critical fro success of data mining method. Among others, data preparation mainly involves data cleaning, data integration, data transformation, and reduction. 2.2.1 Data Cleaning Data cleaning is also known as data cleansing or scrubbing. It deals with detecting and removing inconsistencies and errors from data in order to get better quality data. While using a single data source such as flat files or databases data quality problems arises due to misspellings while data entry, missing information or other invalid data. While the data is taken from the integration of multiple data sources such as data warehouses, federated database systems or global web-based information systems, the requirement for data cleaning increases significantly. This is because the multiple sources may contain redundant data in different formats. Consolidation of different data formats abs elimination of redundant information becomes necessary in order to provide access to accurate and consistent data. Good quality data requires passing a set of quality criteria. Those criteria include: Accuracy: Accuracy is an aggregated value over the criteria of integrity, consistency and density. Integrity: Integrity is an aggregated value over the criteria of completeness and validity. Completeness: completeness is achieved by correcting data containing anomalies. Validity: Validity is approximated by the amount of data satisfying integrity constraints. Consistency: consistency concerns contradictions and syntactical anomalies in data. Uniformity: it is directly related to irregularities in data. Density: The density is the quotient of missing values in the data and the number of total values ought to be known. Uniqueness: uniqueness is related to the number of duplicates present in the data. 2.2.1.1 Terms Related to Data Cleaning Data cleaning: data cleaning is the process of detecting, diagnosing, and editing damaged data. Data editing: data editing means changing the value of data which are incorrect. Data flow: data flow is defined as passing of recorded information through succeeding information carriers. Inliers: Inliers are data values falling inside the projected range. Outlier: outliers are data value falling outside the projected range. Robust estimation: evaluation of statistical parameters, using methods that are less responsive to the effect of outliers than more conventional methods are called robust method. 2.2.1.2 Definition: Data Cleaning Data cleaning is a process used to identify imprecise, incomplete, or irrational data and then improving the quality through correction of detected errors and omissions. This process may include format checks Completeness checks Reasonableness checks Limit checks Review of the data to identify outliers or other errors Assessment of data by subject area experts (e.g. taxonomic specialists). By this process suspected records are flagged, documented and checked subsequently. And finally these suspected records can be corrected. Sometimes validation checks also involve checking for compliance against applicable standards, rules, and conventions. The general framework for data cleaning given as: Define and determine error types; Search and identify error instances; Correct the errors; Document error instances and error types; and Modify data entry procedures to reduce future errors. Data cleaning process is referred by different people by a number of terms. It is a matter of preference what one uses. These terms include: Error Checking, Error Detection, Data Validation, Data Cleaning, Data Cleansing, Data Scrubbing and Error Correction. We use Data Cleaning to encompass three sub-processes, viz. Data checking and error detection; Data validation; and Error correction. A fourth improvement of the error prevention processes could perhaps be added. 2.2.1.3 Problems with Data Here we just note some key problems with data Missing data : This problem occur because of two main reasons Data are absent in source where it is expected to be present. Some times data is present are not available in appropriately form Detecting missing data is usually straightforward and simpler. Erroneous data: This problem occurs when a wrong value is recorded for a real world value. Detection of erroneous data can be quite difficult. (For instance the incorrect spelling of a name) Duplicated data : This problem occur because of two reasons Repeated entry of same real world entity with some different values Some times a real world entity may have different identifications. Repeat records are regular and frequently easy to detect. The different identification of the same real world entities can be a very hard problem to identify and solve. Heterogeneities: When data from different sources are brought together in one analysis problem heterogeneity may occur. Heterogeneity could be Structural heterogeneity arises when the data structures reflect different business usage Semantic heterogeneity arises when the meaning of data is different n each system that is being combined Heterogeneities are usually very difficult to resolve since because they usually involve a lot of contextual data that is not well defined as metadata. Information dependencies in the relationship between the different sets of attribute are commonly present. Wrong cleaning mechanisms can further damage the information in the data. Various analysis tools handle these problems in different ways. Commercial offerings are available that assist the cleaning process, but these are often problem specific. Uncertainty in information systems is a well-recognized hard problem. In following a very simple examples of missing and erroneous data is shown Extensive support for data cleaning must be provided by data warehouses. Data warehouses have high probability of â€Å"dirty data† since they load and continuously refresh huge amounts of data from a variety of sources. Since these data warehouses are used for strategic decision making therefore the correctness of their data is important to avoid wrong decisions. The ETL (Extraction, Transformation, and Loading) process for building a data warehouse is illustrated in following . Data transformations are related with schema or data translation and integration, and with filtering and aggregating data to be stored in the data warehouse. All data cleaning is classically performed in a separate data performance area prior to loading the transformed data into the warehouse. A large number of tools of varying functionality are available to support these tasks, but often a significant portion of the cleaning and transformation work has to be done manually or by low-level programs that are difficult to write and maintain. A data cleaning method should assure following: It should identify and eliminate all major errors and inconsistencies in an individual data sources and also when integrating multiple sources. Data cleaning should be supported by tools to bound manual examination and programming effort and it should be extensible so that can cover additional sources. It should be performed in association with schema related data transformations based on metadata. Data cleaning mapping functions should be specified in a declarative way and be reusable for other data sources. 2.2.1.4 Data Cleaning: Phases 1. Analysis: To identify errors and inconsistencies in the database there is a need of detailed analysis, which involves both manual inspection and automated analysis programs. This reveals where (most of) the problems are present. 2. Defining Transformation and Mapping Rules: After discovering the problems, this phase are related with defining the manner by which we are going to automate the solutions to clean the data. We will find various problems that translate to a list of activities as a result of analysis phase. Example: Remove all entries for J. Smith because they are duplicates of John Smith Find entries with `bule in colour field and change these to `blue. Find all records where the Phone number field does not match the pattern (NNNNN NNNNNN). Further steps for cleaning this data are then applied. Etc †¦ 3. Verification: In this phase we check and assess the transformation plans made in phase- 2. Without this step, we may end up making the data dirtier rather than cleaner. Since data transformation is the main step that actually changes the data itself so there is a need to be sure that the applied transformations will do it correctly. Therefore test and examine the transformation plans very carefully. Example: Let we have a very thick C++ book where it says strict in all the places where it should say struct 4. Transformation: Now if it is sure that cleaning will be done correctly, then apply the transformation verified in last step. For large database, this task is supported by a variety of tools Backflow of Cleaned Data: In a data mining the main objective is to convert and move clean data into target system. This asks for a requirement to purify legacy data. Cleansing can be a complicated process depending on the technique chosen and has to be designed carefully to achieve the objective of removal of dirty data. Some methods to accomplish the task of data cleansing of legacy system include: n Automated data cleansing n Manual data cleansing n The combined cleansing process 2.2.1.5 Missing Values Data cleaning addresses a variety of data quality problems, including noise and outliers, inconsistent data, duplicate data, and missing values. Missing values is one important problem to be addressed. Missing value problem occurs because many tuples may have no record for several attributes. For Example there is a customer sales database consisting of a whole bunch of records (lets say around 100,000) where some of the records have certain fields missing. Lets say customer income in sales data may be missing. Goal here is to find a way to predict what the missing data values should be (so that these can be filled) based on the existing data. Missing data may be due to following reasons Equipment malfunction Inconsistent with other recorded data and thus deleted Data not entered due to misunderstanding Certain data may not be considered important at the time of entry Not register history or changes of the data How to Handle Missing Values? Dealing with missing values is a regular question that has to do with the actual meaning of the data. There are various methods for handling missing entries 1. Ignore the data row. One solution of missing values is to just ignore the entire data row. This is generally done when the class label is not there (here we are assuming that the data mining goal is classification), or many attributes are missing from the row (not just one). But if the percentage of such rows is high we will definitely get a poor performance. 2. Use a global constant to fill in for missing values. We can fill in a global constant for missing values such as unknown, N/A or minus infinity. This is done because at times is just doesnt make sense to try and predict the missing value. For example if in customer sales database if, say, office address is missing for some, filling it in doesnt make much sense. This method is simple but is not full proof. 3. Use attribute mean. Let say if the average income of a a family is X you can use that value to replace missing income values in the customer sales database. 4. Use attribute mean for all samples belonging to the same class. Lets say you have a cars pricing DB that, among other things, classifies cars to Luxury and Low budget and youre dealing with missing values in the cost field. Replacing missing cost of a luxury car with the average cost of all luxury cars is probably more accurate then the value youd get if you factor in the low budget 5. Use data mining algorithm to predict the value. The value can be determined using regression, inference based tools using Bayesian formalism, decision trees, clustering algorithms etc. 2.2.1.6 Noisy Data Noise can be defined as a random error or variance in a measured variable. Due to randomness it is very difficult to follow a strategy for noise removal from the data. Real world data is not always faultless. It can suffer from corruption which may impact the interpretations of the data, models created from the data, and decisions made based on the data. Incorrect attribute values could be present because of following reasons Faulty data collection instruments Data entry problems Duplicate records Incomplete data: Inconsistent data Incorrect processing Data transmission problems Technology limitation. Inconsistency in naming convention Outliers How to handle Noisy Data? The methods for removing noise from data are as follows. 1. Binning: this approach first sort data and partition it into (equal-frequency) bins then one can smooth it using- Bin means, smooth using bin median, smooth using bin boundaries, etc. 2. Regression: in this method smoothing is done by fitting the data into regression functions. 3. Clustering: clustering detect and remove outliers from the data. 4. Combined computer and human inspection: in this approach computer detects suspicious values which are then checked by human experts (e.g., this approach deal with possible outliers).. Following methods are explained in detail as follows: Binning: Data preparation activity that converts continuous data to discrete data by replacing a value from a continuous range with a bin identifier, where each bin represents a range of values. For instance, age can be changed to bins such as 20 or under, 21-40, 41-65 and over 65. Binning methods smooth a sorted data set by consulting values around it. This is therefore called local smoothing. Let consider a binning example Binning Methods n Equal-width (distance) partitioning Divides the range into N intervals of equal size: uniform grid if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N. The most straightforward, but outliers may dominate presentation Skewed data is not handled well n Equal-depth (frequency) partitioning 1. It divides the range (values of a given attribute) into N intervals, each containing approximately same number of samples (elements) 2. Good data scaling 3. Managing categorical attributes can be tricky. n Smooth by bin means- Each bin value is replaced by the mean of values n Smooth by bin medians- Each bin value is replaced by the median of values n Smooth by bin boundaries Each bin value is replaced by the closest boundary value Example Let Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 n Partition into equal-frequency (equi-depth) bins: o Bin 1: 4, 8, 9, 15 o Bin 2: 21, 21, 24, 25 o Bin 3: 26, 28, 29, 34 n Smoothing by bin means: o Bin 1: 9, 9, 9, 9 ( for example mean of 4, 8, 9, 15 is 9) o Bin 2: 23, 23, 23, 23 o Bin 3: 29, 29, 29, 29 n Smoothing by bin boundaries: o Bin 1: 4, 4, 4, 15 o Bin 2: 21, 21, 25, 25 o Bin 3: 26, 26, 26, 34 Regression: Regression is a DM technique used to fit an equation to a dataset. The simplest form of regression is linear regression which uses the formula of a straight line (y = b+ wx) and determines the suitable values for b and w to predict the value of y based upon a given value of x. Sophisticated techniques, such as multiple regression, permit the use of more than one input variable and allow for the fitting of more complex models, such as a quadratic equation. Regression is further described in subsequent chapter while discussing predictions. Clustering: clustering is a method of grouping data into different groups , so that data in each group share similar trends and patterns. Clustering constitute a major class of data mining algorithms. These algorithms automatically partitions the data space into set of regions or cluster. The goal of the process is to find all set of similar examples in data, in some optimal fashion. Following shows three clusters. Values that fall outsid e the cluster are outliers. 4. Combined computer and human inspection: These methods find the suspicious values using the computer programs and then they are verified by human experts. By this process all outliers are checked. 2.2.1.7 Data cleaning as a process Data cleaning is the process of Detecting, Diagnosing, and Editing Data. Data cleaning is a three stage method involving repeated cycle of screening, diagnosing, and editing of suspected data abnormalities. Many data errors are detected by the way during study activities. However, it is more efficient to discover inconsistencies by actively searching for them in a planned manner. It is not always right away clear whether a data point is erroneous. Many times it requires careful examination. Likewise, missing values require additional check. Therefore, predefined rules for dealing with errors and true missing and extreme values are part of good practice. One can monitor for suspect features in survey questionnaires, databases, or analysis data. In small studies, with the examiner intimately involved at all stages, there may be small or no difference between a database and an analysis dataset. During as well as after treatment, the diagnostic and treatment phases of cleaning need insight into the sources and types of errors at all stages of the study. Data flow concept is therefore crucial in this respect. After measurement the research data go through repeated steps of- entering into information carriers, extracted, and transferred to other carriers, edited, selected, transformed, summarized, and presented. It is essential to understand that errors can occur at any stage of the data flow, including during data cleaning itself. Most of these problems are due to human error. Inaccuracy of a single data point and measurement may be tolerable, and associated to the inherent technological error of the measurement device. Therefore the process of data clenaning mus focus on those errors that are beyond small technical variations and that form a major shift within or beyond the population distribution. In turn, it must be based on understanding of technical errors and expected ranges of normal values. Some errors are worthy of higher priority, but which ones are most significant is highly study-specific. For instance in most medical epidemiological studies, errors that need to be cleaned, at all costs, include missing gender, gender misspecification, birth date or examination date errors, duplications or merging of records, and biologically impossible results. Another example is in nutrition studies, date errors lead to age errors, which in turn lead to errors in weight-for-age scoring and, further, to misclassification of subjects as under- or overweight. Errors of sex and date are particularly important because they contaminate derived variables. Prioritization is essential if the study is under time pressures or if resources for data cleaning are limited. 2.2.2 Data Integration This is a process of taking data from one or more sources and mapping it, field by field, onto a new data structure. Idea is to combine data from multiple sources into a coherent form. Various data mining projects requires data from multiple sources because n Data may be distributed over different databases or data warehouses. (for example an epidemiological study that needs information about hospital admissions and car accidents) n Sometimes data may be required from different geographic distributions, or there may be need for historical data. (e.g. integrate historical data into a new data warehouse) n There may be a necessity of enhancement of data with additional (external) data. (for improving data mining precision) 2.2.2.1 Data Integration Issues There are number of issues in data integrations. Consider two database tables. Imagine two database tables Database Table-1 Database Table-2 In integration of there two tables there are variety of issues involved such as 1. The same attribute may have different names (for example in above tables Name and Given Name are same attributes with different names) 2. An attribute may be derived from another (for example attribute Age is derived from attribute DOB) 3. Attributes might be redundant( For example attribute PID is redundant) 4. Values in attributes might be different (for example for PID 4791 values in second and third field are different in both the tables) 5. Duplicate records under different keys( there is a possibility of replication of same record with different key values) Therefore schema integration and object matching can be trickier. Question here is how equivalent entities from different sources are matched? This problem is known as entity identification problem. Conflicts have to be detected and resolved. Integration becomes easier if unique entity keys are available in all the data sets (or tables) to be linked. Metadata can help in schema integration (example of metadata for each attribute includes the name, meaning, data type and range of values permitted for the attribute) 2.2.2.1 Redundancy Redundancy is another important issue in data integration. Two given attribute (such as DOB and age for instance in give table) may be redundant if one is derived form the other attribute or set of attributes. Inconsistencies in attribute or dimension naming can lead to redundancies in the given data sets. Handling Redundant Data We can handle data redundancy problems by following ways n Use correlation analysis n Different coding / representation has to be considered (e.g. metric / imperial measures) n Careful (manual) integration of the data can reduce or prevent redundancies (and inconsistencies) n De-duplication (also called internal data linkage) o If no unique entity keys are available o Analysis of values in attributes to find duplicates n Process redundant and inconsistent data (easy if values are the same) o Delete one of the values o Average values (only for numerical attributes) o Take majority values (if more than 2 duplicates and some values are the same) Correlation analysis is explained in detail here. Correlation analysis (also called Pearsons product moment coefficient): some redundancies can be detected by using correlation analysis. Given two attributes, such analysis can measure how strong one attribute implies another. For numerical attribute we can compute correlation coefficient of two attributes A and B to evaluate the correlation between them. This is given by Where n n is the number of tuples, n and are the respective means of A and B n ÏÆ'A and ÏÆ'B are the respective standard deviation of A and B n ÃŽ £(AB) is the sum of the AB cross-product. a. If -1 b. If rA, B is equal to zero it indicates A and B are independent of each other and there is no correlation between them. c. If rA, B is less than zero then A and B are negatively correlated. , where if value of one attribute increases value of another attribute decreases. This means that one attribute discourages another attribute. It is important to note that correlation does not imply causality. That is, if A and B are correlated, this does not essentially mean that A causes B or that B causes A. for example in analyzing a demographic database, we may find that attribute representing number of accidents and the number of car theft in a region are correlated. This does not mean that one is related to another. Both may be related to third attribute, namely population. For discrete data, a correlation relation between two attributes, can be discovered by a χ ²(chi-square) test. Let A has c distinct values a1,a2,†¦Ã¢â‚¬ ¦ac and B has r different values namely b1,b2,†¦Ã¢â‚¬ ¦br The data tuple described by A and B are shown as contingency table, with c values of A (making up columns) and r values of B( making up rows). Each and every (Ai, Bj) cell in table has. X^2 = sum_{i=1}^{r} sum_{j=1}^{c} {(O_{i,j} E_{i,j})^2 over E_{i,j}} . Where n Oi, j is the observed frequency (i.e. actual count) of joint event (Ai, Bj) and n Ei, j is the expected frequency which can be computed as E_{i,j}=frac{sum_{k=1}^{c} O_{i,k} sum_{k=1}^{r} O_{k,j}}{N} , , Where n N is number of data tuple n Oi,k is number of tuples having value ai for A n Ok,j is number of tuples having value bj for B The larger the χ ² value, the more likely the variables are related. The cells that contribute the most to the χ ² value are those whose actual count is very different from the expected count Chi-Square Calculation: An Example Suppose a group of 1,500 people were surveyed. The gender of each person was noted. Each person has polled their preferred type of reading material as fiction or non-fiction. The observed frequency of each possible joint event is summarized in following table.( number in parenthesis are expected frequencies) . Calculate chi square. Play chess Not play chess Sum (row) Like science fiction 250(90) 200(360) 450 Not like science fiction 50(210) 1000(840) 1050 Sum(col.) 300 1200 1500 E11 = count (male)*count(fiction)/N = 300 * 450 / 1500 =90 and so on For this table the degree of freedom are (2-1)(2-1) =1 as table is 2X2. for 1 degree of freedom , the χ ² value needed to reject the hypothesis at the 0.001 significance level is 10.828 (taken from the table of upper percentage point of the χ ² distribution typically available in any statistic text book). Since the computed value is above this, we can reject the hypothesis that gender and preferred reading are independent and conclude that two attributes are strongly correlated for given group. Duplication must also be detected at the tuple level. The use of renormalized tables is also a source of redundancies. Redundancies may further lead to data inconsistencies (due to updating some but not others). 2.2.2.2 Detection and resolution of data value conflicts Another significant issue in data integration is the discovery and resolution of data value conflicts. For example, for the same entity, attribute values from different sources may differ. For example weight can be stored in metric unit in one source and British imperial unit in another source. For instance, for a hotel cha Data Pre-processing Tool Data Pre-processing Tool Chapter- 2 Real life data rarely comply with the necessities of various data mining tools. It is usually inconsistent and noisy. It may contain redundant attributes, unsuitable formats etc. Hence data has to be prepared vigilantly before the data mining actually starts. It is well known fact that success of a data mining algorithm is very much dependent on the quality of data processing. Data processing is one of the most important tasks in data mining. In this context it is natural that data pre-processing is a complicated task involving large data sets. Sometimes data pre-processing take more than 50% of the total time spent in solving the data mining problem. It is crucial for data miners to choose efficient data preprocessing technique for specific data set which can not only save processing time but also retain the quality of the data for data mining process. A data pre-processing tool should help miners with many data mining activates. For example, data may be provided in different formats as discussed in previous chapter (flat files, database files etc). Data files may also have different formats of values, calculation of derived attributes, data filters, joined data sets etc. Data mining process generally starts with understanding of data. In this stage pre-processing tools may help with data exploration and data discovery tasks. Data processing includes lots of tedious works, Data pre-processing generally consists of Data Cleaning Data Integration Data Transformation And Data Reduction. In this chapter we will study all these data pre-processing activities. 2.1 Data Understanding In Data understanding phase the first task is to collect initial data and then proceed with activities in order to get well known with data, to discover data quality problems, to discover first insight into the data or to identify interesting subset to form hypothesis for hidden information. The data understanding phase according to CRISP model can be shown in following . 2.1.1 Collect Initial Data The initial collection of data includes loading of data if required for data understanding. For instance, if specific tool is applied for data understanding, it makes great sense to load your data into this tool. This attempt possibly leads to initial data preparation steps. However if data is obtained from multiple data sources then integration is an additional issue. 2.1.2 Describe data Here the gross or surface properties of the gathered data are examined. 2.1.3 Explore data This task is required to handle the data mining questions, which may be addressed using querying, visualization and reporting. These include: Sharing of key attributes, for instance the goal attribute of a prediction task Relations between pairs or small numbers of attributes Results of simple aggregations Properties of important sub-populations Simple statistical analyses. 2.1.4 Verify data quality In this step quality of data is examined. It answers questions such as: Is the data complete (does it cover all the cases required)? Is it accurate or does it contains errors and if there are errors how common are they? Are there missing values in the data? If so how are they represented, where do they occur and how common are they? 2.2 Data Preprocessing Data preprocessing phase focus on the pre-processing steps that produce the data to be mined. Data preparation or preprocessing is one most important step in data mining. Industrial practice indicates that one data is well prepared; the mined results are much more accurate. This means this step is also a very critical fro success of data mining method. Among others, data preparation mainly involves data cleaning, data integration, data transformation, and reduction. 2.2.1 Data Cleaning Data cleaning is also known as data cleansing or scrubbing. It deals with detecting and removing inconsistencies and errors from data in order to get better quality data. While using a single data source such as flat files or databases data quality problems arises due to misspellings while data entry, missing information or other invalid data. While the data is taken from the integration of multiple data sources such as data warehouses, federated database systems or global web-based information systems, the requirement for data cleaning increases significantly. This is because the multiple sources may contain redundant data in different formats. Consolidation of different data formats abs elimination of redundant information becomes necessary in order to provide access to accurate and consistent data. Good quality data requires passing a set of quality criteria. Those criteria include: Accuracy: Accuracy is an aggregated value over the criteria of integrity, consistency and density. Integrity: Integrity is an aggregated value over the criteria of completeness and validity. Completeness: completeness is achieved by correcting data containing anomalies. Validity: Validity is approximated by the amount of data satisfying integrity constraints. Consistency: consistency concerns contradictions and syntactical anomalies in data. Uniformity: it is directly related to irregularities in data. Density: The density is the quotient of missing values in the data and the number of total values ought to be known. Uniqueness: uniqueness is related to the number of duplicates present in the data. 2.2.1.1 Terms Related to Data Cleaning Data cleaning: data cleaning is the process of detecting, diagnosing, and editing damaged data. Data editing: data editing means changing the value of data which are incorrect. Data flow: data flow is defined as passing of recorded information through succeeding information carriers. Inliers: Inliers are data values falling inside the projected range. Outlier: outliers are data value falling outside the projected range. Robust estimation: evaluation of statistical parameters, using methods that are less responsive to the effect of outliers than more conventional methods are called robust method. 2.2.1.2 Definition: Data Cleaning Data cleaning is a process used to identify imprecise, incomplete, or irrational data and then improving the quality through correction of detected errors and omissions. This process may include format checks Completeness checks Reasonableness checks Limit checks Review of the data to identify outliers or other errors Assessment of data by subject area experts (e.g. taxonomic specialists). By this process suspected records are flagged, documented and checked subsequently. And finally these suspected records can be corrected. Sometimes validation checks also involve checking for compliance against applicable standards, rules, and conventions. The general framework for data cleaning given as: Define and determine error types; Search and identify error instances; Correct the errors; Document error instances and error types; and Modify data entry procedures to reduce future errors. Data cleaning process is referred by different people by a number of terms. It is a matter of preference what one uses. These terms include: Error Checking, Error Detection, Data Validation, Data Cleaning, Data Cleansing, Data Scrubbing and Error Correction. We use Data Cleaning to encompass three sub-processes, viz. Data checking and error detection; Data validation; and Error correction. A fourth improvement of the error prevention processes could perhaps be added. 2.2.1.3 Problems with Data Here we just note some key problems with data Missing data : This problem occur because of two main reasons Data are absent in source where it is expected to be present. Some times data is present are not available in appropriately form Detecting missing data is usually straightforward and simpler. Erroneous data: This problem occurs when a wrong value is recorded for a real world value. Detection of erroneous data can be quite difficult. (For instance the incorrect spelling of a name) Duplicated data : This problem occur because of two reasons Repeated entry of same real world entity with some different values Some times a real world entity may have different identifications. Repeat records are regular and frequently easy to detect. The different identification of the same real world entities can be a very hard problem to identify and solve. Heterogeneities: When data from different sources are brought together in one analysis problem heterogeneity may occur. Heterogeneity could be Structural heterogeneity arises when the data structures reflect different business usage Semantic heterogeneity arises when the meaning of data is different n each system that is being combined Heterogeneities are usually very difficult to resolve since because they usually involve a lot of contextual data that is not well defined as metadata. Information dependencies in the relationship between the different sets of attribute are commonly present. Wrong cleaning mechanisms can further damage the information in the data. Various analysis tools handle these problems in different ways. Commercial offerings are available that assist the cleaning process, but these are often problem specific. Uncertainty in information systems is a well-recognized hard problem. In following a very simple examples of missing and erroneous data is shown Extensive support for data cleaning must be provided by data warehouses. Data warehouses have high probability of â€Å"dirty data† since they load and continuously refresh huge amounts of data from a variety of sources. Since these data warehouses are used for strategic decision making therefore the correctness of their data is important to avoid wrong decisions. The ETL (Extraction, Transformation, and Loading) process for building a data warehouse is illustrated in following . Data transformations are related with schema or data translation and integration, and with filtering and aggregating data to be stored in the data warehouse. All data cleaning is classically performed in a separate data performance area prior to loading the transformed data into the warehouse. A large number of tools of varying functionality are available to support these tasks, but often a significant portion of the cleaning and transformation work has to be done manually or by low-level programs that are difficult to write and maintain. A data cleaning method should assure following: It should identify and eliminate all major errors and inconsistencies in an individual data sources and also when integrating multiple sources. Data cleaning should be supported by tools to bound manual examination and programming effort and it should be extensible so that can cover additional sources. It should be performed in association with schema related data transformations based on metadata. Data cleaning mapping functions should be specified in a declarative way and be reusable for other data sources. 2.2.1.4 Data Cleaning: Phases 1. Analysis: To identify errors and inconsistencies in the database there is a need of detailed analysis, which involves both manual inspection and automated analysis programs. This reveals where (most of) the problems are present. 2. Defining Transformation and Mapping Rules: After discovering the problems, this phase are related with defining the manner by which we are going to automate the solutions to clean the data. We will find various problems that translate to a list of activities as a result of analysis phase. Example: Remove all entries for J. Smith because they are duplicates of John Smith Find entries with `bule in colour field and change these to `blue. Find all records where the Phone number field does not match the pattern (NNNNN NNNNNN). Further steps for cleaning this data are then applied. Etc †¦ 3. Verification: In this phase we check and assess the transformation plans made in phase- 2. Without this step, we may end up making the data dirtier rather than cleaner. Since data transformation is the main step that actually changes the data itself so there is a need to be sure that the applied transformations will do it correctly. Therefore test and examine the transformation plans very carefully. Example: Let we have a very thick C++ book where it says strict in all the places where it should say struct 4. Transformation: Now if it is sure that cleaning will be done correctly, then apply the transformation verified in last step. For large database, this task is supported by a variety of tools Backflow of Cleaned Data: In a data mining the main objective is to convert and move clean data into target system. This asks for a requirement to purify legacy data. Cleansing can be a complicated process depending on the technique chosen and has to be designed carefully to achieve the objective of removal of dirty data. Some methods to accomplish the task of data cleansing of legacy system include: n Automated data cleansing n Manual data cleansing n The combined cleansing process 2.2.1.5 Missing Values Data cleaning addresses a variety of data quality problems, including noise and outliers, inconsistent data, duplicate data, and missing values. Missing values is one important problem to be addressed. Missing value problem occurs because many tuples may have no record for several attributes. For Example there is a customer sales database consisting of a whole bunch of records (lets say around 100,000) where some of the records have certain fields missing. Lets say customer income in sales data may be missing. Goal here is to find a way to predict what the missing data values should be (so that these can be filled) based on the existing data. Missing data may be due to following reasons Equipment malfunction Inconsistent with other recorded data and thus deleted Data not entered due to misunderstanding Certain data may not be considered important at the time of entry Not register history or changes of the data How to Handle Missing Values? Dealing with missing values is a regular question that has to do with the actual meaning of the data. There are various methods for handling missing entries 1. Ignore the data row. One solution of missing values is to just ignore the entire data row. This is generally done when the class label is not there (here we are assuming that the data mining goal is classification), or many attributes are missing from the row (not just one). But if the percentage of such rows is high we will definitely get a poor performance. 2. Use a global constant to fill in for missing values. We can fill in a global constant for missing values such as unknown, N/A or minus infinity. This is done because at times is just doesnt make sense to try and predict the missing value. For example if in customer sales database if, say, office address is missing for some, filling it in doesnt make much sense. This method is simple but is not full proof. 3. Use attribute mean. Let say if the average income of a a family is X you can use that value to replace missing income values in the customer sales database. 4. Use attribute mean for all samples belonging to the same class. Lets say you have a cars pricing DB that, among other things, classifies cars to Luxury and Low budget and youre dealing with missing values in the cost field. Replacing missing cost of a luxury car with the average cost of all luxury cars is probably more accurate then the value youd get if you factor in the low budget 5. Use data mining algorithm to predict the value. The value can be determined using regression, inference based tools using Bayesian formalism, decision trees, clustering algorithms etc. 2.2.1.6 Noisy Data Noise can be defined as a random error or variance in a measured variable. Due to randomness it is very difficult to follow a strategy for noise removal from the data. Real world data is not always faultless. It can suffer from corruption which may impact the interpretations of the data, models created from the data, and decisions made based on the data. Incorrect attribute values could be present because of following reasons Faulty data collection instruments Data entry problems Duplicate records Incomplete data: Inconsistent data Incorrect processing Data transmission problems Technology limitation. Inconsistency in naming convention Outliers How to handle Noisy Data? The methods for removing noise from data are as follows. 1. Binning: this approach first sort data and partition it into (equal-frequency) bins then one can smooth it using- Bin means, smooth using bin median, smooth using bin boundaries, etc. 2. Regression: in this method smoothing is done by fitting the data into regression functions. 3. Clustering: clustering detect and remove outliers from the data. 4. Combined computer and human inspection: in this approach computer detects suspicious values which are then checked by human experts (e.g., this approach deal with possible outliers).. Following methods are explained in detail as follows: Binning: Data preparation activity that converts continuous data to discrete data by replacing a value from a continuous range with a bin identifier, where each bin represents a range of values. For instance, age can be changed to bins such as 20 or under, 21-40, 41-65 and over 65. Binning methods smooth a sorted data set by consulting values around it. This is therefore called local smoothing. Let consider a binning example Binning Methods n Equal-width (distance) partitioning Divides the range into N intervals of equal size: uniform grid if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N. The most straightforward, but outliers may dominate presentation Skewed data is not handled well n Equal-depth (frequency) partitioning 1. It divides the range (values of a given attribute) into N intervals, each containing approximately same number of samples (elements) 2. Good data scaling 3. Managing categorical attributes can be tricky. n Smooth by bin means- Each bin value is replaced by the mean of values n Smooth by bin medians- Each bin value is replaced by the median of values n Smooth by bin boundaries Each bin value is replaced by the closest boundary value Example Let Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 n Partition into equal-frequency (equi-depth) bins: o Bin 1: 4, 8, 9, 15 o Bin 2: 21, 21, 24, 25 o Bin 3: 26, 28, 29, 34 n Smoothing by bin means: o Bin 1: 9, 9, 9, 9 ( for example mean of 4, 8, 9, 15 is 9) o Bin 2: 23, 23, 23, 23 o Bin 3: 29, 29, 29, 29 n Smoothing by bin boundaries: o Bin 1: 4, 4, 4, 15 o Bin 2: 21, 21, 25, 25 o Bin 3: 26, 26, 26, 34 Regression: Regression is a DM technique used to fit an equation to a dataset. The simplest form of regression is linear regression which uses the formula of a straight line (y = b+ wx) and determines the suitable values for b and w to predict the value of y based upon a given value of x. Sophisticated techniques, such as multiple regression, permit the use of more than one input variable and allow for the fitting of more complex models, such as a quadratic equation. Regression is further described in subsequent chapter while discussing predictions. Clustering: clustering is a method of grouping data into different groups , so that data in each group share similar trends and patterns. Clustering constitute a major class of data mining algorithms. These algorithms automatically partitions the data space into set of regions or cluster. The goal of the process is to find all set of similar examples in data, in some optimal fashion. Following shows three clusters. Values that fall outsid e the cluster are outliers. 4. Combined computer and human inspection: These methods find the suspicious values using the computer programs and then they are verified by human experts. By this process all outliers are checked. 2.2.1.7 Data cleaning as a process Data cleaning is the process of Detecting, Diagnosing, and Editing Data. Data cleaning is a three stage method involving repeated cycle of screening, diagnosing, and editing of suspected data abnormalities. Many data errors are detected by the way during study activities. However, it is more efficient to discover inconsistencies by actively searching for them in a planned manner. It is not always right away clear whether a data point is erroneous. Many times it requires careful examination. Likewise, missing values require additional check. Therefore, predefined rules for dealing with errors and true missing and extreme values are part of good practice. One can monitor for suspect features in survey questionnaires, databases, or analysis data. In small studies, with the examiner intimately involved at all stages, there may be small or no difference between a database and an analysis dataset. During as well as after treatment, the diagnostic and treatment phases of cleaning need insight into the sources and types of errors at all stages of the study. Data flow concept is therefore crucial in this respect. After measurement the research data go through repeated steps of- entering into information carriers, extracted, and transferred to other carriers, edited, selected, transformed, summarized, and presented. It is essential to understand that errors can occur at any stage of the data flow, including during data cleaning itself. Most of these problems are due to human error. Inaccuracy of a single data point and measurement may be tolerable, and associated to the inherent technological error of the measurement device. Therefore the process of data clenaning mus focus on those errors that are beyond small technical variations and that form a major shift within or beyond the population distribution. In turn, it must be based on understanding of technical errors and expected ranges of normal values. Some errors are worthy of higher priority, but which ones are most significant is highly study-specific. For instance in most medical epidemiological studies, errors that need to be cleaned, at all costs, include missing gender, gender misspecification, birth date or examination date errors, duplications or merging of records, and biologically impossible results. Another example is in nutrition studies, date errors lead to age errors, which in turn lead to errors in weight-for-age scoring and, further, to misclassification of subjects as under- or overweight. Errors of sex and date are particularly important because they contaminate derived variables. Prioritization is essential if the study is under time pressures or if resources for data cleaning are limited. 2.2.2 Data Integration This is a process of taking data from one or more sources and mapping it, field by field, onto a new data structure. Idea is to combine data from multiple sources into a coherent form. Various data mining projects requires data from multiple sources because n Data may be distributed over different databases or data warehouses. (for example an epidemiological study that needs information about hospital admissions and car accidents) n Sometimes data may be required from different geographic distributions, or there may be need for historical data. (e.g. integrate historical data into a new data warehouse) n There may be a necessity of enhancement of data with additional (external) data. (for improving data mining precision) 2.2.2.1 Data Integration Issues There are number of issues in data integrations. Consider two database tables. Imagine two database tables Database Table-1 Database Table-2 In integration of there two tables there are variety of issues involved such as 1. The same attribute may have different names (for example in above tables Name and Given Name are same attributes with different names) 2. An attribute may be derived from another (for example attribute Age is derived from attribute DOB) 3. Attributes might be redundant( For example attribute PID is redundant) 4. Values in attributes might be different (for example for PID 4791 values in second and third field are different in both the tables) 5. Duplicate records under different keys( there is a possibility of replication of same record with different key values) Therefore schema integration and object matching can be trickier. Question here is how equivalent entities from different sources are matched? This problem is known as entity identification problem. Conflicts have to be detected and resolved. Integration becomes easier if unique entity keys are available in all the data sets (or tables) to be linked. Metadata can help in schema integration (example of metadata for each attribute includes the name, meaning, data type and range of values permitted for the attribute) 2.2.2.1 Redundancy Redundancy is another important issue in data integration. Two given attribute (such as DOB and age for instance in give table) may be redundant if one is derived form the other attribute or set of attributes. Inconsistencies in attribute or dimension naming can lead to redundancies in the given data sets. Handling Redundant Data We can handle data redundancy problems by following ways n Use correlation analysis n Different coding / representation has to be considered (e.g. metric / imperial measures) n Careful (manual) integration of the data can reduce or prevent redundancies (and inconsistencies) n De-duplication (also called internal data linkage) o If no unique entity keys are available o Analysis of values in attributes to find duplicates n Process redundant and inconsistent data (easy if values are the same) o Delete one of the values o Average values (only for numerical attributes) o Take majority values (if more than 2 duplicates and some values are the same) Correlation analysis is explained in detail here. Correlation analysis (also called Pearsons product moment coefficient): some redundancies can be detected by using correlation analysis. Given two attributes, such analysis can measure how strong one attribute implies another. For numerical attribute we can compute correlation coefficient of two attributes A and B to evaluate the correlation between them. This is given by Where n n is the number of tuples, n and are the respective means of A and B n ÏÆ'A and ÏÆ'B are the respective standard deviation of A and B n ÃŽ £(AB) is the sum of the AB cross-product. a. If -1 b. If rA, B is equal to zero it indicates A and B are independent of each other and there is no correlation between them. c. If rA, B is less than zero then A and B are negatively correlated. , where if value of one attribute increases value of another attribute decreases. This means that one attribute discourages another attribute. It is important to note that correlation does not imply causality. That is, if A and B are correlated, this does not essentially mean that A causes B or that B causes A. for example in analyzing a demographic database, we may find that attribute representing number of accidents and the number of car theft in a region are correlated. This does not mean that one is related to another. Both may be related to third attribute, namely population. For discrete data, a correlation relation between two attributes, can be discovered by a χ ²(chi-square) test. Let A has c distinct values a1,a2,†¦Ã¢â‚¬ ¦ac and B has r different values namely b1,b2,†¦Ã¢â‚¬ ¦br The data tuple described by A and B are shown as contingency table, with c values of A (making up columns) and r values of B( making up rows). Each and every (Ai, Bj) cell in table has. X^2 = sum_{i=1}^{r} sum_{j=1}^{c} {(O_{i,j} E_{i,j})^2 over E_{i,j}} . Where n Oi, j is the observed frequency (i.e. actual count) of joint event (Ai, Bj) and n Ei, j is the expected frequency which can be computed as E_{i,j}=frac{sum_{k=1}^{c} O_{i,k} sum_{k=1}^{r} O_{k,j}}{N} , , Where n N is number of data tuple n Oi,k is number of tuples having value ai for A n Ok,j is number of tuples having value bj for B The larger the χ ² value, the more likely the variables are related. The cells that contribute the most to the χ ² value are those whose actual count is very different from the expected count Chi-Square Calculation: An Example Suppose a group of 1,500 people were surveyed. The gender of each person was noted. Each person has polled their preferred type of reading material as fiction or non-fiction. The observed frequency of each possible joint event is summarized in following table.( number in parenthesis are expected frequencies) . Calculate chi square. Play chess Not play chess Sum (row) Like science fiction 250(90) 200(360) 450 Not like science fiction 50(210) 1000(840) 1050 Sum(col.) 300 1200 1500 E11 = count (male)*count(fiction)/N = 300 * 450 / 1500 =90 and so on For this table the degree of freedom are (2-1)(2-1) =1 as table is 2X2. for 1 degree of freedom , the χ ² value needed to reject the hypothesis at the 0.001 significance level is 10.828 (taken from the table of upper percentage point of the χ ² distribution typically available in any statistic text book). Since the computed value is above this, we can reject the hypothesis that gender and preferred reading are independent and conclude that two attributes are strongly correlated for given group. Duplication must also be detected at the tuple level. The use of renormalized tables is also a source of redundancies. Redundancies may further lead to data inconsistencies (due to updating some but not others). 2.2.2.2 Detection and resolution of data value conflicts Another significant issue in data integration is the discovery and resolution of data value conflicts. For example, for the same entity, attribute values from different sources may differ. For example weight can be stored in metric unit in one source and British imperial unit in another source. For instance, for a hotel cha