CBSE Class 12 Case Studies In Business Studies – Controlling

CONTROLLING Controlling: Definition Controlling means ensuring that activities in an organisation are performed as per the plans.

Importance of Controlling

  • It helps in accomplishing organisational goals by constantly monitoring the performance of the employees and bringing to light the deviations, if any, and taking appropriate corrective action.
  • It helps the business managers to judge the objectivity and accuracy of the standards.
  • It seeks to make efficient use of resources.
  • It seeks to motivate the employees and helps them in giving a better performance.
  • It creates an atmosphere of order and discipline in the organisation.
  • It facilitates coordination in action by providing direction to all activities within and among departments.

Features of Controlling

  • It is a goal-oriented function.
  • It is a pervasive function as it is used in the organisations of varying types and sizes.
  • It is considered to be a forward looking function as it helps to improve the planning by providing valuable feedback for reviewing and revising the standards.
  • It is considered to be a backward looking function as it is like the post mortem of the past activities to ascertain the deviations if any.
  • It is not the last function of management as it brings the management cycle back to the planning function.

Steps Involved in the Controlling Process

  • Setting performance standards in clear, specific and measurable terms.
  • Measurement of actual performance as far as possible in the same units in which standards are set.
  • Comparing actual performance with standards to identify deviations if any.
  • Analysing deviations through critical point control and management by exception approaches to identify the causes for their occurrence.
  • Taking corrective action whenever the deviation occurs beyond the permissible limits so that it does not reoccur in future.

Relationship between Planning and Controlling

  • Planning without controlling is useless and controlling without planning is blind.
  • Planning provides the basis of controlling by setting the standards in advance. In the absence of these standards, managers will not know what all activities have to be controlled.
  • Planning is prescriptive in nature whereas, controlling is evaluative.
  • Thus, planning and controlling are interrelated and interdependent. As planning is based on facts, it makes controlling easier and effective whereas controlling helps to improve future planning by providing valuable information derived from the past experiences.

LATEST CBSE QUESTIONS

Question 1. Hina Sweets is a renowned name for quality sweets since 1935. Harsh the owner of Hina Sweets was worried as the sales had declined during the last three months. When he enquired from the Sales Manager, the Sales Manager reported that there were some complaints about the quality of sweets. Therefore Harsh ordered for sample checking of sweets. Identify the step taken by Harsh that is related to one of the functions of management. (CBSE, Delhi 2017) Answer: Measurement of actual performance is the step in controlling process being described.

Question 2. State the steps in the process of controlling. (CBSE, Delhi 2017) Answer: The various steps involved in the controlling process are described below:

  • Setting performance standards: The first step in the controlling process involves setting standards in clear, specific and measurable terms. Standards can be set in both quantitative as well as qualitative terms. It is important that standards should be flexible enough to be modified with the changes taking place in the internal and external business environment.
  • Measurement of actual performance: The next step relates to the measurement of actual performance. Performance should be measured in an objective and reliable manner. As far as possible, performance should be measured in the same units in which standards are set as this would make their comparison easier. Depending upon the nature of work various techniques for measurement of performance like personal observation, sample checking, performance reports, etc. may be used.
  • Comparison of actual performance with standards: This step involves comparison of actual performance with the standard. Such comparison will reveal the deviation and noting deviations if any. If the actual performance is more than planned performance deviations are said to be positive or vice-versa.
  • Critical point control: All the deviations may not be significant. Moreover, it may not be either economical nor easy to monitor each and every activity in the organisation. Therefore, every organisation identifies and states its specific key result areas (KRAs) or critical points which require tight control are likely to have a significant effect on the working of the business. Any deviations on these points are attended to urgently by the management. Like if the expenditure on refreshment of workers goes up by 10% it can be ignored but if the production cost goes up by 5% it may call for managerial action.
  • Management by exception: Management by exception is the principle of management control which is based on the belief that if you try to control everything, you may end up controlling nothing. Therefore, only significant deviations which go beyond the permissible limits should be brought to the notice of the management. Like the output defects upto 2% may be considered acceptable but if goes up by 5% it may call for managerial action.
  • Taking corrective action: This is the final step involved in the controlling process. When the deviations are within acceptable limits no corrective action is required. However, when the deviations go beyond the acceptable range, especially in the important areas, it demands immediate managerial attention so that deviations do not occur again and standards are accomplished. Corrective action might involve training of employees, buying new machinery, increasing supervision and so on.
  • Planning is based on facts and makes controlling process easier and adds to the effectiveness.
  • Controlling also adds to the effectiveness of planning process by providing valuable feedback based on past experiences.

Question 4. State any five points that highlight the importance of ‘controlling’ function of management. (CBSE, Delhi 2017) Answer: The importance of controlling function of management is described below:

  • Accomplishing organisational goals: The controlling function facilitates constant monitoring of the actual performance in comparison to the predetermined standards and brings to light the deviations, if any, and indicates corrective action. All these activities ensure that organisational goals are realised efficiently and effectively.
  • Judging accuracy of standards: A good control system enables management to verify whether the standards set are accurate and objective. Moreover, helps to review and revise the standards in light of changes taking place in the organisation or business environment in general.
  • Making efficient use of resources: By implementing a good control system a manager seeks to reduce wastage and spoilage of resources. This is because each activity is performed in accordance with predetermined standards and norms rather than hit and trial method.
  • Improving employee motivation: An effective control system seeks to provide motivation to the employees as they are made aware well in advance what they are expected to do and what are the standards of performance on the basis of which they will be appraised. This approach helps them to give better performance.
  • Ensuring order and discipline: A constant check on the behaviour and work of the employees leads to creation of an atmosphere of order and discipline in the organisation.

Question 5. How does controlling help in “Judging accuracy of standards” and “Ensuring order and discipline” ? (CBSE, Sample Paper, 2017) Answer: Controlling helps in “Judging accuracy of standards” and “Ensuring order and discipline” as explained below:

  • Judging accuracy of standards: An efficient control system enables management to determine weather the standards set are accurate and objective. This is because it helps to helps to review and revise the standards in light the changes taking place in the organisation and in the environment.
  • Ensuring order and discipline: Controlling helps to minimise dishonest behaviour on the part of the employees by keeping a close check on their activities. Thus, it creates an atmosphere of order and discipline in the organisation.

Question 6. ‘If anything goes wrong with the performance of key activities, the entire organisation suffers. Therefore, the organisation should focus on them.’ Explain the statement with a suitable example. (CBSE, Sample Paper 2015-16) Answer: The given statement refers to the importance of ‘Critical Point Control’ in order to ensure effective performance of key activities in an organisation. Critical Point Control: It may not be either economical nor easy to monitor each and every activity in the organisation. Therefore, every organisation identifies and states its specific Key Result Areas (KRAs) or critical points which require tight control and are likely to have a significant effect on the working of the business. Any deviations on these points are attended to urgently by the management. For example, if in an organisation, the expenditure on stationery goes up by 10%, it can be ignored but if the production cost goes up by 5%, it may call for managerial action.

Question 7. Mr. Nath, a recently appointed production manager of Suntech Ltd., has decided to produce jute bags instead of plastic bags as these are banned by the government. He set a target of producing 1000 jute bags a day. It was reported that the employees were not able to achieve the target. After analysis, he found that employees were demotivated and not putting in their best for achieving the target. Mr. Nath’s behaviour is good towards the employees. His attitude is always positive. So, he announced various incentive schemes for the employees like:

  • Installing awards or certificates for best performance
  • Rewarding an employee for giving valuable suggestions
  • Identify the functions of management highlighted in the above paragraph.
  • State the incentive under which the employee are motivated.
  • State any two values which the production manager wants to communicate to society by his work and behaviour. (CBSE, Sample Paper 2015)
  • The functions of management highlighted in the above paragraph are Controlling and Directing.
  • The employees are motivated under Employee recognition programmes which is a non-financial incentive. Employee recognition programmes helps to fulfill the need of due consideration and appreciation of the people working in an organisation. It boosts their self-esteem and motivates them to work with greater zeal and enthusiasm.
  • Respect for employees
  • Concern for environment

Question 8. A company was manufacturing LED bulbs which were in great demand. It was found that the target of producing 300 bulbs a day was not met by the employees. On analysis, it was found that the workers were not at fault. Due to electricity failure and shortage of workers, the company was not able to achieve the set targets and alternative arrangements were needed. To meet the increased demand, the company assessed that approximately 88 additional workers were required out of which 8 would work as heads of different departments and 10 would work as subordinates under each head. The required qualifications and job specifications were also enlisted. It was also decided that necessary relaxations should be given to encourage women, people from backward and rural areas and people with special abilities to assume responsible positions in the organisations. All efforts were made to match the ability of the applicants with the nature of work.

  • Identify the functions of management discussed above.
  • State the two steps in the process of each function discussed in the above paragraph.
  • List any two values which the company wants to communicate to the society. (CBSE, Delhi 2015)
  • The functions of management discussed above are Staffing and Controlling.
  • Estimating manpower requirements: The manpower requirements of an organisation are estimated through workload analysis and workforce analysis. The workload analysis helps to determine the number and type of human resource required in the organisation to meet its present and future needs. Whereas workforce analysis seeks to determine the number and type of human resource available within the organisation.
  • Recruitment: The process of recruitment involves searching for the prospective candidates and stimulating them to apply for jobs in the organisation. There are two sources of recruitment namely, internal and external. The two steps involved in controlling function are as follows:
  • Comparing actual performance with standards to identify deviations if any. “It was found that the target of producing 300 bulbs a day was not met by the employees.”
  • Analysing deviations through critical point control and management by exception approach to identify the causes for their occurrence. “On analysis, it was found that the workers were not at fault. Due to electricity failure and shortage of workers, the company was not able to achieve the set targets and alternative arrangements were needed.”
  • Taking corrective action, if required “To meet the increased demand, the company assessed that approximately … as subordinate under each head.”
  • Women empowerment

Question 9. ‘AS Ltd.’ is a large company engaged in assembling of air-conditioners. Recently the company had conducted the ‘Time’ and ‘Motion’ study and concluded that on an average, a worker can assemble ten air-conditioners in a day. The target volume of the company in a day is assembling of 1,000 units of air-conditioners. The company is providing attractive allowances to reduce labour turnover and absenteeism. All the workers are happy. Even then the assembling of air-conditioners per day is 800 units only. To find out the reason, the company compared actual performance of each worker and observed through CCTV that some of the workers were busy in gossiping.

  • Identify the function of management discussed above.
  • State the steps in the process of the function identified which are discussed in the above paragraph. (CBSE, 2015)
  • The function of management discussed above is Controlling.
  • Setting standards of performance: “concluded that on an average, a worker can assemble ten air-conditioners in a day.” “The target volume of the company in a day is assembling of 1,000 units of air-conditioners.”
  • Measurement of actual performance: ” Even then the assembling of air-conditioners per day is 800 units only.”
  • Comparison of actual performance with the standards: The company compared actual performance of the workers with the planned performance and noted deviation of 200 units.
  • Analysing deviations: “To find out the reason, the company compared the actual performance of each worker and observed through CCTV that some of the workers were busy in gossiping.”

Question 10. PQR Ltd. is engaged in manufacturing machine components. The target production is 200 units per day. The company had been successfully attaining this target until two months ago. Over the last two months, it has been observed that daily production varies between 150-170 units.

  • Identify the management function to rectify the above situation.
  • Briefly state the procedure to be followed so that the actual production may come up to the target production. (CBSE, Delhi 2010)
  • The controlling function of management is needed to rectify the above situation.
  • Providing training to workers if the workers are not well versed with the production process.
  • Improving the work environment if it is not conducive to efficient working.
  • Ensuring timely availability of the raw materials and other equipments if they are not made available on time.
  • Replacing the machinery if it is defective or has become obsolete.

Question 11. Rajeev and Sanjeev are managers in the same organisation heading different units. While discussing about the functions of management, Rajeev says that ‘Planning is looking ahead whereas controlling is looking back.’ But Sanjeev says, ‘You are wrong because planning is looking back whereas controlling is looking ahead.’ Both are giving reasons in favour of their statements. Explain the possible reasons given by both and justify who is correct. (CBSE, 2009) Answer: Both Rajeev and Sanjeev are correct in their statements as explained below:

  • Planning is considered as a forward looking function by Rajeev as plans are made for future.
  • Planning may be considered as a backward looking function by Sanjeev because the quality of planning can be improved with the help of valuable information provided by controlling in terms of results achieved.
  • Controlling is considered as a backward looking function by Rajeev as it is like the post mortem of the past activities to ascertain the deviations if any.
  • Controlling is considered as a forward looking function by Sanjeev as it helps to improve the future performance by providing guidance for taking corrective action so that deviations do not reoccur in future.

Question 12. Kapil & Co. is a large manufacturing unit. Recently the company had conducted time and motion studies and concluded that on an average, a worker could produce 300 units per day. However, it has been noticed that the average daily production per worker is in the range of 200-225 units.

  • Name the function of management and identify the steps in the process of this function which helped in finding out that the actual production of a worker is less than the set target.
  • To complete the process of the function identified in (1) and to ensure the performance as per time and motion studies, explain what further steps a manager has to take? (CBSE, 2010)
  • Setting performance standards in clear, specific and measurable terms. “Recently the company had conducted time and motion studies and concluded that on an average, a worker could produce 300 units per day.”
  • Measurement of actual performance as far as possible in the same units in which standards are set. “It has been noticed that the average daily production per worker is in the range of 200-225 units”.
  • Comparing actual performance with standards to identify deviations if any. In the given case there is a deviation in output in the range of 25-50 units per worker.
  • The workers are not well versed with the production process.
  • The working environment is not conducive to efficient working.
  • The raw materials and other equipment are not available on time.
  • Taking corrective action: The deviations require immediate management attention so that they do not reoccur in future. Therefore, the manager should take appropriate corrective action after analyzing the situation like providing training to workers, improving the work environment, and ensuring timely availability of the raw materials and other equipment.

Question 13. K&K Co. Ltd. is engaged in manufacturing of machine components. The target of production is 200 units daily. The company had been successfully attaining this target until two months ago. Over the last two months it has been observed that daily production varies between 150-170 units. Identify the possible causes for the decline in production and the steps to be taken to achieve the desire targets. (CBSE, 2008) Answer: The possible causes for decline in production are listed below:

  • The machinery is defective or has become obsolete.

The deviations require immediate management attention so that- they do not reoccur in future. Therefore, the manager should take appropriate corrective action after analyzing the situation like providing training to workers, improving the work environment, ensuring timely availability of the raw materials and other equipment or replacing the machinery.

ADDITIONAL QUESTIONS

Question 1. ‘Taste Buds Ltd.’ is a company known for manufacturing good quality confectionery products. The automated system of production ensures uniformity in production and quality maintenance. The quality assurance team conducts stringent checks at all stages, records and analyses the deviations and takes the necessary corrective actions right from the procurement of raw material to its processing, production and packaging. The company has a well-equipped in¬house quality inspection cell where confectionery products are tested on various parameters of quality by the team of experienced quality staff. In context of the above case:

  • Identify and explain the function of management being performed by the quality assurance team of ‘Taste Buds Ltd.’
  • Explain the statement, “records and analyses the deviations and takes the necessary corrective actions”.
  • Controlling is the function of management being performed by the quality assurance team of ‘Taste Buds Ltd.’ Controlling is the process of ensuring that events conform to plans.
  • Comparing the actual performance with the standards: The actual performance is compared with the standards and deviations, if any, are recorded.
  • Critical point control: All the deviations may not be significant. Moreover, it may not be either economical nor easy to monitor each and every activity in the organisation. Therefore, every organisation identifies and states its specific key result areas (KRAs) or critical points which require tight control as they are likely to have a significant effect on the working of the business. Any deviations on these points are attended to urgently by the management. Like in the above case, if the expenditure on refreshment of workers goes up by 10% it can be ignored but if the production cost goes up by 5% it may call for managerial action.
  • Management by exception: Management by exception is the principle of manage¬ment control which is based on the belief that if you try to control everything, you may end up controlling nothing. Therefore, only significant deviations which go beyond the permissible limits should be brought to the notice of the management. Like in the above case, the output defects upto 2% may be considered acceptable but if it goes up by 5%, it may cal for managerial action.
  • Taking corrective action: The last step in controlling process involves taking corrective action whenever the deviation occurs beyond the permissible limits so that they do not reoccur in future. However, the standards may be revised if it is not possible to check deviations through corrective action.

Question 2. Anubhav has set up an export house after completing his masters in fashion designing. As the quality of the garment depends on the quality of raw material used, he assures that the fabric meets the requirements by conducting a series of tests for the fabrics like shrinkage test, testing colour fastness to washing, colour fastness to light, colour fastness to perspiration etc through laboratory tests. Later on, at the production areas, fabric inspection is also conducted by stopping the production process. The tests help to detect the deviations and also take corrective action. Moreover, he ensures that complete training about production work was given to every worker at the time of joining his export house. In context of the above case:

  • Identify the function of management being performed by Anubhav by conducting tests to assure for the quality of the garments manufactured in his export house.
  • Briefly explain the term ‘deviations.’
  • Give any three advantages of giving training to the employees.
  • Controlling is the function of management being performed by Anubhav by conducting tests to assure for the quality of the garments manufactured in his export house.
  • The term ‘deviations’ refers to the difference between the actual performance and planned performance. If the actual performance is more than the planned performance, it may be said to be positive in nature or vice-versa.
  • Training imparts systematic learning to the employees thereby helping to avoid wastage of efforts and money and is considered better than the hit and trial method.
  • It increases the employees’ productivity both in terms of quantity and quality, leading to higher profits.
  • Training increases the morale of the employees and reduces absenteeism and employee turnover.

Question 3. Raghav started a take away eating joint in a nearby market. His business was doing well. He ensured that the food was properly cooked, a standard taste was maintained, packing of food was done effectively and the orders were executed on time. But unfortunately he met with an accident and was advised three months bed rest. In his absence, his cousin Rohit took charge of his business. When he resumed his work after three months, he realised that his clientele had dropped. The people were not happy with the services as the quality of food had deteriorated and the delivery time for orders had increased considerably. All this was happening because most of his previous staff had left as Rohit used to adopt a very strict and authoritative approach towards them. In context of the above case:

  • List any two aspects about his business that Raghav was controlling in order to make it successful.
  • Explain briefly any two points to highlight the importance of the controlling function.
  • Name and explain the style of leadership adopted by Rohit.
  • A standard taste was maintained.
  • The orders were executed on time.
  • Judging accuracy of standards: The controlling function helps the business managers to judge the objectivity and accuracy of the current standards. It also assists in reviewing and revising the standards keeping in view the forthcoming changes in both the internal and external environment of the business.
  • Improving employee motivation: The controlling function seeks to motivate the employees and helps them to give better performance. This is because it makes them aware well in advance about what they are expected to do and what the standards of performance are on the basis of which they will be judged.
  • Rohit had adopted an autocratic style of leadership. An autocratic leader expects strict compliance from his subordinates with regard to the orders and instructions given by him. Therefore, it involves only one-way communication.

Question 4. ‘Saurashtra’ is a company involved in the export of indigenous food products like chutneys and pickles. It has tied up with the small farmers in various states for sourcing of fruits and vegetables. In this way it helps the small farmers to sell their produce at reasonable rates. The company follows a practice where only significant deviations from a budget or plan are brought to the attention of management. The degree of deviations allowed in different categories in the budget are well defined in advance, along with the appropriate levels of management who will respond to the deviations in question. For example, a deviation of Rs. 20,000 or more in purchase costs will be reported to the concerned department manager. In context of the above case:

  • Identify the principle of management control adopted by the company. State the belief underlying this principle.
  • List any two values that the company wants to communicate to the society.
  • Management by exception is the principle of management control adopted by the company. It is based on the belief that ‘if you try to control everything, you may end up controlling nothing’.
  • Rural development
  • Sense of responsibility

Question 5. Shruti has established a small scale factory after completing a course in textile designing. She has tied up with the big home furnishing retail outlets in the city for supplying to them good quality designer home furnishing products like bed covers, cushions etc. She believes that controlling without planning is blind. So, every time she gets an order, she sets the standards in terms of the number of personnel required, the estimated requirements in man-hours per product, the requirements of direct materials for the projected production and the amount of normal overhead expenses required at the projected work-load. She also keeps a close watch on the activities so as to ensure that they conform to plans. Whenever the order size is too large, she hires extra workers by placing a notice on the notice-board of the factory specifying the details of the jobs available. In context of the above case:

  • Identify the functions of management being performed by Shruti.
  • Do you think Shruti is right in her thinking that, “controlling without planning is blind.” Explain by bringing out the relationship between planning and controlling.
  • Name the source of recruitment adopted by Shruti. Also, mention its type.
  • The functions of management being performed by Shruti are Controlling and Staffing.
  • Yes, Shruti is right in thinking that, ” controlling without planning is blind.” Planning provides the basis of controlling by setting the standards in advance. In the absence of these standards, managers will not know what all activities have to be controlled. Planning is prescriptive in nature whereas controlling is evaluative. Thus, planning and controlling are interrelated and interdependent as planning is based on facts and makes controlling easier and effective whereas controlling helps to improve future planning by providing valuable information derived from the past experiences.
  • The source of recruitment adopted by Shruti is Direct Recruitment. It is an external source of recruitment.

Question 6. Vishesh works as an interior designer. He gets a contract to redesign a play school. He employs three painters on the site assuming that an average painter will be able to paint 10 desks in a day. At the end of the first day of their work, Vishesh finds that the painter A, painter B and painter C have painted 12, 14 and 15 desks respectively. On comparing the actual performance with the planned performance, he realises that the standard set by him is too low. Consequently, he decides to review and revise the standard and raise it. In context of the above case:

  • Identify the function of management being performed by Vishesh.
  • “Planning and controlling are both backward looking as well as forward looking functions.” Explain the statement with reference to the above paragraph.
  • Controlling is the function of management being performed by Vishesh.
  • It is appropriate to say that, “Planning and controlling are both backward looking as well as forward looking functions” as evident from the above case. Planning is considered as a forward looking function as plans are made for future. “assuming that an average painter will be able to paint 10 desks in a day.” Planning may be considered as a backward looking function because the quality of planning can be improved with the help of valuable information provided by controlling in terms of results achieved. “On comparing the actual performance with the planned performance, he realises that the standard set by him is too low.” Controlling is considered as a backward looking function as it is like the post mortem of the past activities to ascertain the deviations if any. “At the end of the first day of their work, Vishesh finds that the painter A, painter B and painter C have painted 12, 14 and 15 desks respectively.” Controlling is considered as a forward looking function as it helps to improve the future performance by providing guidance for taking corrective action so that deviations do not reoccur in future. “Consequently, he decides to review and revise the standard and raise it.

Question 7. A critical point control (CPC) approach is followed by McDonald in the cooking and handling process so that any food safety threat can be prevented, eliminated, or reduced to an acceptable level. Hence, continuous monitoring of activities are undertaken to ensure that the process is right at each critical point control. The main principle followed for cooking at McDonald is “less amount many time” which can ensure the high quality and high fresh level of the food. For instance, if four hamburgers have to be made, a worker cannot cook all the four hamburgers at one time. The time figured out for making one hamburger is one hundred and forty-five seconds. Moreover, nearly all foods in the McDonald have the specific holding time, the holding time for hamburgers is ten minutes and for french fries is seven minutes. If it is not sold within that time it is thrown away. Also, the temperature of the milk sent by the supplier must be under 4° C, otherwise, it will be returned. In context of the above case:

  • Name the steps involved in the controlling process which is being discussed in the above lines.
  • What do you understand by ‘critical point control’? Explain.
  • How does the controlling function of management help in accomplishing organisational .goals and ensure efficient use of resources?
  • Analysing deviation and taking corrective action are being discussed in the above lines.
  • Since it may neither be economical nor easy to monitor each and every activity in the organisation, therefore, every organisation identifies ar\d states its specific key result areas (KRAs) or critical points which require tight control are likely to have a significant effect on the working of the business. Any deviations on these points are attended to urgently by the management.
  • Accomplishing organisational goals: The controlling function helps in accomplishing organisational goals by constantly monitoring the performance of the employees and bringing to light the deviations, if any, and taking appropriate corrective action.
  • Making efficient use of resources: The controlling function enables the managers to work as per predetermined standards. This helps to avoid any ambiguity in business operations and reduce wastage and spoilage of resources in the organisation.

Case Studies in Business Studies Business Studies Case Studies Business Studies Commerce

cbsencertsolutions

CBSE NCERT Solutions

NCERT and CBSE Solutions for free

Case Study Chapter 8 Controlling

Please refer to Chapter 8 Controlling Case Study Questions with answers provided below. We have provided Case Study Questions for Class 12 Business Studies for all chapters as per CBSE, NCERT and KVS examination guidelines. These case based questions are expected to come in your exams this year. Please practise these case study based Class 12 Business Studies Questions and answers to get more marks in examinations.

Case Study Questions Chapter 8 Controlling

Read the source given below and answer the following questions : All deviations need not be brought to the notice of top management. A range of deviations should be established and only cases beyond this range should be brought to the knowledge of top level management. They must divide the deviations in two categories—deviations which need to be attended urgently in one category and minor or insignificant decisions in other category.

Questions :

Question. Which concept ignores deviations with in the set range ? (a) Critical point control (b) Management by exception (c) Both (a) and (b) (d) None of the above

Question. Deviations can be : (a) Positive (b) Negative (c) Both positive as well as negative (d) None of the above

Question. Positive deviations indicate we need : (a) Strategic control (b) Operational control (c) No control (d) Both (a) and (b)

Question. Which concept focuses more on the deviations taking place in key area? (a) Critical point control (b) Management by exception (c) Both (a) and (b) (d) None of the above

Read the source given below and answer the following questions : ‘A.S. Ltd.’ is a large company engaged in assembly of air conditioners. Recently the company had conducted the ‘Time’ and ‘Motion’ study and concluded that on an average a worker can assemble ten air conditioners in a day. The target volume of the company in a day is assembling of 1,000 units of air conditioners. The company is providing attractive allowances to reduce labour turnover and absenteeism. All the workers are happy. Even then the assembly of air conditioners per day is 800 units only. To find out the reason the company compared actual performance of each worker and observed through C.C.T.V. that some of the workers were busy in gossiping.

Question. “Even the assembly of air conditioner per day is 80 units only.” This indicates which step of process of the above identified function? (a) Setting up of standard (b) Measuring the performance (c) Comparison between plan and actual performance (d) Analysing deviation

Question. The comparison between plan and actual performance helps to find : (a) reason of mismatch (b) accuracy of standards (c) deviation (d) none of the above

Question. The methods used by manager to analyse deviations are : (a) Critical point control (b) Management by Exception (c) Both (a) and (b) (d) None of the above

Question. Identify the function of management discussed above. (a) Planning (b) Organising (c) Staffing (d) Controlling

Read the source given below and answer the following questions : A manager who tries to control every thing may end up controlling nothing. The deviations which are beyond the specific range should only be handled by managers and minute or minor deviations can be ignored. Manager should not waste his time and energy in finding solutions for minor deviations rather he should concentrate on removing deviations of high degree.

Question. Under MBE Technique, the deviations are reported to superior immediately if : (a) these are within the range. (b) these are beyond the range. (c) both (a) and (b) (d) none of the above

Question. The above para is indicating which concept ? (a) Management by exception (b) Critical Point Control (c) Both (a) and (b) (d) None of the above

Question. Apart from MBE other Technique, for analysing deviation is : (a) Key area technique (b) Focus technique (c) Critical point control (d) None of the above

Question. With the help of M.B.E.: (a) Manager can save his energy (b) Energy waste this energy (c) No effect on energy (d) None of the above

Case Study Chapter 8 Controlling

Related Posts

Solutions class 12 chemistry important questions.

Variations in Psychological Attributes Class 12 Psychology Important Questions

Variations in Psychological Attributes Class 12 Psychology Important Questions

Psychology and Life Class 12 Psychology Important Questions

Psychology and Life Class 12 Psychology Important Questions

Case Studies - Controlling | Business Studies (BST) Class 12 - Commerce PDF Download

1 Crore+ students have signed up on EduRev. Have you?

Q. 1. Babita Ltd. is engaged in manufacturing machine components.  The target production is 250 units per day per worker.  The company had been successfully attaining this target until two months ago.  Over the last two months it has been observed that daily production varies between 200-210 units per worker.

  • Name the function of management and identify the step in the process of this function which helped in finding out that the actual production of a worker is less than the set target.
  • To complete the process of the function identified in (a) and to ensure the performance as per set targets, explain what further steps a manager has to take.                                                                                                 (5 marks)
  • The management function is Controlling.

“Comparing actual performance with standards” is the step involved in the process of controlling which helped in finding out that the actual production of a worker is less than the set target.

  • A manager has to take the following two further steps to complete the process of controlling:
  • Analysing deviations
  • Taking corrective action

Q. 2. Rajeev and Sanjeev are managers in the same organization having different units.  While discussing about the function of management, Rajeev says “Planning is looking ahead whereas controlling is looking back.” But Sanjeev says, “Planning is looking back whereas controlling is looked ahead.” Both are giving reasons in favour of their statements.

Explain the possible reasons given by both and justify who is correct.            (6 marks)

Ans.  Rajeev who says, “Planning is looking ahead whereas controlling is looking back” must be giving the following reason:

Sanjeev who says, “Planning is looking back where as controlling is looking ahead” must be giving the following reasons.

Conclusion: Planning and controlling are both backward looking and forward looking functions.  Hence, both of them are partially correct.

Q. 3.  ‘ Saurashtra’ is a company involved in the export of indigenous food products like chutneys and pickles.. it has tied up with the small farmers in various states for sourcing of fruits and vegetables.  In this way it helps the small farmers to sell their produce at reasonable rates. The company follows a practice where only significant deviations from a budget or plan are brought to the attention of management.  The degree of deviations allowed in different categories in the budget are well defined in advance, along with the appropriate levels of managements who will respond to the deviations in question.  For example, a deviation of Rs. 20,000 or more in purchase costs will be reported to the concerned department manager.

In context of the above case:

  • Identify the principle of management control adopted by the company.  State the belief underlying this principle.
  • List any two values that the company wants to communicate to the society.
  • Management by exception is the principle of management control adopted by the company.  It is based on the belief that ‘if you try to control everything, you may end up controlling nothing.’
  • The two values that the company wants to communicate to the society are:
  • Rural development:
  • Sense of responsibility:

Q. 4. Anubhav has set up an export house after completing his masters in fashion designing.  As the quality of the garment depends on the quality of raw materials used, he assures that the fabric meets the requirements by conducting a series of tests for the fabrics like shrinkage test, testing colour fastness to washing, colour fastness to light, colour fastness to perspiration etc. through laboratory tests.  Later on, at the production areas, fabric inspection is also conducted by stopping the production process.  The tests help to detect the deviations and also take corrective action.  Moreover,  the ensures that complete training about production work was given to every worker at the time of joining his export house.

  • Identify the function of management being performed by Anubhav by conducting tests to assure for the quality of the garments manufactured in his export house.
  • Briefly explain the term ‘deviations.’
  • Give any three advantages of giving training to the employees.
  • Controlling is the function of management being performed by Anubhav by conducting tests to assure for the quality of the garments manufactured in his export house.
  • The term ‘deviations’ refers to the difference between the actual performance and planning performance. If the actual performance is more than the planned performance, it may be said to be positive in nature or vice-versa.
  • Training imparts systematic learning to the employees thereby helping to avoid wastage of efforts and money and is considered better than the hit and trial method.
  • It increases the employees’ productivity both in terms of quantity and quality, leading to higher profits.
  • Training increases the morale of the employees and reduces absenteeism and employee turnover.

Q. 5. Atul and Ajay are good friends. They decide to set up a digital printing press together as both of them are compute wizards. They plan to offer various types of printed products including labels, manuals, marketing material, memo pads, business order forms, T-shirts, mugs etc.  They set standards for every aspect of their work in order to create an efficient working environment.  As per the standards, an average person types between 38 and 40 words per minute.  Keeping this in mind, they engage two typists Bitto and Raju and assign them work accordingly. Within two days, they realize the output in terms of typing work done by Raju is too less as compared to the desired output.  On inspecting, Atul finds out that Raju’s typing speed is between 18 and 20 words per minute only.  But Raju exhibits great skills in designing work and is a good human being.  Hence, Atul and Ajay decide to retain him for doing creative work and appoint a new typist.

  • Identify and explain the function of management being discussed here.
  • List the steps involved in the function of management as identified in part (a).

Also, quote the liens from the paragraph relating to each step.

  • Controlling is the function of management being discussed here.
  • The steps involved in the process of controlling which are discussed in the above paragraph are:
  • Setting standards of performance:
  • Measurement of actual performance:
  • Comparison of actual performance with the standards:
  • Analyzing deviations:
  • Taking corrective action:

Q. 6. D & D Ltd. is a large manufacturing unit.  Recently, the company has conducted the ‘time’ and ‘motion’ studies and concluded that on an average a worker could produce 120 units per day.  However, it has been noticed that average daily production of a worker is in the range of 80-90 units.

Which function of management is needed to ensure that the actual performance is in accordance with the performance as per ‘time’ and ‘motion ‘studies?  State four features of this function of management.                                                 (5 marks)

Ans.  Controlling

Features of controlling:

  • Controlling is a goal-oriented function
  • Controlling is a pervasive function
  • Controlling is a continuous process
  • Controlling is both a backward looking as well as forward looking function.

Q. 7. ‘A.S. Ltd.’ is a large company engaged in assembly of air-conditioners.  Recently the company had conducted the ‘Time’ and ‘Motion’ study and concluded that on an average a worker can assemble ten air-conditioners in a day.  The target volume of the company in a day is assembling of 1,000 units of air-conditioners.  The company is providing attractive allowances to reduce labour turnover and absenteeism.  All the workers are happy.  Even then the assembly of air-conditioners per day is i800 units only.  To find out the reason the company compared actual performance of each worker and observed through C.C.T.V. that some of the workers were busy is gossiping.

  • Identify the function of management discussed above.
  • State those steps in the process of the function identified which are discussed in the above praragraph.                                                                  (3 marks)
  • Controlling
  • Steps discussed in the above paragraph are:
  • Setting performance standards
  • Measurement of actual performance
  • Comparing actual performance with the standards
  • Analyzing deviations for their causes.

Q. 8. A company ‘M’ Ltd. is manufacturing mobile phones both for domestic Indian market as well as for export.  It has enjoyed a substantial market share and also had a loyal customer following. But latterly it has been experiencing problems because its targets have not been met with regard to sales and customer satisfaction.  Also mobile market in India has grown tremendously and new player have come with better technology and pricing.  This is causing problems for the company.  It is planning to revamp its controlling system and take other steps necessary to rectify the problems it is facing.

  • Identify the benefits the company will derive from a good control system.
  • How can the company relate its planning with control in this line of business to ensure that its plan are actually implemented and targets attained?
  • Give the steps in the control process that the company should follow to remove the problems it is facing.
  • Explain the importance of controlling.
  • Company can relate its planning with control in this line of business by following measure by implementing an effective controlling system and following a controlling process.
  • Explain steps in the process of controlling system.

Q. 9. Alpha Ltd. was manufacturing Auto spare parts.  To improve the efficiency of employees the company provided training to their employees by inviting an expert who demonstrated the whole process of manufacturing.  The expert quoted that all deviations cannot be controlled, so manager must know which deviation in key areas must be attended urgently as compared to deviation in non-key area.  He also suggested that human beings are bound to brake mistakes as manager should not take strict action on every minute mistake of workers, rather he can fix a range of deviation and take action if deviation is above the specified large.

  • Identify the functions of management referred above.
  • Name the two ways of analyzing deviation mentioned above.
  • Name the method of training used by the company.
  • Identify the value being emphasized in above para.
  • Staffing and controlling
  • (i) critical Point Control

(ii) Management by exception

  • Apprenticeship method of training
  • Value of Humanity.

Q. 10. A critical point control (CPC) approach is followed by McDonald in the cooking and handling process so that any food safety threat can be prevented, eliminated, or reduced to an acceptable level.  Hence, continuous monitoring of activities are undertaken to ensure that the process is right at each critical point control.  The main principle followed for cooking at McDonald is “less amount many time” which can ensure the high quality and high fresh level of the food.  For instance, if your hamburgers have to be made, a worker cannot cook all the four hamburgers at one time.  The time figured out for making one hamburger is one hundred and forty-five seconds.  Moreover, nearly all foods in the McDonald have the specific holding time, the holding time for hamburgers is ten minutes and for French fries is seven minutes.  If it is not sold within that time it is thrown away.  Also, the temperature of the milk sent by t he supplier must b e under 4 0 c, otherwise, it will be returned.

  • Name the steps involved in the controlling process which is being discussed in the above lines.
  • What do you understand by ‘critical point control’?  Explain.
  • How does the controlling function of management help in accomplishing organizational goals and ensure efficient use of resources?
  • Analyzing deviation and taking corrective action are being discussed in the above lines.
  • Since it may neither be economical nor easy to monitor each and every activity in the organization, there for every organization identifies and states its specific key result area (KRAs) or critical points which require tight control are likely to have a significant effect on the working of the business.  Any deviations on these points are attended to urgently by the management.
  • The two points that highlight the importance of the controlling function are listed below:
  • Accomplishing organizational goals:
  • Making efficiently use of resources:

Q. 11. Raghav started a take away eating joint in a nearby market.  His business was doing well.  He ensured that the food was properly cooked, a standard taste was maintained, packing of food was done effectively and the orders were executed on time.  But unfortunately he met with an accident and was advised three months bed rest.  In his absence, his cousin Rohit took charge of his business.  When he resumed his work after three months, he realized that his clientele had dropped.  The people were not happy with the services as the quality of food had deteriorated and the delivery time for orders had increased considerably.  All this was happening because most of his previous staff had left as Rohit used to adopt a very strict and authoritative approach towards them.

  • List any two aspects about his business that Raghav was controlling in order to make it successful.
  • Explain briefly any two points to highlight the importance of the controlling function.
  • Name and explain the style of leadership adopted by Rohit.
  • The two aspects about his business that Raghav was controlling in order to make it successful are listed below:
  • A standard taste was maintained.
  • The orders were executed on time.
  • Judging accuracy of standards:
  • Improving employee motivation:
  • Rohit had adopted an autocratic style of leadership.  An autocratic leader expects strict compliance form his subordinates with regard to the orders and instructions given by him.  Therefore, it involves only one-way communication.

Q. 12. Mr. Nath, a recently appointed production manager of Suntech Ltd. has decided to produce jute bags instead of plastic bags as these are banned by the government. He set a target of producing 1000 jute bags a day.  It was reported that the employees were not able to achieve the target.  Mr. Nath’s behavior is good towards the employees.  His attitude is always positive.  So he announced various incentive schemes for the employees like.

- installing award or certificate for best performance.

- Rewarding an employee for giving valuable suggestions.

- Congratulating the employees for good performance.

(a) Identify the functions of management highlighted in the above paragraph.

(b) State the ‘incentive’ under which the employees are motivated.

(c) State any two values which the production manager wants to communicate to the society by his work and behavior.                                                                       (5 marks)

  • Controlling and Directing
  • Employee recognition programme (non-monetary incentive)
  • Sensitivity to environment
  • Good behavior towards employees
  • Team work with employees

Q. 13. Joseph Bros. was a firm manufacturing jute lamp shades.  It uses left over jute pieces from various jute factories to manufacture economical lamp shades which are supplied to various hotels in nearby towns:  it employs men and women from nearby  villages as workers for creating good lamp shade designs.

Joseph Bros., is not able to meet its targets. Namish, the supervisor of the company, was told to analyze the reasons for the poor performance.  Namish found following problems and suggested certain solution s in the working of the business. M the number of workers employed was les than what was required for the work.  As a result, the existing workers were overburdened.  The firm decided to search for new workers and it asked the present employees to introduce candidates or recommend their friends and relatives to the firm.  This enabled the firm in “putting people to jobs” and assured attainment of objectives according to plans.

  • Identify the functions of management being performed by the firm in the above situation.
  • Name the concept and its source used by the firm to attract more workers for the firm.
  • State any two values being followed by Jacob Bros.                         (5 marks)
  • Staffing and Controlling
  • Recruitment, External Source of Recruitment (Recommendations of employees)
  • Values being followed by Joseph Bros.:
  • Creating employment opportunities.
  • Utilizing resources efficiently by using leftover clothes.

Q. 14. A company was manufacturing ‘LED bulbs’ which were in great demand.  It was found that the target of producing 300 bulbs a day was not met by the employees.  On analysis, it was found that the workers were not at fault. Due to electricity failure and shortage of workers, the company was not able to achieve the set targets and alternative arrangements were needed.

To meet the increased demand, the company assessed that approximately 88 additional workers were required out of which 8 would work as heads of different departments and 10 would work as subordinates under each head.  The required qualifications and job specifications were also enlisted.  It was also decided that necessary relaxation should be given to encourage women, persons from backward and rural areas and persons with special abilities to assume responsible positions in the organization.  All efforts were made to match the ability of the applicants with the nature of work.

  • Identify the functions of management discussed above.
  • State the two steps in the process of each function discussed in the above para.
  • List any two values which the company wants to communicate to the society
  • Step in Staffing
  • Estimating manpower requirements:
  • Recruitment:
  • Steps in controlling:
  • Values which the company wants to communicate to the society:
  • Using environment friendly methods of production.
  • Women empowerment.
  • Upliftment of underprivileged sections of the society.

Q. 15. Airtech Ltd. is manufacturing mobile phones both for domestic Indian market as well as for export.  It has enjoyed a substantial market share and also had a loyal customer following.  But lately it has been experiencing problems because its targets have not been met with regard to sales and customer satisfaction. Also, mobile market in India has grown tremendously and new players have come with better technology and pricing.  This is causing problems for the company.  It is planning to revamp its controlling system and take other steps necessary to rectify the problems ikt is facing.  It also decides to offer its basis models of mobile phones at 50% discount to the poor people.

  • State any two benefits the company with derive from a good control system.
  • How can the company relate its planning with control in this line of business to ensure that its plan are actually implemented and targets attained.
  • Give the steps that the company should follow to remove the problems it is facing.
  • Identify any one value which the company wants to communicate to the society.                                                                                        (6 marks)
  • Two benefits which the company will derive from a good control system are:
  • Accomplishing organizational goals of increasing market share and customer satisfaction.
  • Making efficient use of resources by controlling wastage and spoilage of resources; and ensuring that each activity is performed according to the predetermined standards.
  • Controlling will improve future planning by providing information to the company derived from past experience that its targets were not met with regard to sales and customer satisfaction.
  • The company should undertake technological up gradation of machinery, and modify the existing process  so that cost is reduced and the company can set lower price for its mobile phones to bet its competitors.
  • Concern about poor people
  • Social responsibility

Q. 16. You are the manager of Bharti Chemicals Ltd. it is reported to you that postal expenses have increased by 10% over standard rates and cost of raw materials has increased by 2%.  Which of the two deviations will be more critical to you?             (1 mark)

Ans.  Increase in cost of raw materials by 2% is more critical.  (Critical Point Control)

Q. 17. Surbhi Ltd. produces safety pins on a mass scale.  The company’s policy is that at most 25 of the daily production could be defective.  Over a three months period, it has been observed that 8% - 10% of the production is defective.  The cause of deviation found is defective machinery.  What corrective action should be taken by the management? (1 mark)

Ans.  Repair the existing machine or replace the machine if it cannot be repaired.

Q. 18. K & K Co. Ltd. is engaged in manufacturing machine components.  The target production is 200 units daily.  The company had been successfully attaining this target until two months ago.  Over the last few months it has been observed that daily production varies between 150-170 units.

Identify the possible causes for the decline in production and the steps to be taken to achieve the desired targets .                                                                                                 (5 marks)

Case Studies - Controlling | Business Studies (BST) Class 12 - Commerce

|205 docs|49 tests

Top Courses for Commerce

FAQs on Case Studies - Controlling - Business Studies (BST) Class 12 - Commerce

1. What is commerce control?
2. How does commerce control impact businesses?
3. What are some common measures used in commerce control?
4. Why is commerce control important?
5. How can businesses ensure compliance with commerce control regulations?
Views
Rating
Last updated

shortcuts and tricks

Semester notes, mock tests for examination, objective type questions, practice quizzes, important questions, past year papers, case studies - controlling | business studies (bst) class 12 - commerce, extra questions, study material, sample paper, previous year questions with solutions, viva questions, video lectures.

case study controlling class 12

Case Studies - Controlling Free PDF Download

Importance of case studies - controlling, case studies - controlling notes, case studies - controlling commerce questions, study case studies - controlling on the app.

cation olution
Join the 10M+ students on EduRev

Welcome Back

Create your account for free.

case study controlling class 12

Forgot Password

Unattempted tests, change country, practice & revise.

Studyresearch

End to your search for good notes!!

  • MCQs – Nature & Significance of Management
  • MCQs- Principles of Management
  • MCQs – Chapter – Business Environment
  • MCQs – Chapter Planning
  • MCQs – Chapter – Organizing
  • MCQs – Chapter Staffing
  • MCQs – Chapter – Directing PART-1
  • MCQs – Chapter – Directing – Part-2
  • MCQs- Chapter Controlling
  • MCQs- Chapter – Financial Management
  • MCQs – Chapter – Marketing Management Part-1
  • MCQs- Chapter- Marketing Management -Part-2
  • MCQs – Chapter- Marketing Management – Part-3
  • Case Studies- Principles of Management
  • Case Study – Business Environment
  • Case Study-Planning
  • Case Study-Organizing
  • Case Study- Staffing
  • Case Study-Directing
  • Case Study – Chapter Controlling
  • MCQs – Ch-2 – An Entrepreneur – Part-1
  • MCQs – Ch-2 – An Entrepreneur – Part-2
  • Ch-3 MCQs for ENTREPRENEURIAL JOURNEY
  • MCQs – Ch-4 Entrepreneurship As Innovation And Problem Solving
  • Ch-5 MCQs Analysis of Market Env & Market Research
  • CH-6 MCQs Unit of sale – Unit cost – Gross Profit
  • Unit-2 Entrepreneurial Planning
  • Download PDF| Unit-3 Marketing Strategies
  • Download PDF | UNIT-4 Enterprise Growth Strategies
  • Download PDF | UNIT-6 Resource Mobilization
  • Download PDF |Unit 1 ENTREPRENEURSHIP
  • Download PDF | Unit-3 Marketing Strategies
  • Download PDF| UNIT-4 | Enterprise Growth Strategies

Studyresearch

Case Study – Chapter Controlling

Case Studies – Class 12 – Controlling

Case Study for chapter – Controlling

Especially for class 12 – cbse business studies students.

Q1: Managers at Virginia city import-export company suspected Corporate Defence Strategies od Maywood New jersey advised the firm to install a software program that could secretly log every single stroke of the suspect’s computer Keys and send an encrypted e-mail report to CDS. Investigator revealed that two employees were deleting orders from the corporate books after processing them, pocketing the revenue and building their own company with in. In the above one of the important functions of management is performed.

(a) Identify the function

 (b) and identify one of the importance of this function corresponding to above case.

(c) explain any other two importance of identified function.

Q2:  FedEx operates an $ 23 billion delivery system from its London and six international based hubs. An important part of FedEx system was their ability to track customers parcels at each stage of collection, shipment and delivery also at FedEx its system helps in identify which customers generate maximum profits and which eventually end up costing the company. FedEx closes the accounts that are not profitable to serve.

(a): In the above case identify the function of management by quoting the lines which helped in identifying above function.

(b) Also identify two steps of above function by quoting the lines from above case study.

Answers: (a) Controlling, Quoted line: An important part………..profitable to serve

(b):  Measuring actual performance.” An important…….shipment and delivery”. Comparing actual performance with standards, ” its system…………costing the company”.

Taking corrective action ” FedEx closes the accounts……………to serve.

Q3: Raman and Aman are working as planning department and quality check department in their organisation respectively. Raman is of the view that quality check is not possible without standards and Aman is of the view that standards for coming year cannot be made without the help of quality check department. Who among them is correct? Give conceptual reason for your answer.

For reason explain the relation between Planning and Controlling

Q4: Smith Courier System based in Switzerland is a provider of same day delivery services. although Smith may do everything right to meet its delivery commitments, it relies on commercial airlines to transport its parcels and occasionally fails to meet its deadlines. Delays are usually a result of packages being misplaced in airlines tracking systems. Such incidents are beyond Sterling’s control. But from the customer’s vantage point, the failure is smith’s problem. To control the damage created by such delays, Smith had to take some rectification measures. for example, for several months in 1990 and early 1991 several S deliveries disappeared in transit. The packages turned up later, but customer has already suffered financial losses. Yet because the packages were eventually recovered, neither insurance company nor the airlines was liable. The decision of the president Glenn was whether to compensate the customers for their losses or simply not to charge them for the shipments, Glenn concluded that not charging for the shipment was inadequate response given the suffered downtime. but paying the $30,000 in losses would push the then five-year-old $6 million company to losses for the quarter. Glenn’s decision was to pay out the $30,000 in gratis service, the customer stayed, and Smith continue to grow.

(a) Identify one of the important management functions performed by Glenn which helped Smith Courier System to survive and grow.

(b) Also identify the steps of Process of function identified in (a) by quoting the lines from above case.

Answers: (a) Controlling

(b) line quoted: ‘smith courier………..delivery commitments’., step: setting performance standards, line quoted: it relies………its deadlines”, step: measurement of actual performance, line quoted: “delays are…….tracking system”., step: analysing deviations, line quoted: “glenn’s decision……to grow”., step: taking corrective action..

Q5: At Sam defines lack of quality had created a crisis, When the government shut it down because it was not meeting quality standards, Sam brought back a TQM programme that had restored quality. Although Sam’s weapons worked well, the government questions the company’s quality practices and policies. To solve these problems Sam defence went through an organisational transformation. The key elements were: (I) to minimise the dishonest behaviour on the part of the employees by keeping a close check on their activities.(ii) Empowering employees by giving them responsibility and accountability of their performance.(iii) To provide common direction to all activities,

Explain the importance of controlling highlighted in above key elements.

Ans: (I) Ensuring order and discipline.

(ii) improving employee’s motivation., (iii) facilitating coordination in action..

We have disabled - Right- Click - How about stay to read :)

  • NCERT Solutions
  • NCERT Class 12
  • NCERT Class 12 Business Studies
  • Chapter 8: Controlling

NCERT Solutions for Class 12 Business Studies Chapter 8 - Controlling

NCERT Solutions are considered an extremely helpful resource to prepare for the CBSE Class 12 Chapter 8 Business Studies examinations. This study material provides students with a deep knowledge of the topics covered, and the NCERT solutions collated by the subject matter experts are easy to comprehend.

Download the PDF of NCERT Solutions for Class 12 Business Studies Chapter 8 – Controlling

carouselExampleControls111

case study controlling class 12

Access NCERT Solutions for Class 12 Chapter 8

Very short questions ncert business studies solutions class 12 chapter 8.

1. Explain the meaning of controlling.

Controlling is referred to as the process of evaluation of the work that is done. It is all about setting standards for the work and then comparing the actual work that is done with the standard. It ensures that all the activities in an organisation are performed as per the decided plan.

2. Name the principle that a manager should consider while dealing with deviations effectively. State any one situation in which an organisation’s control system loses its effectiveness.

The principle that should be adopted to deal with deviations is management by exception. An organisation’s control system loses effectiveness when standards are not able to be defined in quantitative terms. For example, job satisfaction will be different for different employees.

3. State any one situation in which an organisation’s control system loses its effectiveness.

A control system is bound to lose its effectiveness whenever the standards cannot be defined in quantitative terms, thereby making it difficult to measure deviations happening between actual and standard performance. For example, job satisfaction cannot be described in quantitative terms as it is different for different employees.

4. Give any two standards that can be used by a company to evaluate the performance of its Finance & Accounting department.

Standards that are used by a company for evaluating the performance of the Finance and Accounting department are as follows:

1. Liquidity

2. Flow of Capital

5. Which term is used to indicate the difference between standard performance and actual performance?

The deviation is the term that is used to indicate the difference between standard and actual performance.

Short Questions NCERT Business Studies Solutions Class 12 Chapter 8

1. ‘Planning is looking ahead and controlling is looking back’. Comment.

Planning is the process of creating a structure in advance regarding the work that needs to be done. It is helpful in defining the objectives and goals that need to be achieved by an individual or organisation. Therefore, it is said that planning is about looking ahead, which involves predicting about future. Controlling takes into consideration the assessment of past performance and comparing them with set standards. Due to these characteristics, it can be said to be a backward-looking future. But all these statements are partially correct, as planning is done based on past experiences and how to do better, similarly controlling although looks at past performance, its aim is to improve future performance. Hence, it can be said that both planning and controlling are forward and backward-looking functions.

2. ‘An effort to control everything may end up in controlling nothing’. Explain.

This statement is with regard to the principle of management by exception. As stated in this rule, it is not possible to control everything effectively. This principle states that instead of trying to control all the deviations, there should be some defined ranges that are set up, and only when the deviations go beyond the range they should be notified to managers for control measures.

3. Write a short note on budgetary control as a technique of managerial control.

Budgetary control is a technique that is related to preparing plans in the form of budgets. It is a financial statement that tells us what needs to be achieved and the policies that need to be followed for the time period. Performance is compared with the standards that were set in the budget. Such a comparison helps in the identification of deviation and helps in taking corrective steps. There can be different budgets for different divisions. Budgets act as motivation for employees and encourage them to reach their objectives. With proper budgeting measures, resources can be evenly distributed and utilised appropriately.

4. Explain how management audit serves as an effective technique of controlling.

Management audit is the process of appraisal of the management in an organisation. It is useful in improving the effectiveness of the management to carry out its objectives. It evaluates the functions of the managers and highlights areas where deficiencies are observed. The following point will be helpful in explaining the importance of management audit as a technique of controlling:

1. It helps in identifying deficiencies in the work, which will help in taking corrective measures necessary for improvement.

2. By performing a management audit, various management activities can be monitored, which helps in improving the overall efficiency of the organisation.

3. Enhanced coordination can be observed between employees and departments as work is monitored for effectiveness.

4. Helps organisation to adapt to environmental changes, and this can be ensured by having strategies and policies updated continuously.

5. Mr. Arfaaz had been heading the production department of Writewell Products Ltd., a firm manufacturing stationary items. The firm secured an export order that had to be completed on a priority basis, and production targets were defined for all the employees. One of the workers, Mr. Bhanu Prasad, fell short of his daily production target by 10 units for two days consecutively. Mr. Arfaaz approached Ms. Vasundhara, the CEO of the Company, to file a complaint against Mr. Bhanu Prasad and requested her to terminate his services. Explain the principle of management control that Ms. Vasundhara should consider while taking her decision.

In this situation, the principle of management by exception should be followed. This principle states that any effort to control everything may end up having control of nothing. Only deviations that are beyond the limit will need to be acknowledged, and appropriate actions need to be taken. Therefore, Mr. Bhanu should not be terminated for such a small reason.

Long Questions NCERT Business Studies Solutions Class 12 Chapter 8

1. Explain the various steps involved in the process of control.

Controlling is a systematic approach to managing the activities in an organisation. It includes the following steps:

1. Setting standards: This step involves setting standards and developing benchmarks on the basis of which actual performance can be determined. Standards can be either qualitative or quantitative.

2. Measure actual performance: After setting standards, the next step is determining the actual performance that is taking place through the activities. These can be determined by observation and obtaining data from performance reports.

3. Comparing performances: This step involves comparing the actual performance with the standard. It helps determine the deviations, which guides managers in assessing the performance and taking necessary steps.

4. Analysing deviation: When comparing actual performance with set standards, there will be deviations. It is, therefore, important to find these deviations in the key areas. The methods most used are Critical Point Control and Management by exception.

5. Corrective measures: When deviations reach beyond admissible limits, the management needs to take corrective actions. This step is all about correcting the errors so that they do not happen again. It is the last step that is taken in the process of controlling.

2. Explain the techniques of managerial control.

Managerial control techniques can be divided into two categories:

1. Traditional Technique

2. Modern Technique

Traditional techniques are techniques that were followed by managers in the old days. The following are the techniques followed:

1. Personal observation: Managers oversee the work conducted by employees in this technique. Managers will be getting the right information which makes workers keen on performing well. But, it is a time-consuming process.

2. Statistical reports: Managers can get the performance data which is in the form of average percentages or ratios, and it can be easily represented in charts and graphs. Therefore, a comparison of performance with standards is determined.

3. Break-even analysis: To determine profit or loss, break-even analysis is used. It is that point where total costs become equal to total revenue. Using this technique, managers can determine profit or loss and thereby devise ways to generate profit.

4. Budgetary control: It is a technique where future business operations are determined in the form of budgets. It sets standards for measuring actual performance.

Modern techniques are more recently introduced. It provides ideas for having better control. The following are the modern methods:

1. Return on investment: It is referred to as the gains or similar benefits that are earned on the amount of investment done. It gives a good idea of the returns a company is earning with the amount of investment done.

2. Ratio analysis: It is calculating various ratios for analysing the financial statement. Ratios such as liquidity ratio, solvency ratio etc., help determine the stability of a business.

3. Responsibility accounting: Various responsibility centres are established, and each centre head is responsible for the outcome of the centres. The responsibility centre includes the cost centre, revenue centre, investment centre and profit centre.

4. Management audit: It is an audit of the management processes. It checks the capability of the management and identifies the deficiencies present in the system. It is done through continuous monitoring.

5. PERT and CPM: These are modern management techniques that help determine the scheduling and resource allocation, it enables project execution in the most effective way. These techniques are used in construction, shipbuilding industries etc.

6. MIS: Management Information System is the process of controlling that helps in effective decision-making. It also is cost-effective and helps in the collection and dispersal of information across levels.

3. Explain the importance of controlling in an organisation. What are the problems faced by the organisation in implementing an effective control system?

The following are the importance of controlling in an organisation:

1. Controlling helps in achieving organisational goals by optimum use of resources and correcting deficiencies in the process.

2. It helps in determining the accuracy of the standards set by management. It also helps in reviewing the standards as per changing business requirements.

3. It helps an employee to become motivated as they know what the management expects from them.

4. It also enables effective decision-making in the organisation by promoting order and discipline.

5. It improves coordination among employees and departments, which helps organisation productivity.

Controlling is effective for management, but there are certain problems that are faced by organisations which are highlighted below:

1. The set of standards cannot be set for both qualitative and quantitative terms, as qualitative terms make controlling less effective.

2. Changing factors in the business environment result in changing of control mechanisms in an organisation.

3. Controlling will be resisted if it is against the comfort level of employees.

4. Controlling is a costly affair as infrastructure needs to be set up.

4. Discuss the relationship between planning and controlling.

Planning and controlling are very closely related functions of management. Planning is the process of creating a structure in advance regarding the work that needs to be done. It is helpful in defining the objectives and goals that need to be achieved by an individual or organisation. Therefore, it is said that planning is about looking ahead, which involves predicting about future. Controlling takes into consideration the assessment of past performance and comparing them with set standards. Due to these characteristics, it can be said to be backward-looking future.

Various objectives that are determined by planning will serve as a set of standards against which performance will be determined. If no standards and objectives are present, there will be no control necessary. Similarly, only by planning without control no one will monitor the work, which will lead to inefficiency and lack of productivity. Planning and control complement each other.

But all these statements are partially correct, as planning is done based on past experiences and how to do better, similarly controlling, although looks at past performance, its aim is to improve the future performance. Hence, it can be said that both planning and controlling are forward and backward-looking functions and are very important from an organisation’s point of view.

5. A company, ‘M’ limited, manufactures mobile phones, both for the domestic Indian market as well as for export. It enjoyed a substantial market share and also had a loyal customer following. But lately, it has been experiencing problems because its targets have not been met with regard to sales and customer satisfaction. Also, the mobile market in India has grown tremendously, and new players have come with better technology and pricing. This is causing problems for the company. It is planning to revamp its controlling system and take other steps necessary to rectify the problems it is facing.

a. Identify the benefits the company will derive from a good control system.

b. How can the company relate its planning with control in this line of business to ensure that its plans are actually implemented and targets attained?

c. Give the steps in the control process that the company should follow to remove the problems it is facing.

a. The company will derive the following benefits from a good control system:

i. Deficiencies in the system will be identified, and corrective steps can be taken accordingly. It helps the organisation to move towards the objective in the right way.

ii. Accuracy of set standards can be determined. If needed, the set of standards can be appropriately modified.

iii. Optimum resource utilisation will occur, so there will be less wastage of resources and more efficiency.

iv. The employees will be aware of their roles and expectations from the management, which motivates them to achieve the objective of the organisation.

b. Planning and controlling are closely related functions. While planning is all about what objectives need to be achieved and the steps to follow, controlling is about evaluating the work as per standards and taking necessary corrective actions as required. In the current situation, plans can be made with regard to customer satisfaction, sales and pricing policy. In the event of a lack of standards, there will be no control.

c. The company should follow the steps as mentioned below:

1. Standards should be set up which will serve as a benchmark for comparison against actual performance. Standards can be either qualitative or quantitative.

2. After setting the standards, actual performance needs to be analysed. It can be done by personal observation and collecting reports of performance.

3. The next step would be to compare the performances with standards and find deviations; then, necessary corrective steps can be taken to rectify them.

4. Deviations that are over the permissible range should be worked upon. It can be analysed using critical point control and management by exception methods.

5. The corrective steps are the last part of controlling as it works towards correcting deficiencies of the organisation.

6. Mr. Shantanu is the Chief Manager of a reputed company that manufactures garments. He called the production manager and instructed him to keep a constant and continuous check on all the activities related to his department so that everything goes as per the set plan. Also, the Chief Manager suggested the production manager to keep track of the performance of all the employees in the organisation so that targets are achieved effectively and efficiently.

a. Describe any two features of controlling highlighted in the above situation. (Goal oriented, continuous and pervasive – any 2).

b. Explain any five points of importance of controlling.

a. Following features of controlling are highlighted here:

i. By keeping a close watch on the progress of work and constantly engaging in work towards attaining the goals of the organisation, it is a goal-oriented approach.

ii. Controlling is a pervasive function which can be exercised by managers of any level, division or department.

b. Following are the points of controlling:

1. Controlling helps in achieving the organisational goals by optimum use of resources and correcting deficiencies in the process.

4. It enables effective decision-making in the organisation by promoting order and discipline.

NCERT Solutions for Class 12 Business Studies Chapter 8 – Controlling provides students with a comprehensive introduction to the concepts. It provides a clear picture of how to control the staff and management.

The concepts covered in this chapter are listed below:

  • Meaning of Controlling
  • Limitations of Controlling
  • Relationship between Planning and Controlling
  • Controlling Process

NCERT Solutions for Class 12 Business Studies Chapter 8 provides a wide range of illustrative examples, which helps the students to comprehend and learn quickly. The above-mentioned are the solutions according to the Class 12 CBSE syllabus. For more solutions and study materials of NCERT solutions for Class 12 Business Studies , visit BYJU’S or download BYJU’S – The Learning App for more information.

Leave a Comment Cancel reply

Your Mobile number and Email id will not be published. Required fields are marked *

Request OTP on Voice Call

Post My Comment

case study controlling class 12

Register with BYJU'S & Download Free PDFs

Register with byju's & watch live videos.

case study controlling class 12

45,000+ students realised their study abroad dream with us. Take the first step today

Here’s your new year gift, one app for all your, study abroad needs, start your journey, track your progress, grow with the community and so much more.

case study controlling class 12

Verification Code

An OTP has been sent to your registered mobile no. Please verify

case study controlling class 12

Thanks for your comment !

Our team will review it before it's shown to our readers.

case study controlling class 12

  • Class 12th /

Controlling Class 12

' src=

  • Updated on  
  • Feb 20, 2021

Controlling Class 12 Business Studies Notes

Controlling is an important chapter in the syllabus of Business Studies class 12 . Learning this chapter and other commerce subjects , helps you understand the meaning, importance of controlling, the relationship between planning and controlling, processes and techniques on how a manager may effectively control adverse business situations intelligently through various strategies. Through this blog, we provide you with concise revision notes in business studies that will help you with your exam preparations controlling class 12.

Must Read: Study Abroad After 12th Commerce

According to control class 12,   control is a management function which involves:

  • Setting standards
  • Measuring actual performance
  • Taking corrective action
  • Comparison of actual performance with the planned performance

Importance of Controlling

The chapter controlling class 12, elaborates the importance of controlling. They are mentioned below.

  • Helps in achieving organizational goals and indicates deviations if any to take corrective action.
  • Judges accuracy of standards by carefully checking the changes taking place in an organizational environment.
  • Makes efficient use of resources and enables a manager to reduce wastage of resources.
  • It improves employee motivation by letting them know well in advance what they are expected to do and the standards of performance.
  • Ensures order and discipline by keeping a close check on the activities of its employees.
  • Facilitates Coordination in action by setting predetermined actions for governing various departments.

Limitations in Controlling

Moving further in the chapter controlling class 12, we look at the various limitations of controlling. They are mentioned below.

  • Difficulty in setting quantitative standards thus leading to loss in its effectiveness.
  • Little control on external factors like  government policies, technological changes, competition etc.
  • Resistance from employees when exerting control.
  • Costly affair involving expenditure, time and effort.

Features of Controlling

According to controlling class 12, controlling as a process has various features to it. They are listed below.

  • Goal oriented
  • looking back the performance achieved by employees
  • Is a forward looking function
  • Depends on planning
  • Action oriented
  • Primary Function of Management
  • Brings back management cycle back to planning

Relationship between Planning and Controlling

Moving further in controlling class 12, we look at the relationship between planning and controlling. Planning and controlling are both interrelated and influence each other.

  • Planning is necessary for controlling as the set standard for establishing control.
  • Planning is only meaningful when control is exercised. It identifies deviations and initiates corrective measures.
  • The effectiveness of planning can be measured by controlling.
  • Planning is future-oriented and involves looking in advance and making policies to maximise resources in the future. Whereas controlling is looking back on the performance already achieved by the employees and comparing it with the set standards.
  • Therefore they are connected as planning makes controlling effective whereas controlling improves future planning.

Controlling Processes

Moving further in controlling class 12, we identify the various process of controlling. It is a 5 stage process which includes the following:

  • Setting performance standards
  • Measurement of actual performance.
  • Comparing the actual performance with standards.
  • Critical Point Control – Control should focus on key result areas (KRAs) which are critical to the success of an organisation. These KRAs are set as the critical points.
  • Management by Exception – It is based on the belief that any attempt to control everything results in controlling nothing. In short, everything cannot be controlled at the same times.
  • Taking Corrective Action

Techniques of Managerial Control

According to controlling class 12, the process of controlling has several techniques.  It can be classified into 2 broad types.

Traditional Techniques

The types of traditional techniques in controlling class 12 are mentioned below.

  • Personal Observation – collection of first hand information but it is very time consuming and cannot be used in all kinds of jobs.
  • Statistical Reports – Statistical analysis in the form of averages, percentages, ratios, correlation, etc
  • Break Even analysis – a technique to study the relationship between costs, volume and profits.
  • Budgetary Control – all activities are planned in advance in the form of budgets and actual results are compared with budgetary standards.

There are many kinds of budget, which form a part of the traditional techniques. They are mentioned below.

  • Sales budget
  • Production budget
  • Material budget
  • Cash budget
  • Capital budget
  • Research and development budget

Modern Techniques

The types of modern techniques are mentioned below.

  • Return on Investment – technique which provides the basic tool for measuring whether or not invested capital has been used effectively for generating a reasonable amount of return.
  • Solvency Ratios
  • Profitability Ratios
  • Liquidity Ratios
  • Turnover Ratios
  • Cost centre: an arena of an organisation for which a manager is held responsible for its operations. For e.g. production department for manufacturing units.
  •  Revenue Centre: is an arena of an organisation which is responsible for the generation of revenue. For e.g. Marketing department.
  • Profit Centre: is an arena of an organisation where the manager is responsible for both revenues and costs. For e.g. repair and maintenance department.
  • Investment Centre: is responsible for profits and all investments made in the centre. e.g. assets
  • Management unit : Management unit is the performance appraisal of the management of an organization.

PERT and CPM

According to controlling class 12, PERT (Programme Evaluation and Review Technique) and CPM (Critical Path Method) are important techniques that help plan and control complex projects. They are useful in the implementation of time-bound projects which comprises a complex, diverse and interrelated activities.

Management Information System

Next in controlling class 12, is the management information system. Management Information System (MIS) is a computer-based system. It is an important control technique and provides information and support for managerial decision-making. Some of the advantages of MIS are

  • It helps with the collection, management and dissemination of information.
  • It bolsters planning and controlling.
  • It enhances the quality of information.
  • It maintains cost effectiveness.
  • It helps reduce information overload

Must Read: Business Studies Class 12 Project

This was all about the chapter Controlling class 12. We hope this blog about Controlling class 12 will help you with a quick revision while preparing for the exam. If you want to avail similar notes in other subjects, check out our blogs at Leverage Edu . Confused about what to do after class 12th? Get in touch with our experts who will guide you every step of the way! Sign up for a free session today!

' src=

Team Leverage Edu

Leave a Reply Cancel reply

Save my name, email, and website in this browser for the next time I comment.

Contact no. *

browse success stories

Leaving already?

8 Universities with higher ROI than IITs and IIMs

Grab this one-time opportunity to download this ebook

Connect With Us

45,000+ students realised their study abroad dream with us. take the first step today..

case study controlling class 12

Resend OTP in

case study controlling class 12

Need help with?

Study abroad.

UK, Canada, US & More

IELTS, GRE, GMAT & More

Scholarship, Loans & Forex

Country Preference

New Zealand

Which English test are you planning to take?

Which academic test are you planning to take.

Not Sure yet

When are you planning to take the exam?

Already booked my exam slot

Within 2 Months

Want to learn about the test

Which Degree do you wish to pursue?

When do you want to start studying abroad.

January 2025

September 2025

What is your budget to study abroad?

case study controlling class 12

How would you describe this article ?

Please rate this article

We would like to hear more.

NCERT Solutions for Class 6, 7, 8, 9, 10, 11 and 12

NCERT Solutions for Class 12 Business Studies Chapter 8 Controlling

September 30, 2019 by Sastry CBSE

NCERT Solutions CBSE Sample Papers Business Studies Class 12 Business Studies

Short Answer Type Questions

1. Explain the meaning of controlling. Ans:  Controlling means ensuring that activities in an organisation are performed as per the plans. Controlling also ensures that an organisations resources are being used effectively and efficiently for the achievement of desired goals. Controlling is, thus a goal oriented function. Controlling is a very important managerial function. Because of controlling manager is able to compare actual performance with the planned performance. In order to control the activities at all levels manager needs to perform controlling function.

2. Planning is looking ahead and controlling is looking back comment. Ans:  Planning and controlling are inseparable. Planning is the primary function of every organisation it is the thinking process, which means looking ahead or making plans that how desired goal is achieved in future thus it is called a formed looking function on the other hand controlling is a systematic function which measures the actual performance with the planned performance. It compared and analysed the whole process of an organisation and take correcting actions. Thus, it is a backward looking function but the statement “Planning is looking ahead and controlling is looking back” is partially correct because it should be understood that planning is glided by past experiences and the corrective action initiated by control function which aims to improve future performance. Thus, planning and controlling are both backward looking as well as a forward looking functions.

3. ‘An effort to control everything may end up in controlling nothing’. Explain. Ans:  It’s a well known fact that “Jack of all master of none” when we start controlling everything it results in controlling nothing because it is not possible at one time to control various activities as this process may neither be economical nor easy. Control thus focus on KRAs (Key Result Areas). It means instead of controlling all activities, control where the critical points goes wrong and by which organisation suffers. Thus, KRAs are set as critical points and one should be aware that he has to control what.

4 . Write a short note on budgetary control as a technique of managerial control. Ans:  Budgetary control is a technique of managerial control in which all operations are planned and this will help us in knowing how much we have to spend in order to achieve the future result. It compared the actual result with budgetary standards. This comparison reveals the necessary actions to be taken so that the organisational objectives are accomplished. Budgeting offers the following advantages (i) Budgeting focuses on specific and time bound targets. (ii) Budgeting is a source of motivation to the employees they set the standards against which their performance will be appraised and thus, enables them to perform better. (iii) Budgeting helps in optimum utilisation of resources by allocating them according to the requirements of different departments. (iv) It helps the management in setting standards.

5 . Explain how management audit serves as an effective technique of controlling. Ans:  Management audit is a technique which helps in measuring the efficiency and effectiveness of management. It is a comprehensive and constructive review. Thus, we can say it is defined as the review of the functioning performance and to improve its efficiency in future period hence it serves as an effective technique of controlling following points are proving the same. (i) It helps to locate present and potential deficiencies in the performance of management functions. (ii) It helps to improve the control system of an organisation by continuously monitoring the performance of management. (iii) It ensures updating of existing managerial policies and strategies in the light of environmental changes. This results in efficient controlling of management.

Long Answer Type Questions

1. Explain the various steps involved in the process of control. Ans:  Controlling is a systematic process involving following steps (i) Setting Performance Standards  The first step in the controlling process is setting up of performance standards. Standards are the criteria against which actual performance would be measured. Standards can be set in both quantitative as well as qualitative terms. Some of the qualitative standards are—cost to be incurred, product units to be produced, time to be spent in performing a task etc. Improving goodwill and motivation level of employees are examples of qualitative standards. (ii) Measurement of Actual Performance Once performance standards are set, the next step is measurement of actual performance. Performance should be measured in an objective and reliable manner. Some of the techniques used for measuring the performance are personal observation, sample checking performance reports etc. (iii) Comparing Actual Performance with Standards This step involves comparison of actual performance with the standards. Such comparison will reveal the deviation between actual and desired results. Comparison becomes easier when standards are set in quantitative terms. For instance, performance of a worker in terms of units produced in a week can be easily measured against the standard output for the week. (iv) Analysing Deviations Some deviations in performance can be expected in all activities. It is therefore, important to determine the acceptable range of deviations. Also, deviations in key areas of business need to be attended more urgently as compared to deviations in certain insignificant areas. Critical point control and management by exception should be used by a manager in this regard. (v) Taking Corrective Action The final step in the controlling process is taking corrective action. No corrective action is required when the deviations are within acceptable limits. However, when the deviations go beyond the acceptable range, especially in the important areas, it demands immediate managerial attention so that deviations do not occur again and standards are accomplished. Incase the deviations cannot be corrected through managerial action, the standards may have to be revised.

2. Explain the techniques of managerial control. Ans:  The various techniques of managerial control may be classified into broad categories (i) Traditional Techniques Those techniques which have been used by the companies for a long time now are traditional techniques. However, these have not become obsolete and are still being used by companies. (a) Personal Observation Personal observation enables the manager to collect first hand information. It also creates a psychological pressure on the employees to perform well as they are aware that they are being observed personally in their job. (b) Statistical Reports Statistical analysis in the form of averages, percentages, ratios, correlation etc. Present useful information to the managers regarding performance of the organisation in various areas. Such information when presented in the form of charts, graphs, tables etc enables the managers to read them more easily and allow a comparison to be made with performance in previous periods and also with the benchmarks. (c) Break-even Analysis It is a technique used by managers to study the relationship between costs, volume and profits. It determines the probable profits and losses at different levels of activity. The sales volume at which there is no profit, no loss is known as break-even point. It is a useful technique for the managers as it helps in estimating profits at different levels of activities. (d) Budgetary Control It is a technique of managerial control in which all operations are planned in advance in the form of budgets and actual results are compared with budgetary standards. This comparison reveals the necessary actions to be taken so that organisational goals are accomplished. A budget is a quantative statement for a definite future period of time for the purpose of obtaining a given objective. It is also a statement which reflects the policy of that particular period. It will contain figures of forecasts both in terms of time and quantities.

NCERT Solutions for Class 12 Business Studies Chapter 8 Controlling LAQ Q2

3. Explain the importance of controlling in an organisation. What are the problems faced by the organisation in implementing an effective control system? Ans:  Control is an indispensable function of management. Without control the best of plans can go away. A good control system helps an organisation in the following way (i) Accomplishing Organisational Goals The controlling function measures progress towards the organisational goals and brings to light the deviations. If any, and indicates corrective action. It thus, guides the organisation and keeps it on the right track so that organisational goals might be achieved. (ii) Judging Accuracy of Standards A good control system enables management to verify whether the standards set are accurate and objective an efficient control system keeps a careful check on the changes taking place in the organisation and in the environment and helps to review and revise the standards in light of such changes. (iii) Making Efficient Use of Resources By exercising control, a manager seeks to reduce wastage and spoilage of resources. Each activity is performed in accordance with pre-determined standards and norms. This ensures that resources are used in the most efficient and effective manner. (iv) Improving Employee Motivation A good control system ensures that employees know well in advance what they are expected to do and what are the standards of performance on the basis of which they will be appraised. It, thus motivates them and helps them to give better performancer. (v) Ensuring Order and Discipline Controlling creates an atmosphere of order and discipline in the organisation. It helps to minimise dishonest behaviour on the part of the employees by keeping a close check on their activities. (vi) Facilitating Co-ordination in Action Controlling provides direction Jo al! activities and efforts for achieving organisational goals. Each department and employee is governed by pre-determined standards which are well co-ordination with one another. This ensures that overall organisational objectives are accomplished. Although controlling is an important function of management. It suffers from the following limitations also (i) Difficulty in Setting Quantitative Standards  Control system loses some of its effectiveness when standards cannot be defined in quantitative terms. This makes measurement of performance and their comparison with standards a difficult task. Employee morale, job satisfaction and human behaviour are such areas where this problem might arise. (ii) Little Control on External Factors  Generally an enterprise cannot control external factors such as government policies, technological changes competition etc. (iii) Resistance from Employees  Control is offer resisted by employees. They see it as a restriction on their freedom. For instance, employees might object when they are kept under a strict watch with the help of Closed Circuit Televisions (CCTVs). (iv) Costly Affair Control is a costly affair as it involves a lot of expenditure, time and effort. A small enterprise cannot afford to install an expensive control system. It cannot justify the expenses involved. Managers must ensure that the costs of installing and operating a control system should not exceed the benefits derived from it.

4. Discuss the relationship between planning and controlling. Ans:  Planning and controlling are inseparable, they are twins of management. A system of control pre-supposes the existence of certain standards. These standards of performance which serve as the basis of controlling are provided by planning. Once a plan becomes operational controlling is necessary to monitor the progress, measure it, discover deviations and initiate corrective measures to ensure that events conform to plans. Planning is clearly a pre-requisite for controlling. Controlling cannot be accomplished with planning. With planning there is no pre-determined understanding of the desired performance, planning seeks consistent, integrated and articulated programmes while controlling seeks to compel events to conform to plans.

Application Type Questions

Following are some behaviours that you and others might engage in on the job. For each item, choose the behaviour that management must keep a check to ensure an efficient control system. 1.Biased performance appraisals. 2.Using company’s supplies for personal use. 3.Asking a person to violate company’s rules. 4.Calling office to take a day off when one is sick. 5.Overlooking boss’s error to prove loyalty 6. Claiming credit for someone else’s morn. 7. Reporting a violation on noticing it. 8. Falsifying quality reports. 9. Taking longer than necessary to do the job. 10. Setting standards in consultation with workers. You are also required to suggest the management how the undesirable behaviour can be controlled.

Answers 1. To avoid biased appraisal, performance appraisal should be taken by a committee of experts. 2. The statement are not so expensive, so it can be ignored. 3. Strict and immediate disciplinary action should be taken. 4. Mass bunking should not be allowed. 5. Secret suggestion box can be used to” collect feedback about the boss for appraisal. 6. Performance records of employees to be maintained. 7. If minor can be over looked. 8. Strict quality control techniques should be used. 9. Time and motion study should be used to fix standard. 10. The use of scientific techniques can help in fixing the most feasible and optimum standards.

Case Problem

A company M limited is manufacturing mobile phones both for domestic Indian market as well as for export. It had enjoyed a substantial market share and also had a loyal customer following. But lately it has been experiencing problems because its targets have not been met with regard to sales and customer satisfaction. Also mobile market in India has grown tremendously and new players have come with better technology and pricing. This is causing problems for the company. It is planning to revamp its controlling system and take other steps necessary to rectify the problems it is facing.

1 . Identify the benefits the company will define from a good control system. Ans:  When company starts following a good control on operations, it leads to derive benefits which are (i) Helps in achieving desired goals. (ii) Judging accuracy of operations. (iii) Making efficient and effective use of resources. (iv) Improving employee morale. (v) Ensuring proper flow of orders and the whole system is in discipline. (vi) It facilitates the co-ordination and improve the performance of every individual.

2 . How can the company relate its planning with control in this line of business to ensure that its plans are actually implemented and targets attained? Ans:  Company relates its planning with control in this line of business by implementing effective control system as this will help in two aspects planning makes controlling effective and efficient whereas controlling improves future planning because it is like a postmortem of past activities to find out deviations from the standards and in order to ensure that its plans are actually implemented and targets are attained they will take the help from controlling process as it is a systematic process and it leads to following benefits (i) Setting Up of Standards  In this step, company set some targets against which the actual performance is measured. (ii) Measuring of Performance  In this step, company is able to measure the performance and evaluating that what is actually done by the employees. (iii) Compare Performance  After evaluating the actual result, company compares the actual performance with the planned one this helps in knowing that the desired goal is achieved or not. (iv) Analysing Deviations  This refers to the difference between actual and desired performance. It helps in knowing to the company that the deviation is positive or negative. It needs focus on which part rather than analysing whole. (v) Taking Corrective Measures  Final step is to know the type of deviation and trying to remove this deviation and in future it matches with the plans.

3. Give the steps in the control process that the company should follow to remove the problems it is facing. Ans:  The company should follow these steps in a systematic manner (i) Setting performance standards (ii) Measurement of actual performance (iii) Comparison of actual performance with standards (iv) Analysing deviations (v) Taking corrective actions

4 . What techniques of control can the company use? Ans:  The company should follow the modern techniques to control the system (i) ROI (Return on Investment) It is a useful technique of controlling overall performance of a company. It indicates how effectively resources are being used, facilitates balanced use of capital employed, focuses on profits and relates them to capital invested. (ii) Responsibility Accounting  Under this technique, organisation is divided into centres which is responsible for overall growth of various departments. Responsibility centres in the organisation are (a) Cost Centre  Under this, one can check the production and operational cost. (b) Revenue Centre  Sales or marketing departments come under this, it is responsible for generating revenue. (c) Profit Centre  Profit = Revenue – Cost. It ensures the actual profit derived from the business. (d) Investment Centre  This centre ensures the optimum use of assets and it makes use of return on investment. (iii) MIS (Management Information System) It is a control technique which provides information and support for effective managerial decision making. It provide accurate information to the managers, helps in planning, controlling, provides cost effective information and many more.

More Resources for CBSE Class 12 RD Sharma class 12 Solutions NCERT Solutions for Class 12th English Flamingo NCERT Solutions for Class 12th English Vistas CBSE Class 12 Accountancy NCERT Solutions for Class 12th Maths CBSE Class 12 Biology CBSE Class 12 Physics CBSE Class 12 Chemistry CBSE Sample Papers For Class 12

NCERT Solutions Accountancy Business Studies Macro Economics Commerce

Free Resources

NCERT Solutions

Quick Resources

myCBSEguide

  • Business Studies
  • Extra Questions of Class...

Extra Questions of Class 12 Business Studies Controlling

Table of Contents

myCBSEguide App

Download the app to get CBSE Sample Papers 2023-24, NCERT Solutions (Revised), Most Important Questions, Previous Year Question Bank, Mock Tests, and Detailed Notes.

Extra Questions of Class 12 Business Studies Controlling. myCBSEguide has just released Chapter Wise Question Answers for class 12 Business Studies. There chapter wise Practice Questions with complete solutions are available for download in  myCBSEguide   website and mobile app. These test papers with solution are prepared by our team of expert teachers who are teaching grade in CBSE schools for years. There are around 4-5 set of solved Business Studies Test Papers from each and every chapter. The students will not miss any concept in these Chapter wise question that are specially designed to tackle Board Exam. We have taken care of every single concept given in CBSE Class 12 Business Studies syllabus  and questions are framed as per the latest marking scheme and blue print issued by CBSE for class 12.

CBSE Class 12 Business Studies Ch – 8

Download as PDF

Business Studies Practice Question for Class 12 Chapter 8

Budgetary control requires the preparation of (1)

  • Responsibility centres
  • Network diagram
  • Training schedule

Controlling is blind without _________________ (1)

  • Capital market

Controlling is _________aspect of management (1)

  • Theoretical

What will be the corrective action for defective material? (1)

  • None of these
  • Change in Quantity
  • Change in Quality Specifications of the material used
  • Change in Price

What corrective action should be taken in case deviations are cause due to defective machinery? (1)

  • What is important while analyzing deviations in controlling? (1)

Name the concept which suggests that only significant deviations which go beyond the permissible limit should be brought to the notice of management. (1)

Standards are set in quantitative as well as in qualitative terms. Identify the type of standards when it is set in terms of ‘time to be spent’. (1)

Why it is said that controlling is blind without planning? (3)

K&K Co. Ltd. is engaged in manufacturing of machine components. The target of production is 200 units daily. The company had been successfully attaining this target until two months ago. Over the last two months it has been observed that daily production varies between 150-170 units. Identify the possible causes for the decline in production and the steps to be taken to achieve the desired targets. (3)

Explain briefly the relationship between controlling and planning. (4)

  • “There is a close and reciprocal relationship between planning and controlling.” Explain the statement. (4)

If planning is done carefully and accordingly other functions of management are going in the right direction, then there is no need for the controlling function of management”. Do you agree with the statement? Give any two reasons in support of your answer. (5)

“In the absence of a managerial function, planning goes unchecked“. Name the function and explain its importance. (5)

Explain the importance of controlling in an organization. What are the problems faced by the organization in implementing an effective control system? (6)

Chapter – 8 Controlling

  • Budgets Explanation: To audit means comparing the performance with the standards set, thus it is important to set budget – that is set standards in terms of achievables, eg. in sales budget – no. of units to be sold by the salesman is to be budgetted, so that it facilitates measuring the varience of performance.
  • Planning Explanation: Setting performance standards through the process of planning forms the first function of management control. Work plan or finance plan or strategic plan gives a direction for performance and audit. Thus, without laying down work plan it would be like working without a foresight of the path that has to be undertaken.
  • Practical Explanation: As controlling is the doing function of management activity, that is comparing the executed work with the planned work and measuring the difference and communicating the difference with the performer, it is a practical job. It is neither physical nor a thought of mind and not a theory proposed.
  • Change in Quality Specifications of the material used Explanation: As the material used is defective the quality of the end product would also be defective. Thus, the corrective action to be taken by the materials management department or the stores manager is to check the quality of the material purchased and change the quality specifications of the material used.
  • Replacement or repairing the machinery can be a corrective action if it is causing deviations in the set targets.
  • While identifying deviations it is important to focus more on important areas which can affect an organizations main objectives.
  • The concept Management by Exception(MBE) Says that the Managers should focus on Key Areas only.
  • ‘Time to be spent’ comes under quantitative standards.
  • In the absence of controlling, actual performance will not be measured and compared. So, how far plans are implemented cannot be known.
  • The possible causes for decline in production are- 1.There may be some defect in machinery by which the components are produced. 2. Employees are not performing efficiently. 3. Sometimes employees may become lethargic. The company can take the following steps to achieve the desired targets- 1. Company can replace the machinery or get repaired, if fault is in machine. 2. If employees are not performing efficiently, the company can give training to employees. 3. If employees are becoming lethargic,the company needs to keep a close supervision on their employees.
  • Planning provides the basis for controlling activities.
  • Controlling ensures that planned goals are achieved efficiently and effectively. It measures the performance with the predetermined standards and finds out the deviations if any.
  • The causes of deviation as identified by controlling are the basis of effective future planning.
  • Planning and controlling both are forward-looking and backward-looking.
  • Both are integral parts of an organisation and are necessary for the smooth functioning of an enterprise.
  • Planning proceeds controlling and controlling succeeds planning.
  • The process of planning and controlling works on Systems Approach which is “Planning {tex}\right arrow{/tex} Results {tex}\right arrow{/tex} Corrective Action”.
  • Planning is the basis for control in the sense that it provides the entire spectrum on which control function is based. In fact, these two terms are often used together in the designation of the department which carries production, planning and scheduling. Control measures the behaviour and activities in the organisation suggest measures to remove deviations, if any.Control is the result of particular plans, goals and policies. Thus, planning offers and affects control. Also, planning is affected by control in the sense that many of the information provided by control is used for planning. Thus, there is a reciprocal relationship between planning and controlling.
  • Accomplishing organizational goals: Controlling plays an important role in the achievement of organizational goals. Organizational goals can be achieved only if all activities are going according to the plan. Through controlling managers ensure that all activities are taking place according to the plan and also measure that an organization is progressing towards its goals. If there is any deviation, they take corrective action. In this way, controlling is helpful in achieving the goals of the organisation.
  • Judging accuracy of standards: While performing the function of controlling, a manager compares the actual work performance with the standards. He tries to find out whether the laid down standards are not more or less than the general standards. In case of need, they are redefined.
  • Accomplishing organizational goals : The process of controlling helps in accomplishing organizational goals or objectives. The controlling guides the activities of subordinates in achieving the goals. It ensures the use of human and material resources in the best possible manner so that there may be predetermined objectives of the organization.
  • Judging accuracy of the standards : A manager compares the actual work performance with the standards while performing the function of controlling. He tries to find out whether the accuracy of the standards is not more or less than the general standards. In case of the needs, they are redefined.
  • Improves efficiency : The organization sets the goal for future which is not certain. So, controlling is the way which focuses on uncertainty and to attain the goals. Regular control shows the deviation in plan and actual achievement which helps to keep the staffs on the right track.
  • Improve employee motivation : Motivation is defined as the process of inspiring someone for doing something. Controlling makes all the employees to work with complete dedication as they know that their work performance will be evaluated. Their identity will be established if the progress report is satisfactory in the organization.
  • Ensuring order and discipline :The implementation of controlling help to check all the undesirable activities like theft, corruption, delay in work and uncooperative attitude. Controlling ensures order and discipline, Ensuring order and discipline is also one of the importance of controlling.
  • Accomplishing organizational goals: The controlling process is implemented to take care of the plans. With the help of controlling, deviations are immediately detected and corrective action is taken. Therefore, the difference between the expected results and the actual results is reduced to the minimum. In this way, controlling is helpful in achieving the goals of the organisation.
  • Judging accuracy of standards: While performing the function of controlling, a manager compares the actual work performance with the standards. He tries to find out whether the laid down standards are not more or less than the general standards. In case of need, they are redefined
  • Making efficient use of resources: Controlling plays an important role in reducing the wastage and spoilage of resources and ensures that resources of an organization i.e. Technical, human, financial resources etc., are being used effectively and efficiently for the achievement of predetermined goals.
  • Improving employee motivation: A good control system communicates the goals and standards to employees well in advance. An effective control helps in removing the weaknesses of the employees so that they can contribute to the best of their efforts. It motivates them and helps them to give better performance.
  • Ensuring order and discipline: Controlling ensures order and discipline. With its implementation, all the undesirable activities like theft, corruption, delay in work and uncooperative attitude are checked.
  • Facilitating coordination in action: Controlling provides direction to all activities and efforts for achieving organizational goals. It facilitates coordination between different departments by laying down standards of performance. All departments are governed by the pre-determined standards which are well coordinated with one another. This ensures that overall organizational objectives are achieved.

Problems faced by the organization in implementing an effective control system are :

  • Difficulty in setting quantitative standards: Controlling fails in setting quantitative standards. The controlling function becomes less effective when standards cannot be defined in quantitative terms.
  • Little control over external factors: External factors like government policies, technological changes and competition etc., cannot be controlled by controlling.
  • Resistance from employees: Employees think that control is a restriction on their freedom. For example, they do not like to be observed through CCTV.
  • Costly affair: Controlling is costly and time-consuming. Managers must ensure that the cost of controlling should not exceed the benefits derived from it.

Chapter Wise Extra Questions for Class 12 Business Studies

Part -i and part – ii.

  • Nature and Significance of Management
  • Principles of Management
  • Business Environment
  • Controlling
  • Financial Management
  • Financial Markets
  • Marketing Management
  • Consumer Protection

Test Generator

Create question paper PDF and online tests with your own name & logo in minutes.

Question Bank, Mock Tests, Exam Papers, NCERT Solutions, Sample Papers, Notes

Related Posts

  • Important Questions for Class 12 Financial Management Business Studies
  • Important Questions for Class 12 Directing Business Studies
  • Class 12 Business Studies Staffing Extra Questions
  • Practice Questions for Class 12 Organising Business Studies
  • Class 12 Business Studies Nature and Significance of Management Extra Questions
  • Class 12 Economics Money and Banking Important Questions
  • Important Questions for Class 12 Business Studies Consumer Protection
  • Marketing Management Class 12 Business Studies Extra Questions

Leave a Comment

Save my name, email, and website in this browser for the next time I comment.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 12 September 2024

An open-source framework for end-to-end analysis of electronic health record data

  • Lukas Heumos 1 , 2 , 3 ,
  • Philipp Ehmele 1 ,
  • Tim Treis 1 , 3 ,
  • Julius Upmeier zu Belzen   ORCID: orcid.org/0000-0002-0966-4458 4 ,
  • Eljas Roellin 1 , 5 ,
  • Lilly May 1 , 5 ,
  • Altana Namsaraeva 1 , 6 ,
  • Nastassya Horlava 1 , 3 ,
  • Vladimir A. Shitov   ORCID: orcid.org/0000-0002-1960-8812 1 , 3 ,
  • Xinyue Zhang   ORCID: orcid.org/0000-0003-4806-4049 1 ,
  • Luke Zappia   ORCID: orcid.org/0000-0001-7744-8565 1 , 5 ,
  • Rainer Knoll 7 ,
  • Niklas J. Lang 2 ,
  • Leon Hetzel 1 , 5 ,
  • Isaac Virshup 1 ,
  • Lisa Sikkema   ORCID: orcid.org/0000-0001-9686-6295 1 , 3 ,
  • Fabiola Curion 1 , 5 ,
  • Roland Eils 4 , 8 ,
  • Herbert B. Schiller 2 , 9 ,
  • Anne Hilgendorff 2 , 10 &
  • Fabian J. Theis   ORCID: orcid.org/0000-0002-2419-1943 1 , 3 , 5  

Nature Medicine ( 2024 ) Cite this article

86 Altmetric

Metrics details

  • Epidemiology
  • Translational research

With progressive digitalization of healthcare systems worldwide, large-scale collection of electronic health records (EHRs) has become commonplace. However, an extensible framework for comprehensive exploratory analysis that accounts for data heterogeneity is missing. Here we introduce ehrapy, a modular open-source Python framework designed for exploratory analysis of heterogeneous epidemiology and EHR data. ehrapy incorporates a series of analytical steps, from data extraction and quality control to the generation of low-dimensional representations. Complemented by rich statistical modules, ehrapy facilitates associating patients with disease states, differential comparison between patient clusters, survival analysis, trajectory inference, causal inference and more. Leveraging ontologies, ehrapy further enables data sharing and training EHR deep learning models, paving the way for foundational models in biomedical research. We demonstrate ehrapy’s features in six distinct examples. We applied ehrapy to stratify patients affected by unspecified pneumonia into finer-grained phenotypes. Furthermore, we reveal biomarkers for significant differences in survival among these groups. Additionally, we quantify medication-class effects of pneumonia medications on length of stay. We further leveraged ehrapy to analyze cardiovascular risks across different data modalities. We reconstructed disease state trajectories in patients with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) based on imaging data. Finally, we conducted a case study to demonstrate how ehrapy can detect and mitigate biases in EHR data. ehrapy, thus, provides a framework that we envision will standardize analysis pipelines on EHR data and serve as a cornerstone for the community.

Similar content being viewed by others

case study controlling class 12

Data-driven identification of heart failure disease states and progression pathways using electronic health records

case study controlling class 12

EHR foundation models improve robustness in the presence of temporal distribution shift

case study controlling class 12

Harnessing EHR data for health research

Electronic health records (EHRs) are becoming increasingly common due to standardized data collection 1 and digitalization in healthcare institutions. EHRs collected at medical care sites serve as efficient storage and sharing units of health information 2 , enabling the informed treatment of individuals using the patient’s complete history 3 . Routinely collected EHR data are approaching genomic-scale size and complexity 4 , posing challenges in extracting information without quantitative analysis methods. The application of such approaches to EHR databases 1 , 5 , 6 , 7 , 8 , 9 has enabled the prediction and classification of diseases 10 , 11 , study of population health 12 , determination of optimal treatment policies 13 , 14 , simulation of clinical trials 15 and stratification of patients 16 .

However, current EHR datasets suffer from serious limitations, such as data collection issues, inconsistencies and lack of data diversity. EHR data collection and sharing problems often arise due to non-standardized formats, with disparate systems using exchange protocols, such as Health Level Seven International (HL7) and Fast Healthcare Interoperability Resources (FHIR) 17 . In addition, EHR data are stored in various on-disk formats, including, but not limited to, relational databases and CSV, XML and JSON formats. These variations pose challenges with respect to data retrieval, scalability, interoperability and data sharing.

Beyond format variability, inherent biases of the collected data can compromise the validity of findings. Selection bias stemming from non-representative sample composition can lead to skewed inferences about disease prevalence or treatment efficacy 18 , 19 . Filtering bias arises through inconsistent criteria for data inclusion, obscuring true variable relationships 20 . Surveillance bias exaggerates associations between exposure and outcomes due to differential monitoring frequencies 21 . EHR data are further prone to missing data 22 , 23 , which can be broadly classified into three categories: missing completely at random (MCAR), where missingness is unrelated to the data; missing at random (MAR), where missingness depends on observed data; and missing not at random (MNAR), where missingness depends on unobserved data 22 , 23 . Information and coding biases, related to inaccuracies in data recording or coding inconsistencies, respectively, can lead to misclassification and unreliable research conclusions 24 , 25 . Data may even contradict itself, such as when measurements were reported for deceased patients 26 , 27 . Technical variation and differing data collection standards lead to distribution differences and inconsistencies in representation and semantics across EHR datasets 28 , 29 . Attrition and confounding biases, resulting from differential patient dropout rates or unaccounted external variable effects, can significantly skew study outcomes 30 , 31 , 32 . The diversity of EHR data that comprise demographics, laboratory results, vital signs, diagnoses, medications, x-rays, written notes and even omics measurements amplifies all the aforementioned issues.

Addressing these challenges requires rigorous study design, careful data pre-processing and continuous bias evaluation through exploratory data analysis. Several EHR data pre-processing and analysis workflows were previously developed 4 , 33 , 34 , 35 , 36 , 37 , but none of them enables the analysis of heterogeneous data, provides in-depth documentation, is available as a software package or allows for exploratory visual analysis. Current EHR analysis pipelines, therefore, differ considerably in their approaches and are often commercial, vendor-specific solutions 38 . This is in contrast to strategies using community standards for the analysis of omics data, such as Bioconductor 39 or scverse 40 . As a result, EHR data frequently remain underexplored and are commonly investigated only for a particular research question 41 . Even in such cases, EHR data are then frequently input into machine learning models with serious data quality issues that greatly impact prediction performance and generalizability 42 .

To address this lack of analysis tooling, we developed the EHR Analysis in Python framework, ehrapy, which enables exploratory analysis of diverse EHR datasets. The ehrapy package is purpose-built to organize, analyze, visualize and statistically compare complex EHR data. ehrapy can be applied to datasets of different data types, sizes, diseases and origins. To demonstrate this versatility, we applied ehrapy to datasets obtained from EHR and population-based studies. Using the Pediatric Intensive Care (PIC) EHR database 43 , we stratified patients diagnosed with ‘unspecified pneumonia’ into distinct clinically relevant groups, extracted clinical indicators of pneumonia through statistical analysis and quantified medication-class effects on length of stay (LOS) with causal inference. Using the UK Biobank 44 (UKB), a population-scale cohort comprising over 500,000 participants from the United Kingdom, we employed ehrapy to explore cardiovascular risk factors using clinical predictors, metabolomics, genomics and retinal imaging-derived features. Additionally, we performed image analysis to project disease progression through fate mapping in patients affected by coronavirus disease 2019 (COVID-19) using chest x-rays. Finally, we demonstrate how exploratory analysis with ehrapy unveils and mitigates biases in over 100,000 visits by patients with diabetes across 130 US hospitals. We provide online links to additional use cases that demonstrate ehrapy’s usage with further datasets, including MIMIC-II (ref. 45 ), and for various medical conditions, such as patients subject to indwelling arterial catheter usage. ehrapy is compatible with any EHR dataset that can be transformed into vectors and is accessible as a user-friendly open-source software package hosted at https://github.com/theislab/ehrapy and installable from PyPI. It comes with comprehensive documentation, tutorials and further examples, all available at https://ehrapy.readthedocs.io .

ehrapy: a framework for exploratory EHR data analysis

The foundation of ehrapy is a robust and scalable data storage backend that is combined with a series of pre-processing and analysis modules. In ehrapy, EHR data are organized as a data matrix where observations are individual patient visits (or patients, in the absence of follow-up visits), and variables represent all measured quantities ( Methods ). These data matrices are stored together with metadata of observations and variables. By leveraging the AnnData (annotated data) data structure that implements this design, ehrapy builds upon established standards and is compatible with analysis and visualization functions provided by the omics scverse 40 ecosystem. Readers are also available in R, Julia and Javascript 46 . We additionally provide a dataset module with more than 20 public loadable EHR datasets in AnnData format to kickstart analysis and development with ehrapy.

For standardized analysis of EHR data, it is crucial that these data are encoded and stored in consistent, reusable formats. Thus, ehrapy requires that input data are organized in structured vectors. Readers for common formats, such as CSV, OMOP 47 or SQL databases, are available in ehrapy. Data loaded into AnnData objects can be mapped against several hierarchical ontologies 48 , 49 , 50 , 51 ( Methods ). Clinical keywords of free text notes can be automatically extracted ( Methods ).

Powered by scanpy, which scales to millions of observations 52 ( Methods and Supplementary Table 1 ) and the machine learning library scikit-learn 53 , ehrapy provides more than 100 composable analysis functions organized in modules from which custom analysis pipelines can be built. Each function directly interacts with the AnnData object and adds all intermediate results for simple access and reuse of information to it. To facilitate setting up these pipelines, ehrapy guides analysts through a general analysis pipeline (Fig. 1 ). At any step of an analysis pipeline, community software packages can be integrated without any vendor lock-in. Because ehrapy is built on open standards, it can be purposefully extended to solve new challenges, such as the development of foundational models ( Methods ).

figure 1

a , Heterogeneous health data are first loaded into memory as an AnnData object with patient visits as observational rows and variables as columns. Next, the data can be mapped against ontologies, and key terms are extracted from free text notes. b , The EHR data are subject to quality control where low-quality or spurious measurements are removed or imputed. Subsequently, numerical data are normalized, and categorical data are encoded. Data from different sources with data distribution shifts are integrated, embedded, clustered and annotated in a patient landscape. c , Further downstream analyses depend on the question of interest and can include the inference of causal effects and trajectories, survival analysis or patient stratification.

In the ehrapy analysis pipeline, EHR data are initially inspected for quality issues by analyzing feature distributions that may skew results and by detecting visits and features with high missing rates that ehrapy can then impute ( Methods ). ehrapy tracks all filtering steps while keeping track of population dynamics to highlight potential selection and filtering biases ( Methods ). Subsequently, ehrapy’s normalization and encoding functions ( Methods ) are applied to achieve a uniform numerical representation that facilitates data integration and corrects for dataset shift effects ( Methods ). Calculated lower-dimensional representations can subsequently be visualized, clustered and annotated to obtain a patient landscape ( Methods ). Such annotated groups of patients can be used for statistical comparisons to find differences in features among them to ultimately learn markers of patient states.

As analysis goals can differ between users and datasets, the ehrapy analysis pipeline is customizable during the final knowledge inference step. ehrapy provides statistical methods for group comparison and extensive support for survival analysis ( Methods ), enabling the discovery of biomarkers. Furthermore, ehrapy offers functions for causal inference to go from statistically determined associations to causal relations ( Methods ). Moreover, patient visits in aggregated EHR data can be regarded as snapshots where individual measurements taken at specific timepoints might not adequately reflect the underlying progression of disease and result from unrelated variation due to, for example, day-to-day differences 54 , 55 , 56 . Therefore, disease progression models should rely on analysis of the underlying clinical data, as disease progression in an individual patient may not be monotonous in time. ehrapy allows for the use of advanced trajectory inference methods to overcome sparse measurements 57 , 58 , 59 . We show that this approach can order snapshots to calculate a pseudotime that can adequately reflect the progression of the underlying clinical process. Given a sufficient number of snapshots, ehrapy increases the potential to understand disease progression, which is likely not robustly captured within a single EHR but, rather, across several.

ehrapy enables patient stratification in pneumonia cases

To demonstrate ehrapy’s capability to analyze heterogeneous datasets from a broad patient set across multiple care units, we applied our exploratory strategy to the PIC 43 database. The PIC database is a single-center database hosting information on children admitted to critical care units at the Children’s Hospital of Zhejiang University School of Medicine in China. It contains 13,499 distinct hospital admissions of 12,881 individual pediatric patients admitted between 2010 and 2018 for whom demographics, diagnoses, doctors’ notes, vital signs, laboratory and microbiology tests, medications, fluid balances and more were collected (Extended Data Figs. 1 and 2a and Methods ). After missing data imputation and subsequent pre-processing (Extended Data Figs. 2b,c and 3 and Methods ), we generated a uniform manifold approximation and projection (UMAP) embedding to visualize variation across all patients using ehrapy (Fig. 2a ). This visualization of the low-dimensional patient manifold shows the heterogeneity of the collected data in the PIC database, with malformations, perinatal and respiratory being the most abundant International Classification of Diseases (ICD) chapters (Fig. 2b ). The most common respiratory disease categories (Fig. 2c ) were labeled pneumonia and influenza ( n  = 984). We focused on pneumonia to apply ehrapy to a challenging, broad-spectrum disease that affects all age groups. Pneumonia is a prevalent respiratory infection that poses a substantial burden on public health 60 and is characterized by inflammation of the alveoli and distal airways 60 . Individuals with pre-existing chronic conditions are particularly vulnerable, as are children under the age of 5 (ref. 61 ). Pneumonia can be caused by a range of microorganisms, encompassing bacteria, respiratory viruses and fungi.

figure 2

a , UMAP of all patient visits in the ICU with primary discharge diagnosis grouped by ICD chapter. b , The prevalence of respiratory diseases prompted us to investigate them further. c , Respiratory categories show the abundance of influenza and pneumonia diagnoses that we investigated more closely. d , We observed the ‘unspecified pneumonia’ subgroup, which led us to investigate and annotate it in more detail. e , The previously ‘unspecified pneumonia’-labeled patients were annotated using several clinical features (Extended Data Fig. 5 ), of which the most important ones are shown in the heatmap ( f ). g , Example disease progression of an individual child with pneumonia illustrating pharmacotherapy over time until positive A. baumannii swab.

We selected the age group ‘youths’ (13 months to 18 years of age) for further analysis, addressing a total of 265 patients who dominated the pneumonia cases and were diagnosed with ‘unspecified pneumonia’ (Fig. 2d and Extended Data Fig. 4 ). Neonates (0–28 d old) and infants (29 d to 12 months old) were excluded from the analysis as the disease context is significantly different in these age groups due to distinct anatomical and physical conditions. Patients were 61% male, had a total of 277 admissions, had a mean age at admission of 54 months (median, 38 months) and had an average LOS of 15 d (median, 7 d). Of these, 152 patients were admitted to the pediatric intensive care unit (PICU), 118 to the general ICU (GICU), four to the surgical ICU (SICU) and three to the cardiac ICU (CICU). Laboratory measurements typically had 12–14% missing data, except for serum procalcitonin (PCT), a marker for bacterial infections, with 24.5% missing, and C-reactive protein (CRP), a marker of inflammation, with 16.8% missing. Measurements assigned as ‘vital signs’ contained between 44% and 54% missing values. Stratifying patients with unspecified pneumonia further enables a more nuanced understanding of the disease, potentially facilitating tailored approaches to treatment.

To deepen clinical phenotyping for the disease group ‘unspecified pneumonia’, we calculated a k -nearest neighbor graph to cluster patients into groups and visualize these in UMAP space ( Methods ). Leiden clustering 62 identified four patient groupings with distinct clinical features that we annotated (Fig. 2e ). To identify the laboratory values, medications and pathogens that were most characteristic for these four groups (Fig. 2f ), we applied t -tests for numerical data and g -tests for categorical data between the identified groups using ehrapy (Extended Data Fig. 5 and Methods ). Based on this analysis, we identified patient groups with ‘sepsis-like, ‘severe pneumonia with co-infection’, ‘viral pneumonia’ and ‘mild pneumonia’ phenotypes. The ‘sepsis-like’ group of patients ( n  = 28) was characterized by rapid disease progression as exemplified by an increased number of deaths (adjusted P  ≤ 5.04 × 10 −3 , 43% ( n  = 28), 95% confidence interval (CI): 23%, 62%); indication of multiple organ failure, such as elevated creatinine (adjusted P  ≤ 0.01, 52.74 ± 23.71 μmol L −1 ) or reduced albumin levels (adjusted P  ≤ 2.89 × 10 −4 , 33.40 ± 6.78 g L −1 ); and increased expression levels and peaks of inflammation markers, including PCT (adjusted P  ≤ 3.01 × 10 −2 , 1.42 ± 2.03 ng ml −1 ), whole blood cell count, neutrophils, lymphocytes, monocytes and lower platelet counts (adjusted P  ≤ 6.3 × 10 −2 , 159.30 ± 142.00 × 10 9 per liter) and changes in electrolyte levels—that is, lower potassium levels (adjusted P  ≤ 0.09 × 10 −2 , 3.14 ± 0.54 mmol L −1 ). Patients whom we associated with the term ‘severe pneumonia with co-infection’ ( n  = 74) were characterized by prolonged ICU stays (adjusted P  ≤ 3.59 × 10 −4 , 15.01 ± 29.24 d); organ affection, such as higher levels of creatinine (adjusted P  ≤ 1.10 × 10 −4 , 52.74 ± 23.71 μmol L −1 ) and lower platelet count (adjusted P  ≤ 5.40 × 10 −23 , 159.30 ± 142.00 × 10 9 per liter); increased inflammation markers, such as peaks of PCT (adjusted P  ≤ 5.06 × 10 −5 , 1.42 ± 2.03 ng ml −1 ), CRP (adjusted P  ≤ 1.40 × 10 −6 , 50.60 ± 37.58 mg L −1 ) and neutrophils (adjusted P  ≤ 8.51 × 10 −6 , 13.01 ± 6.98 × 10 9 per liter); detection of bacteria in combination with additional pathogen fungals in sputum samples (adjusted P  ≤ 1.67 × 10 −2 , 26% ( n  = 74), 95% CI: 16%, 36%); and increased application of medication, including antifungals (adjusted P  ≤ 1.30 × 10 −4 , 15% ( n  = 74), 95% CI: 7%, 23%) and catecholamines (adjusted P  ≤ 2.0 × 10 −2 , 45% ( n  = 74), 95% CI: 33%, 56%). Patients in the ‘mild pneumonia’ group were characterized by positive sputum cultures in the presence of relatively lower inflammation markers, such as PCT (adjusted P  ≤ 1.63 × 10 −3 , 1.42 ± 2.03 ng ml −1 ) and CRP (adjusted P  ≤ 0.03 × 10 −1 , 50.60 ± 37.58 mg L −1 ), while receiving antibiotics more frequently (adjusted P  ≤ 1.00 × 10 −5 , 80% ( n  = 78), 95% CI: 70%, 89%) and additional medications (electrolytes, blood thinners and circulation-supporting medications) (adjusted P  ≤ 1.00 × 10 −5 , 82% ( n  = 78), 95% CI: 73%, 91%). Finally, patients in the ‘viral pneumonia’ group were characterized by shorter LOSs (adjusted P  ≤ 8.00 × 10 −6 , 15.01 ± 29.24 d), a lack of non-viral pathogen detection in combination with higher lymphocyte counts (adjusted P  ≤ 0.01, 4.11 ± 2.49 × 10 9 per liter), lower levels of PCT (adjusted P  ≤ 0.03 × 10 −2 , 1.42 ± 2.03 ng ml −1 ) and reduced application of catecholamines (adjusted P  ≤ 5.96 × 10 −7 , 15% (n = 97), 95% CI: 8%, 23%), antibiotics (adjusted P  ≤ 8.53 × 10 −6 , 41% ( n  = 97), 95% CI: 31%, 51%) and antifungals (adjusted P  ≤ 5.96 × 10 −7 , 0% ( n  = 97), 95% CI: 0%, 0%).

To demonstrate the ability of ehrapy to examine EHR data from different levels of resolution, we additionally reconstructed a case from the ‘severe pneumonia with co-infection’ group (Fig. 2g ). In this case, the analysis revealed that CRP levels remained elevated despite broad-spectrum antibiotic treatment until a positive Acinetobacter baumannii result led to a change in medication and a subsequent decrease in CRP and monocyte levels.

ehrapy facilitates extraction of pneumonia indicators

ehrapy’s survival analysis module allowed us to identify clinical indicators of disease stages that could be used as biomarkers through Kaplan–Meier analysis. We found strong variance in overall aspartate aminotransferase (AST), alanine aminotransferase (ALT), gamma-glutamyl transferase (GGT) and bilirubin levels (Fig. 3a ), including changes over time (Extended Data Fig. 6a,b ), in all four ‘unspecified pneumonia’ groups. Routinely used to assess liver function, studies provide evidence that AST, ALT and GGT levels are elevated during respiratory infections 63 , including severe pneumonia 64 , and can guide diagnosis and management of pneumonia in children 63 . We confirmed reduced survival in more severely affected children (‘sepsis-like pneumonia’ and ‘severe pneumonia with co-infection’) using Kaplan–Meier curves and a multivariate log-rank test (Fig. 3b ; P  ≤ 1.09 × 10 −18 ) through ehrapy. To verify the association of this trajectory with altered AST, ALT and GGT expression levels, we further grouped all patients based on liver enzyme reference ranges ( Methods and Supplementary Table 2 ). By Kaplan–Meier survival analysis, cases with peaks of GGT ( P  ≤ 1.4 × 10 −2 , 58.01 ± 2.03 U L −1 ), ALT ( P  ≤ 2.9 × 10 −2 , 43.59 ± 38.02 U L −1 ) and AST ( P  ≤ 4.8 × 10 −4 , 78.69 ± 60.03 U L −1 ) in ‘outside the norm’ were found to correlate with lower survival in all groups (Fig. 3c and Extended Data Fig. 6 ), in line with previous studies 63 , 65 . Bilirubin was not found to significantly affect survival ( P  ≤ 2.1 × 10 −1 , 12.57 ± 21.22 mg dl −1 ).

figure 3

a , Line plots of major hepatic system laboratory measurements per group show variance in the measurements per pneumonia group. b , Kaplan–Meier survival curves demonstrate lower survival for ‘sepsis-like’ and ‘severe pneumonia with co-infection’ groups. c , Kaplan–Meier survival curves for children with GGT measurements outside the norm range display lower survival.

ehrapy quantifies medication class effect on LOS

Pneumonia requires case-specific medications due to its diverse causes. To demonstrate the potential of ehrapy’s causal inference module, we quantified the effect of medication on ICU LOS to evaluate case-specific administration of medication. In contrast to causal discovery that attempts to find a causal graph reflecting the causal relationships, causal inference is a statistical process used to investigate possible effects when altering a provided system, as represented by a causal graph and observational data (Fig. 4a ) 66 . This approach allows identifying and quantifying the impact of specific interventions or treatments on outcome measures, thereby providing insight for evidence-based decision-making in healthcare. Causal inference relies on datasets incorporating interventions to accurately quantify effects.

figure 4

a , ehrapy’s causal module is based on the strategy of the tool ‘dowhy’. Here, EHR data containing treatment, outcome and measurements and a causal graph serve as input for causal effect quantification. The process includes the identification of the target estimand based on the causal graph, the estimation of causal effects using various models and, finally, refutation where sensitivity analyses and refutation tests are performed to assess the robustness of the results and assumptions. b , Curated causal graph using age, liver damage and inflammation markers as disease progression proxies together with medications as interventions to assess the causal effect on length of ICU stay. c , Determined causal effect strength on LOS in days of administered medication categories.

We manually constructed a minimal causal graph with ehrapy (Fig. 4b ) on records of treatment with corticosteroids, carbapenems, penicillins, cephalosporins and antifungal and antiviral medications as interventions (Extended Data Fig. 7 and Methods ). We assumed that the medications affect disease progression proxies, such as inflammation markers and markers of organ function. The selection of ‘interventions’ is consistent with current treatment standards for bacterial pneumonia and respiratory distress 67 , 68 . Based on the approach of the tool ‘dowhy’ 69 (Fig. 4a ), ehrapy’s causal module identified the application of corticosteroids, antivirals and carbapenems to be associated with shorter LOSs, in line with current evidence 61 , 70 , 71 , 72 . In contrast, penicillins and cephalosporins were associated with longer LOSs, whereas antifungal medication did not strongly influence LOS (Fig. 4c ).

ehrapy enables deriving population-scale risk factors

To illustrate the advantages of using a unified data management and quality control framework, such as ehrapy, we modeled myocardial infarction risk using Cox proportional hazards models on UKB 44 data. Large population cohort studies, such as the UKB, enable the investigation of common diseases across a wide range of modalities, including genomics, metabolomics, proteomics, imaging data and common clinical variables (Fig. 5a,b ). From these, we used a publicly available polygenic risk score for coronary heart disease 73 comprising 6.6 million variants, 80 nuclear magnetic resonance (NMR) spectroscopy-based metabolomics 74 features, 81 features derived from retinal optical coherence tomography 75 , 76 and the Framingham Risk Score 77 feature set, which includes known clinical predictors, such as age, sex, body mass index, blood pressure, smoking behavior and cholesterol levels. We excluded features with more than 10% missingness and imputed the remaining missing values ( Methods ). Furthermore, individuals with events up to 1 year after the sampling time were excluded from the analyses, ultimately selecting 29,216 individuals for whom all mentioned data types were available (Extended Data Figs. 8 and 9 and Methods ). Myocardial infarction, as defined by our mapping to the phecode nomenclature 51 , was defined as the endpoint (Fig. 5c ). We modeled the risk for myocardial infarction 1 year after either the metabolomic sample was obtained or imaging was performed.

figure 5

a , The UKB includes 502,359 participants from 22 assessment centers. Most participants have genetic data (97%) and physical measurement data (93%), but fewer have data for complex measures, such as metabolomics, retinal imaging or proteomics. b , We found a distinct cluster of individuals (bottom right) from the Birmingham assessment center in the retinal imaging data, which is an artifact of the image acquisition process and was, thus, excluded. c , Myocardial infarctions are recorded for 15% of the male and 7% of the female study population. Kaplan–Meier estimators with 95% CIs are shown. d , For every modality combination, a linear Cox proportional hazards model was fit to determine the prognostic potential of these for myocardial infarction. Cardiovascular risk factors show expected positive log hazard ratios (log (HRs)) for increased blood pressure or total cholesterol and negative ones for sampling age and systolic blood pressure (BP). log (HRs) with 95% CIs are shown. e , Combining all features yields a C-index of 0.81. c – e , Error bars indicate 95% CIs ( n  = 29,216).

Predictive performance for each modality was assessed by fitting Cox proportional hazards (Fig. 5c ) models on each of the feature sets using ehrapy (Fig. 5d ). The age of the first occurrence served as the time to event; alternatively, date of death or date of the last record in the EHR served as censoring times. Models were evaluated using the concordance index (C-index) ( Methods ). The combination of multiple modalities successfully improved the predictive performance for coronary heart disease by increasing the C-index from 0.63 (genetic) to 0.76 (genetics, age and sex) and to 0.77 (clinical predictors) with 0.81 (imaging and clinical predictors) for combinations of feature sets (Fig. 5e ). Our finding is in line with previous observations of complementary effects between different modalities, where a broader ‘major adverse cardiac event’ phenotype was modeled in the UKB achieving a C-index of 0.72 (ref. 78 ). Adding genetic data improves predictive potential, as it is independent of sampling age and has limited prediction of other modalities 79 . The addition of metabolomic data did not improve predictive power (Fig. 5e ).

Imaging-based disease severity projection via fate mapping

To demonstrate ehrapy’s ability to handle diverse image data and recover disease stages, we embedded pulmonary imaging data obtained from patients with COVID-19 into a lower-dimensional space and computationally inferred disease progression trajectories using pseudotemporal ordering. This describes a continuous trajectory or ordering of individual points based on feature similarity 80 . Continuous trajectories enable mapping the fate of new patients onto precise states to potentially predict their future condition.

In COVID-19, a highly contagious respiratory illness caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), symptoms range from mild flu-like symptoms to severe respiratory distress. Chest x-rays typically show opacities (bilateral patchy, ground glass) associated with disease severity 81 .

We used COVID-19 chest x-ray images from the BrixIA 82 dataset consisting of 192 images (Fig. 6a ) with expert annotations of disease severity. We used the BrixIA database scores, which are based on six regions annotated by radiologists, to classify disease severity ( Methods ). We embedded raw image features using a pre-trained DenseNet model ( Methods ) and further processed this embedding into a nearest-neighbors-based UMAP space using ehrapy (Fig. 6b and Methods ). Fate mapping based on imaging information ( Methods ) determined a severity ordering from mild to critical cases (Fig. 6b–d ). Images labeled as ‘normal’ are projected to stay within the healthy group, illustrating the robustness of our approach. Images of diseased patients were ordered by disease severity, highlighting clear trajectories from ‘normal’ to ‘critical’ states despite the heterogeneity of the x-ray images stemming from, for example, different zoom levels (Fig. 6a ).

figure 6

a , Randomly selected chest x-ray images from the BrixIA dataset demonstrate its variance. b , UMAP visualization of the BrixIA dataset embedding shows a separation of disease severity classes. c , Calculated pseudotime for all images increases with distance to the ‘normal’ images. d , Stream projection of fate mapping in UMAP space showcases disease severity trajectory of the COVID-19 chest x-ray images.

Detecting and mitigating biases in EHR data with ehrapy

To showcase how exploratory analysis using ehrapy can reveal and mitigate biases, we analyzed the Fairlearn 83 version of the Diabetes 130-US Hospitals 84 dataset. The dataset covers 10 years (1999–2008) of clinical records from 130 US hospitals, detailing 47 features of diabetes diagnoses, laboratory tests, medications and additional data from up to 14 d of inpatient care of 101,766 diagnosed patient visits ( Methods ). It was originally collected to explore the link between the measurement of hemoglobin A1c (HbA1c) and early readmission.

The cohort primarily consists of White and African American individuals, with only a minority of cases from Asian or Hispanic backgrounds (Extended Data Fig. 10a ). ehrapy’s cohort tracker unveiled selection and surveillance biases when filtering for Medicare recipients for further analysis, resulting in a shift of age distribution toward an age of over 60 years in addition to an increasing ratio of White participants. Using ehrapy’s visualization modules, our analysis showed that HbA1c was measured in only 18.4% of inpatients, with a higher frequency in emergency admissions compared to referral cases (Extended Data Fig. 10b ). Normalization biases can skew data relationships when standardization techniques ignore subgroup variability or assume incorrect distributions. The choice of normalization strategy must be carefully considered to avoid obscuring important factors. When normalizing the number of applied medications individually, differences in distributions between age groups remained. However, when normalizing both distributions jointly with age group as an additional group variable, differences between age groups were masked (Extended Data Fig. 10c ). To investigate missing data and imputation biases, we introduced missingness for the number of applied medications according to an MCAR mechanism, which we verified using ehrapy’s Little’s test ( P  ≤ 0.01 × 10 −2 ), and an MAR mechanism ( Methods ). Whereas imputing the mean in the MCAR case did not affect the overall location of the distribution, it led to an underestimation of the variance, with the standard deviation dropping from 8.1 in the original data to 6.8 in the imputed data (Extended Data Fig. 10d ). Mean imputation in the MAR case skewed both location and variance of the mean from 16.02 to 14.66, with a standard deviation of only 5.72 (Extended Data Fig. 10d ). Using ehrapy’s multiple imputation based MissForest 85 imputation on the MAR data resulted in a mean of 16.04 and a standard deviation of 6.45. To predict patient readmission in fewer than 30 d, we merged the three smallest race groups, ‘Asian’, ‘Hispanic’ and ‘Other’. Furthermore, we dropped the gender group ‘Unknown/Invalid’ owing to the small sample size making meaningful assessment impossible, and we performed balanced random undersampling, resulting in 5,677 cases from each condition. We observed an overall balanced accuracy of 0.59 using a logistic regression model. However, the false-negative rate was highest for the races ‘Other’ and ‘Unknown’, whereas their selection rate was lowest, and this model was, therefore, biased (Extended Data Fig. 10e ). Using ehrapy’s compatibility with existing machine learning packages, we used Fairlearn’s ThresholdOptimizer ( Methods ), which improved the selection rates for ‘Other’ from 0.32 to 0.38 and for ‘Unknown’ from 0.23 to 0.42 and the false-negative rates for ‘Other’ from 0.48 to 0.42 and for ‘Unknown’ from 0.61 to 0.45 (Extended Data Fig. 10e ).

Clustering offers a hypothesis-free alternative to supervised classification when clear hypotheses or labels are missing. It has enabled the identification of heart failure subtypes 86 and progression pathways 87 and COVID-19 severity states 88 . This concept, which is central to ehrapy, further allowed us to identify fine-grained groups of ‘unspecified pneumonia’ cases in the PIC dataset while discovering biomarkers and quantifying effects of medications on LOS. Such retroactive characterization showcases ehrapy’s ability to put complex evidence into context. This approach supports feedback loops to improve diagnostic and therapeutic strategies, leading to more efficiently allocated resources in healthcare.

ehrapy’s flexible data structures enabled us to integrate the heterogeneous UKB data for predictive performance in myocardial infarction. The different data types and distributions posed a challenge for predictive models that were overcome with ehrapy’s pre-processing modules. Our analysis underscores the potential of combining phenotypic and health data at population scale through ehrapy to enhance risk prediction.

By adapting pseudotime approaches that are commonly used in other omics domains, we successfully recovered disease trajectories from raw imaging data with ehrapy. The determined pseudotime, however, only orders data but does not necessarily provide a future projection per patient. Understanding the driver features for fate mapping in image-based datasets is challenging. The incorporation of image segmentation approaches could mitigate this issue and provide a deeper insight into the spatial and temporal dynamics of disease-related processes.

Limitations of our analyses include the lack of control for informative missingness where the absence of information represents information in itself 89 . Translation from Chinese to English in the PIC database can cause information loss and inaccuracies because the Chinese ICD-10 codes are seven characters long compared to the five-character English codes. Incompleteness of databases, such as the lack of radiology images in the PIC database, low sample sizes, underrepresentation of non-White ancestries and participant self-selection, cannot be accounted for and limit generalizability. This restricts deeper phenotyping of, for example, all ‘unspecified pneumonia’ cases with respect to their survival, which could be overcome by the use of multiple databases. Our causal inference use case is limited by unrecorded variables, such as Sequential Organ Failure Assessment (SOFA) scores, and pneumonia-related pathogens that are missing in the causal graph due to dataset constraints, such as high sparsity and substantial missing data, which risk overfitting and can lead to overinterpretation. We counterbalanced this by employing several refutation methods that statistically reject the causal hypothesis, such as a placebo treatment, a random common cause or an unobserved common cause. The longer hospital stays associated with penicillins and cephalosporins may be dataset specific and stem from higher antibiotic resistance, their use as first-line treatments, more severe initial cases, comorbidities and hospital-specific protocols.

Most analysis steps can introduce algorithmic biases where results are misleading or unfavorably affect specific groups. This is particularly relevant in the context of missing data 22 where determining the type of missing data is necessary to handle it correctly. ehrapy includes an implementation of Little’s test 90 , which tests whether data are distributed MCAR to discern missing data types. For MCAR data single-imputation approaches, such as mean, median or mode, imputation can suffice, but these methods are known to reduce variability 91 , 92 . Multiple imputation strategies, such as Multiple Imputation by Chained Equations (MICE) 93 and MissForest 85 , as implemented in ehrapy, are effective for both MCAR and MAR data 22 , 94 , 95 . MNAR data require pattern-mixture or shared-parameter models that explicitly incorporate the mechanism by which data are missing 96 . Because MNAR involves unobserved data, the assumptions about the missingness mechanism cannot be directly verified, making sensitivity analysis crucial 21 . ehrapy’s wide range of normalization functions and grouping functionality enables to account for intrinsic variability within subgroups, and its compatibility with Fairlearn 83 can potentially mitigate predictor biases. Generally, we recommend to assess all pre-processing in an iterative manner with respect to downstream applications, such as patient stratification. Moreover, sensitivity analysis can help verify the robustness of all inferred knowledge 97 .

These diverse use cases illustrate ehrapy’s potential to sufficiently address the need for a computationally efficient, extendable, reproducible and easy-to-use framework. ehrapy is compatible with major standards, such as Observational Medical Outcomes Partnership (OMOP), Common Data Model (CDM) 47 , HL7, FHIR or openEHR, with flexible support for common tabular data formats. Once loaded into an AnnData object, subsequent sharing of analysis results is made easy because AnnData objects can be stored and read platform independently. ehrapy’s rich documentation of the application programming interface (API) and extensive hands-on tutorials make EHR analysis accessible to both novices and experienced analysts.

As ehrapy remains under active development, users can expect ehrapy to continuously evolve. We are improving support for the joint analysis of EHR, genetics and molecular data where ehrapy serves as a bridge between the EHR and the omics communities. We further anticipate the generation of EHR-specific reference datasets, so-called atlases 98 , to enable query-to-reference mapping where new datasets get contextualized by transferring annotations from the reference to the new dataset. To promote the sharing and collective analysis of EHR data, we envision adapted versions of interactive single-cell data explorers, such as CELLxGENE 99 or the UCSC Cell Browser 100 , for EHR data. Such web interfaces would also include disparity dashboards 20 to unveil trends of preferential outcomes for distinct patient groups. Additional modules specifically for high-frequency time-series data, natural language processing and other data types are currently under development. With the widespread availability of code-generating large language models, frameworks such as ehrapy are becoming accessible to medical professionals without coding expertise who can leverage its analytical power directly. Therefore, ehrapy, together with a lively ecosystem of packages, has the potential to enhance the scientific discovery pipeline to shape the era of EHR analysis.

All datasets that were used during the development of ehrapy and the use cases were used according to their terms of use as indicated by each provider.

Design and implementation of ehrapy

A unified pipeline as provided by our ehrapy framework streamlines the analysis of EHR data by providing an efficient, standardized approach, which reduces the complexity and variability in data pre-processing and analysis. This consistency ensures reproducibility of results and facilitates collaboration and sharing within the research community. Additionally, the modular structure allows for easy extension and customization, enabling researchers to adapt the pipeline to their specific needs while building on a solid foundational framework.

ehrapy was designed from the ground up as an open-source effort with community support. The package, as well as all associated tutorials and dataset preparation scripts, are open source. Development takes place publicly on GitHub where the developers discuss feature requests and issues directly with users. This tight interaction between both groups ensures that we implement the most pressing needs to cater the most important use cases and can guide users when difficulties arise. The open-source nature, extensive documentation and modular structure of ehrapy are designed for other developers to build upon and extend ehrapy’s functionality where necessary. This allows us to focus ehrapy on the most important features to keep the number of dependencies to a minimum.

ehrapy was implemented in the Python programming language and builds upon numerous existing numerical and scientific open-source libraries, specifically matplotlib 101 , seaborn 102 , NumPy 103 , numba 104 , Scipy 105 , scikit-learn 53 and Pandas 106 . Although taking considerable advantage of all packages implemented, ehrapy also shares the limitations of these libraries, such as a lack of GPU support or small performance losses due to the translation layer cost for operations between the Python interpreter and the lower-level C language for matrix operations. However, by building on very widely used open-source software, we ensure seamless integration and compatibility with a broad range of tools and platforms to promote community contributions. Additionally, by doing so, we enhance security by allowing a larger pool of developers to identify and address vulnerabilities 107 . All functions are grouped into task-specific modules whose implementation is complemented with additional dependencies.

Data preparation

Dataloaders.

ehrapy is compatible with any type of vectorized data, where vectorized refers to the data being stored in structured tables in either on-disk or database form. The input and output module of ehrapy provides readers for common formats, such as OMOP, CSV tables or SQL databases through Pandas. When reading in such datasets, the data are stored in the appropriate slots in a new AnnData 46 object. ehrapy’s data module provides access to more than 20 public EHR datasets that feature diseases, including, but not limited to, Parkinson’s disease, breast cancer, chronic kidney disease and more. All dataloaders return AnnData objects to allow for immediate analysis.

AnnData for EHR data

Our framework required a versatile data structure capable of handling various matrix formats, including Numpy 103 for general use cases and interoperability, Scipy 105 sparse matrices for efficient storage, Dask 108 matrices for larger-than-memory analysis and Awkward array 109 for irregular time-series data. We needed a single data structure that not only stores data but also includes comprehensive annotations for thorough contextual analysis. It was essential for this structure to be widely used and supported, which ensures robustness and continual updates. Interoperability with other analytical packages was a key criterion to facilitate seamless integration within existing tools and workflows. Finally, the data structure had to support both in-memory operations and on-disk storage using formats such as HDF5 (ref. 110 ) and Zarr 111 , ensuring efficient handling and accessibility of large datasets and the ability to easily share them with collaborators.

All of these requirements are fulfilled by the AnnData format, which is a popular data structure in single-cell genomics. At its core, an AnnData object encapsulates diverse components, providing a holistic representation of data and metadata that are always aligned in dimensions and easily accessible. A data matrix (commonly referred to as ‘ X ’) stands as the foundational element, embodying the measured data. This matrix can be dense (as Numpy array), sparse (as Scipy sparse matrix) or ragged (as Awkward array) where dimensions do not align within the data matrix. The AnnData object can feature several such data matrices stored in ‘layers’. Examples of such layers can be unnormalized or unencoded data. These data matrices are complemented by an observations (commonly referred to as ‘obs’) segment where annotations on the level of patients or visits are stored. Patients’ age or sex, for instance, are often used as such annotations. The variables (commonly referred to as ‘var’) section complements the observations, offering supplementary details about the features in the dataset, such as missing data rates. The observation-specific matrices (commonly referred to as ‘obsm’) section extends the capabilities of the AnnData structure by allowing the incorporation of observation-specific matrices. These matrices can represent various types of information at the individual cell level, such as principal component analysis (PCA) results, t-distributed stochastic neighbor embedding (t-SNE) coordinates or other dimensionality reduction outputs. Analogously, AnnData features a variables-specific variables (commonly referred to as ‘varm’) component. The observation-specific pairwise relationships (commonly referred to as ‘obsp’) segment complements the ‘obsm’ section by accommodating observation-specific pairwise relationships. This can include connectivity matrices, indicating relationships between patients. The inclusion of an unstructured annotations (commonly referred to as ‘uns’) component further enhances flexibility. This segment accommodates unstructured annotations or arbitrary data that might not conform to the structured observations or variables categories. Any AnnData object can be stored on disk in h5ad or Zarr format to facilitate data exchange.

ehrapy natively interfaces with the scientific Python ecosystem via Pandas 112 and Numpy 103 . The development of deep learning models for EHR data 113 is further accelerated through compatibility with pathml 114 , a unified framework for whole-slide image analysis in pathology, and scvi-tools 115 , which provides data loaders for loading tensors from AnnData objects into PyTorch 116 or Jax arrays 117 to facilitate the development of generalizing foundational models for medical artificial intelligence 118 .

Feature annotation

After AnnData creation, any metadata can be mapped against ontologies using Bionty ( https://github.com/laminlabs/bionty-base ). Bionty provides access to the Human Phenotype, Phecodes, Phenotype and Trait, Drug, Mondo and Human Disease ontologies.

Key medical terms stored in an AnnData object in free text can be extracted using the Medical Concept Annotation Toolkit (MedCAT) 119 .

Data processing

Cohort tracking.

ehrapy provides a CohortTracker tool that traces all filtering steps applied to an associated AnnData object. To calculate cohort summary statistics, the implementation makes use of tableone 120 and can subsequently be plotted as bar charts together with flow diagrams 121 that visualize the order and reasoning of filtering operations.

Basic pre-processing and quality control

ehrapy encompasses a suite of functionalities for fundamental data processing that are adopted from scanpy 52 but adapted to EHR data:

Regress out: To address unwanted sources of variation, a regression procedure is integrated, enhancing the dataset’s robustness.

Subsample: Selects a specified fraction of observations.

Balanced sample: Balances groups in the dataset by random oversampling or undersampling.

Highly variable features: The identification and annotation of highly variable features following the ‘highly variable genes’ function of scanpy is seamlessly incorporated, providing users with insights into pivotal elements influencing the dataset.

To identify and minimize quality issues, ehrapy provides several quality control functions:

Basic quality control: Determines the relative and absolute number of missing values per feature and per patient.

Winsorization: For data refinement, ehrapy implements a winsorization process, creating a version of the input array less susceptible to extreme values.

Feature clipping: Imposes limits on features to enhance dataset reliability.

Detect biases: Computes pairwise correlations between features, standardized mean differences for numeric features between groups of sensitive features, categorical feature value count differences between groups of sensitive features and feature importances when predicting a target variable.

Little’s MCAR test: Applies Little’s MCAR test whose null hypothesis is that data are MCAR. Rejecting the null hypothesis may not always mean that data are not MCAR, nor is accepting the null hypothesis a guarantee that data are MCAR. For more details, see Schouten et al. 122 .

Summarize features: Calculates statistical indicators per feature, including minimum, maximum and average values. This can be especially useful to reduce complex data with multiple measurements per feature per patient into sets of columns with single values.

Imputation is crucial in data analysis to address missing values, ensuring the completeness of datasets that can be required for specific algorithms. The ‘ehrapy’ pre-processing module offers a range of imputation techniques:

Explicit Impute: Replaces missing values, in either all columns or a user-specified subset, with a designated replacement value.

Simple Impute: Imputes missing values in numerical data using mean, median or the most frequent value, contributing to a more complete dataset.

KNN Impute: Uses k -nearest neighbor imputation to fill in missing values in the input AnnData object, preserving local data patterns.

MissForest Impute: Implements the MissForest strategy for imputing missing data, providing a robust approach for handling complex datasets.

MICE Impute: Applies the MICE algorithm for imputing data. This implementation is based on the miceforest ( https://github.com/AnotherSamWilson/miceforest ) package.

Data encoding can be required if categoricals are a part of the dataset to obtain numerical values only. Most algorithms in ehrapy are compatible only with numerical values. ehrapy offers two encoding algorithms based on scikit-learn 53 :

One-Hot Encoding: Transforms categorical variables into binary vectors, creating a binary feature for each category and capturing the presence or absence of each category in a concise representation.

Label Encoding: Assigns a unique numerical label to each category, facilitating the representation of categorical data as ordinal values and supporting algorithms that require numerical input.

To ensure that the distributions of the heterogeneous data are aligned, ehrapy offers several normalization procedures:

Log Normalization: Applies the natural logarithm function to the data, useful for handling skewed distributions and reducing the impact of outliers.

Max-Abs Normalization: Scales each feature by its maximum absolute value, ensuring that the maximum absolute value for each feature is 1.

Min-Max Normalization: Transforms the data to a specific range (commonly (0, 1)) by scaling each feature based on its minimum and maximum values.

Power Transformation Normalization: Applies a power transformation to make the data more Gaussian like, often useful for stabilizing variance and improving the performance of models sensitive to distributional assumptions.

Quantile Normalization: Aligns the distributions of multiple variables, ensuring that their quantiles match, which can be beneficial for comparing datasets or removing batch effects.

Robust Scaling Normalization: Scales data using the interquartile range, making it robust to outliers and suitable for datasets with extreme values.

Scaling Normalization: Standardizes data by subtracting the mean and dividing by the standard deviation, creating a distribution with a mean of 0 and a standard deviation of 1.

Offset to Positive Values: Shifts all values by a constant offset to make all values non-negative, with the lowest negative value becoming 0.

Dataset shifts can be corrected using the scanpy implementation of the ComBat 123 algorithm, which employs a parametric and non-parametric empirical Bayes framework for adjusting data for batch effects that is robust to outliers.

Finally, a neighbors graph can be efficiently computed using scanpy’s implementation.

To obtain meaningful lower-dimensional embeddings that can subsequently be visualized and reused for downstream algorithms, ehrapy provides the following algorithms based on scanpy’s implementation:

t-SNE: Uses a probabilistic approach to embed high-dimensional data into a lower-dimensional space, emphasizing the preservation of local similarities and revealing clusters in the data.

UMAP: Embeds data points by modeling their local neighborhood relationships, offering an efficient and scalable technique that captures both global and local structures in high-dimensional data.

Force-Directed Graph Drawing: Uses a physical simulation to position nodes in a graph, with edges representing pairwise relationships, creating a visually meaningful representation that emphasizes connectedness and clustering in the data.

Diffusion Maps: Applies spectral methods to capture the intrinsic geometry of high-dimensional data by modeling diffusion processes, providing a way to uncover underlying structures and patterns.

Density Calculation in Embedding: Quantifies the density of observations within an embedding, considering conditions or groups, offering insights into the concentration of data points in different regions and aiding in the identification of densely populated areas.

ehrapy further provides algorithms for clustering and trajectory inference based on scanpy:

Leiden Clustering: Uses the Leiden algorithm to cluster observations into groups, revealing distinct communities within the dataset with an emphasis on intra-cluster cohesion.

Hierarchical Clustering Dendrogram: Constructs a dendrogram through hierarchical clustering based on specified group by categories, illustrating the hierarchical relationships among observations and facilitating the exploration of structured patterns.

Feature ranking

ehrapy provides two ways of ranking feature contributions to clusters and target variables:

Statistical tests: To compare any obtained clusters to obtain marker features that are significantly different between the groups, ehrapy extends scanpy’s ‘rank genes groups’. The original implementation, which features a t -test for numerical data, is complemented by a g -test for categorical data.

Feature importance: Calculates feature rankings for a target variable using linear regression, support vector machine or random forest models from scikit-learn. ehrapy evaluates the relative importance of each predictor by fitting the model and extracting model-specific metrics, such as coefficients or feature importances.

Dataset integration

Based on scanpy’s ‘ingest’ function, ehrapy facilitates the integration of labels and embeddings from a well-annotated reference dataset into a new dataset, enabling the mapping of cluster annotations and spatial relationships for consistent comparative analysis. This process ensures harmonized clinical interpretations across datasets, especially useful when dealing with multiple experimental diseases or batches.

Knowledge inference

Survival analysis.

ehrapy’s implementation of survival analysis algorithms is based on lifelines 124 :

Ordinary Least Squares (OLS) Model: Creates a linear regression model using OLS from a specified formula and an AnnData object, allowing for the analysis of relationships between variables and observations.

Generalized Linear Model (GLM): Constructs a GLM from a given formula, distribution and AnnData, providing a versatile framework for modeling relationships with nonlinear data structures.

Kaplan–Meier: Fits the Kaplan–Meier curve to generate survival curves, offering a visual representation of the probability of survival over time in a dataset.

Cox Hazard Model: Constructs a Cox proportional hazards model using a specified formula and an AnnData object, enabling the analysis of survival data by modeling the hazard rates and their relationship to predictor variables.

Log-Rank Test: Calculates the P value for the log-rank test, comparing the survival functions of two groups, providing statistical significance for differences in survival distributions.

GLM Comparison: Given two fit GLMs, where the larger encompasses the parameter space of the smaller, this function returns the P value, indicating the significance of the larger model and adding explanatory power beyond the smaller model.

Trajectory inference

Trajectory inference is a computational approach that reconstructs and models the developmental paths and transitions within heterogeneous clinical data, providing insights into the temporal progression underlying complex systems. ehrapy offers several inbuilt algorithms for trajectory inference based on scanpy:

Diffusion Pseudotime: Infers the progression of observations by measuring geodesic distance along the graph, providing a pseudotime metric that represents the developmental trajectory within the dataset.

Partition-based Graph Abstraction (PAGA): Maps out the coarse-grained connectivity structures of complex manifolds using a partition-based approach, offering a comprehensive visualization of relationships in high-dimensional data and aiding in the identification of macroscopic connectivity patterns.

Because ehrapy is compatible with scverse, further trajectory inference-based algorithms, such as CellRank, can be seamlessly applied.

Causal inference

ehrapy’s causal inference module is based on ‘dowhy’ 69 . It is based on four key steps that are all implemented in ehrapy:

Graphical Model Specification: Define a causal graphical model representing relationships between variables and potential causal effects.

Causal Effect Identification: Automatically identify whether a causal effect can be inferred from the given data, addressing confounding and selection bias.

Causal Effect Estimation: Employ automated tools to estimate causal effects, using methods such as matching, instrumental variables or regression.

Sensitivity Analysis and Testing: Perform sensitivity analysis to assess the robustness of causal inferences and conduct statistical testing to determine the significance of the estimated causal effects.

Patient stratification

ehrapy’s complete pipeline from pre-processing to the generation of lower-dimensional embeddings, clustering, statistical comparison between determined groups and more facilitates the stratification of patients.

Visualization

ehrapy features an extensive visualization pipeline that is customizable and yet offers reasonable defaults. Almost every analysis function is matched with at least one visualization function that often shares the name but is available through the plotting module. For example, after importing ehrapy as ‘ep’, ‘ep.tl.umap(adata)’ runs the UMAP algorithm on an AnnData object, and ‘ep.pl.umap(adata)’ would then plot a scatter plot of the UMAP embedding.

ehrapy further offers a suite of more generally usable and modifiable plots:

Scatter Plot: Visualizes data points along observation or variable axes, offering insights into the distribution and relationships between individual data points.

Heatmap: Represents feature values in a grid, providing a comprehensive overview of the data’s structure and patterns.

Dot Plot: Displays count values of specified variables as dots, offering a clear depiction of the distribution of counts for each variable.

Filled Line Plot: Illustrates trends in data with filled lines, emphasizing variations in values over a specified axis.

Violin Plot: Presents the distribution of data through mirrored density plots, offering a concise view of the data’s spread.

Stacked Violin Plot: Combines multiple violin plots, stacked to allow for visual comparison of distributions across categories.

Group Mean Heatmap: Creates a heatmap displaying the mean count per group for each specified variable, providing insights into group-wise trends.

Hierarchically Clustered Heatmap: Uses hierarchical clustering to arrange data in a heatmap, revealing relationships and patterns among variables and observations.

Rankings Plot: Visualizes rankings within the data, offering a clear representation of the order and magnitude of values.

Dendrogram Plot: Plots a dendrogram of categories defined in a group by operation, illustrating hierarchical relationships within the dataset.

Benchmarking ehrapy

We generated a subset of the UKB data selecting 261 features and 488,170 patient visits. We removed all features with missingness rates greater than 70%. To demonstrate speed and memory consumption for various scenarios, we subsampled the data to 20%, 30% and 50%. We ran a minimal ehrapy analysis pipeline on each of those subsets and the full data, including the calculation of quality control metrics, filtering of variables by a missingness threshold, nearest neighbor imputation, normalization, dimensionality reduction and clustering (Supplementary Table 1 ). We conducted our benchmark on a single CPU with eight threads and 60 GB of maximum memory.

ehrapy further provides out-of-core implementations using Dask 108 for many algorithms in ehrapy, such as our normalization functions or our PCA implementation. Out-of-core computation refers to techniques that process data that do not fit entirely in memory, using disk storage to manage data overflow. This approach is crucial for handling large datasets without being constrained by system memory limits. Because the principal components get reused for other computationally expensive algorithms, such as the neighbors graph calculation, it effectively enables the analysis of very large datasets. We are currently working on supporting out-of-core computation for all computationally expensive algorithms in ehrapy.

We demonstrate the memory benefits in a hosted tutorial where the in-memory pipeline for 50,000 patients with 1,000 features required about 2 GB of memory, and the corresponding out-of-core implementation required less than 200 MB of memory.

The code for benchmarking is available at https://github.com/theislab/ehrapy-reproducibility . The implementation of ehrapy is accessible at https://github.com/theislab/ehrapy together with extensive API documentation and tutorials at https://ehrapy.readthedocs.io .

PIC database analysis

Study design.

We collected clinical data from the PIC 43 version 1.1.0 database. PIC is a single-center, bilingual (English and Chinese) database hosting information of children admitted to critical care units at the Children’s Hospital of Zhejiang University School of Medicine in China. The requirement for individual patient consent was waived because the study did not impact clinical care, and all protected health information was de-identified. The database contains 13,499 distinct hospital admissions of 12,881 distinct pediatric patients. These patients were admitted to five ICU units with 119 total critical care beds—GICU, PICU, SICU, CICU and NICU—between 2010 and 2018. The mean age of the patients was 2.5 years, of whom 42.5% were female. The in-hospital mortality was 7.1%; the mean hospital stay was 17.6 d; the mean ICU stay was 9.3 d; and 468 (3.6%) patients were admitted multiple times. Demographics, diagnoses, doctors’ notes, laboratory and microbiology tests, prescriptions, fluid balances, vital signs and radiographics reports were collected from all patients. For more details, see the original publication of Zeng et al. 43 .

Study participants

Individuals older than 18 years were excluded from the study. We grouped the data into three distinct groups: ‘neonates’ (0–28 d of age; 2,968 patients), ‘infants’ (1–12 months of age; 4,876 patients) and ‘youths’ (13 months to 18 years of age; 6,097 patients). We primarily analyzed the ‘youths’ group with the discharge diagnosis ‘unspecified pneumonia’ (277 patients).

Data collection

The collected clinical data included demographics, laboratory and vital sign measurements, diagnoses, microbiology and medication information and mortality outcomes. The five-character English ICD-10 codes were used, whose values are based on the seven-character Chinese ICD-10 codes.

Dataset extraction and analysis

We downloaded the PIC database of version 1.1.0 from Physionet 1 to obtain 17 CSV tables. Using Pandas, we selected all information with more than 50% coverage rate, including demographics and laboratory and vital sign measurements (Fig. 2 ). To reduce the amount of noise, we calculated and added only the minimum, maximum and average of all measurements that had multiple values per patient. Examination reports were removed because they describe only diagnostics and not detailed findings. All further diagnoses and microbiology and medication information were included into the observations slot to ensure that the data were not used for the calculation of embeddings but were still available for the analysis. This ensured that any calculated embedding would not be divided into treated and untreated groups but, rather, solely based on phenotypic features. We imputed all missing data through k -nearest neighbors imputation ( k  = 20) using the knn_impute function of ehrapy. Next, we log normalized the data with ehrapy using the log_norm function. Afterwards, we winsorized the data using ehrapy’s winsorize function to obtain 277 ICU visits ( n  = 265 patients) with 572 features. Of those 572 features, 254 were stored in the matrix X and the remaining 318 in the ‘obs’ slot in the AnnData object. For clustering and visualization purposes, we calculated 50 principal components using ehrapy’s pca function. The obtained principal component representation was then used to calculate a nearest neighbors graph using the neighbors function of ehrapy. The nearest neighbors graph then served as the basis for a UMAP embedding calculation using ehrapy’s umap function.

We applied the community detection algorithm Leiden with resolution 0.6 on the nearest neighbor graph using ehrapy’s leiden function. The four obtained clusters served as input for two-sided t -tests for all numerical values and two-sided g -tests for all categorical values for all four clusters against the union of all three other clusters, respectively. This was conducted using ehrapy’s rank_feature_groups function, which also corrects P values for multiple testing with the Benjamini–Hochberg method 125 . We presented the four groups and the statistically significantly different features between the groups to two pediatricians who annotated the groups with labels.

Our determined groups can be confidently labeled owing to their distinct clinical profiles. Nevertheless, we could only take into account clinical features that were measured. Insightful features, such as lung function tests, are missing. Moreover, the feature representation of the time-series data is simplified, which can hide some nuances between the groups. Generally, deciding on a clustering resolution is difficult. However, more fine-grained clusters obtained via higher clustering resolutions may become too specific and not generalize well enough.

Kaplan–Meier survival analysis

We selected patients with up to 360 h of total stay for Kaplan–Meier survival analysis to ensure a sufficiently high number of participants. We proceeded with the AnnData object prepared as described in the ‘Patient stratification’ subsection to conduct Kaplan–Meier analysis among all four determined pneumonia groups using ehrapy’s kmf function. Significance was tested through ehrapy’s test_kmf_logrank function, which tests whether two Kaplan–Meier series are statistically significant, employing a chi-squared test statistic under the null hypothesis. Let h i (t) be the hazard ratio of group i at time t and c a constant that represents a proportional change in the hazard ratio between the two groups, then:

This implicitly uses the log-rank weights. An additional Kaplan–Meier analysis was conducted for all children jointly concerning the liver markers AST, ALT and GGT. To determine whether measurements were inside or outside the norm range, we used reference ranges (Supplementary Table 2 ). P values less than 0.05 were labeled significant.

Our Kaplan–Meier curve analysis depends on the groups being well defined and shares the same limitations as the patient stratification. Additionally, the analysis is sensitive to the reference table where we selected limits that generalize well for the age ranges, but, due to children of different ages being examined, they may not necessarily be perfectly accurate for all children.

Causal effect of mechanism of action on LOS

Although the dataset was not initially intended for investigating causal effects of interventions, we adapted it for this purpose by focusing on the LOS in the ICU, measured in months, as the outcome variable. This choice aligns with the clinical aim of stabilizing patients sufficiently for ICU discharge. We constructed a causal graph to explore how different drug administrations could potentially reduce the LOS. Based on consultations with clinicians, we included several biomarkers of liver damage (AST, ALT and GGT) and inflammation (CRP and PCT) in our model. Patient age was also considered a relevant variable.

Because several different medications act by the same mechanisms, we grouped specific medications by their drug classes This grouping was achieved by cross-referencing the drugs listed in the dataset with DrugBank release 5.1 (ref. 126 ), using Levenshtein distances for partial string matching. After manual verification, we extracted the corresponding DrugBank categories, counted the number of features per category and compiled a list of commonly prescribed medications, as advised by clinicians. This approach facilitated the modeling of the causal graph depicted in Fig. 4 , where an intervention is defined as the administration of at least one drug from a specified category.

Causal inference was then conducted with ehrapy’s ‘dowhy’ 69 -based causal inference module using the expert-curated causal graph. Medication groups were designated as causal interventions, and the LOS was the outcome of interest. Linear regression served as the estimation method for analyzing these causal effects. We excluded four patients from the analysis owing to their notably long hospital stays exceeding 90 d, which were deemed outliers. To validate the robustness of our causal estimates, we incorporated several refutation methods:

Placebo Treatment Refuter: This method involved replacing the treatment assignment with a placebo to test the effect of the treatment variable being null.

Random Common Cause: A randomly generated variable was added to the data to assess the sensitivity of the causal estimate to the inclusion of potential unmeasured confounders.

Data Subset Refuter: The stability of the causal estimate was tested across various random subsets of the data to ensure that the observed effects were not dependent on a specific subset.

Add Unobserved Common Cause: This approach tested the effect of an omitted variable by adding a theoretically relevant unobserved confounder to the model, evaluating how much an unmeasured variable could influence the causal relationship.

Dummy Outcome: Replaces the true outcome variable with a random variable. If the causal effect nullifies, it supports the validity of the original causal relationship, indicating that the outcome is not driven by random factors.

Bootstrap Validation: Employs bootstrapping to generate multiple samples from the dataset, testing the consistency of the causal effect across these samples.

The selection of these refuters addresses a broad spectrum of potential biases and model sensitivities, including unobserved confounders and data dependencies. This comprehensive approach ensures robust verification of the causal analysis. Each refuter provides an orthogonal perspective, targeting specific vulnerabilities in causal analysis, which strengthens the overall credibility of the findings.

UKB analysis

Study population.

We used information from the UKB cohort, which includes 502,164 study participants from the general UK population without enrichment for specific diseases. The study involved the enrollment of individuals between 2006 and 2010 across 22 different assessment centers throughout the United Kingdom. The tracking of participants is still ongoing. Within the UKB dataset, metabolomics, proteomics and retinal optical coherence tomography data are available for a subset of individuals without any enrichment for specific diseases. Additionally, EHRs, questionnaire responses and other physical measures are available for almost everyone in the study. Furthermore, a variety of genotype information is available for nearly the entire cohort, including whole-genome sequencing, whole-exome sequencing, genotyping array data as well as imputed genotypes from the genotyping array 44 . Because only the latter two are available for download, and are sufficient for polygenic risk score calculation as performed here, we used the imputed genotypes in the present study. Participants visited the assessment center up to four times for additional and repeat measurements and completed additional online follow-up questionnaires.

In the present study, we restricted the analyses to data obtained from the initial assessment, including the blood draw, for obtaining the metabolomics data and the retinal imaging as well as physical measures. This restricts the study population to 33,521 individuals for whom all of these modalities are available. We have a clear study start point for each individual with the date of their initial assessment center visit. The study population has a mean age of 57 years, is 54% female and is censored at age 69 years on average; 4.7% experienced an incident myocardial infarction; and 8.1% have prevalent type 2 diabetes. The study population comes from six of the 22 assessment centers due to the retinal imaging being performed only at those.

For the myocardial infarction endpoint definition, we relied on the first occurrence data available in the UKB, which compiles the first date that each diagnosis was recorded for a participant in a hospital in ICD-10 nomenclature. Subsequently, we mapped these data to phecodes and focused on phecode 404.1 for myocardial infarction.

The Framingham Risk Score was developed on data from 8,491 participants in the Framingham Heart Study to assess general cardiovascular risk 77 . It includes easily obtainable predictors and is, therefore, easily applicable in clinical practice, although newer and more specific risk scores exist and might be used more frequently. It includes age, sex, smoking behavior, blood pressure, total and low-density lipoprotein cholesterol as well as information on insulin, antihypertensive and cholesterol-lowering medications, all of which are routinely collected in the UKB and used in this study as the Framingham feature set.

The metabolomics data used in this study were obtained using proton NMR spectroscopy, a low-cost method with relatively low batch effects. It covers established clinical predictors, such as albumin and cholesterol, as well as a range of lipids, amino acids and carbohydrate-related metabolites.

The retinal optical coherence tomography–derived features were returned by researchers to the UKB 75 , 76 . They used the available scans and determined the macular volume, macular thickness, retinal pigment epithelium thickness, disc diameter, cup-to-disk ratio across different regions as well as the thickness between the inner nuclear layer and external limiting membrane, inner and outer photoreceptor segments and the retinal pigment epithelium across different regions. Furthermore, they determined a wide range of quality metrics for each scan, including the image quality score, minimum motion correlation and inner limiting membrane (ILM) indicator.

Data analysis

After exporting the data from the UKB, all timepoints were transformed into participant age entries. Only participants without prevalent myocardial infarction (relative to the first assessment center visit at which all data were collected) were included.

The data were pre-processed for retinal imaging and metabolomics subsets separately, to enable a clear analysis of missing data and allow for the k -nearest neighbors–based imputation ( k  = 20) of missing values when less than 10% were missing for a given participant. Otherwise, participants were dropped from the analyses. The imputed genotypes and Framingham analyses were available for almost every participant and, therefore, not imputed. Individuals without them were, instead, dropped from the analyses. Because genetic risk modeling poses entirely different methodological and computational challenges, we applied a published polygenic risk score for coronary heart disease using 6.6 million variants 73 . This was computed using the plink2 score option on the imputed genotypes available in the UKB.

UMAP embeddings were computed using default parameters on the full feature sets with ehrapy’s umap function. For all analyses, the same time-to-event and event-indicator columns were used. The event indicator is a Boolean variable indicating whether a myocardial infarction was observed for a study participant. The time to event is defined as the timespan between the start of the study, in this case the date of the first assessment center visit. Otherwise, it is the timespan from the start of the study to the start of censoring; in this case, this is set to the last date for which EHRs were available, unless a participant died, in which case the date of death is the start of censoring. Kaplan–Meier curves and Cox proportional hazards models were fit using ehrapy’s survival analysis module and the lifelines 124 package’s Cox-PHFitter function with default parameters. For Cox proportional hazards models with multiple feature sets, individually imputed and quality-controlled feature sets were concatenated, and the model was fit on the resulting matrix. Models were evaluated using the C-index 127 as a metric. It can be seen as an extension of the common area under the receiver operator characteristic score to time-to-event datasets, in which events are not observed for every sample and which ranges from 0.0 (entirely false) over 0.5 (random) to 1.0 (entirely correct). CIs for the C-index were computed based on bootstrapping by sampling 1,000 times with replacement from all computed partial hazards and computing the C-index over each of these samples. The percentiles at 2.5% and 97.5% then give the upper and lower confidence bound for the 95% CIs.

In all UKB analyses, the unit of study for a statistical test or predictive model is always an individual study participant.

The generalizability of the analysis is limited as the UK Biobank cohort may not represent the general population, with potential selection biases and underrepresentation of the different demographic groups. Additionally, by restricting analysis to initial assessment data and censoring based on the last available EHR or date of death, our analysis does not account for longitudinal changes and can introduce follow-up bias, especially if participants lost to follow-up have different risk profiles.

In-depth quality control of retina-derived features

A UMAP plot of the retina-derived features indicating the assessment centers shows a cluster of samples that lie somewhat outside the general population and mostly attended the Birmingham assessment center (Fig. 5b ). To further investigate this, we performed Leiden clustering of resolution 0.3 (Extended Data Fig. 9a ) and isolated this group in cluster 5. When comparing cluster 5 to the rest of the population in the retina-derived feature space, we noticed that many individuals in cluster 5 showed overall retinal pigment epithelium (RPE) thickness measures substantially elevated over the rest of the population in both eyes (Extended Data Fig. 9b ), which is mostly a feature of this cluster (Extended Data Fig. 9c ). To investigate potential confounding, we computed ratios between cluster 5 and the rest of the population over the ‘obs’ DataFrame containing the Framingham features, diabetes-related phecodes and genetic principal components. Out of the top and bottom five highest ratios observed, six are in genetic principal components, which are commonly used to represent genetic ancestry in a continuous space (Extended Data Fig. 9d ). Additionally, diagnoses for type 1 and type 2 diabetes and antihypertensive use are enriched in cluster 5. Further investigating the ancestry, we computed log ratios for self-reported ancestries and absolute counts, which showed no robust enrichment and depletion effects.

A closer look at three quality control measures of the imaging pipeline revealed that cluster 5 was an outlier in terms of either image quality (Extended Data Fig. 9e ) or minimum motion correlation (Extended Data Fig. 9f ) and the ILM indicator (Extended Data Fig. 9g ), all of which can be indicative of artifacts in image acquisition and downstream processing 128 . Subsequently, we excluded 301 individuals from cluster 5 from all analyses.

COVID-19 chest-x-ray fate determination

Dataset overview.

We used the public BrixIA COVID-19 dataset, which contains 192 chest x-ray images annotated with BrixIA scores 82 . Hereby, six regions were annotated by a senior radiologist with more than 20 years of experience and a junior radiologist with a disease severity score ranging from 0 to 3. A global score was determined as the sum of all of these regions and, therefore, ranges from 0 to 18 (S-Global). S-Global scores of 0 were classified as normal. Images that only had severity values up to 1 in all six regions were classified as mild. Images with severity values greater than or equal to 2, but a S-Global score of less than 7, were classified as moderate. All images that contained at least one 3 in any of the six regions with a S-Global score between 7 and 10 were classified as severe, and all remaining images with S-Global scores greater than 10 with at least one 3 were labeled critical. The dataset and instructions to download the images can be found at https://github.com/ieee8023/covid-chestxray-dataset .

We first resized all images to 224 × 224. Afterwards, the images underwent a random affine transformation that involved rotation, translation and scaling. The rotation angle was randomly selected from a range of −45° to 45°. The images were also subject to horizontal and vertical translation, with the maximum translation being 15% of the image size in either direction. Additionally, the images were scaled by a factor ranging from 0.85 to 1.15. The purpose of applying these transformations was to enhance the dataset and introduce variations, ultimately improving the robustness and generalization of the model.

To generate embeddings, we used a pre-trained DenseNet model with weights densenet121-res224-all of TorchXRayVision 129 . A DenseNet is a convolutional neural network that makes use of dense connections between layers (Dense Blocks) where all layers (with matching feature map sizes) directly connect with each other. To maintain a feed-forward nature, every layer in the DenseNet architecture receives supplementary inputs from all preceding layers and transmits its own feature maps to all subsequent layers. The model was trained on the nih-pc- chex-mimic_ch-google-openi-rsna dataset 130 .

Next, we calculated 50 principal components on the feature representation of the DenseNet model of all images using ehrapy’s pca function. The principal component representation served as input for a nearest neighbors graph calculation using ehrapy’s neighbors function. This graph served as the basis for the calculation of a UMAP embedding with three components that was finally visualized using ehrapy.

We randomly picked a root in the group of images that was labeled ‘Normal’. First, we calculated so-called pseudotime by fitting a trajectory through the calculated UMAP space using diffusion maps as implemented in ehrapy’s dpt function 57 . Each image’s pseudotime value represents its estimated position along this trajectory, serving as a proxy for its severity stage relative to others in the dataset. To determine fates, we employed CellRank 58 , 59 with the PseudotimeKernel . This kernel computes transition probabilities for patient visits based on the connectivity of the k -nearest neighbors graph and the pseudotime values of patient visits, which resembles their progression through a process. Directionality is infused in the nearest neighbors graph in this process where the kernel either removes or downweights edges in the graph that contradict the directional flow of increasing pseudotime, thereby refining the graph to better reflect the developmental trajectory. We computed the transition matrix with a soft threshold scheme (Parameter of the PseudotimeKernel ), which downweights edges that point against the direction of increasing pseudotime. Finally, we calculated a projection on top of the UMAP embedding with CellRank using the plot_projection function of the PseudotimeKernel that we subsequently plotted.

This analysis is limited by the small dataset of 192 chest x-ray images, which may affect the model’s generalizability and robustness. Annotation subjectivity from radiologists can further introduce variability in severity scores. Additionally, the random selection of a root from ‘Normal’ images can introduce bias in pseudotime calculations and subsequent analyses.

Diabetes 130-US hospitals analysis

We used data from the Diabetes 130-US hospitals dataset that were collected between 1999 and 2008. It contains clinical care information at 130 hospitals and integrated delivery networks. The extracted database information pertains to hospital admissions specifically for patients diagnosed with diabetes. These encounters required a hospital stay ranging from 1 d to 14 d, during which both laboratory tests and medications were administered. The selection criteria focused exclusively on inpatient encounters with these defined characteristics. More specifically, we used a version that was curated by the Fairlearn team where the target variable ‘readmitted’ was binarized and a few features renamed or binned ( https://fairlearn.org/main/user_guide/datasets/diabetes_hospital_data.html ). The dataset contains 101,877 patient visits and 25 features. The dataset predominantly consists of White patients (74.8%), followed by African Americans (18.9%), with other racial groups, such as Hispanic, Asian and Unknown categories, comprising smaller percentages. Females make up a slight majority in the data at 53.8%, with males accounting for 46.2% and a negligible number of entries listed as unknown or invalid. A substantial majority of the patients are over 60 years of age (67.4%), whereas those aged 30–60 years represent 30.2%, and those 30 years or younger constitute just 2.5%.

All of the following descriptions start by loading the Fairlearn version of the Diabetes 130-US hospitals dataset using ehrapy’s dataloader as an AnnData object.

Selection and filtering bias

An overview of sensitive variables was generated using tableone. Subsequently, ehrapy’s CohortTracker was used to track the age, gender and race variables. The cohort was filtered for all Medicare recipients and subsequently plotted.

Surveillance bias

We plotted the HbA1c measurement ratios using ehrapy’s catplot .

Missing data and imputation bias

MCAR-type missing data for the number of medications variable (‘num_medications‘) were introduced by randomly setting 30% of the variables to be missing using Numpy’s choice function. We tested that the data are MCAR by applying ehrapy’s implementation of Little’s MCAR test, which returned a non-significant P value of 0.71. MAR data for the number of medications variable (‘num_medications‘) were introduced by scaling the ‘time_in_hospital’ variable to have a mean of 0 and a standard deviation of 1, adjusting these values by multiplying by 1.2 and subtracting 0.6 to influence overall missingness rate, and then using these values to generate MAR data in the ‘num_medications’ variable via a logistic transformation and binomial sampling. We verified that the newly introduced missing values are not MCAR with respect to the ‘time_in_hospital’ variable by applying ehrapy’s implementation of Little’s test, which was significant (0.01 × 10 −2 ). The missing data were imputed using ehrapy’s mean imputation and MissForest implementation.

Algorithmic bias

Variables ‘race’, ‘gender’, ‘age’, ‘readmitted’, ‘readmit_binary’ and ‘discharge_disposition_id’ were moved to the ‘obs’ slot of the AnnData object to ensure that they were not used for model training. We built a binary label ‘readmit_30_days’ indicating whether a patient had been readmitted in fewer than 30 d. Next, we combined the ‘Asian’ and ‘Hispanic’ categories into a single ‘Other’ category within the ‘race’ column of our AnnData object and then filtered out and discarded any samples labeled as ‘Unknown/Invalid’ under the ‘gender‘ column and subsequently moved the ‘gender’ data to the variable matrix X of the AnnData object. All categorical variables got encoded. The data were split into train and test groups with a test size of 50%. The data were scaled, and a logistic regression model was trained using scikit-learn, which was also used to determine the balanced accuracy score. Fairlearn’s MetricFrame function was used to inspect the target model performance against the sensitive variable ‘race’. We subsequently fit Fairlearn’s ThresholdOptimizer using the logistic regression estimator with balanced_accuracy_score as the target object. The algorithmic demonstration of Fairlearn’s abilities on this dataset is shown here: https://github.com/fairlearn/talks/tree/main/2021_scipy_tutorial .

Normalization bias

We one-hot encoded all categorical variables with ehrapy using the encode function. We applied ehrapy’s implementation of scaling normalization with and without the ‘Age group’ variable as group key to scale the data jointly and separately using ehrapy’s scale_norm function.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Physionet provides access to the PIC database 43 at https://physionet.org/content/picdb/1.1.0 for credentialed users. The BrixIA images 82 are available at https://github.com/BrixIA/Brixia-score-COVID-19 . The data used in this study were obtained from the UK Biobank 44 ( https://www.ukbiobank.ac.uk/ ). Access to the UK Biobank resource was granted under application number 49966. The data are available to researchers upon application to the UK Biobank in accordance with their data access policies and procedures. The Diabetes 130-US Hospitals dataset is available at https://archive.ics.uci.edu/dataset/296/diabetes+130-us+hospitals+for+years+1999-2008 .

Code availability

The ehrapy source code is available at https://github.com/theislab/ehrapy under an Apache 2.0 license. Further documentation, tutorials and examples are available at https://ehrapy.readthedocs.io . We are actively developing the software and invite contributions from the community.

Jupyter notebooks to reproduce our analysis and figures, including Conda environments that specify all versions, are available at https://github.com/theislab/ehrapy-reproducibility .

Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101 , E215–E220 (2000).

Article   CAS   PubMed   Google Scholar  

Atasoy, H., Greenwood, B. N. & McCullough, J. S. The digitization of patient care: a review of the effects of electronic health records on health care quality and utilization. Annu. Rev. Public Health 40 , 487–500 (2019).

Article   PubMed   Google Scholar  

Jamoom, E. W., Patel, V., Furukawa, M. F. & King, J. EHR adopters vs. non-adopters: impacts of, barriers to, and federal initiatives for EHR adoption. Health (Amst.) 2 , 33–39 (2014).

Google Scholar  

Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1 , 18 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Wolf, A. et al. Data resource profile: Clinical Practice Research Datalink (CPRD) Aurum. Int. J. Epidemiol. 48 , 1740–1740g (2019).

Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12 , e1001779 (2015).

Pollard, T. J. et al. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data 5 , 180178 (2018).

Johnson, A. E. W. et al. MIMIC-III, a freely accessible critical care database. Sci. Data 3 , 160035 (2016).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Hyland, S. L. et al. Early prediction of circulatory failure in the intensive care unit using machine learning. Nat. Med. 26 , 364–373 (2020).

Rasmy, L. et al. Recurrent neural network models (CovRNN) for predicting outcomes of patients with COVID-19 on admission to hospital: model development and validation using electronic health record data. Lancet Digit. Health 4 , e415–e425 (2022).

Marcus, J. L. et al. Use of electronic health record data and machine learning to identify candidates for HIV pre-exposure prophylaxis: a modelling study. Lancet HIV 6 , e688–e695 (2019).

Kruse, C. S., Stein, A., Thomas, H. & Kaur, H. The use of electronic health records to support population health: a systematic review of the literature. J. Med. Syst. 42 , 214 (2018).

Sheikh, A., Jha, A., Cresswell, K., Greaves, F. & Bates, D. W. Adoption of electronic health records in UK hospitals: lessons from the USA. Lancet 384 , 8–9 (2014).

Sheikh, A. et al. Health information technology and digital innovation for national learning health and care systems. Lancet Digit. Health 3 , e383–e396 (2021).

Cord, K. A. M., Mc Cord, K. A. & Hemkens, L. G. Using electronic health records for clinical trials: where do we stand and where can we go? Can. Med. Assoc. J. 191 , E128–E133 (2019).

Article   Google Scholar  

Landi, I. et al. Deep representation learning of electronic health records to unlock patient stratification at scale. NPJ Digit. Med. 3 , 96 (2020).

Ayaz, M., Pasha, M. F., Alzahrani, M. Y., Budiarto, R. & Stiawan, D. The Fast Health Interoperability Resources (FHIR) standard: systematic literature review of implementations, applications, challenges and opportunities. JMIR Med. Inform. 9 , e21929 (2021).

Peskoe, S. B. et al. Adjusting for selection bias due to missing data in electronic health records-based research. Stat. Methods Med. Res. 30 , 2221–2238 (2021).

Haneuse, S. & Daniels, M. A general framework for considering selection bias in EHR-based studies: what data are observed and why? EGEMS (Wash. DC) 4 , 1203 (2016).

PubMed   Google Scholar  

Gallifant, J. et al. Disparity dashboards: an evaluation of the literature and framework for health equity improvement. Lancet Digit. Health 5 , e831–e839 (2023).

Sauer, C. M. et al. Leveraging electronic health records for data science: common pitfalls and how to avoid them. Lancet Digit. Health 4 , e893–e898 (2022).

Li, J. et al. Imputation of missing values for electronic health record laboratory data. NPJ Digit. Med. 4 , 147 (2021).

Rubin, D. B. Inference and missing data. Biometrika 63 , 581 (1976).

Scheid, L. M., Brown, L. S., Clark, C. & Rosenfeld, C. R. Data electronically extracted from the electronic health record require validation. J. Perinatol. 39 , 468–474 (2019).

Phelan, M., Bhavsar, N. A. & Goldstein, B. A. Illustrating informed presence bias in electronic health records data: how patient interactions with a health system can impact inference. EGEMS (Wash. DC). 5 , 22 (2017).

PubMed   PubMed Central   Google Scholar  

Secondary Analysis of Electronic Health Records (ed MIT Critical Data) (Springer, 2016).

Jetley, G. & Zhang, H. Electronic health records in IS research: quality issues, essential thresholds and remedial actions. Decis. Support Syst. 126 , 113137 (2019).

McCormack, J. P. & Holmes, D. T. Your results may vary: the imprecision of medical measurements. BMJ 368 , m149 (2020).

Hobbs, F. D. et al. Is the international normalised ratio (INR) reliable? A trial of comparative measurements in hospital laboratory and primary care settings. J. Clin. Pathol. 52 , 494–497 (1999).

Huguet, N. et al. Using electronic health records in longitudinal studies: estimating patient attrition. Med. Care 58 Suppl 6 Suppl 1 , S46–S52 (2020).

Zeng, J., Gensheimer, M. F., Rubin, D. L., Athey, S. & Shachter, R. D. Uncovering interpretable potential confounders in electronic medical records. Nat. Commun. 13 , 1014 (2022).

Getzen, E., Ungar, L., Mowery, D., Jiang, X. & Long, Q. Mining for equitable health: assessing the impact of missing data in electronic health records. J. Biomed. Inform. 139 , 104269 (2023).

Tang, S. et al. Democratizing EHR analyses with FIDDLE: a flexible data-driven preprocessing pipeline for structured clinical data. J. Am. Med. Inform. Assoc. 27 , 1921–1934 (2020).

Dagliati, A. et al. A process mining pipeline to characterize COVID-19 patients’ trajectories and identify relevant temporal phenotypes from EHR data. Front. Public Health 10 , 815674 (2022).

Sun, Y. & Zhou, Y.-H. A machine learning pipeline for mortality prediction in the ICU. Int. J. Digit. Health 2 , 3 (2022).

Article   CAS   Google Scholar  

Mandyam, A., Yoo, E. C., Soules, J., Laudanski, K. & Engelhardt, B. E. COP-E-CAT: cleaning and organization pipeline for EHR computational and analytic tasks. In Proc. of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics. https://doi.org/10.1145/3459930.3469536 (Association for Computing Machinery, 2021).

Gao, C. A. et al. A machine learning approach identifies unresolving secondary pneumonia as a contributor to mortality in patients with severe pneumonia, including COVID-19. J. Clin. Invest. 133 , e170682 (2023).

Makam, A. N. et al. The good, the bad and the early adopters: providers’ attitudes about a common, commercial EHR. J. Eval. Clin. Pract. 20 , 36–42 (2014).

Amezquita, R. A. et al. Orchestrating single-cell analysis with Bioconductor. Nat. Methods 17 , 137–145 (2020).

Virshup, I. et al. The scverse project provides a computational ecosystem for single-cell omics data analysis. Nat. Biotechnol. 41 , 604–606 (2023).

Zou, Q. et al. Predicting diabetes mellitus with machine learning techniques. Front. Genet. 9 , 515 (2018).

Cios, K. J. & William Moore, G. Uniqueness of medical data mining. Artif. Intell. Med. 26 , 1–24 (2002).

Zeng, X. et al. PIC, a paediatric-specific intensive care database. Sci. Data 7 , 14 (2020).

Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562 , 203–209 (2018).

Lee, J. et al. Open-access MIMIC-II database for intensive care research. Annu. Int. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2011 , 8315–8318 (2011).

Virshup, I., Rybakov, S., Theis, F. J., Angerer, P. & Alexander Wolf, F. anndata: annotated data. Preprint at bioRxiv https://doi.org/10.1101/2021.12.16.473007 (2021).

Voss, E. A. et al. Feasibility and utility of applications of the common data model to multiple, disparate observational health databases. J. Am. Med. Inform. Assoc. 22 , 553–564 (2015).

Vasilevsky, N. A. et al. Mondo: unifying diseases for the world, by the world. Preprint at medRxiv https://doi.org/10.1101/2022.04.13.22273750 (2022).

Harrison, J. E., Weber, S., Jakob, R. & Chute, C. G. ICD-11: an international classification of diseases for the twenty-first century. BMC Med. Inform. Decis. Mak. 21 , 206 (2021).

Köhler, S. et al. Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources. Nucleic Acids Res. 47 , D1018–D1027 (2019).

Wu, P. et al. Mapping ICD-10 and ICD-10-CM codes to phecodes: workflow development and initial evaluation. JMIR Med. Inform. 7 , e14325 (2019).

Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19 , 15 (2018).

Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res . 12 , 2825–2830 (2011).

de Haan-Rietdijk, S., de Haan-Rietdijk, S., Kuppens, P. & Hamaker, E. L. What’s in a day? A guide to decomposing the variance in intensive longitudinal data. Front. Psychol. 7 , 891 (2016).

Pedersen, E. S. L., Danquah, I. H., Petersen, C. B. & Tolstrup, J. S. Intra-individual variability in day-to-day and month-to-month measurements of physical activity and sedentary behaviour at work and in leisure-time among Danish adults. BMC Public Health 16 , 1222 (2016).

Roffey, D. M., Byrne, N. M. & Hills, A. P. Day-to-day variance in measurement of resting metabolic rate using ventilated-hood and mouthpiece & nose-clip indirect calorimetry systems. JPEN J. Parenter. Enter. Nutr. 30 , 426–432 (2006).

Haghverdi, L., Büttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13 , 845–848 (2016).

Lange, M. et al. CellRank for directed single-cell fate mapping. Nat. Methods 19 , 159–170 (2022).

Weiler, P., Lange, M., Klein, M., Pe'er, D. & Theis, F. CellRank 2: unified fate mapping in multiview single-cell data. Nat. Methods 21 , 1196–1205 (2024).

Zhang, S. et al. Cost of management of severe pneumonia in young children: systematic analysis. J. Glob. Health 6 , 010408 (2016).

Torres, A. et al. Pneumonia. Nat. Rev. Dis. Prim. 7 , 25 (2021).

Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9 , 5233 (2019).

Kamin, W. et al. Liver involvement in acute respiratory infections in children and adolescents—results of a non-interventional study. Front. Pediatr. 10 , 840008 (2022).

Shi, T. et al. Risk factors for mortality from severe community-acquired pneumonia in hospitalized children transferred to the pediatric intensive care unit. Pediatr. Neonatol. 61 , 577–583 (2020).

Dudnyk, V. & Pasik, V. Liver dysfunction in children with community-acquired pneumonia: the role of infectious and inflammatory markers. J. Educ. Health Sport 11 , 169–181 (2021).

Charpignon, M.-L. et al. Causal inference in medical records and complementary systems pharmacology for metformin drug repurposing towards dementia. Nat. Commun. 13 , 7652 (2022).

Grief, S. N. & Loza, J. K. Guidelines for the evaluation and treatment of pneumonia. Prim. Care 45 , 485–503 (2018).

Paul, M. Corticosteroids for pneumonia. Cochrane Database Syst. Rev. 12 , CD007720 (2017).

Sharma, A. & Kiciman, E. DoWhy: an end-to-end library for causal inference. Preprint at arXiv https://doi.org/10.48550/ARXIV.2011.04216 (2020).

Khilnani, G. C. et al. Guidelines for antibiotic prescription in intensive care unit. Indian J. Crit. Care Med. 23 , S1–S63 (2019).

Harris, L. K. & Crannage, A. J. Corticosteroids in community-acquired pneumonia: a review of current literature. J. Pharm. Technol. 37 , 152–160 (2021).

Dou, L. et al. Decreased hospital length of stay with early administration of oseltamivir in patients hospitalized with influenza. Mayo Clin. Proc. Innov. Qual. Outcomes 4 , 176–182 (2020).

Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50 , 1219–1224 (2018).

Julkunen, H. et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat. Commun. 14 , 604 (2023).

Ko, F. et al. Associations with retinal pigment epithelium thickness measures in a large cohort: results from the UK Biobank. Ophthalmology 124 , 105–117 (2017).

Patel, P. J. et al. Spectral-domain optical coherence tomography imaging in 67 321 adults: associations with macular thickness in the UK Biobank study. Ophthalmology 123 , 829–840 (2016).

D’Agostino Sr, R. B. et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 117 , 743–753 (2008).

Buergel, T. et al. Metabolomic profiles predict individual multidisease outcomes. Nat. Med. 28 , 2309–2320 (2022).

Xu, Y. et al. An atlas of genetic scores to predict multi-omic traits. Nature 616 , 123–131 (2023).

Saelens, W., Cannoodt, R., Todorov, H. & Saeys, Y. A comparison of single-cell trajectory inference methods. Nat. Biotechnol. 37 , 547–554 (2019).

Rousan, L. A., Elobeid, E., Karrar, M. & Khader, Y. Chest x-ray findings and temporal lung changes in patients with COVID-19 pneumonia. BMC Pulm. Med. 20 , 245 (2020).

Signoroni, A. et al. BS-Net: learning COVID-19 pneumonia severity on a large chest X-ray dataset. Med. Image Anal. 71 , 102046 (2021).

Bird, S. et al. Fairlearn: a toolkit for assessing and improving fairness in AI. https://www.microsoft.com/en-us/research/publication/fairlearn-a-toolkit-for-assessing-and-improving-fairness-in-ai/ (2020).

Strack, B. et al. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed. Res. Int. 2014 , 781670 (2014).

Stekhoven, D. J. & Bühlmann, P. MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28 , 112–118 (2012).

Banerjee, A. et al. Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study. Lancet Digit. Health 5 , e370–e379 (2023).

Nagamine, T. et al. Data-driven identification of heart failure disease states and progression pathways using electronic health records. Sci. Rep. 12 , 17871 (2022).

Da Silva Filho, J. et al. Disease trajectories in hospitalized COVID-19 patients are predicted by clinical and peripheral blood signatures representing distinct lung pathologies. Preprint at bioRxiv https://doi.org/10.1101/2023.09.08.23295024 (2023).

Haneuse, S., Arterburn, D. & Daniels, M. J. Assessing missing data assumptions in EHR-based studies: a complex and underappreciated task. JAMA Netw. Open 4 , e210184 (2021).

Little, R. J. A. A test of missing completely at random for multivariate data with missing values. J. Am. Stat. Assoc. 83 , 1198–1202 (1988).

Jakobsen, J. C., Gluud, C., Wetterslev, J. & Winkel, P. When and how should multiple imputation be used for handling missing data in randomised clinical trials—a practical guide with flowcharts. BMC Med. Res. Methodol. 17 , 162 (2017).

Dziura, J. D., Post, L. A., Zhao, Q., Fu, Z. & Peduzzi, P. Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J. Biol. Med. 86 , 343–358 (2013).

White, I. R., Royston, P. & Wood, A. M. Multiple imputation using chained equations: issues and guidance for practice. Stat. Med. 30 , 377–399 (2011).

Jäger, S., Allhorn, A. & Bießmann, F. A benchmark for data imputation methods. Front. Big Data 4 , 693674 (2021).

Waljee, A. K. et al. Comparison of imputation methods for missing laboratory data in medicine. BMJ Open 3 , e002847 (2013).

Ibrahim, J. G. & Molenberghs, G. Missing data methods in longitudinal studies: a review. Test (Madr.) 18 , 1–43 (2009).

Li, C., Alsheikh, A. M., Robinson, K. A. & Lehmann, H. P. Use of recommended real-world methods for electronic health record data analysis has not improved over 10 years. Preprint at bioRxiv https://doi.org/10.1101/2023.06.21.23291706 (2023).

Regev, A. et al. The Human Cell Atlas. eLife 6 , e27041 (2017).

Megill, C. et al. cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. Preprint at bioRxiv https://doi.org/10.1101/2021.04.05.438318 (2021).

Speir, M. L. et al. UCSC Cell Browser: visualize your single-cell data. Bioinformatics 37 , 4578–4580 (2021).

Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9 , 90–95 (2007).

Waskom, M. seaborn: statistical data visualization. J. Open Source Softw. 6 , 3021 (2021).

Harris, C. R. et al. Array programming with NumPy. Nature 585 , 357–362 (2020).

Lam, S. K., Pitrou, A. & Seibert, S. Numba: a LLVM-based Python JIT compiler. In Proc. of the Second Workshop on the LLVM Compiler Infrastructure in HPC. https://doi.org/10.1145/2833157.2833162 (Association for Computing Machinery, 2015).

Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17 , 261–272 (2020).

McKinney, W. Data structures for statistical computing in Python. In Proc. of the 9th Python in Science Conference (eds van der Walt, S. & Millman, J.). https://doi.org/10.25080/majora-92bf1922-00a (SciPy, 2010).

Boulanger, A. Open-source versus proprietary software: is one more reliable and secure than the other? IBM Syst. J. 44 , 239–248 (2005).

Rocklin, M. Dask: parallel computation with blocked algorithms and task scheduling. In Proc. of the 14th Python in Science Conference. https://doi.org/10.25080/majora-7b98e3ed-013 (SciPy, 2015).

Pivarski, J. et al. Awkward Array. https://doi.org/10.5281/ZENODO.4341376

Collette, A. Python and HDF5: Unlocking Scientific Data (‘O’Reilly Media, Inc., 2013).

Miles, A. et al. zarr-developers/zarr-python: v2.13.6. https://doi.org/10.5281/zenodo.7541518 (2023).

The pandas development team. pandas-dev/pandas: Pandas. https://doi.org/10.5281/ZENODO.3509134 (2024).

Weberpals, J. et al. Deep learning-based propensity scores for confounding control in comparative effectiveness research: a large-scale, real-world data study. Epidemiology 32 , 378–388 (2021).

Rosenthal, J. et al. Building tools for machine learning and artificial intelligence in cancer research: best practices and a case study with the PathML toolkit for computational pathology. Mol. Cancer Res. 20 , 202–206 (2022).

Gayoso, A. et al. A Python library for probabilistic analysis of single-cell omics data. Nat. Biotechnol. 40 , 163–166 (2022).

Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (eds Wallach, H. et al.). 8024–8035 (Curran Associates, 2019).

Frostig, R., Johnson, M. & Leary, C. Compiling machine learning programs via high-level tracing. https://cs.stanford.edu/~rfrostig/pubs/jax-mlsys2018.pdf (2018).

Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616 , 259–265 (2023).

Kraljevic, Z. et al. Multi-domain clinical natural language processing with MedCAT: the Medical Concept Annotation Toolkit. Artif. Intell. Med. 117 , 102083 (2021).

Pollard, T. J., Johnson, A. E. W., Raffa, J. D. & Mark, R. G. An open source Python package for producing summary statistics for research papers. JAMIA Open 1 , 26–31 (2018).

Ellen, J. G. et al. Participant flow diagrams for health equity in AI. J. Biomed. Inform. 152 , 104631 (2024).

Schouten, R. M. & Vink, G. The dance of the mechanisms: how observed information influences the validity of missingness assumptions. Sociol. Methods Res. 50 , 1243–1258 (2021).

Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8 , 118–127 (2007).

Davidson-Pilon, C. lifelines: survival analysis in Python. J. Open Source Softw. 4 , 1317 (2019).

Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 , 289–300 (1995).

Wishart, D. S. et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 34 , D668–D672 (2006).

Harrell, F. E. Jr, Califf, R. M., Pryor, D. B., Lee, K. L. & Rosati, R. A. Evaluating the yield of medical tests. JAMA 247 , 2543–2546 (1982).

Currant, H. et al. Genetic variation affects morphological retinal phenotypes extracted from UK Biobank optical coherence tomography images. PLoS Genet. 17 , e1009497 (2021).

Cohen, J. P. et al. TorchXRayVision: a library of chest X-ray datasets and models. In Proc. of the 5th International Conference on Medical Imaging with Deep Learning (eds Konukoglu, E. et al.). 172 , 231–249 (PMLR, 2022).

Cohen, J.P., Hashir, M., Brooks, R. & Bertrand, H. On the limits of cross-domain generalization in automated X-ray prediction. In Proceedings of Machine Learning Research , Vol. 121 (eds Arbel, T. et al.) 136–155 (PMLR, 2020).

Download references

Acknowledgements

We thank M. Ansari who designed the ehrapy logo. The authors thank F. A. Wolf, M. Lücken, J. Steinfeldt, B. Wild, G. Rätsch and D. Shung for feedback on the project. We further thank L. Halle, Y. Ji, M. Lücken and R. K. Rubens for constructive comments on the paper. We thank F. Hashemi for her help in implementing the survival analysis module. This research was conducted using data from the UK Biobank, a major biomedical database ( https://www.ukbiobank.ac.uk ), under application number 49966. This work was supported by the German Center for Lung Research (DZL), the Helmholtz Association and the CRC/TRR 359 Perinatal Development of Immune Cell Topology (PILOT). N.H. and F.J.T. acknowledge support from the German Federal Ministry of Education and Research (BMBF) (LODE, 031L0210A), co-funded by the European Union (ERC, DeepCell, 101054957). A.N. is supported by the Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA) through the DAAD program Konrad Zuse Schools of Excellence in Artificial Intelligence, sponsored by the Federal Ministry of Education and Research. This work was also supported by the Chan Zuckerberg Initiative (CZIF2022-007488; Human Cell Atlas Data Ecosystem).

Open access funding provided by Helmholtz Zentrum München - Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH).

Author information

Authors and affiliations.

Institute of Computational Biology, Helmholtz Munich, Munich, Germany

Lukas Heumos, Philipp Ehmele, Tim Treis, Eljas Roellin, Lilly May, Altana Namsaraeva, Nastassya Horlava, Vladimir A. Shitov, Xinyue Zhang, Luke Zappia, Leon Hetzel, Isaac Virshup, Lisa Sikkema, Fabiola Curion & Fabian J. Theis

Institute of Lung Health and Immunity and Comprehensive Pneumology Center with the CPC-M bioArchive; Helmholtz Zentrum Munich; member of the German Center for Lung Research (DZL), Munich, Germany

Lukas Heumos, Niklas J. Lang, Herbert B. Schiller & Anne Hilgendorff

TUM School of Life Sciences Weihenstephan, Technical University of Munich, Munich, Germany

Lukas Heumos, Tim Treis, Nastassya Horlava, Vladimir A. Shitov, Lisa Sikkema & Fabian J. Theis

Health Data Science Unit, Heidelberg University and BioQuant, Heidelberg, Germany

Julius Upmeier zu Belzen & Roland Eils

Department of Mathematics, School of Computation, Information and Technology, Technical University of Munich, Munich, Germany

Eljas Roellin, Lilly May, Luke Zappia, Leon Hetzel, Fabiola Curion & Fabian J. Theis

Konrad Zuse School of Excellence in Learning and Intelligent Systems (ELIZA), Darmstadt, Germany

Altana Namsaraeva

Systems Medicine, Deutsches Zentrum für Neurodegenerative Erkrankungen (DZNE), Bonn, Germany

Rainer Knoll

Center for Digital Health, Berlin Institute of Health (BIH) at Charité – Universitätsmedizin Berlin, Berlin, Germany

Roland Eils

Research Unit, Precision Regenerative Medicine (PRM), Helmholtz Munich, Munich, Germany

Herbert B. Schiller

Center for Comprehensive Developmental Care (CDeCLMU) at the Social Pediatric Center, Dr. von Hauner Children’s Hospital, LMU Hospital, Ludwig Maximilian University, Munich, Germany

Anne Hilgendorff

You can also search for this author in PubMed   Google Scholar

Contributions

L. Heumos and F.J.T. conceived the study. L. Heumos, P.E., X.Z., E.R., L.M., A.N., L.Z., V.S., T.T., L. Hetzel, N.H., R.K. and I.V. implemented ehrapy. L. Heumos, P.E., N.L., L.S., T.T. and A.H. analyzed the PIC database. J.U.z.B. and L. Heumos analyzed the UK Biobank database. X.Z. and L. Heumos analyzed the COVID-19 chest x-ray dataset. L. Heumos, P.E. and J.U.z.B. wrote the paper. F.J.T., A.H., H.B.S. and R.E. supervised the work. All authors read, corrected and approved the final paper.

Corresponding author

Correspondence to Fabian J. Theis .

Ethics declarations

Competing interests.

L. Heumos is an employee of LaminLabs. F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd. and Omniscope Ltd. and has ownership interest in Dermagnostix GmbH and Cellarity. The remaining authors declare no competing interests.

Peer review

Peer review information.

Nature Medicine thanks Leo Anthony Celi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary handling editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1 overview of the paediatric intensive care database (pic)..

The database consists of several tables corresponding to several data modalities and measurement types. All tables colored in green were selected for analysis and all tables in blue were discarded based on coverage rate. Despite the high coverage rate, we discarded the ‘OR_EXAM_REPORTS’ table because of the lack of detail in the exam reports.

Extended Data Fig. 2 Preprocessing of the Paediatric Intensive Care (PIC) dataset with ehrapy.

( a ) Heterogeneous data of the PIC database was stored in ‘data’ (matrix that is used for computations) and ‘observations’ (metadata per patient visit). During quality control, further annotations are added to the ‘variables’ (metadata per feature) slot. ( b ) Preprocessing steps of the PIC dataset. ( c ) Example of the function calls in the data analysis pipeline that resembles the preprocessing steps in (B) using ehrapy.

Extended Data Fig. 3 Missing data distribution for the ‘youths’ group of the PIC dataset.

The x-axis represents the percentage of missing values in each feature. The y-axis reflects the number of features in each bin with text labels representing the names of the individual features.

Extended Data Fig. 4 Patient selection during analysis of the PIC dataset.

Filtering for the pneumonia cohort of the youths filters out care units except for the general intensive care unit and the pediatric intensive care unit.

Extended Data Fig. 5 Feature rankings of stratified patient groups.

Scores reflect the z-score underlying the p-value per measurement for each group. Higher scores (above 0) reflect overrepresentation of the measurement compared to all other groups and vice versa. ( a ) By clinical chemistry. ( b ) By liver markers. ( c ) By medication type. ( d ) By infection markers.

Extended Data Fig. 6 Liver marker value progression for the ‘youths’ group and Kaplan-Meier curves.

( a ) Viral and severe pneumonia with co-infection groups display enriched gamma-glutamyl transferase levels in blood serum. ( b ) Aspartate transferase (AST) and Alanine transaminase (ALT) levels are enriched for severe pneumonia with co-infection during early ICU stay. ( c ) and ( d ) Kaplan-Meier curves for ALT and AST demonstrate lower survivability for children with measurements outside the norm.

Extended Data Fig. 7 Overview of medication categories used for causal inference.

( a ) Feature engineering process to group administered medications into medication categories using drugbank. ( b ) Number of medications per medication category. ( c ) Number of patients that received (dark blue) and did not receive specific medication categories (light blue).

Extended Data Fig. 8 UK-Biobank data overview and quality control across modalities.

( a ) UMAP plot of the metabolomics data demonstrating a clear gradient with respect to age at sampling, and ( b ) type 2 diabetes prevalence. ( c ) Analogously, the features derived from retinal imaging show a less pronounced age gradient, and ( d ) type 2 diabetes prevalence gradient. ( e ) Stratifying myocardial infarction risk by the type 2 diabetes comorbidity confirms vastly increased risk with a prior type 2 (T2D) diabetes diagnosis. Kaplan-Meier estimators with 95 % confidence intervals are shown. ( f ) Similarly, the polygenic risk score for coronary heart disease used in this work substantially enriches myocardial infarction risk in its top 5% percentile. Kaplan-Meier estimators with 95 % confidence intervals are shown. ( g ) UMAP visualization of the metabolomics features colored by the assessment center shows no discernable biases. (A-G) n = 29,216.

Extended Data Fig. 9 UK-Biobank retina derived feature quality control.

( a ) Leiden Clustering of retina derived feature space. ( b ) Comparison of ‘overall retinal pigment epithelium (RPE) thickness’ values between cluster 5 (n = 301) and the rest of the population (n = 28,915). ( c ) RPE thickness in the right eye outliers on the UMAP largely corresponds to cluster 5. ( d ) Log ratio of top and bottom 5 fields in obs dataframe between cluster 5 and the rest of the population. ( e ) Image Quality of the optical coherence tomography scan as reported in the UKB. ( f ) Minimum motion correlation quality control indicator. ( g ) Inner limiting membrane (ILM) quality control indicator. (D-G) Data are shown for the right eye only, comparable results for the left eye are omitted. (A-G) n = 29,216.

Extended Data Fig. 10 Bias detection and mitigation study on the Diabetes 130-US hospitals dataset (n = 101,766 hospital visits, one patient can have multiple visits).

( a ) Filtering to the visits of Medicare recipients results in an increase of Caucasians. ( b ) Proportion of visits where Hb1Ac measurements are recorded, stratified by admission type. Adjusted P values were calculated with Chi squared tests and Bonferroni correction (Adjusted P values: Emergency vs Referral 3.3E-131, Emergency vs Other 1.4E-101, Referral vs Other 1.6E-4.) ( c ) Normalizing feature distributions jointly vs. separately can mask distribution differences. ( d ) Imputing the number of medications for visits. Onto the complete data (blue), MCAR (30% missing data) and MAR (38% missing data) were introduced (orange), with the MAR mechanism depending on the time in hospital. Mean imputation (green) can reduce the variance of the distribution under MCAR and MAR mechanisms, and bias the center of the distribution under an MAR mechanism. Multiple imputation, such as MissForest imputation can impute meaningfully even in MAR cases, when having access to variables involved in the MAR mechanism. Each boxplot represents the IQR of the data, with the horizontal line inside the box indicating the median value. The left and right bounds of the box represent the first and third quartiles, respectively. The ‘whiskers’ extend to the minimum and maximum values within 1.5 times the IQR from the lower and upper quartiles, respectively. ( e ) Predicting the early readmission within 30 days after release on a per-stay level. Balanced accuracy can mask differences in selection and false negative rate between sensitive groups.

Supplementary information

Supplementary tables 1 and 2, reporting summary, rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Heumos, L., Ehmele, P., Treis, T. et al. An open-source framework for end-to-end analysis of electronic health record data. Nat Med (2024). https://doi.org/10.1038/s41591-024-03214-0

Download citation

Received : 11 December 2023

Accepted : 25 July 2024

Published : 12 September 2024

DOI : https://doi.org/10.1038/s41591-024-03214-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

case study controlling class 12

IMAGES

  1. Controlling

    case study controlling class 12

  2. Controlling

    case study controlling class 12

  3. Controlling Class 12 Business Studies one Shot

    case study controlling class 12

  4. Controlling

    case study controlling class 12

  5. Controlling Class 12 Business Studies Notes and Questions

    case study controlling class 12

  6. CONTROLLING

    case study controlling class 12

VIDEO

  1. Controlling

  2. Controlling process (class 12 business studies)

  3. Controlling and its aspects (class 12 business studies)

  4. Importance of controlling (class 12 business studies)

  5. Controlling part

  6. Controlling

COMMENTS

  1. CBSE Class 12 Business Studies Case Studies

    CBSE Class 12 Business Studies Case Studies - Controlling. Controlling is the process of monitoring organisational performance, finding out deviations (or gaps in between the actual performance and the set standards), and taking corrective action in order to achieve organisational goals (as per set standards). Nature of Controlling.

  2. CBSE Class 12 Case Studies In Business Studies

    Question 2. State the steps in the process of controlling. (CBSE, Delhi 2017) Answer: The various steps involved in the controlling process are described below: Setting performance standards: The first step in the controlling process involves setting standards in clear, specific and measurable terms. Standards can be set in both quantitative as ...

  3. Highly Expected Case Studies (Part 1)

    👉Previous Video: https://www.youtube.com/watch?v=te5VpSRCbh4👉Next Video: https://www.youtube.com/watch?v=rBcI9utJPFk=====...

  4. Controlling Class 12 Business Studies Notes and Questions

    Class 12 Business Studies Controlling Notes and Questions. Q. 1. Babita Ltd. is engaged in manufacturing machine components. The target production is 250 units per day per worker. The company had been successfully attaining this target until two months ago. Over the last two months it has been observed that daily production varies between 200 ...

  5. CONTROLLING

    Vishwaas - 12th Commerce Enrollment Link: https://bit.ly/Vishwaas_Batch📝 For complete notes of Lectures, visit Viswaas Batch in the Batch Section of Physics...

  6. Controlling: Case Study

    Hello everyone, today in this Class 12 Business Studies session with Disha Mam we will cover some case studies from the chapter Controlling. It is an importa...

  7. Case Study Chapter 8 Controlling

    These case based questions are expected to come in your exams this year. Please practise these case study based Class 12 Business Studies Questions and answers to get more marks in examinations. Case Study Questions Chapter 8 Controlling. Read the source given below and answer the following questions :

  8. Case Studies

    Introduction of Case Studies - Controlling in English is available as part of our Business Studies (BST) Class 12 for Commerce & Case Studies - Controlling in Hindi for Business Studies (BST) Class 12 course. Download more important topics related with notes, lectures and mock test series for Commerce Exam by signing up for free.

  9. Controlling Class 12 Notes CBSE Business Studies Chapter 8 [PDF]

    In this class 12 business studies controlling revision notes, we will look at all the major points from the definitions to the properties and important questions related to this chapter. Students can download chapter 8 class 12 business studies notes for free from Vedantu. The notes are available in a pdf format.

  10. CBSE Class 12 Case Study

    Explain the importance of controlling highlighted in above key elements. Ans: (I) Ensuring order and discipline. (ii) Improving employee's motivation. (iii) facilitating coordination in action. Business Studies CBSE Class 12 Case Study - Controlling with answers. Managers at New York city import-export company suspected Corporate Defence ...

  11. PDF Controlling notes for CBSE Class 12 Business Studies

    Controlling refers to the process of evaluation and assessment of the work done. Under the process of controlling, standards are set for various tasks and activities. Accordingly, the various tasks and activities are evaluated against the set standards. Deviations from the set targets are identified, and corrective actions to be taken are decided.

  12. NCERT Solution for Class 12 Business Studies Chapter 8 Controlling

    NCERT Solutions for Class 12 Business Studies Chapter 8 - Controlling provides students with a comprehensive introduction to the concepts. It provides a clear picture of how to control the staff and management. The concepts covered in this chapter are listed below: Meaning of Controlling. Limitations of Controlling.

  13. CBSE Class 12

    Understand the concept of Case Studies on Controlling with CBSE Class 12 course curated by Rana Rohit Kumar on Unacademy. The Business Studies course is delivered in Hinglish.

  14. CBSE Class 12 Business Studies Case Studies

    BST Class 12 Case studies: You already know that as per new pattern , questions based on case study can be asked in exam .These type of questions are introduced to check students ability to understand and apply his/her knowledge to given situation . Do not fear the questions based on case study. If you are well prepared and have through understanding of chapter, those questions will not be ...

  15. Important CBSE Questions on Class 12 Chapter 8

    Study Important Question for Class 12 Business Studies Chapter 8 - Controlling. Very Short Answer Questions (1 or 2 Marks) 1. Kothari Sweets is a renowned name for quality sweets since 2000. Adarsh the owner of Kothari Sweets was worried as the sales had declined during the last five months.

  16. Controlling

    Oswal- Gurukul 36 Sample Papers for Commerce Term 2https://www.amazon.in/stores/page/74390834-1D1E-4456-91C3-D40FF74699DA?channel=cbse-36sp2-sunilpWe also pr...

  17. PDF Revision Notes Class 12 Business Studies Chapter 8

    Revision Notes. Class - 12 Business StudiesChapter 8 - ControllingDefinition"Managerial Control implies the measurement of accomplishment against the standard and the correction of dev. f objectives according to plans."Koontz and O'DonnellMeaningControlling is a process that entails comparing actual performance to the desired o.

  18. Controlling Class 12: Important Revision Notes

    The chapter controlling class 12, elaborates the importance of controlling. They are mentioned below. Helps in achieving organizational goals and indicates deviations if any to take corrective action. Judges accuracy of standards by carefully checking the changes taking place in an organizational environment.

  19. CBSE Class 12

    Day -08 - CONTROLLING - CASE STUDIES. Lesson 38 of 64 • 25 upvotes • 15:00mins. Priya Jain. Continue on app (Hindi) 13 Days Pledge to complete Business Studies with Case Studies : Class 12.

  20. NCERT Solutions for Class 12 Business Studies Chapter 8 Controlling

    1. Explain the meaning of controlling. Ans: Controlling means ensuring that activities in an organisation are performed as per the plans. Controlling also ensures that an organisations resources are being used effectively and efficiently for the achievement of desired goals. Controlling is, thus a goal oriented function.

  21. CBSE Class 12

    Understand the concept of Case Studies on Controlling-2 with CBSE Class 12 course curated by Rana Rohit Kumar on Unacademy. The Business Studies course is delivered in Hinglish.

  22. Extra Questions of Class 12 Business Studies Controlling

    Identify the possible causes for the decline in production and the steps to be taken to achieve the desired targets. (3) Explain briefly the relationship between controlling and planning. (4) "There is a close and reciprocal relationship between planning and controlling.". Explain the statement.

  23. PDF Controlling Chapter

    CHAPTER 8. g and controlling;Describe the techni. f controlling; andbe cont. olled by a manager. It is quite clear from the example that a manager needs to take some sort of corrective action before any major damage is d. ne to the business. Controlling function of management comes to the rescu.

  24. An open-source framework for end-to-end analysis of electronic ...

    The application of such approaches to EHR databases 1,5,6,7,8,9 has enabled the prediction and classification of diseases 10,11, study of population health 12, determination of optimal treatment ...