Data Editing
ANALYSIS AND PROCESSING OF INFORMATION
The processing and analysis of the information represents a fundamental part to guarantee the quality, consistency, completeness and timeliness of the information generated in census statistical events or agricultural samples.
For the 2019 National Agricultural Survey (ENA 2019), specific information processing activities were contemplated in order to guarantee its consistency and quality. The validation and analysis processes were carried out from the moment of the interview with validation criteria in the Mobile Computing Device (DCM) directly with the informant, until the review and presentation of the results.
Due to the above, the information collected in the ENA 2019, was subjected to a set of processes to identify data that does not meet the requirements of logical and arithmetic consistency, completeness and integrity, in order to apply a solution under specific and homogeneous criteria, that ensure the consistency and quality of the information.
Within the processing of the ENA 2019, various stages were defined to carry out the analysis and validation of the information. The processing stages were as follows:
- Online validation
- Monitoring
- Codification and Normalization
- Validation within the questionnaire
- Validation between questionnaires
- Comparison with internal and external sources
ONLINE VALIDATION
Online validation is the first stage of the processing and had the purpose of detecting and solving inconsistencies in the information at the time of the interview directly with the informant, this during the application of the questionnaire with the DCM. This validation allowed that once the interviewer has recorded the data provided by the informant, if the system detected any inconsistency, it would send an error message to be corrected at that time with the informant. The online validation criteria were more than 200. These were designed to guarantee that the questionnaire had the minimum necessary information, detect variables without answers, as well as validate the breakdowns of the destination of crop production, livestock stocks, among others.
Once the capture of the questionnaires was completed, the information was transferred via the Internet to the national capture database of INEGI central offices.
MONITORING
The monitoring of the information was carried out at the same time as the field operation and with the information from the capture database. Its objective was to follow up on the information collected in the questionnaires and verify its completeness during the field operation, in order to detect in a timely manner inconsistencies in the collection of information that were not detected during the online validation. In the same way, it served as an alert system to monitor the quality and completeness of the information and offered elements to reinstruct the operational personnel in case of omissions or repetitive failures in the capture of information.
CODING OF CONCEPTS
For the generation of statistics, it is necessary that the information collected from each variable is cataloged for its proper classification and is identified for its integration into the database, for its processing, analysis, as well as for an orderly presentation of results.
In agricultural statistics (censuses and surveys), catalogs are used to classify the response options for each variable contained in the questionnaire; The catalogs contain codified concepts that are developed from the investigation and analysis of each variable, to integrate the answer options, as many as it is feasible for the informants to answer, according to the characteristics of each question in the questionnaire.
The first coding was done at the time of the interview, since the mobile computing device had the catalogs integrated, in such a way that, during the interview, the device allowed the catalog to be displayed from which the interviewer could choose the concept according to the response of the producer, when choosing a concept from the catalog integrated into the mobile computing device, at that moment the key of the concept chosen from the catalog was stored. In the cases in which the answer provided by the informant did not coincide with any of the concepts in the catalogue, the capture system made it possible to capture the answer and all these cases were coded once the captured information was integrated into a database, using two processes: electronic coding and manual coding.
The questionnaires captured in the mobile computing device of each interviewer were transferred weekly to the database concentrated at the state level and each state coordination was transferred in turn to a national database, integrated in the central offices of the Institute.
The information already concentrated in the central office database was processed by a system and those cases that were not coded at the time of capture because they were not located in the catalog at the time of the interview were electronically coded. that is to say, by means of an automated electronic system, the concepts captured with the contents in the catalogs were compared to make a filter that would allow detecting those cases that were coincidental and that for some reason at the time of the interview were not located, by this means electronically, the cases were automatically coded with those described in the catalogues.
After the electronic coding process, the cases that were pending coding are transferred to manual coding. In this process, they are grouped by type of catalog for review and analysis by central office staff, where synonyms with concepts that if they are contained in the catalog or they had an erroneous writing at the time of their capture; these were assigned the corresponding key of the concept contained in the catalogue; on the other hand, the cases that were identified as new, after review, analysis and investigation, including consultations with staff from state offices, were assigned a new code and registered in the corresponding catalog for their coding.
STANDARDIZATION
In Mexico throughout its territory there are various regionalisms and the units of measurement that refer to surface and volume are no exception. The information that is captured in the agricultural statistics and that corresponds to quantitative variables, which refer to extensions of surface or to quantify the capacity or volume. In some cases, agricultural producers express them in measures that are not always of the metric system (meters, hectares, liters, kilograms, tons, etc.), depending on the geographical location in which they are located, they provide regional units that they usually use in their community, such as almud, tarea, media, rope, among other measures.
For the publication of results it is necessary to homogenize the measurements to the decimal metric system, this homogenization process is called Normalization, in this process the units of measurement other than the decimal metric system are reviewed and analyzed and an equivalence is applied to carry out a conversion to the measurements with which they will be published (hectares, tons, liters, etc.).
First, an electronic normalization is carried out, which, through an automated electronic process, converts the units of measurement that are of fixed equivalence (square meter, yard, acre, pound, gallon, etc.), to units of measurement that are presented in the published results: liter, meter or hectare, kilogram or ton, depending on the variable in question such as the planted area, harvested area or production.
On the other hand, in manual normalization, all captured units that do not correspond to the decimal metric system and that do not have an established equivalence are analyzed and investigated, since their value can vary, depending on the region where they have been captured. In these cases, once their equivalence is determined through an exhaustive investigation and having verified their consistency with other variables, their value is converted to publishable measurement units, thus homogenizing the values to be able to add the information and present it in the results. of the poll.
VALIDATION INSIDE THE QUESTIONNAIRE
The validation within the questionnaire guarantees the consistency of the information within it, verifying the congruence between related variables. For this, there was a significant number of logical validations that were applied to each of the questionnaires. This process was carried out once the previous coding and normalization processes were released. Therefore, the validation within the questionnaire, and for each one of them, began with the standardized information and was executed until no questionnaire presented errors or discrepancies according to the established criteria. In ENA 2019, 157 validation criteria were developed.
For the validation within questionnaires, it was established that the 'theoretical vectors' method was used, in which functions were previously defined where their dependent variables were assigned values according to the questions and answers of each chapter of the questionnaire. . From these values, the functions provided a set of 'images' that corresponded to all the possible combinations of answers to the questions under study, each image identifying one and only one combination. Subsequently, each image was subjected to an analysis and correction methodology for any inconsistencies that could arise, in such a way that the records that did not meet the established criteria would be automatically corrected in some cases and in others diagnosed for manual debugging.
VALIDATION BETWEEN QUESTIONNAIRES
The validation processing stage between questionnaires had as objective that the information was consistent in a grouped way. For this, an analysis was carried out between different groups defined according to the main activity or the size of the production unit, etc.; such as, for example: corn production units or livestock production units with an affinity for some species, with this it was possible to detect records that showed different behavior in certain variables with respect to the group to which they belong. This was done by applying statistical tools to grouped data such as multivariate and univariate analysis. For the univariate analysis, the intervals between which the data of these variables could fluctuate without departing from the average behavior of the others were statistically defined. The intervals were used to detect all those production units that recorded atypical data when leaving the delimited fluctuation, that is, all those data whose dimension was higher or lower than what is recorded by the predetermined average behavior of the others. On the other hand, for the multivariate analysis, the variables that were correlated and dependent on each other were defined; Based on this, the production units with atypicalities in the grouped behavior of said variables were detected.
The validation between questionnaires was carried out by having all the questionnaires coded and standardized. This stage was developed at the same time as the validation within the questionnaires, as the entire base was standardized, and continued until the end of the processing. In the cases that were inconsistent, a report was prepared to analyze their automatic or manual debugging if necessary.
During the internal validation stages and between questionnaires, a re-consultation system was available, which allowed an exchange of information to be carried out between the central and state levels, in relation to the cases reported as inconsistent so that they could be analyzed by the state. and if it was considered necessary to reconsult them in the field directly with the informant, to ratify the data or apply adjustments to them.
COMPARISON WITH INTERNAL AND EXTERNAL SOURCES
In order to guarantee the quality of the information captured by the ENA 2019, it was important to carry out a comparison of information with that generated by other sources, both internal and from institutions related to the Agricultural Sector. The sources of consultation used were the following:
INTERNAL SOURCES: information from the 2007 census and 2012, 2014 and 2017 Agricultural Surveys.
EXTERNAL SOURCES: information from SIAP-SADER, SEMARNAT, CONAGUA, RAN, etc.
The aforementioned confrontation was carried out at two levels, national and state, based on the priority given to certain variables, such as: area, crops, production, yields, cattle head inventories, etc.
For the above, it was necessary to have the diagnostic or preliminary tabulations, which would allow carrying out the corresponding analysis in terms of corroborating the expanded sample figures, as well as carrying out re-consultations with the producers, in order to determine if the information was correct. or the pertinent adjustments had to be made and, in the last case, the corresponding justification should be made.
Finally, this activity made it possible to detect similarities and/or differences in the expanded statistical data, or to determine if these differences were due to conceptual or operational aspects.
Other Processing
The organizational structure of INEGI is oriented towards the development of regulations at the central level and to delegate the execution of statistical programs at the regional and state levels.
The sample for ENA 2019 was obtained from the universe of production units resulting from the Update of the 2016 Agricultural Census Framework and ENA 2017. The information was collected with an operational structure made up of Interviewers (ENT) and Interviewer Supervisors. (SENT).
The planning constituted a basic process, since it allowed calculating the staff of operational and control personnel necessary for the collection of information; as well as defining and equitably distributing the workloads through the conformation of geographical areas of responsibility, in this way the necessary material resources were determined and for the purposes of establishing an adequate organization of work and optimizing the control of the field operation.
The general planning of the field operation was based on the calculation of human resources necessary to satisfy the needs of the Survey, the operational planning was defined based on the size of the sample and the dispersion of the localities where the domicile of the producers was located. , accessibility, average identification time, total land and conformation of the Production Unit; In addition, other variables are considered, such as the number of days of the field operation and the average number of questionnaires per day that the Interviewer would apply; With these elements, the loads and work areas were defined, which determined the number of Interviewer
In the first planning phase, the universe of locations to be surveyed was indicated and the factors and criteria considered in the planning system for calculating the fraction of Interviewers and Interviewers Supervisors required by location are described.
In the second planning phase, the criteria and guidelines for the formation of the Interviewers and Interviewers Supervisor Areas were detailed, which were described in the reports generated by the system for the assignment of workloads of these figures.
Additionally, a reinforcement session was held in the field, with the purpose of providing feedback to the staff regarding possible inconsistencies. The session consisted of the application of the question: “How many lands do I manage in this municipality?”, if the number that was stated was equal to the one that was registered, the application of the questionnaire continued; If the number did not match, only the amount of land that the informant claimed to own was noted.
Finally, progress reports were prepared, generated with information stored in the DCM and transferred to the Management System for Control and Monitoring of Operational Processes (SACSPO) via the Web, with the purpose of maintaining control over the degree of progress of the activities developed in each area of responsibility in the entities and accordingly, draw up the pertinent measures in those cases in which they were considered necessary.
The organizational structure of ENA 2019 was defined, depending on the workloads, as follows:
Control Chief (JC)
Responsible for planning the operation, coordinating, advising, supervising and supporting the tasks carried out by the personnel under his charge, monitoring and resolving situations in accordance with the guidelines that have been established, in order to ensure planning, training, monitoring control of the operation, as well as the requirement of financial and material resources. In addition, it carries out the coordination of support, dissemination of the project with municipal authorities and local leaders for its development; and, the hiring of operational personnel participating in the survey.
Informatic support
He is in charge of the system administrator for the operation of mobile computing devices, he is in charge of advising and supporting the NCDs, for sending DCM information through the Web; It also provides advice to staff on the use and management of the device, upon request from the JC.
Lastly, he was in charge of generating follow-up reports for the weekly meetings or when they were required by the JC.
Agricultural Instructor (IA)
He was in charge of providing training to operational personnel such as: SENT, ENT, AI, Zone Administrative Assistant (AA), he also trained the personnel hired to replace vacancies due to resignations of operational personnel. The IA carries out supervision activities of the field operation, that is, the SENT and the ENT; In addition, it supports the JC in the activities it carries out in the event that it so requests.
Information Analyst (AI)
Among the activities carried out was to review and analyze the analysis reports with possible inconsistencies generated by the automated Monitoring and Validation systems, which make the JC aware, in the case of information inconsistencies to which a solution was given both both in the field and in the office, they must be registered in the Reconsultation Module, until corroborating that when they returned they contained the correct information, which is why communication with the Reconsultation System was direct.
Interviewer Supervisor (SENT)
Responsible for supervising that the TNCs carry out recruitment activities properly. Delivers the DCM loaded with the ENA 2019 Information Capture System. Provides the necessary materials (control, support, auxiliary and office) for the performance of the activities. Assign the initial charge. Together with the NCDs, it establishes the weekly schedule of visits to producers. Advise and support in solving the problem. Supervises the correct capture of information from the selected production units (UP). Review the progress of the operation based on the analysis of the follow-up reports. Verifies that information is sent through the Web in a complete and timely manner.
Interviewer (ENT)
In charge of capturing, through a direct interview, the information from the observation units assigned in their area of responsibility, based on the planning and strategy stipulated in the interviewer's manual for the generation of survey information.