B.Tech. (I) - ICP-132 (2009) Assignment 7 Level 0 ------- 1. Given a number n by the user, create a file data_n.txt with each line having i, i^2 and i^3. If the file data_n.txt already exists, the program should print error message and quit. Eg Input: 5 Output: (file data_n.txt) 1, 1, 1 2, 4, 8 3, 9, 27 4, 16, 64 5, 25, 125 2. Remove the commas from the data_n.txt. If the file data_n.txt does not exist, then the program should print the error message and quit. 3. Write a program that reads data_n.txt and appends it to a file data_an.txt. The program should create data_an.txt if it is not found and should print a message that the file has been created. Level 1 ------- 1. Manually remove the second line from data_n.txt. Write a program to print the missing lines in data_n.txt. 2. A user enters multiple integers in three lines. Print the number of integers in each line and print their value doubled. The numbers are white space separated. Eg Input: 5 3 2 2 1 23 2 1 Output: 4: 10 6 4 4 1: 2 3: 46 4 2 3. A user enters multiple reals in three lines. Print the number of reals in each line and print their value doubled to single decimal accuracy. The numbers are comma separated. Eg Input: 5.1,3.2,2.1,2.1 1.0 22.5,3.4,1.0 Output: 4: 10.2 6.4 4.2 4.2 1: 2.0 3: 45.0 6.8 2.0 Level 2 ------- A data base is a collection of records. A record is a collection of fields. The following questions are based on two data bases, SwissProt and Nasdaq. The files are available at: https://iws60.iiita.ac.in/icp/assg/sprot.txt https://iws60.iiita.ac.in/icp/assg/nasdaq.txt SwissProt: The file is a data base of proteins. Records are stored sequentially. Some fields might be missing in individuals records. Content of the field is preceeded by a two letter code that identifies the name of the field. A record begins with the field 'ID' and ends with '//'. Eg. ID 104K_SHEAN Reviewed; 893 AA. AC Q4U9M9; DT 01-APR-1990, integrated into UniProtKB/Swiss-Prot. DT 01-APR-1990, sequence version 1. DT 24-JUL-2007, entry version 37. DE 104 kDa microneme/rhoptry antigen precursor (p104). GN OrderedLocusNames=TP04_0437; OS Theileria parva. SQ SEQUENCE 893 AA; 101921 MW; 2F67CEB3B02E7AC1 CRC64; MKFLVLLFNI LCLFPILGAD ELVMSPIPTT DVQPKVTFDI NSEVSSGPLY LNPVEMAGVK YLQLQRQPGV QVHKVVEGDI VIWENEEMPL YTCAIVTQNE VPYMAYVELL EDPDLIFFLK EGDQWAPIPE DQYLARLQQL RQQIHTESFF SLNLSFQHEN YKYEMVSSFQ HSIKMVVFTP KNGHICKMVY DKNIRIFKAL YNEYVTSVIG FFRGLKLLLL NIFVIDDRGM IGNKYFQLLD DKYAPISVQG YVATIPKLKD FAEPYHPIIL DISDIDYVNF YLGDATYHDP GFKIVPKTPQ CITKVVDGNE VIYESSNPSV ECVYKVTYYD KKNESMLRLD LNHSPPSYTS YYAKREGVWV TSTYIDLEEK IEELQDHRST ELDVMFMSDK DLNVVPLTNG NLEYFMVTPK PHRDIIIVFD GSEVLWYYEG LENHLVCTWI YVTEGAPRLV HLRVKDRIPQ NTDIYMVKFG EYWVRISKTQ YTQEIKKLIK KSKKKLPSIE EEDSDKHGGP PKGPEPPTGP GHSSSESKEH EDSKESKEPK EHGSPKETKE GEVTKKPGPA KEHKPSKIPV YTKRPEFPKK SKSPKRPESP KSPKRPVSPQ RPVSPKSPKR PESLDIPKSP KRPESPKSPK RPVSPQRPVS PRRPESPKSP KSPKSPKSPK VPFDPKFKEK LYDSYLDKAA KTKETVTLPP VLPTDESFTH TPIGEPTAEQ PDDIEPIEES VFIKETGILT EEVKTEDIHS ETGEPEEPKR PDSPTKHSPK PTGTHPSMPK KRRRSDGLAL STTDLESEAG RILRDPTGKI VTMKRSKSFD DLTTVREKEH MGAEIRKIVV DDDGTEADDE DTHPSKEKHL STVRRRRPRP KKSSKSSKPR KPDSAFVPSI IFIFLVSLIV GIL // The above record contains the fields ID, AC, DT, DE, GN, OS and SQ. The field SQ contains sequence of the protein and does not have leading 'SQ' for the lines containing the actual sequence. The sequence of this protein (ID=104K_SHEAN) and the record are terminated by '//'. Nasdaq: Each line is a record and fields are seperated by commas. There are no missing fields. The first line gives names of various indices. Each subsequent line is the corresponding index on various days. The first field, for example, is the 'Date'. 1. Print the IDs of all the proteins in sprot.txt 2. Print the scientific names of all the organisms whose proteins have records. The output must be in sorted order. (Hint: You need to process the field OS and discard the information on strains. Use shell command to get an idea as in, "grep ^OS sprot.txt | sort | uniq".) 3. Find the date on which the difference between the N100 and Financial100 is maximum. Level 3 ------- 1. Print the IDs of all the proteins and the number of distinct amino acids in their sequences. (Each amino acid is represented by a single alphabet.) 2. Given the ID of the protein, print the number of times each amino acid occurs in the sequence. 3. Find the Nasdaq index that continued to grow for the longest time at a stretch.