{"title":"D-optimal plans for variable selection in data bases","authors":"J. Schiffner, C. Weihs","doi":"10.17877/DE290R-8705","DOIUrl":"https://doi.org/10.17877/DE290R-8705","url":null,"abstract":"This paper is based on an article of Pumplun et al. (2005a) that investigates the use of Design of Experiments in data bases in order to select variables that are relevant for classification in situations where a sufficient number of measurements of the explanatory variables is available, but measuring the class label is hard, e. g. expensive or time-consuming. Pumplun et al. searched for D-optimal designs in existing data sets by means of a genetic algorithm and assessed variable importance based on the found plans. If the design matrix is standardized these D-optimal plans are almost orthogonal and the explanatory variables are nearly uncorrelated. Thus Pumplun et al. expected that their importance for discrimination can be judged independently of each other. In a simulation study Pumplun et al. applied this approach in combination with five classification methods to eight data sets and the obtained error rates were compared with those resulting from variable selection on the basis of the complete data sets. Based on the D-optimal plans in some cases considerably lower error rates were achieved. Although Pumplun et al. (2005a) obtained some promising results, it was not clear for different reasons if D-optimality actually is beneficial for variable selection. For example, D-efficiency and orthogonality of the resulting plans were not investigated and a comparison with variable selection based on random samples of observations of the same size as the D-optimal plans was missing. In this paper we extend the simulation study of Pumplun et al. (2005a) in order to verify their results and as basis for further research in this field. Moreover, in Pumplun et al. D-optimal plans are only used for data preprocessing, that is variable selection. The classification models are estimated on the whole data set in order to assess the effects of D-optimality on variable selection separately. Since the number of measurements of the class label in fact is limited one would normally employ the same observations that were used for variable selection for learning, too. For this reason in our simulation study the appropriateness of D-optimal plans for training classification methods is additionally investigated. It turned out that in general in terms of the error rate there is no difference between variable selection on the basis of D-optimal plans and variable selection on random samples. However, for training of linear classification methods D-optimal plans seem to be beneficial.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"64 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74715638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal designs for an interference model","authors":"J. Kunert, S. Mersmann","doi":"10.17877/DE290R-504","DOIUrl":"https://doi.org/10.17877/DE290R-504","url":null,"abstract":"Kunert and Martin (2000) determined optimal and efficient block designs in a model for field trials with interference effects, for block sizes up to 4. In this paper we use Kushner's method (Kushner, 1997) of finding optimal approximate designs to extend the work of Kunert and Martin (2000) to optimal designs with five or more plots per block. We give an overall upper bound a*t,b,k for the trace of the information matrix of any design and show that an universally optimal approximate design will have all its sequences from merely four different equivalence classes. We further determine the efficiency of a binary type I orthogonal array under the general p-criterion. We find that these designs achieve high efficiencies of more than 0:94.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84088788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distribution hierarchies in directed networks","authors":"Ueli Peter, T. Hrúz","doi":"10.3929/ETHZ-A-006733711","DOIUrl":"https://doi.org/10.3929/ETHZ-A-006733711","url":null,"abstract":"Recently, Ahnert and Fink [AF08] showed that some classes of directed networks are cleanly separated in the space of the clustering signature. In this work we will study the relation hierarchy among subgraph distributions in directed networks and derive how the clustering signature ts into this hierarchy. Thereby we gather a fundamental understanding of the network dynamics and build a framework for the analysis of stochastic processes.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89745362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constructing irregular histograms by penalized likelihood","authors":"Y. Rozenholc, Thoralf Mildenberger, U. Gather","doi":"10.17877/DE290R-587","DOIUrl":"https://doi.org/10.17877/DE290R-587","url":null,"abstract":"We propose a fully automatic procedure for the construction of irregular histograms. For a given number of bins, the maximum likelihood histogram is known to be the result of a dynamic programming algorithm. To choose the number of bins, we propose two different penalties motivated by recent work in model selection by Castellan [6] and Massart [26]. We give a complete description of the algorithm and a proper tuning of the penalties. Finally, we compare our procedure to other existing proposals for a wide range of different densities and sample sizes.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79147064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kernelized design of experiments","authors":"S. Rüping, C. Weihs","doi":"10.17877/DE290R-8240","DOIUrl":"https://doi.org/10.17877/DE290R-8240","url":null,"abstract":"This paper describes an approach for selecting instances in regression problems in the cases where observations x are readily available, but obtaining labels y is hard. Given a database of observations, an algorithm inspired by statistical design of experiments and kernel methods is presented that selects a set of k instances to be chosen in order to maximize the prediction performance of a support vector machine. It is shown that the algorithm significantly outperforms related approaches on a number of real-world datasets.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81177561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hans-Joachim Bröckenhauer, D. Komm, Rastislav KráloviÄ, Richard KráloviÄ, Tobias Mömke
{"title":"Online algorithms with advice","authors":"Hans-Joachim Bröckenhauer, D. Komm, Rastislav KráloviÄ, Richard KráloviÄ, Tobias Mömke","doi":"10.3929/ETHZ-A-006733662","DOIUrl":"https://doi.org/10.3929/ETHZ-A-006733662","url":null,"abstract":"In online problems, the input forms a finite sequence of requests. Each request must be processed, i. e., a partial output has to be computed only depending on the requests having arrived so far, and it is not allowed to change this partial output subsequently. The aim of an online algorithm is to produce a sequence of partial outputs that optimizes some global measure. The most frequently used tool for analyzing the quality of online algorithms is the competitive analysis which compares the solution quality of an online algorithm to the optimal solution for the whole input sequence, and in fact measures the degradation in the solution quality caused by the lack of any information about the input. In this paper, we investigate to what extent the solution quality can be improved by allowing the algorithm to extract a given amount of information about the input. We consider the recently introduced notion of advice complexity where the algorithm, in addition to being fed the requests one by one, has access to a tape of advice bits that were computed by some oracle function from the complete input. The advice complexity is the number of advice bits read. We introduce an improved model of advice complexity and investigate the connections of advice complexity to the competitive ratio of both deterministic and randomized online algorithms using the paging problem, job shop scheduling, and the routing problem on a line as sample problems. Our results for all of these problems show that very small advice (only three bits in the case of paging) already suffices to significantly improve over the best deterministic algorithm. Moreover, to achieve the same competitive ratio as any randomized online algorithm, a logarithmic number of advice bits is sufficient. On the other hand, to obtain optimality, much larger advice is necessary.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74545431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Token-Ring for the TRM","authors":"N. Wirth","doi":"10.3929/ETHZ-A-006828754","DOIUrl":"https://doi.org/10.3929/ETHZ-A-006828754","url":null,"abstract":"With the design of the Token-Ring for the TRM (Tiny Register Machine) I pursued mainly two aims. The first is to design a network connecting several TRM cores. The second is to go for a design that is as simple as possible, considering that the TRM project is oriented towards educational hard- and software. Featuring a ring architecture, it provides a welcome alternative to the already existing bus architecture implemented by Ling Liu, allowing to compare complexity and performance.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73234434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Separation, abstraction, multiple inheritance and view shifting","authors":"S. Staden","doi":"10.3929/ETHZ-A-006836686","DOIUrl":"https://doi.org/10.3929/ETHZ-A-006836686","url":null,"abstract":"Inheritance is a central mechanism in object-oriented programming. Many popular object-oriented languages support multiple inheritance or limited versions thereof. This work extends a powerful modular proof system for single inheritance, which uses separation logic and abstract predicate families, to multiple inheritance. The extended system allows view shifting in the logic: the ability to view an object under different abstractions and to shift between such views. Several examples illustrate the system’s use and utility.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77988599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Specifying Access Control in Event-B","authors":"Son Hoang","doi":"10.3929/ETHZ-A-006733720","DOIUrl":"https://doi.org/10.3929/ETHZ-A-006733720","url":null,"abstract":"We investigate the idea of developing access control systems in Event-B by specifying separately the \"insecure\" target system and the security authorisation, then combining them together in order to construct a secure system. This is based on the work by Basin et. al. [6] where the chosen language is CSP-OZ. Moreover, in order to verify the secure system against some safety temporal properties, we propose an approach of constructing several abstract models corresponding to these properties, and using refinement to prove that the final system satisfies these properties.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79938607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Formalization of JML in the Coq Proof System","authors":"Andreas Kägi, Hermann Lehner, Peter Müller","doi":"10.3929/ETHZ-A-006903145","DOIUrl":"https://doi.org/10.3929/ETHZ-A-006903145","url":null,"abstract":"JML is a complex specication language for Java. Its large scale and manifold features make it hard to precisely dene its semantics in a reference manual. It is thus desirable to formally specify the syntax and semantics of JML. There are many good reasons for a formalized semantics of JML in a theorem prover: It can be used to develop a sound verication condition generator for JML constructs. By formally defining the semantics in a theorem prover, we can detect and eliminate ambiguousities in the language. When using the semantics with an operational semantics for Java source code, we can dene a runtime assertion checker and prove it's soundness with respect to the semantics in Coq. We divide the problem of dening JML in Coq into several steps. Firstly, we dene a basic JML subset that has the full expressiveness of JML, but without syntactic sugar. We define the semantics for this subset in Coq. We introduce an extended (full) JML Syntax and a syntactic rewriting function from the extended syntax into the basic syntax. Finally, we built a translation frontend that transforms a JML-annotated Java program into it's equivalent in Coq. We managed to dene the full JML and Java syntax in Coq, minus some very rare and not clearly described concepts and minus everything related to floating point numbers. We implemented a lightweight translation frontent in Java. We defined a large set of rewritings that simplify the syntax of JML without loosing any precision. We then dened the semantics of the desugared JML, using Bicolano as a basis for the semantic domain. Finally, we conducted a case study evaluating the feasibility of proving on top of the formalisation.","PeriodicalId":10841,"journal":{"name":"CTIT technical reports series","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2009-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83195870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}